Dear FluCoMa community,
I’m posting to describe a project that I’m working on in collaboration with @amgum. We want to “productionize” a ML workflow using the FluCoMa libraries (i.e. create a continuous training pipeline which runs in the cloud). The goal is to provide a recipe that can be shared and re-used by others, including infra, for research purposes. It’s early but I think this workflow will involve analyzing a corpus of sound in Google Colab and then providing a proof-of-concept to “ship it” using a platform like Vertex AI (when new sounds are uploaded, it triggers the training workflow). The full pipeline is a bit speculative and involves some reduction of scope to keep it realistic and feasible. We want a small-data version of something which could scale to big-data. Probably we would use @jamesbradbury’s Python bindings, to bring FluCoMa into the more traditional Python-based data science ecosystem. We could use some help with the technical scoping.
@amgum is a data scientist while I’m a DevOps by day and we’re working together within the structure of a professional peer mentorship. We want our work to serve as an example to others. In addition to sharing the notebooks and infra code, part of this project is to reflect on and showcase what makes, for us, a successful collaboration between Data Science and DevOps. @weefuzzy shared this video with me, which might serve as orientation.
This is a learning experiment for both of us. I’ve never done full-blown MLOps and @amgum has mainly worked with financial data and things like that, not spectral time series data. The specific inspiration for this was Alice Eldridge’s demonstration of her analysis of rainforest sounds.
I hope that’s enough context. I’m eager to share this with the FluCoMa community and hope we can get your support and encouragement.
Right now, specifically, we’re looking for public datasets (either on Kaggle or elsewhere), both of sounds and derivative spectral data of the natual environment, to get our bearing on the data itself. If anybody has any pointers, would like to help or know more, please get in touch in thread or in DM.