Hi guys, I’ve recently transitioned from “ML engineer” to “Data Scientist”. (by that I mean more sole responsibility for external products using machine learning, requiring more direct communication with stakeholders and product people.)
My first project wasn’t a complete waste, but not a complete success either.
There were some things that wasn’t completely obvious at the start, which clearly hampered development of the ML part, one of them being: the product didn’t really have the data to support using machine learning yet, even though it’s a great candidate for it later on.
What are some good explanations of prerequisites for needing/using ML you have found?
How can we make teams without direct ML experience prepare well for a time when they might actually need ML in their products?
Examples on the top of my head are:
- Do you have enough data-points to capture some truth about the distribution you’re trying to explain?
- Would you be able to distinguish between example A and B apart as a human, based on the data you have about them?
- Is my data in a structured form, that the computer understands? (is it numerical, categorical or possible to turn it into structured data via image or NLP models? PS: turning it into structured data is probably a last resort)
- Does your data already exist somewhere, or does it have to be collected from scratch?
++
Really appreciate any thoughts, or links about the subject!