ML Ops with Andrew Ng
These are my notes from a talk by Andrew Ng about Machine Learning Ops, and going from model-centric to data-centric AI.
AI Systems = code (model/algorithm) + data
We need to shift our mindset to improve our data in a systematic way to improve our ML outcomes.
Improving the quality of the data can be more impactful than tweaking our models.
Data is food for AI:
- Cooking is 80% prep (source and prepare high quality ingredients), and 20% action (cook a meal)
- AI, too, is 80% prep (source and prepare high quality data), and 20% action (train a model)
Lifecycle of an ML project:
- Scope project (decide what to work on, define it)
- Collect data (define and collect it)
- Train model (training, error analysis, iterative improvement): feeds back into collecting data
- Deploy in production (deploy, monitor, maintain system): feeds back into collecting data & training models
ML Ops: Making Data Quality Systematic
Make sure the data is labeled consistently. For instance, there are multiple ways to transcribe an audio clip:
- "Um, today's weather"
- "Um... today's weather"
- "Today's weather"
Any of these^ can be ok, but you want the labeling to be consistent across all of your data. If you're going to include the "um" in one data, then include it in all.
Take a scientific approach to labeling data. For instance, make sure labeling instructions are clear enough so that labeling across different labelers is consistent. Test it out with multiple labelers and revise your labeling instructions if you're not getting consistent results.
Model-centric-view: Collect what data you can and develop a model good enough to deal with the noise in the data. Hold the data fixed and iteratively improve the code/model.
Data-centric-view: The consistency of the data is paramount. Use tools to improve the data quality; this will allow multiple models to do well. Hold the code fixed and iteratively improve the data.
Error analysis: Identify the types of data the algorithm does poorly on.
Systematic process (training):
- Train a model.
- Error analysis.
- Get more of the data you need that the algo is having trouble with (discovered in error analysis).
Systematic process (deploy):
- Monitor performance in deployment and flow new data back for continuous refinement of model.
- Do this by systematically checking for concept drift/data drift (performance degradation).
- Flow data back to retrain/update model regularly.
ML Ops is making the development and deployment of ML systems systematic.
AI software has many feedback loops in the process (deploy feeds back into training & collecting data), so MLOps (maintaining quality & infra) should be involved early in the process (data collection).
ML Ops most important task: Ensure consistently high-quality data in all phases of the ML project lifecycle.
- Is defined consistently (definition of labels y is unambiguous).
- Covers the important cases well (good coverage of inputs x).
- Has timely feedback from production data (distribution covers data drift and concept drift).
- Is sized appropriately.