Jade Abbott - Mls Hidden Tasks
Jade Abbott - Mls Hidden Tasks
Jade Abbott - Mls Hidden Tasks
Lots of tutorials
Loads of resources
ML
Endless examples
https://miro.medium.com/max/1552/1*Nv2NNALuokZEcV6hYEHdGA.png
Challenge
How to make
this work in
the real
world?
Machine Learning’s Surprises
A Checklist for Developers
when Building ML Systems
Hi, I’m Jade Abbott
@alienelf
masakhane.io
Hi, I’m Jade Abbott
Surprises while...
After deployment
of model
Some context
❖ I won’t be talking about training machine
learning models
❖ I won’t be talking about which models to chose
❖ I work primarily in deep learning & NLP
❖ I am a one person ML team working in a startup
context
❖ I work in a normal world where data is scarce
and we need to collect more
The Problem
Yes, they
I want to meet... should
meet
No they
I can provide...
shouldn’t
Yes, they
I want to meet... should
meet
No they
I can provide...
shouldn’t
pet sitting
cat breeding
software
development
chef lessons Language Model + Downstream Task

The Problem
Yes, they
I want to meet... should
meet
No they
I can provide...
shouldn’t
pet sitting
cat breeding
software
development
chef lessons
The Problem
Yes, they
I want to meet... should
meet
No they
I can provide...
shouldn’t
pet sitting
cat breeding
software
development
chef lessons
Surprises
Surprises trying to
deploy the model
Expectations
CI/CD
model API
Unit
Tests user testing
Surprise #1
1. https://visualsonline.cancer.gov/details.cfm?imageid=9288
2. https://arxiv.org/pdf/1602.04938.pdf
Skin Cancer Detection Husky/Dog Classifier
1. https://visualsonline.cancer.gov/details.cfm?imageid=9288
2. https://arxiv.org/pdf/1602.04938.pdf
Explanations
https://github.com/marcotcr/lime https://pair-code.github.io/what-if-tool/
Surprise #3
Make it measurable!
❖ It’s in English
❖ Is it robust to spelling errors?
❖ How does it perform with malicious data?
https://pair-code.github.io http://aif360.mybluemix.net https://github.com/fairlearn/fairlearn
https://github.com/jphall663/awesome-machine-learning-interpretability
Expectations
CI/CD
model API
Unit
Tests user testing
Reality
choose
a useful metric
Evaluate model
model
Choose threshold
API
Explain predictions
Fairness Framework
Unit
Tests user
testing

Surprises
agile cycle
Bug Triage
bug tracking
tool
I can provide marijuana and other drugs I want to meet a doctor YES NO False Positive
which improves health
I can provide medicine I want to meet a drug addiction sponsor YES YES
I can provide medicine I want to meet a pharmacist YES YES True Positives
I can provide illegal drugs I want to meet a drug dealer YES NO
Is my “bug” fixed?
Classification Error
politicians-false-neg
designers-too-general
drugs-doctors-false-pos
tech-too-general
% Users Affected
x
Normalized Error
x
Harm
How do we triage these “bugs”?
0.8 0.75
Re-evaluate ALL models
0.72 0.75
Surprise #7
❖ My data?
❖ My model?
❖ My preprocessing?
How to figure out what changed? Experiment
ea2541df da1341bb
agile cycle
Prioritization
bug tracking
tool
Surprises maintaining
and improving the
model over time
Expectation
User behaviour
drifts
Now what?
● Regularly sample
data from
production for
training
● Regularly refresh
your test set
Surprise #9
Data labellers
are rarely
experts
Surprise #10
Changing and
updating the data so
often gets messy
Needed to check the following
● Data Leakage
● Duplicates
● Distributions
Expectation
Review
Pick Get data labelled
Generate/select sample from
Problem on crowdsourced
unlabelled data each data
platform
labeller
Approve
Model tells you
which patterns Escalate
it’s uncertain conflicting
about data labels
Explain Predictions
Fairness Framework
After First Release
The Checklist
ML Problem Tracker
Reproducible Training
Comparable Results
Result Management