1.How Do We Build and Maintain Machine Learning Systems? From Techfest 2016 Gene Olafsen https://arxiv.org/pdf/1707.06742.pdf
2.What is your worst nightmare? Question Context: building machine learning models for an enterprise which are deployed in production.
3.Response “[...] Manage versions. Manage data versions. Being able to reproduce the models. What if, you know, the data disappears, the person disappears, the model disappears... And we cannot reproduce this. I have seen this hundreds of times in Bing. I have seen it every day. Like... Oh yeah, we had a good model. Ok, I need to tweak it. I need to understand it. And then... Now we cannot reproduce it. That is my biggest nightmare!”
4.Break it down "To put context to this testimony, we review what building a machine learning model may look like in a product group:"
5.1. Data Collection A problem owner collects data, writes labeling guidelines, and optionally contributes some labels.
6.2. Labeling The problem owner outsources the task of labeling a large portion of the data (e.g., 50,000 examples).
7.Amazon Mechanical Turk ( MTurk ) MTurk aims to make accessing human intelligence simple, scalable, and cost-effective. Businesses or developers needing tasks done (called Human Intelligence Tasks or “HITs”) can use the robust MTurk API to access thousands of high quality, global, on-demand Workers—and then programmatically integrate the results of that work directly into their business processes and systems. MTurk enables developers and businesses to achieve their goals more quickly and at a lower cost than was previously possible.
8.MTurk fee The price you (the Requester) pay for a Human Intelligence Task ("HIT") is comprised of two components: the amount you pay Workers, plus a fee you pay Amazon Mechanical Turk ( MTurk ) which is based on the amount you pay Workers. The minimum fee is $0.01 per assignment or bonus payment. There is an additional fee for using the Masters Qualification.
9.3. Labeling issues The problem owner examines the labels and may discover that the guidelines are incorrect or that the sampled examples are inappropriate or inadequate for the problem. When that happens, GOTO step 1.
10.4. Model Selection An ML expert is consulted to select the algorithm (e.g., deep neural network), the architecture (e.g., number of layers, units per layer, etc.), the objective function, the regularizers , the cross-validation sets, etc.
11.5. feature engineering Engineers adjust existing features or create new features to improve performance. Models are trained and deployed on a fraction of traffic for testing. Machine Teaching
12.6. train and Test the model If the system does not perform well on test traffic, GOTO step 1
13.7. deploy The model is deployed on full traffic. Performance of the model is monitored, and if that performance goes below a critical level, the model is modified by returning to step 1.
14.Risks Time may be your enemy...
15.Degrade over time An iteration through steps 1 to 6 typically takes weeks. The system can be stable at step 7 for months. When it eventually breaks, it can be for a variety of reasons: the data distribution has changed the competition has improved and the requirements have increased new features are available and some old features are no longer available the definition of the problem has changed a security update or other change has broken the code
16.Staff changes over time The problem owner The machine learning expert A key engineer may have moved on to another group or another company
17.Restaff "Because multiple players with different expertise are involved, it takes a significant amount of effort and coordination to understand why the model does not perform as well as expected after being retrained."
18.Features, Labels, Data The features or the labels were not versioned or documented. No one understands how the data was collected because it was done in an ad hoc and organic fashion.
19.In the worst case, the model is operating but no one can tell if it is performing as expected, and no one wants the responsibility of turning it off. Machine learning “litter” starts accumulating everywhere. These problems are not new to machine learning in practice (Sculley et al., 2014).
20.reflection "The example illustrates the fact that building a machine learning model involves more than just collecting data and applying learning algorithms, and that the management process of building machine learning solutions can be fraught with inefficiencies."
21.Inefficiences and Metrics Machine learning projects typically consist of a single monolithic model trained on a large labeled data set. If the model’s summary performance metrics (e.g., accuracy, F1 score) were the only requirements and the performance remained unchanged, adding examples would not be a problem even if the new model errs on the examples that were previously predicted correctly. However, for many problems for which predictability and quality control are important, any negative progress on the model quality leads to laborious testing of the entire model and incurs high maintenance cost. A single monolithic model lacks the modularity required for most people to isolate and address the root cause of a regression problem.