CI for ML: Code, Data and Model

Run unit tests, schema tests and behavioural tests on every PR.

0/2 done

Code + Data + Model

Three test layers

Code tests — pure-Python unit tests on feature functions and pre-processing.
Data tests — schema, null rate, ranges; tools: Great Expectations, Soda, dbt tests.
Model tests — behavioural (does the model respond as expected to a sentinel input?), invariance (paraphrasing a sentence does not flip sentiment), directional (raising income raises credit limit).

Inspired by 'Beyond Accuracy: Behavioral Testing of NLP Models with CheckList' (Ribeiro et al., 2020).

A pragmatic PR pipeline

lint  →  unit tests  →  data tests (sample)  →  train (smoke)
                                              →  behavioural tests on saved model
                                              →  metric gate (no regression > 0.5%)

Analogy

Code tests check the recipe text is well-formed. Data tests check the ingredients haven't gone off. Behavioural tests are the taster confirming the dish still tastes right. You need all three; missing one means a class of bug ships unchallenged.

Reading in progress · 0 of 2 activities done