The challenge
- Production models ran as C#/.NET services (Azure Functions on Kubernetes), hard to scale for herd-wide computation.
- Numerical parity with the existing system had to be preserved exactly through the migration.
- Early Python prototypes used row-by-row pandas iteration, unsuitable for farm/herd scale.
Our approach
- Re-implemented the sub-models (production, health, reproduction, robot efficiency, lactation-curve fitting) and the composite cow-ranking index in Python.
- Preserved exact model constants and thresholds from the reference implementation to guarantee parity.
- Ported the gradient-descent curve-fitting solver to the scientific-Python stack.
- Re-engineered row-wise pandas logic into vectorized PySpark for scale on Databricks.
- Built a numerical-parity test harness validating Python outputs against the legacy reference before cutover.
Architecture
Daily summary data
- Spark / Delta
Feature engineering
- Vectorized PySpark
Sub-models
- Production · health
- Reproduction · efficiency
- Curve fitting
Composite ranking
- Cow index
Per-cow advice
- Databricks jobs
Parity check
- vs. legacy reference
Outcomes
- Cow-modelling running as scalable Databricks jobs in place of the legacy .NET services.
- Validated numerical parity with the reference implementation.
- A maintainable Python codebase ready for herd-scale execution and future iteration.
