Gold Recovery Process Modeling
Multi-stage regression pipeline to predict rougher and final gold recovery from plant telemetry. Cleaned process data, validated recovery calculations, and optimized against a weighted sMAPE business metric.
Key metrics.
Random Forest
best final notebook performance
7.26%
weighted business metric on test data
4.70
rougher-stage recovery error
7.74
final recovery prediction error
What the project tries to solve.
Predict both rougher-stage and final-stage gold recovery from process telemetry so plant operators can estimate output quality earlier in the pipeline.
Status: Study project, being refactored for portfolio use.
Notebook: gold_recovery_process_modeling.ipynb
Repo path: data-science-projects/gold-recovery-process-modeling
This project reads more like real industrial data science than a standard classroom notebook: there are multiple targets, plant-process constraints, custom evaluation logic, and a need to reconcile train and test feature availability.
This project stands out because it looks more like real industrial data science work: process-oriented data cleaning, domain-specific metrics, and separate modeling targets tied to a multi-stage system rather than a toy prediction task.
How I approached it.
Validated the recovery calculation itself before trusting the labels.
Aligned training and test features to avoid relying on unavailable plant measurements at inference time.
Modeled rougher and final recovery separately, then combined them with the weighted business metric.
Compared linear, decision tree, and random forest regressors against the weighted sMAPE objective.
What I would improve next.
Refactor the dual-target workflow into a more production-style pipeline with shared preprocessing.
Add clearer diagnostics around train / test distribution shift and feature drift across process stages.
Explain the business meaning of sMAPE and where the current model would and would not be trusted operationally.