Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Data Science Version Control and Continuous Delivery

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Data Version Control

dvc

DAGsHub

dvc + DAGsHub sounds like a good lightweighted way for data version control.

dolthub is another good way for data version control.

Pachyderm

Data Versioning, Data Pipelines, and Data Lineage

Retracing Your Steps in Machine Learning

Model Life Cycle Tracking

mlflow mlflow tracks every detail about a model (including training, servering, etc.) but it seems to be a little bit complicated to use.

References

Continuous Delivery for Machine Learning

https://towardsdatascience.com/version-control-ml-model-4adb2db5f87c

How to version control your production machine learning models