Version Control for ML

Hi All,

I find it difficult to keep track of my experiments, metrics (split by train, val & test) & hyper-parameters, architectures, cross validation results etc., when working on models. Any tips on how to do this efficiently & consistently across different algorithms (RFs, Linear/Logistic Regressions, Neural Nets etc.,)? I end up creating new excel for each project to cover all the experiments within the project and find it cumbersome to maintain when switching algorithms in the same project.

I came across dvc.org & comet-ml.org when I was searching for version control systems specifically for the machine learning use case. Have you come across this in the past? Any better alternatives? What was your experience like? How should one tackle this when collaborating in the team?

All suggestions are welcome :slight_smile:

Thank You!!

-Deepak

2 Likes

Hi @Deepak_S ,

Jakub from neptune.ml here.
It feels like Neptune would solve your problems. It helps you keep track of code, hyperparameters, data versions, summary charts (prediction distribution) and things like that. Some time ago I wrote a blog post about organizing your experimentation process with Neptune so you can go and check it out here if you want. What is more, it also allows discussions that link to code or charts so that collaboration on your machine learning project is smoother.

Also, I have just added a simple callback that lets you monitor fastai training in Neptune to our neptune-contrib library. I explain how it works in this blog post but basically, with no change to your workflow, you can track code, hyperparameters, metrics and more.

Before you ask, Neptune is now open and free for non-organizations.
Read more about it on the docs page to get a better view.

I tried using Neptune, but I ran into a problem with different PIL version dependencies, do you have any recommendation how to overcome this issue.

You can simply downgrade torchvision version to 0.4.0 and it should work:

pip install torchvision==0.4.0

Atleast that worked for me.

Thank you, I will give that a try