How I monitor and track my machine learning experiments anywhere

By Kimberly Cook |Email | Apr 17, 2018 | 17001 Views

I started working at a few months ago, where we are building a really amazing tool for machine learning engineers. The short of it is that we help track experiments using a single line of code that automagically saves everything to make your model reproducible. You can get great experiment logging and history without being tied to a single platform.

In my own time, I decided to put @cometml to the test while training a model for logo detection, using RetinaNet.

Note: I work for, but all of this is done on my own time for fun.

RetinaNet is a very high quality object detector that uses the "Focal Loss for Dense Object Detection" (by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Doll√?¬°) paper's method. The Keras implementation of this paper can be found here:

Using this repository, I augmented the script and added the training example code. With a single line of code, I was able to get a live view of the model's training process, access to the Keras RetinaNet code, a snapshot of the hyperparameters I used when running the program, and the results.

I am obviously biased because I work on this product, but I honestly have to say that I was impressed. While trying to run various training processes in the past, I repeatedly got stuck or underwhelmed by the output. With, I have a live interface into the training process that extends beyond the bash terminal. I do most of my development locally, but train models on a remote development machine I use. I can now train the model and monitor the overall process using, rather than needing to keep an open session to monitor any changes.

I'm reiterating the process I went through below for anyone else who wants to try.

Setup my environment
I started with setting up my remote environment and getting the code I would be using for the RetinaNet training process. I used a dataset of logos from various companies, very similar to something that can be found on Kaggle.

I had to install the RetinaNet library and various dependencies on my remote machine. Because my machine has a GPU, I installed the tensorflow-gpu version 1.4. I also updated the script that RetinaNet uses to run and added the single line of code from to kick off the training process.

Side note: We make it really easy to connect your training process to your github repo. This way, once you figure out how to get the best training result, you can create a pull request that takes a snapshot of your code and hyperparameters.

All I had to do was copy the initialization script into the Keras RetinaNet file and run the code. Thats it.

Track your experiment
Once is installed, the experiment code will pull all the hyper parameters you define during runtime. Whats also very cool is that depending if you have setup your project to be public or private, you will get a web URL to monitor the experiment training in real-time.

Monitor the experiment
In my case, the training process was estimated to take 10 hours, so I was able to detach from my active session and monitor The dashboard for the experiment shows a live chart of the loss and accuracy metrics. It also provides a clear picture of the code and hyper parameters used to get the recorded result.

Finally, also logs the terminal output. This is really useful because you can actually just have the website open rather than needing an active SSH session to the server running your experiment. You also don't have to worry about accidentally disconnecting and losing your progress.

Is it useful?
If you struggle with managing your experiment history and reproducing results, you should definitely check out

The article was originally published here

Source: HOB