Data Science & Machine Learning in Containers

Opeyemi Bamigbade
20 min readNov 19, 2020

Deep Dive Into Containerization for Data Science & Machine Learning

image source: neptune.ai

This is a summary of a well-detailed article I originally published with neptune.ai. For complete understanding and further details, you can find the full write-up by clicking the Full article link on neptune.ai

When building data science and machine learning-powered products the research-development-production workflow is not linear like in traditional software development where the specs are known and problems are (mostly) understood beforehand.

There are lots of trial and error involved, including the test and use of new algorithms, trying new data versions (and managing it), packaging the product for production, end-users views and perspectives, feedback loops, and more. These make managing those projects a challenge.

Isolating the development environment from the production systems is a must if you want to assure that your application will actually work. And so putting your ML model development work inside a (docker) container can really help with:

  • managing the product development,
  • keeping your environments clean (and making it easy to reset it),
  • most importantly, moving things from development to production becomes easier.

--

--