Data Science and Machine Learning DOCKER AND PYTHON Sr. Developer Advocate @Microsoft. ixek | https://bit.ly/europython-ml-docker
Science and Machine Learning -Security and performance -Do not reinvent the wheel, automate -Tips and trick to use Docker ixek | https://bit.ly/europython-ml-docker
your users or colleagues meant to know what dependencies they need? Import Error: no module name x, y, x ixek | https://bit.ly/europython-ml-docker
the problem of how to get software to run reliably when moved from one computing environment to another Your laptop Test environment Staging environment Production environment ixek | https://bit.ly/europython-ml-docker
is containerised INFRASTRUCTURE HOST OPERATING SYSTEM DOCKER APP APP APP APP APP At the app level: Each runs as an isolated process ixek | https://bit.ly/europython-ml-docker
app -When you run an image it creates a container IMAGE VS CONTAINER Docker image $ docker run Latest 1.0.2 ixek | https://bit.ly/europython-ml-docker
evolving projects (iterative R&D process) -Docker is complex and can take a lot of time to upskill -Are containers secure enough for my data / model /algorithm? COMMON PAIN POINTS IN DS AND ML
a model either -Heavily relies on data -Mixture of wheels and compiled packages -Security access levels - for data and software -Mixture of stakeholders: data scientists, software engineers, ML engineers HOW IS IT DIFFERENT FROM WEB APPS FOR EXAMPLE? ixek | https://bit.ly/europython-ml-docker
set of instructions to install software, configure your image or copy files BUILDING DOCKER IMAGES ixek | https://bit.ly/europython-ml-docker
LABELS -Split complex RUN statements and sort them -Prefer COPY to add files BEST PRACTICES https://docs.docker.com/develop/develop-images/dockerfile_best-practices/ ixek | https://bit.ly/europython-ml-docker
files https://docs.docker.com/develop/develop-images/dockerfile_best-practices/ SPEED UP YOUR BUILD AND PROOF ixek | https://bit.ly/europython-ml-docker
using a database) -Avoid issues by creating a non-root user https://docs.docker.com/develop/develop-images/dockerfile_best-practices/ MOUNT VOLUMES TO ACCESS DATA ixek | https://bit.ly/europython-ml-docker
runs as root by default) - Minimise capabilities MINIMISE PRIVILEGE - FAVOUR LESS PRIVILEGED USER ixek | https://bit.ly/europython-ml-docker
in an intermediate layer they are cached. Keep them out of your Dockerfile. DON’T LEAK SENSITIVE INFORMATION ixek | https://bit.ly/europython-ml-docker
your dependencies will have been packed as wheels so you might need a compiler - build a compile and a runtime image -Smaller images overall USE MULTI STAGE BUILDS
data science Or cookie cutter docker science https://github.com/docker-science/cookiecutter-docker-science https://drivendata.github.io/cookiecutter-data-science/
of tools like repo2docker. Already configured and optimised for Data Science / Scientific computing. https://repo2docker.readthedocs.io/en/latest ixek | https://bit.ly/europython-ml-docker
GitHub Actions, whatever you prefer). And delegate your build - also build often. https://repo2docker.readthedocs.io/en/latest ixek | https://bit.ly/europython-ml-docker
system packages 2. Never work as root / minimise the privileges 3. You do not want to use Alpine Linux (go for buster, stretch or the Jupyter stack) 4. Always know what you are expecting: pin / version EVERYTHING (use pip- tools, conda, poetry or pipenv) 5. Leverage build cache TOP TIPS
- need to compile code? Need to reduce your image size? 8. Make your images identifiable (test, production, R&D) - also be careful when accessing databases and using ENV variables / build variables 9. Do not reinvent the wheel! Use repo2docker 10.Automate - no need to build and push manually 11.Use a linter TOP TIPS