0% found this document useful (0 votes)
131 views

DVC Cheatsheet

DVC is a tool for data version control and reproducible machine learning workflows. It allows users to initialize a DVC environment, add files under DVC control, run commands to generate outputs, and reproduce or modify the pipeline by pulling and pushing data between a local cache and remote storage. Common commands include dvc init, dvc add, dvc run, dvc repro, dvc push, dvc pull and dvc status.

Uploaded by

Etienne Koa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views

DVC Cheatsheet

DVC is a tool for data version control and reproducible machine learning workflows. It allows users to initialize a DVC environment, add files under DVC control, run commands to generate outputs, and reproduce or modify the pipeline by pulling and pushing data between a local cache and remote storage. Common commands include dvc init, dvc add, dvc run, dvc repro, dvc push, dvc pull and dvc status.

Uploaded by

Etienne Koa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Retrieve Data

Download files from the remote storage https://github.com/iterative/dvc


Cheat Sheet $ dvc pull
https://dvc.org/chat
Download files from a specific .dvc file

$ dvc pull filename.dvc


Basics
Initializing
Checkout files from cache into working space
Other Commands
$ dvc checkout
Initialize a DVC environment Set/unset cache directory location

$ dvc init The Pipeline $ dvc cache dir /path

Add transformations and generate a Commit outputs to cache


Remote
Set up a remote to keep and share data files stage file from a given command
$ dvc commit
$ dvc run -d dependencyfile \
$ dvc remote add -d myremote /path *Use if you specified --no-commit in dvc add/run/repro
-o outputfile python command.py
*Possible remotes include local, s3, gs, azure, ssh, hdfs Config repository or global options
and http. *Use --file to specify the name of the generated .dvc file.
*Use --metrics to output a file containing the metric. $ dvc config
Show all available remotes
*Config the default remote using core.remote myremote
$ dvc remote list Metrics
*Config core (loglevel, remote), cache and state settings
Collect and display project metrics
Modify remote settings Fetch files from the remote to the local cache
$ dvc metrics show
$ dvc remote modify myremote $ dvc fetch file.dvc
*Use --all to show the metrics in all branches.
*Use if remote requires extra configuration Remove unused objects from cache
Visualizing
Adding Files $ dvc gc
Show stages in a pipeline
Add files under DVC control
$ dvc pipeline show --ascii file.dvc Import file from URL to local directory
$ dvc add filename
*Add --commands or --outs to show more detail. $ dvc import url /path
*Use --no-commit to stop adding the file to the cache.
Show connected pipelines of DVC stages *Supported schemes include local, s3, gs, azure, ssh, hdfs
Share Data and http.
$ dvc pipeline list
Push all data files to the remote storage Remove data files tracked by dvc

$ dvc push Reproducing $ dvc remove filename.dvc


Reproduce outputs defined in .dvc file
Push outputs of a specific .dvc file Show changed stages in the pipeline
$ dvc repro filename.dvc
$ dvc push filename.dvc $ dvc status
*Name a .dvc file “Dvcfile” to be use by dvc repro by default

Made by Carl Handlin based on the documentation for DVC at https://dvc.org/doc

You might also like