0% found this document useful (0 votes)

94 views5 pages

DVC Cheatsheet

DVC (Data Version Control) is a system designed for managing machine learning projects, enabling users to track, share, and reproduce experiments. Key functionalities include initializing a repository, tracking experiments, defining pipelines, logging metrics, and managing remote storage. The cheat sheet provides commands for various operations, including adding files, pushing data, and visualizing pipelines.

Uploaded by

khushirajpurohit617

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views5 pages

DVC Cheatsheet

Uploaded by

khushirajpurohit617

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

DVC (Data Version Control) Cheat Sheet

DVC is a version control system for machine learning projects, allowing you to track, share, and

reproduce your experiments.

---

1. Getting Started

Initialize DVC in a Repository:

dvc init

Add Files or Directories to DVC:

dvc add <file_or_directory>

Commit Changes:

1. Use Git to commit the .dvc file:

git add <file>.dvc .gitignore

git commit -m "Track data with DVC"

Configure Remote Storage:

dvc remote add -d myremote <remote_storage_url>

Push Data to Remote Storage:

dvc push
Pull Data from Remote Storage:

dvc pull

---

2. Tracking Experiments

Run an Experiment:

dvc repro

Track Parameters:

Specify parameters in a params.yaml file and link them to stages in the pipeline.

Example params.yaml:

learning_rate: 0.01

batch_size: 32

---

3. Pipelines

Define a Pipeline Stage:

dvc stage add -n <stage_name> -d <dependency> -o <output> <command>

Example:

dvc stage add -n train -d train.py -d data.csv -o model.pkl python train.py

Visualize the Pipeline:

dvc dag
Run the Entire Pipeline:

dvc repro

---

4. Metrics and Plots

Log Metrics:

Use a metrics.json or similar file to store metrics:

"accuracy": 0.95,

"loss": 0.05

Track the metrics file:

dvc metrics add metrics.json

Visualize Plots:

Use DVC to generate plots from tracked data files:

dvc plots show <file>

---

5. Versioning Data

Check File Status:

dvc status

Remove Data but Keep Track:

dvc remove <file>.dvc

Checkout Specific Versions:

git checkout <commit_hash>

dvc checkout

---

6. Sharing Projects

Push Project to Git and DVC Remote:

git push

dvc push

Clone a Repository and Retrieve Data:

git clone <repo_url>

dvc pull

---

7. Useful Commands

Show Pipeline Stages:

dvc stage list

Remove Cache:

dvc gc

Show Differences in Metrics:

dvc metrics diff

---

8. Remote Storage Options

DVC supports various remote storage backends:

- AWS S3: s3://bucket-name/path

- Google Drive: gdrive://<folder-id>

- Azure Blob Storage: azure://container-name/path

- SSH: ssh://user@server:/path

- Local Directory: /path/to/storage

Configure remotes using:

dvc remote add -d <name> <url>

---

9. Useful Links

- Official Documentation: https://dvc.org/doc

- DVC GitHub: https://github.com/iterative/dvc

Data Engineering With Databricks Da
100% (3)
Data Engineering With Databricks Da
232 pages
De Mod 1 Get Started With Databricks Data Science and Engineering Workspace
No ratings yet
De Mod 1 Get Started With Databricks Data Science and Engineering Workspace
27 pages
Git - Learn Version Control With Git - A Step-By-step Ultimate Beginners Guide
100% (3)
Git - Learn Version Control With Git - A Step-By-step Ultimate Beginners Guide
105 pages
MLOPs Final
No ratings yet
MLOPs Final
54 pages
DVC4ML
No ratings yet
DVC4ML
40 pages
DVC Cheatsheet
No ratings yet
DVC Cheatsheet
1 page
DVC Cheatsheet
No ratings yet
DVC Cheatsheet
1 page
Build Reliable Machine Learning Pipelines With Continuous Integration
No ratings yet
Build Reliable Machine Learning Pipelines With Continuous Integration
22 pages
Homework 5 Yessine Labyedh
No ratings yet
Homework 5 Yessine Labyedh
28 pages
MLOps Research Work by Arka Roy
No ratings yet
MLOps Research Work by Arka Roy
21 pages
Unit - 3 MLMM
No ratings yet
Unit - 3 MLMM
40 pages
HZDR Publications 17653
No ratings yet
HZDR Publications 17653
27 pages
06 VersionControl
No ratings yet
06 VersionControl
13 pages
21BEC047 - Version Control in Software Engineering
No ratings yet
21BEC047 - Version Control in Software Engineering
18 pages
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
From Everand
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
Steve Brown
No ratings yet
04 Version Control
No ratings yet
04 Version Control
37 pages
Git 1
No ratings yet
Git 1
60 pages
22CBL54 Devops Lab Manual
No ratings yet
22CBL54 Devops Lab Manual
44 pages
DevOps Lab Manual 2021
No ratings yet
DevOps Lab Manual 2021
53 pages
Introduction To Devops
No ratings yet
Introduction To Devops
5 pages
Interviwe Questions Devops
No ratings yet
Interviwe Questions Devops
61 pages
Git and GItHub
No ratings yet
Git and GItHub
3 pages
Some Tutorials in Computer Networking Hacking
From Everand
Some Tutorials in Computer Networking Hacking
Dr. Hidaia Mahmood Alassouli
No ratings yet
LPIC-1 Primer
From Everand
LPIC-1 Primer
John Greene
4.5/5 (3)
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Github Setup Guide
No ratings yet
Github Setup Guide
7 pages
Imagecon MLops Syllabus
No ratings yet
Imagecon MLops Syllabus
6 pages
GitHub Foundations Exam Prep: 500 Practice Questions with Detailed Explanations
From Everand
GitHub Foundations Exam Prep: 500 Practice Questions with Detailed Explanations
Satou Takahiro
No ratings yet
Advanced Penetration Testing for Highly-Secured Environments: The Ultimate Security Guide
From Everand
Advanced Penetration Testing for Highly-Secured Environments: The Ultimate Security Guide
Allen Lee
4.5/5 (6)
Paper 3
No ratings yet
Paper 3
10 pages
Subversion Version Control Certification
No ratings yet
Subversion Version Control Certification
7 pages
DevOps - Semester 2 - Module 02 - v1.0.0 - PPT
No ratings yet
DevOps - Semester 2 - Module 02 - v1.0.0 - PPT
33 pages
6 Open Source Data Science Projects Interviewer
No ratings yet
6 Open Source Data Science Projects Interviewer
7 pages
Professional Team Foundation Server 2013
From Everand
Professional Team Foundation Server 2013
Damian Brady
No ratings yet
P4V User Guide: September 2018
No ratings yet
P4V User Guide: September 2018
188 pages
DevOps With AWS Cloud Syllabus
No ratings yet
DevOps With AWS Cloud Syllabus
15 pages
bd1718 10 Spark
No ratings yet
bd1718 10 Spark
55 pages
02-Version Control Git PDF
No ratings yet
02-Version Control Git PDF
24 pages
Troubleshooting Ubuntu Server
From Everand
Troubleshooting Ubuntu Server
Bhargav Skanda
No ratings yet
SPM - Chapter - 7 Design and Programming
No ratings yet
SPM - Chapter - 7 Design and Programming
15 pages
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Week 8 - Lecture Notes
No ratings yet
Week 8 - Lecture Notes
75 pages
Tech Report
No ratings yet
Tech Report
13 pages
05 Versioning
No ratings yet
05 Versioning
47 pages
Hallo Docker: Learning Docker Containers by Doing Projects
From Everand
Hallo Docker: Learning Docker Containers by Doing Projects
Agus Kurniawan
No ratings yet
Version Control New-Print
No ratings yet
Version Control New-Print
67 pages
Week 3
No ratings yet
Week 3
14 pages
In9040 PHD Presentation Selimozcan 2
No ratings yet
In9040 PHD Presentation Selimozcan 2
36 pages
Data Versioning For Graph Databases
No ratings yet
Data Versioning For Graph Databases
71 pages
Version Control Systems: Phil Pratt-Szeliga Fall 2010
No ratings yet
Version Control Systems: Phil Pratt-Szeliga Fall 2010
21 pages
Professional Node.js: Building Javascript Based Scalable Software
From Everand
Professional Node.js: Building Javascript Based Scalable Software
Pedro Teixeira
No ratings yet
Techtalk Devops Slide Deck
No ratings yet
Techtalk Devops Slide Deck
65 pages
Version Control Systems Example With SVN
No ratings yet
Version Control Systems Example With SVN
14 pages
CH 2
No ratings yet
CH 2
29 pages
Version Control With Git
No ratings yet
Version Control With Git
71 pages
Deployment
No ratings yet
Deployment
23 pages
Puppet - DevOps For Netops
No ratings yet
Puppet - DevOps For Netops
32 pages
SEPM Lab Manual Without Code
No ratings yet
SEPM Lab Manual Without Code
62 pages
Denodo Developper Prep
No ratings yet
Denodo Developper Prep
8 pages
BDA - Unit-3
No ratings yet
BDA - Unit-3
14 pages

DVC Cheatsheet

Uploaded by

DVC Cheatsheet

Uploaded by

DVC (Data Version Control) Cheat Sheet

reproduce your experiments.

Initialize DVC in a Repository:

Add Files or Directories to DVC:

dvc add <file_or_directory>

1. Use Git to commit the .dvc file:

git add <file>.dvc .gitignore

git commit -m "Track data with DVC"

Configure Remote Storage:

dvc remote add -d myremote <remote_storage_url>

Push Data to Remote Storage:

Define a Pipeline Stage:

dvc stage add -n <stage_name> -d <dependency> -o <output> <command>

dvc stage add -n train -d train.py -d data.csv -o model.pkl python train.py

Visualize the Pipeline:

4. Metrics and Plots

Use a metrics.json or similar file to store metrics:

Track the metrics file:

dvc metrics add metrics.json

Use DVC to generate plots from tracked data files:

dvc plots show <file>

Check File Status:

Remove Data but Keep Track:

dvc remove <file>.dvc

Checkout Specific Versions:

git checkout <commit_hash>

Push Project to Git and DVC Remote:

Clone a Repository and Retrieve Data:

git clone <repo_url>

Show Pipeline Stages:

dvc stage list

Show Differences in Metrics:

dvc metrics diff

8. Remote Storage Options

DVC supports various remote storage backends:

- AWS S3: s3://bucket-name/path

- Google Drive: gdrive://<folder-id>

- Azure Blob Storage: azure://container-name/path

- Local Directory: /path/to/storage

Configure remotes using:

dvc remote add -d <name> <url>

- Official Documentation: https://dvc.org/doc

- DVC GitHub: https://github.com/iterative/dvc

You might also like