0% found this document useful (0 votes)

12 views

Data - DSPy

Uploaded by

anapaula.althoff

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Data - DSPy

Uploaded by

anapaula.althoff

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

22/10/2024, 15:36 Data - DSPy

Data

DSPy is a machine learning framework, so working in it involves training sets, development

sets, and test sets.

For each example in your data, we distinguish typically between three types of values: the
inputs, the intermediate labels, and the final label. You can use DSPy effectively without any
intermediate or final labels, but you will need at least a few example inputs.

How much data do I need and how do I collect data for my

task?
Concretely, you can use DSPy optimizers usefully with as few as 10 example inputs, but having
50-100 examples (or even better, 300-500 examples) goes a long way.

How can you get examples like these? If your task is extremely unusual, please invest in
preparing ~10 examples by hand. Often times, depending on your metric below, you just need
inputs and not labels, so it's not that hard.

However, chances are that your task is not actually that unique. You can almost always find
somewhat adjacent datasets on, say, HuggingFace datasets or other forms of data that you
can leverage here.

If there's data whose licenses are permissive enough, we suggest you use them. Otherwise, you
can also start using/deploying/demoing your system and collect some initial data that way.

DSPy Example objects

The core data type for data in DSPy is Example . You will use Examples to represent items in
your training set and test set.

DSPy Examples are similar to Python dict s but have a few useful utilities. Your DSPy modules
will return values of the type Prediction , which is a special sub-class of Example .

When you use DSPy, you will do a lot of evaluation and optimization runs. Your individual
datapoints will be of type Example :
Ask AI
qa_pair = dspy.Example(question="This is a question?", answer="This is an
answer.")

print(qa_pair)

https://dspy-docs.vercel.app/building-blocks/4-data/ 1/4
22/10/2024, 15:36 Data - DSPy

print(qa_pair.question)
print(qa_pair.answer)

Output:

Example({'question': 'This is a question?', 'answer': 'This is an answer.'})

(input_keys=None)
This is a question?
This is an answer.

Examples can have any field keys and any value types, though usually values are strings.

object = Example(field1=value1, field2=value2, field3=value3, ...)

You can now express your training set for example as:

trainset = [dspy.Example(report="LONG REPORT 1", summary="short summary 1"),

...]

Specifying Input Keys

In traditional ML, there are separated "inputs" and "labels".

In DSPy, the Example objects have a with_inputs() method, which can mark specific fields
as inputs. (The rest are just metadata or labels.)

# Single Input.
print(qa_pair.with_inputs("question"))

# Multiple Inputs; be careful about marking your labels as inputs unless you
mean it.
print(qa_pair.with_inputs("question", "answer"))

Values can be accessed using the . (dot) operator. You can access the value of key name in
defined object Example(name="John Doe", job="sleep") through object.name .

To access or exclude certain keys, use inputs() and labels() methods to return new
Example objects containing only input or non-input keys, respectively.

article_summary = dspy.Example(article= "This is an article.", summary= "This

is a summary.").with_inputs("article")

input_key_only = article_summary.inputs()
non_input_key_only = article_summary.labels()
Ask AI
print("Example object with Input fields only:", input_key_only)
print("Example object with Non-Input fields only:", non_input_key_only)

Output

https://dspy-docs.vercel.app/building-blocks/4-data/ 2/4
22/10/2024, 15:36 Data - DSPy

Example object with Input fields only: Example({'article': 'This is an

article.'}) (input_keys=None)
Example object with Non-Input fields only: Example({'summary': 'This is a
summary.'}) (input_keys=None)

Loading Dataset from sources

One of the most convinient way to import dataset in DSPy is by using DataLoader . The first
step is to declare an object, this object can then be used to call utilities to load datasets in
different formats:

from dspy.datasets import DataLoader

dl = DataLoader()

For most dataset formats, it's quite straightforward you pass the file path to the corresponding
method of the format and you'll get the list of Example for the dataset in return:

import pandas as pd

csv_dataset = dl.from_csv(
"sample_dataset.csv",
fields=("instruction", "context", "response"),
input_keys=("instruction", "context")
)

json_dataset = dl.from_json(
"sample_dataset.json",
fields=("instruction", "context", "response"),
input_keys=("instruction", "context")
)

parquet_dataset = dl.from_parquet(
"sample_dataset.parquet",
fields=("instruction", "context", "response"),
input_keys=("instruction", "context")
)

pandas_dataset = dl.from_pandas(
pd.read_csv("sample_dataset.csv"), # DataFrame
fields=("instruction", "context", "response"),
input_keys=("instruction", "context")
)

These are some supported formats that DataLoader supports to load from file directly, in
Ask
backend most of these methods are leveraging load_dataset method from AI
datasets library
to load these formats. But when working with text data you often use HuggingFace datasets, in
order to import HF datasets in list of Example format we can use from_huggingface method:

https://dspy-docs.vercel.app/building-blocks/4-data/ 3/4
22/10/2024, 15:36 Data - DSPy

blog_alpaca = dl.from_huggingface(
"intertwine-expel/expel-blog"
input_keys=("title",)
)

You can access the dataset of the splits by calling key of the corresponding split:

from
train_split = blog_alpaca['train']

# Since this is the only split in the dataset we can split this into
# train and test split ourselves by slicing or sampling 75 rows from the train
# split for testing.
testset = train_split[:75]
trainset = train_split[75:]

The way you load a huggingface dataset using load_dataset is exactly how you load data it
via from_huggingface as well. This includes passing specific splits, subsplits, read
instructions, etc. For code snippets you can refer to the cheatsheet snippets for loading from
HF.

Ask AI

https://dspy-docs.vercel.app/building-blocks/4-data/ 4/4

Learn JavaScript in 24 Hours
From Everand
Learn JavaScript in 24 Hours
Alex Nordeen
3.5/5 (5)
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Java 8 Programmer II Study Guide: Exam 1Z0-809
From Everand
Java 8 Programmer II Study Guide: Exam 1Z0-809
Esteban Herrera
4/5 (1)
Python For Data Science Extended Ebook PDF
100% (5)
Python For Data Science Extended Ebook PDF
56 pages
Python Libraries and Packages For Data Science
100% (1)
Python Libraries and Packages For Data Science
5 pages
OFDM Matlab Code
100% (1)
OFDM Matlab Code
5 pages
IGNOU PGDCA MCS 206 Object Oriented Programming using Java Previous Years solved Papers
From Everand
IGNOU PGDCA MCS 206 Object Oriented Programming using Java Previous Years solved Papers
Manish Soni
No ratings yet
Modules - DSPy
No ratings yet
Modules - DSPy
3 pages
10 Lessons in Front-end
From Everand
10 Lessons in Front-end
Krasimir Tsonev
2/5 (1)
Java Programming Tutorial With Screen Shots & Many Code Example
From Everand
Java Programming Tutorial With Screen Shots & Many Code Example
Desmond Ohwofosirai
No ratings yet
Software Design Simplified
From Everand
Software Design Simplified
Liviu Catalin Dorobantu
No ratings yet
Lesson 03 3.01 Python Libraries For Data Science
No ratings yet
Lesson 03 3.01 Python Libraries For Data Science
79 pages
MLP Week 1 Lecture 7
No ratings yet
MLP Week 1 Lecture 7
14 pages
Lecturer2_Basic of Python
No ratings yet
Lecturer2_Basic of Python
45 pages
Blazor and API Example: Classroom Quiz Application
From Everand
Blazor and API Example: Classroom Quiz Application
Taurius Litvinavicius
No ratings yet
DATABASE From the conceptual model to the final application in Access, Visual Basic, Pascal, Html and Php: Inside, examples of applications created with Access, Visual Studio, Lazarus and Wamp
From Everand
DATABASE From the conceptual model to the final application in Access, Visual Basic, Pascal, Html and Php: Inside, examples of applications created with Access, Visual Studio, Lazarus and Wamp
Olga Maria Stefania Cucaro
No ratings yet
Core Java Programming Book
From Everand
Core Java Programming Book
Manish Soni
No ratings yet
Machine Learning - Python Libraries
No ratings yet
Machine Learning - Python Libraries
12 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet
50 Java Concepts Every Developer Should Know
From Everand
50 Java Concepts Every Developer Should Know
Hernando Abella
No ratings yet
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
Topic 1 IntroductionToNumpy-2
No ratings yet
Topic 1 IntroductionToNumpy-2
7 pages
Exp-1
No ratings yet
Exp-1
22 pages
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Python For Data Science Quickstart Guide
No ratings yet
Python For Data Science Quickstart Guide
13 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
MVS JCL Utilities Quick Reference, Third Edition
From Everand
MVS JCL Utilities Quick Reference, Third Edition
Robert Wingate
5/5 (1)
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Introductiontocourse: 1 The Python Programming Language: Functions
No ratings yet
Introductiontocourse: 1 The Python Programming Language: Functions
11 pages
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
1.python Basics Edx
No ratings yet
1.python Basics Edx
98 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Top 18 Python Libraries
100% (1)
Top 18 Python Libraries
11 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
Python Numpy Array Tutorial (Article) - DataCamp
No ratings yet
Python Numpy Array Tutorial (Article) - DataCamp
40 pages
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
Python Programming Tutorial For Machine Learning Beginners Using
No ratings yet
Python Programming Tutorial For Machine Learning Beginners Using
13 pages
Python Data Science - A Beginner's Guide To Mastering Analysis, Visualization, and Machine Learning by A. Eich Liana
No ratings yet
Python Data Science - A Beginner's Guide To Mastering Analysis, Visualization, and Machine Learning by A. Eich Liana
86 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Week 1: 1 The Python Programming Language: Functions
No ratings yet
Week 1: 1 The Python Programming Language: Functions
9 pages
Numpy
No ratings yet
Numpy
37 pages
Python Reference: An Alphabetical Guide
From Everand
Python Reference: An Alphabetical Guide
Jo Foster
No ratings yet
Ai, Ds & ML
No ratings yet
Ai, Ds & ML
52 pages
Numpy
No ratings yet
Numpy
13 pages
Data Preparation
No ratings yet
Data Preparation
19 pages
DB2 11.1 for LUW: SQL Basic Training for Application Developers
From Everand
DB2 11.1 for LUW: SQL Basic Training for Application Developers
Robert Wingate
No ratings yet
FDS Lab Meterial CS3361
No ratings yet
FDS Lab Meterial CS3361
30 pages
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Numpy COEP
No ratings yet
Numpy COEP
36 pages
Lesson 03 Python Libraries For Data Science
No ratings yet
Lesson 03 Python Libraries For Data Science
190 pages
Interview Questions for IBM Mainframe Developers
From Everand
Interview Questions for IBM Mainframe Developers
Robert Wingate
1/5 (1)
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
Ass-1 Prac
No ratings yet
Ass-1 Prac
23 pages
Quick Python Guide
From Everand
Quick Python Guide
Coder1
No ratings yet
The Python Programming Language: Functions: Def Add - Numbers (X, Y)
No ratings yet
The Python Programming Language: Functions: Def Add - Numbers (X, Y)
21 pages
SQL Interview Success From Beginner To Pro
From Everand
SQL Interview Success From Beginner To Pro
Shana
No ratings yet
Oracle SQL and PL/SQL
From Everand
Oracle SQL and PL/SQL
Niraj Gupta
4.5/5 (8)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Extreme Site Survey AP Configuration - Wi-Fi Vitae
No ratings yet
Extreme Site Survey AP Configuration - Wi-Fi Vitae
15 pages
Switch Between Android GMS and Non GMS Builds
No ratings yet
Switch Between Android GMS and Non GMS Builds
1 page
So3 b1 Quick Quiz u8b PDF
No ratings yet
So3 b1 Quick Quiz u8b PDF
2 pages
An Overview of Security Challenges of Seaport IoT Systems - MIPRO 2019 - Saša Aksentijević
No ratings yet
An Overview of Security Challenges of Seaport IoT Systems - MIPRO 2019 - Saša Aksentijević
6 pages
1.cim Text Book
No ratings yet
1.cim Text Book
18 pages
PACISOFT Brochure ENG
No ratings yet
PACISOFT Brochure ENG
6 pages
GETVPN
No ratings yet
GETVPN
46 pages
How To Drive More Traffic To Your Website Using Sniply - JienneDR - Alpha Sunny Ace
No ratings yet
How To Drive More Traffic To Your Website Using Sniply - JienneDR - Alpha Sunny Ace
101 pages
Aman 4TH SEM
No ratings yet
Aman 4TH SEM
2 pages
Final Exam Schedule 2 SEM 22 23
No ratings yet
Final Exam Schedule 2 SEM 22 23
7 pages
HTML and Css Basic
No ratings yet
HTML and Css Basic
75 pages
Neso Note (5 DBMS Roles Including)
No ratings yet
Neso Note (5 DBMS Roles Including)
14 pages
Dr. Lisa Su: President and CEO
No ratings yet
Dr. Lisa Su: President and CEO
1 page
VBoxHardening - Copy
No ratings yet
VBoxHardening - Copy
127 pages
2023 BS6 Term One Computing
No ratings yet
2023 BS6 Term One Computing
5 pages
Agya Ram Verma_ Yatendra Kumar - Basic and Advance_ Phython Programming-Independently Published (2024)
No ratings yet
Agya Ram Verma_ Yatendra Kumar - Basic and Advance_ Phython Programming-Independently Published (2024)
240 pages
Cs411-Visual Programming: Finalterm Papers
100% (1)
Cs411-Visual Programming: Finalterm Papers
20 pages
BAPI Enhancement
No ratings yet
BAPI Enhancement
6 pages
Company Profile - WeNetworkllc
No ratings yet
Company Profile - WeNetworkllc
38 pages
Mobile Phone: Service Manual
No ratings yet
Mobile Phone: Service Manual
174 pages
Expanded Sorting Visualizer Project Report
No ratings yet
Expanded Sorting Visualizer Project Report
33 pages
Network Layer4 Transport
No ratings yet
Network Layer4 Transport
8 pages
(Ebook) Advanced computer architecture and parallel processing by Hesham El-Rewini, Mostafa Abd-El-Barr ISBN 9780471467403, 0471467405 - Download the ebook now for the best reading experience
100% (1)
(Ebook) Advanced computer architecture and parallel processing by Hesham El-Rewini, Mostafa Abd-El-Barr ISBN 9780471467403, 0471467405 - Download the ebook now for the best reading experience
60 pages
Journal of Computer Science and Informat
No ratings yet
Journal of Computer Science and Informat
192 pages
Practical C Programming 2nd Edition Steve Oualline instant download
100% (6)
Practical C Programming 2nd Edition Steve Oualline instant download
65 pages
AS372 Manual
No ratings yet
AS372 Manual
54 pages
ITR VN - Recruitment
No ratings yet
ITR VN - Recruitment
6 pages
Oracle Inv Flexfields
No ratings yet
Oracle Inv Flexfields
29 pages
M109 - TMA - Summer 2022
No ratings yet
M109 - TMA - Summer 2022
6 pages

Data - DSPy

Uploaded by

Data - DSPy

Uploaded by

22/10/2024, 15:36 Data - DSPy

DSPy is a machine learning framework, so working in it involves training sets, development

How much data do I need and how do I collect data for my

DSPy Example objects

Example({'question': 'This is a question?', 'answer': 'This is an answer.'})

object = Example(field1=value1, field2=value2, field3=value3, ...)

trainset = [dspy.Example(report="LONG REPORT 1", summary="short summary 1"),

Specifying Input Keys

In traditional ML, there are separated "inputs" and "labels".

article_summary = dspy.Example(article= "This is an article.", summary= "This

Example object with Input fields only: Example({'article': 'This is an

Loading Dataset from sources

from dspy.datasets import DataLoader

You might also like