0% found this document useful (0 votes)

73 views2 pages

Accelerated Data Science Getting Started Cheat Sheet Cudf 2003937 r4

This document provides a cheat sheet for getting started with GPU accelerated DataFrames in Python. It outlines how to create DataFrames from various data sources, extract properties from DataFrames and Series, save DataFrames to disk or convert to other formats, query and transform DataFrames, and perform string operations on Series. Functions are provided to load and save DataFrames from/to CSV, JSON, Parquet files and convert between cuDF and pandas DataFrames. DataFrames can be queried, transformed by applying custom functions, joined, aggregated and strings in Series can be analyzed using regular expressions.

Uploaded by

Junio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views2 pages

Accelerated Data Science Getting Started Cheat Sheet Cudf 2003937 r4

Uploaded by

Junio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

GPU Accelerated

DataFrames in Python
Getting Started Cheat Sheet

Try out enterprise solutions for free with NVIDIA LaunchPad.

Get started with immediate access to hands-on labs at nvidia.com/try-data-science

For additional cheat sheets go to: nvidia.com/rapids-kit/

CREATE PROPERTIES
Instantiate DataFrames from files and host memory. Extract properties from DataFrames and Series

Create a DataFrame. FOR SERIES

Retrieve rows and columns by index label.
cudf.DataFrame([1,2,3,4], columns=[‘foo’]) - from a list of elements
ser.loc[1] - row with index 1
cudf.DataFrame({‘foo’: [1,2,3,4], ‘bar’: [‘a’,’b’,’c’,None]}) - from a dictionary
of columns ser.loc[1:4] - row with indicies 1 to 4
cudf.DataFrame([(1,’a’), (2,’b’)], columns=[‘foo’,’bar’]) - from a list of tuples ser.values - Get an array of all elements.
cudf.from_pandas(pd.DataFrame([1,2,3,4], columns=[‘ints’])) - Convert
pandas DataFrame (CPU) to cuDF DataFrame (GPU). SAVE
Persist data to disk or convert to other memory representations.
cudf.read_csv(‘results.csv’) - Read contents of a CSV file.
df.to_csv(‘results.csv’) - Save cuDF DataFrame in a CSV format with index
cudf.read_csv(‘results.csv’, nrows=2, usecols=[‘foo’]) - Read two rows and and header.
column foo of a CSV file.
df.to_csv(‘results.csv’, index=False, header=False) - Save cuDF DataFrame in
cudf.read_csv(‘results.csv’, skiprows=1, names=[‘foo’,’bar’]) - Replace a CSV format without index and header.
column names when reading a CSV file.
df.to_dlpack() - Convert DataFrame to DLPack tensor for deep learning.
cudf.read_json(‘results.json’) - Read contents of a JSON file.
df.to_json(‘results.json’) - Save cuDF DataFrame in a JSON format.
cudf.read_json(‘results.json’, lines=True, engine=’cudf’) - Read contents of
lines-formatted JSON file using GPU. df.to_json(‘results.json’, orient=’records’, lines=True) - Save cuDF DataFrame in
a JSON Lines format.
cudf.read_parquet(‘results/df_default.parquet’) - Read contents of a
Parquet file. df.to_pandas() - Convert cuDF DataFrame (GPU) to pandas DataFrame
(CPU).
cudf.read_parquet(‘results/df_default.parquet’, columns=[‘foo’]) - Read
column foo from a Parquet file. df.to_parquet(‘results.parquet’) - Save cuDF DataFrame in a Parquet format.

Create a series.
QUERY
cudf.Series([0,1,2,3]) - from a list of elements Extract information from data.

df[‘foo’] - get column ‘foo’ from DataFrame as a cuDF Series df.head() - Retrieve top 5 rows from DataFrame.

df.head(2) - Retrieve top 2 rows from DataFrame.

PROPERTIES
Extract properties from DataFrames and Series df.memory_usage() - Learn how much memory your DataFrame consumes
(in bytes).
FOR DATAFRAMES
df.columns - Get a list of column names. df.nlargest(3, ‘foo’) - Retrieve 3 rows with largest values in column foo.

df.dtypes - Get a list of columns with data types. df.nsmallest(2, ‘foo’) - Retrieve 2 rows with smallest values in column foo.

Retrieve rows and columns by index label. df.query(‘foo == 1’) - Get all rows where column foo equals to 1.

df.loc[3] - row with index 3 df.query(‘foo > 10’) - Get all rows where column foo is greater than 10.

df.loc[3, ‘foo’] - row with index 3 and column ‘foo’ df.sample() - Fetch a random row.

df.loc[2:5, [‘foo’, ‘bar’]] - rows with labels 2 to 5 and columns ‘foo’ and ‘bar’ df.sample(3) - Fetch a random 3 rows.

df.shape - Know data shape (row #, col #) TRANSFORM

Alter the information and structure of DataFrames
df.size - Know total number of elements.
df.apply_rows(func, incols=[‘foo’], outcols={‘bar’: ‘float64’}, kwargs={}) - Apply
df.values - Get an array with all elements.
custom transformation defined in func to column foo and store in column bar.
TRANSFORM STRING
Alter the information and structure of DataFrames Operate on string columns on GPU.

def func(foo, bar): ser.str.contains(‘foo’) - Check if Series of strings contains foo.

for i, f in enumerate(foo):
bar[i] = f + 1 - Kernel definition to use in apply_rows() function. ser.str.contains(‘foo[a-z]+’) - Check if Series of strings contains words starting
with foo.
cudf.concat([df1, df2]) - Append a DataFrame to another DataFrame.
ser.str.extract(‘(foo)’) - Retrieve regex groups matching pattern in Series of strings.
df.drop(1) - Remove row with index equal to 1.
ser.str.extract(‘[a-z]+flow (\d)’) - Retrieve IDs of dataflows, workflows, etc.,
df.drop([1,2]) - Remove rows with index equal to 1 and 2. in Series of strings.

df.drop(‘foo’, axis=1) - Remove column foo. ser.str.findall(‘([a-z]+flow)’) - Retrieve all instances of words like dataflow,
workflow, etc.
df.dropna() - Remove rows with one or more missing values.
ser.str.len() - Find the total length of a string.
df.dropna(subset=’foo’) - Remove rows with a missing value in column foo.
ser.str.lower() - Cast all the letters in a string to lowercase characters.
df.fillna(-1) - Replace any missing value with a default.
ser.str.match(‘[a-z]+flow’) - Check if every element matches the pattern.
df.fillna({‘foo’: -1}) - Replace a missing value in column foo with a default.
ser.str.ngrams_tokenize(n=2, separator=‘_’) - Generate all bi-grams from a
df1.join(df2) - Join with a DataFrame on index. string separated by underscore.

df1.merge(df2, on=’foo’, how=’inner’) - Perform an inner join with a ser.str.pad(width=10) - Make every string of equal length.
DataFrame on column foo.
ser.str.pad(width=10, side=’both’, fillchar=’$’) - Make every string of equal
df1.merge(df2, left_on=’foo’, right_on=’bar’, how=’left’) - Perform a left length with word centered and padded with dollar signs.
outer join with a DataFrame on different keys.
ser.str.replace(‘foo’, ‘bar’) - Replace all instances of word foo with bar.
df.rename({‘foo’: ‘bar’}, axis=1) - Rename column foo to bar.
ser.str.replace(‘f..’, ‘bar’) - Replace all instances of 3-letter words beginning
df.rename({1: 101}) - Replace index 1 with value 101. with f with bar.

df.reset_index() - Replace index and retain the old one as a column. ser.str.split() - Split the string on spaces.

df.reset_index(drop=True) - Replace index and remove the old one. ser.str.split(‘,’, n=5) - Split the string on comma and retain only the first 5
occurences (6 column retains the remainder of the string).
df.set_index(‘foo’) - Replace index with the values of column foo.
tokens, masks, metadata = ser.str.subword_tokenize(‘hash.txt’) - Tokenize
df.set_index(‘foo’, drop =False) - Replace index with the values of column text using perfectly hashed BERT vocabulary.
foo and retain the column.
ser.str.upper() - Cast all the letters in a string to uppercase characters.
SUMMARIZE
Learn from data by aggregating and exploring. CATEGORICAL
Work with categorical columns on GPU.
df.groupby(by=’foo’).agg({‘bar’: ‘sum’, ‘baz’: ‘count’}) - Aggregate
DataFrame: sum elements of bar, count elements of baz by values of foo. ser.cat.add_categories([‘foo’,’bar’]) - Extend the list of categorical allowed values.

df.describe() - Learn basic statistics about DataFrame. ser.cat.categories - Retrieve the list of all categories.

df.describe(percentiles=[.1,.9]) - Learn basic statistics about DataFrame ser.cat.remove_categories([‘foo’]) - Remove the foo category from categorical column.
and only produce 1st and 9th decile.

df.max() - Learn the maximum value in each column. DATETIME

Deal with date and time columns on GPU.
df.max(axis=1) - Learn the maximum value in each row.
ser.dt.day - Extract day from DateTime column.
df.mean() - Learn the average value of each column.
ser.dt.dayofweek - Extract the day of a week from DataTime column.
df.mean(axis=1) - Learn the average value of each row.
ser.dt.year - Extract year from DateTime column.
df.min() - Learn the minimum value in each column.

df.min(axis=1) - Learn the minimum value in each row. MATH/STAT

Perform mathematical and statistical operations on columns.
df.quantile() - Learn the median of each column.
df.corr() - Calculate coefficient of correlation.
df.quantile(.25) - Learn the 1st quartile of each column.
df.exp() - Exponentiate values in all columns.
df.std() - Learn the standard deviation of each column.
df.kurt() - Find kurtosis of each column.
df.std(axis=1) - Learn the standard deviation of each row.
df.log() - Take a logarithm of values in all columns.
df.sum() - Get the sum of each column.
df.pow(2) - Raise values in all columns to the power of 2.
df.sum(axis=1) - Get the sum of each row.
df.skew() - Find skewness of each column.
ser.unique() - Find all unique values in Series.
df.sqrt() - Find root squares of values in all columns.

Semester 1 Midterm Exam PLSQL
100% (2)
Semester 1 Midterm Exam PLSQL
15 pages
User Administration - PostQuiz - Attempt Review
No ratings yet
User Administration - PostQuiz - Attempt Review
4 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Windows Command Prompt A-N
From Everand
Windows Command Prompt A-N
Prometheus MMS
5/5 (2)
What Is Pandas
No ratings yet
What Is Pandas
9 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
96 pages
Pandas Documentation PDF
No ratings yet
Pandas Documentation PDF
86 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
47 pages
Usage of NumPy For Numerical Data in Detail
No ratings yet
Usage of NumPy For Numerical Data in Detail
52 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
ELT Using Pandas
No ratings yet
ELT Using Pandas
5 pages
Pandas CheatSheet
No ratings yet
Pandas CheatSheet
18 pages
Rapids Cheatsheet
100% (1)
Rapids Cheatsheet
2 pages
Pandas
No ratings yet
Pandas
5 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Pandas Notes
No ratings yet
Pandas Notes
6 pages
Manipulating Dataframes - Beginner
No ratings yet
Manipulating Dataframes - Beginner
2 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Python Numpy Pandas CheatSheet
No ratings yet
Python Numpy Pandas CheatSheet
4 pages
# (Data Preprocessing) : (Cheatsheet)
No ratings yet
# (Data Preprocessing) : (Cheatsheet)
10 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Working With Panda
No ratings yet
Working With Panda
13 pages
Data Science Lab Manual..
No ratings yet
Data Science Lab Manual..
54 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
Data Handling Module
No ratings yet
Data Handling Module
10 pages
Python Library Functions
No ratings yet
Python Library Functions
12 pages
Pandas para Analisis de Datos
No ratings yet
Pandas para Analisis de Datos
10 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Dataframe in Pandas - Cheatsheet
No ratings yet
Dataframe in Pandas - Cheatsheet
8 pages
Pandas Library Documentation
No ratings yet
Pandas Library Documentation
16 pages
Dejene Chala Stat606 Screening Quiz Programming Part
No ratings yet
Dejene Chala Stat606 Screening Quiz Programming Part
12 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Chapter 4 - Python For Data Analysis
No ratings yet
Chapter 4 - Python For Data Analysis
47 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
Unit 3 (FODS)
No ratings yet
Unit 3 (FODS)
34 pages
Python ClassXII AI
No ratings yet
Python ClassXII AI
4 pages
IP Practical File Questions
No ratings yet
IP Practical File Questions
1 page
Python Programming For Data Science
No ratings yet
Python Programming For Data Science
36 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Pandas
No ratings yet
Pandas
94 pages
Pandas Cheat Sheet........
No ratings yet
Pandas Cheat Sheet........
11 pages
Pandas Roadmap
No ratings yet
Pandas Roadmap
6 pages
Python For ML
No ratings yet
Python For ML
41 pages
Python For DA
100% (2)
Python For DA
47 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Mastering Shell Commands On Linux
From Everand
Mastering Shell Commands On Linux
Urko Galen
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Java Programming Tutorial With Screen Shots & Many Code Example
From Everand
Java Programming Tutorial With Screen Shots & Many Code Example
Desmond Ohwofosirai
No ratings yet
Linux Commands By Example
From Everand
Linux Commands By Example
Khaled Jamal
4.5/5 (3)
Agile Product Development
No ratings yet
Agile Product Development
4 pages
Pandas Notes 1
No ratings yet
Pandas Notes 1
6 pages
CI - CD DevOps Pipeline Project
No ratings yet
CI - CD DevOps Pipeline Project
29 pages
Assembly Systems in Industry 4.0 Era: A Road Map To Understand Assembly 4.0
No ratings yet
Assembly Systems in Industry 4.0 Era: A Road Map To Understand Assembly 4.0
18 pages
Assignment 1: Operations Research
No ratings yet
Assignment 1: Operations Research
2 pages
BMC+Automation+Console+20.02 Home 04 21 2020
100% (2)
BMC+Automation+Console+20.02 Home 04 21 2020
168 pages
HUAWEI IdeaHub S2 Datasheet (Simplified Edition) - For Printing
No ratings yet
HUAWEI IdeaHub S2 Datasheet (Simplified Edition) - For Printing
10 pages
Artificial Intelligence - KCS701 - 2022-23 - AKTU - Solution - PDF.PDF - Crdownload
No ratings yet
Artificial Intelligence - KCS701 - 2022-23 - AKTU - Solution - PDF.PDF - Crdownload
28 pages
Crayon Cloud Security Assessment OnePager
No ratings yet
Crayon Cloud Security Assessment OnePager
1 page
Learn CSS - The Box Model Cheatsheet - Codecademy
No ratings yet
Learn CSS - The Box Model Cheatsheet - Codecademy
2 pages
Client-Server Network
No ratings yet
Client-Server Network
6 pages
Lab - Exp - 7 (Dealing With Polymorphism and Inheritance)
No ratings yet
Lab - Exp - 7 (Dealing With Polymorphism and Inheritance)
4 pages
Payment Voucher 9
No ratings yet
Payment Voucher 9
5 pages
Strata Box
No ratings yet
Strata Box
54 pages
Makalah Obat Sistem Pencernaan - PDF
No ratings yet
Makalah Obat Sistem Pencernaan - PDF
46 pages
Journals Management System
No ratings yet
Journals Management System
44 pages
Updated - Final - SPPU - Internship - Evaluation Guidelines-Work - Book - 2023-2024-5
No ratings yet
Updated - Final - SPPU - Internship - Evaluation Guidelines-Work - Book - 2023-2024-5
10 pages
Iot Unit - 2
No ratings yet
Iot Unit - 2
9 pages
Pentest Example
No ratings yet
Pentest Example
15 pages
Factory Acceptance Test (FAT) of A PLC Panel - A Step-by-Step Basic Guide
100% (1)
Factory Acceptance Test (FAT) of A PLC Panel - A Step-by-Step Basic Guide
13 pages
Eti PDF
No ratings yet
Eti PDF
16 pages
Crit - B - Record - of - Tasks IA
No ratings yet
Crit - B - Record - of - Tasks IA
3 pages
Developer's Guide: Driving Tivoli Workload Automation
No ratings yet
Developer's Guide: Driving Tivoli Workload Automation
88 pages
SC-300 Reviewer
No ratings yet
SC-300 Reviewer
23 pages
Christmas Around The World Powerpoint
No ratings yet
Christmas Around The World Powerpoint
27 pages
Week 2 - The Data Engineering Ecosystem
No ratings yet
Week 2 - The Data Engineering Ecosystem
21 pages
How Instana Works - Detailed Presentation - V1 - 0
No ratings yet
How Instana Works - Detailed Presentation - V1 - 0
55 pages
ATM Personal Identification Pin Theft Avoidance System
No ratings yet
ATM Personal Identification Pin Theft Avoidance System
3 pages

Accelerated Data Science Getting Started Cheat Sheet Cudf 2003937 r4

Uploaded by

Accelerated Data Science Getting Started Cheat Sheet Cudf 2003937 r4

Uploaded by

GPU Accelerated

Try out enterprise solutions for free with NVIDIA LaunchPad.

Get started with immediate access to hands-on labs at nvidia.com/try-data-science

Create a DataFrame. FOR SERIES

df.head(2) - Retrieve top 2 rows from DataFrame.

df.shape - Know data shape (row #, col #) TRANSFORM

def func(foo, bar): ser.str.contains(‘foo’) - Check if Series of strings contains foo.

df.max() - Learn the maximum value in each column. DATETIME

df.min(axis=1) - Learn the minimum value in each row. MATH/STAT

You might also like