0% found this document useful (0 votes)
20 views

Conversion Guide R Python Data Manipulation

This document provides a summary of key concepts for data manipulation and transformation in R and Python: - It outlines common commands for file management, exploring data, filtering and selecting subsets of data, and performing mathematical operations in both R and Python. - It also summarizes common data types that can be contained in columns for R and Python, as well as typical transformations for concatenating, reshaping and changing the dimensions of data frames between the two languages. - The document is intended as a conversion guide between the two popular languages for data science, R and Python, to assist users in mapping functions and concepts between them.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Conversion Guide R Python Data Manipulation

This document provides a summary of key concepts for data manipulation and transformation in R and Python: - It outlines common commands for file management, exploring data, filtering and selecting subsets of data, and performing mathematical operations in both R and Python. - It also summarizes common data types that can be contained in columns for R and Python, as well as typical transformations for concatenating, reshaping and changing the dimensions of data frames between the two languages. - The document is intended as a conversion guide between the two popular languages for data science, R and Python, to assist users in mapping functions and concepts between them.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

15.

003 Software Tools — Data Science Afshine Amidi & Shervine Amidi

Conversion Guide between R and Python: R Data type Python Data type Description
character String-related data
Data manipulation object
String-related data that can
factor
be put in bucket, or ordered
numeric float64 Numerical data
Afshine Amidi and Shervine Amidi int int64 Numeric data that are integer
POSIXct datetime64 Timestamps
August 21, 2020

Main concepts Data preprocessing

r File management – The table below summarizes the useful commands to make sure the r Filtering – We can filter rows according to some conditions as follows:
working directory is correctly set:
R
df %>%
Category R Command Python Command ..filter(some_col some_operation some_value_or_list_or_col)
setwd(path) os.chdir(path)
where some_operation is one of the following:
Paths getwd() os.getcwd()
file.path(path_1, ..., path_n) os.path.join(path_1, ..., path_n) Category R Command Python Command
list.files( == / != == / !=
path, include.dirs = TRUE os.listdir(path)
) Basic <, <=, >=, > <, <=, >=, >

file_test(’-f’, path) os.path.isfile(path) &/| &/|


Files
file_test(’-d’, path) os.path.isdir(path) is.na() pd.isnull()

read.csv(path_to_csv_file) pd.read_csv(path_to_csv_file) Advanced %in% (val_1, ..., val_n) .isin([val_1, ..., val_n])

write.csv(df, path_to_csv_file) df.to_csv(path_to_csv_file) %like% ’val’ .str.contains(’val’)

r Mathematical operations – The table below sums up the main mathematical operations
r Exploring the data – The table below summarizes the main functions used to get a complete that can be performed on columns:
overview of the data:
Operation R Command Python Command
Category R Command Python Command √
x sqrt(x) np.sqrt(x)
df %>% select(col_list) df[col_list]
bxc floor(x) np.floor(x)
Look at data df %>% head(n) / df %>% tail(n) df.head(n) / df.tail(n)
dxe ceiling(x) np.ceil(x)
df %>% summary() df.describe()
df %>% str() df.dtypes / df.info()
Data types
df %>% NROW() / df %>% NCOL() df.shape
Data frame transformation
r Common transformations – The common data frame transformations are summarized in
r Data types – The table below sums up the main data types that can be contained in columns: the table below:

Massachusetts Institute of Technology 1 https://www.mit.edu/~amidi


15.003 Software Tools — Data Science Afshine Amidi & Shervine Amidi

Category R Command Python Command


rbind(df_1, ..., df_n) pd.concat([df_1, ..., df_n], axis=0)
Concatenation
cbind(df_1, ..., df_n) pd.concat([df_1, ..., df_n], axis=1)
pd.pivot_table(
df, values=’some_values’,
spread(df, key, value) index=’some_index’,
columns=’some_column’,
Dimension change aggfunc=np.sum
)
pd.melt(
gather(df, key, value) df, id_vars=’variable’,
value_vars=’other_variable’
)

Massachusetts Institute of Technology 2 https://www.mit.edu/~amidi

You might also like