0% found this document useful (0 votes)
934 views

Cheat Sheet: Python For Data Science

This document provides a cheat sheet on using Python Pandas for data science. It covers topics such as importing and exporting data, viewing DataFrame contents, data selection and slicing, grouping data, descriptive statistics, and creating test data. Pandas allows working with labeled data structures similar to R data frames in Python. It provides easy to use data structures and tools for data analysis and manipulation.

Uploaded by

Shishir Ray
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
934 views

Cheat Sheet: Python For Data Science

This document provides a cheat sheet on using Python Pandas for data science. It covers topics such as importing and exporting data, viewing DataFrame contents, data selection and slicing, grouping data, descriptive statistics, and creating test data. Pandas allows working with labeled data structures similar to R data frames in Python. It provides easy to use data structures and tools for data analysis and manipulation.

Uploaded by

Shishir Ray
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

PYTHON FOR DATA Importing Data Operations Oper

Arithmetic Operations:


ations -
G r - oReturns
df.groupby(column) u paBgroupby
y object for values
SCIENCE •


pd.read_csv(filename)

pd.read_table(filename)
View DataFrame Contents:
• df.head(n) - look at first n rows of the DataFrame. •
from one column
df.groupby([column1,column2]) - Returns a groupby

CHEAT SHEET •


pd.read_excel(filename)

pd.read_sql(query, connection_object)


df.tail(n) – look at last n rows of the DataFrame.
df.shape() - Gives the number of rows and columns. •
object values from multiple columns
df.groupby(column1)[column2].mean() - Returns the
• df.info() - Information of Index, Datatype and Memory. mean of the values in column2, grouped by the values in
• pd.read_json(json_string)
Python Pandas • df.describe() -Summary statistics for numerical column1
columns. • df.groupby(column1)[column2].median() - Returns the
Selection: mean of the values in column2, grouped by the values in

What is Pandas? Exporting Data • iloc column1


• df.iloc[0] - Select first row of data frame
• df.to_csv(filename)
It is a library that provides easy to use data structure and • df.iloc[1] - Select second row of data frame
data analysis tool for Python Programming Language. • df.to_excel(filename) • df.iloc[-1] - Select last row of data frame
Functions
• df.to_sql(table_name, connection_object) • df.iloc[:,0] - Select first column of data frame
Mean:
• df.to_json(filename) • df.iloc[:,1] - Select second column of data
Import Convention • df.mean() - mean of all columns
frame
Median
• loc
import pandas as pd – Import pasdas • df.median() - median of each column
• df.loc([0], [column labels])- Select single
Create Test/Fake value by row position & column labels
Standard Deviation
Data • df.loc['row1':'row3', 'column1':'column3’]-
• df.std() - standard deviation of each column
Pandas Data Max
• pd.DataFrame(np.random.rand(4,3)) - 3 columns and 4 Select and slicing on labels
Structure • df.max() - highest value in each column
rows of random floats Sort:
• df.sort_index() - Sorts by labels along an axis Min
• pd.Series(new_series) - Creates a series from an
• df.sort_values by='Column label’ - Sorts by the values • df.min() - lowest value in each column
• Series: iterable new_series
along an axis Count
s = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
• • df.count() - number of non-null values in each DataFrame
• Data Frame: df.sort_values(column1) - Sorts values by column1 in
ascending order column
data_mobile = {'Mobile': ['iPhone', 'Samsung',
• Describe
'Redmi'], 'Color': ['Red', 'White', 'Black'], 'Price': [High, Plotting df.sort_values(column2,ascending=False) - Sorts
values by column2 in descending order • df.describe() - Summary statistics for numerical columns
Medium,Low]}
• Histogram: df.plot.hist()
df = pd.DataFrame(data_mobile,
• Scatter Plot: df.plot.scatter(x='column1',y='column2')
columns=['Mobile', 'Color', 'Price'])
FURTHERMORE:
Python for Data Science Certification Training Course

You might also like