PYTHON FOR DATA Importing Data Operations Oper
Arithmetic Operations:
•
ations -
G r - oReturns
df.groupby(column) u paBgroupby
y object for values
SCIENCE •
•
pd.read_csv(filename)
pd.read_table(filename)
View DataFrame Contents:
• df.head(n) - look at first n rows of the DataFrame. •
from one column
df.groupby([column1,column2]) - Returns a groupby
CHEAT SHEET •
•
pd.read_excel(filename)
pd.read_sql(query, connection_object)
•
•
df.tail(n) – look at last n rows of the DataFrame.
df.shape() - Gives the number of rows and columns. •
object values from multiple columns
df.groupby(column1)[column2].mean() - Returns the
• df.info() - Information of Index, Datatype and Memory. mean of the values in column2, grouped by the values in
• pd.read_json(json_string)
Python Pandas • df.describe() -Summary statistics for numerical column1
columns. • df.groupby(column1)[column2].median() - Returns the
Selection: mean of the values in column2, grouped by the values in
What is Pandas? Exporting Data • iloc column1
• df.iloc[0] - Select first row of data frame
• df.to_csv(filename)
It is a library that provides easy to use data structure and • df.iloc[1] - Select second row of data frame
data analysis tool for Python Programming Language. • df.to_excel(filename) • df.iloc[-1] - Select last row of data frame
Functions
• df.to_sql(table_name, connection_object) • df.iloc[:,0] - Select first column of data frame
Mean:
• df.to_json(filename) • df.iloc[:,1] - Select second column of data
Import Convention • df.mean() - mean of all columns
frame
Median
• loc
import pandas as pd – Import pasdas • df.median() - median of each column
• df.loc([0], [column labels])- Select single
Create Test/Fake value by row position & column labels
Standard Deviation
Data • df.loc['row1':'row3', 'column1':'column3’]-
• df.std() - standard deviation of each column
Pandas Data Max
• pd.DataFrame(np.random.rand(4,3)) - 3 columns and 4 Select and slicing on labels
Structure • df.max() - highest value in each column
rows of random floats Sort:
• df.sort_index() - Sorts by labels along an axis Min
• pd.Series(new_series) - Creates a series from an
• df.sort_values by='Column label’ - Sorts by the values • df.min() - lowest value in each column
• Series: iterable new_series
along an axis Count
s = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
• • df.count() - number of non-null values in each DataFrame
• Data Frame: df.sort_values(column1) - Sorts values by column1 in
ascending order column
data_mobile = {'Mobile': ['iPhone', 'Samsung',
• Describe
'Redmi'], 'Color': ['Red', 'White', 'Black'], 'Price': [High, Plotting df.sort_values(column2,ascending=False) - Sorts
values by column2 in descending order • df.describe() - Summary statistics for numerical columns
Medium,Low]}
• Histogram: df.plot.hist()
df = pd.DataFrame(data_mobile,
• Scatter Plot: df.plot.scatter(x='column1',y='column2')
columns=['Mobile', 'Color', 'Price'])
FURTHERMORE:
Python for Data Science Certification Training Course