Python Pandas-2
Python Pandas-2
Pandas II
PRESENTED BY
S U S M I TA C H O L K A R
>>> dFrame1=pd.DataFrame([[1, 2, 3], [4, 5], >>> dFrame2=pd.DataFrame([[10, 20], [30], [40,
[6]], columns=['C1', 'C2', 'C3'], index=['R1', 'R2', 'R3']) 50]], columns=['C2', 'C5'], index=['R4', 'R2','R5'])
dFrame2
dFrame1 =dFrame1.
append(dFrame2)
dFrame1
print(dFrame1)
dFrame2=dFrame2.append(dFrame1)
dFrame2
dFrame1 dFrame2
Alternatively, if we append dFrame1 to dFrame2, the rows of
dFrame2 precede the rows of dFrame1. To get the column labels
appear in sorted order we can set the parameter sort=True. The
column labels shall appear in unsorted order when the parameter
sort = False.
tail() function
Syntax:
pd.merge(left, right, how='inner', on=None, left_on=None,
right_on=None,left_index=False, right_index=False, sort=True)
merge() Function
It is used to merge two dataframes that p1=({'P_ID':[1,2,3,4,5],
have some common values. You can
'First_Name':['Sachin','Saurav','Virendra','Mahendra Sinh','Gautam'],
specify the fields as on parameter in the
merge() function. It follows the concept 'Last_Name':['Tendulker','Ganguly','Sehvag','Dhoni','Gambhir']})
of RDBMS having parent column and
d1=pd.DataFrame(p1)
child columns in the dataframe. One
column should have common data. p2=({'P_ID':[1,2,3,4,5],'Runs':[18987,12120,11345,10345,12789]})
Have a look at this code:
d2=pd.DataFrame(p2)
players=pd.merge(d1,d2)
print(players)
To directly import a .csv file into DataFrame
A simple way to store big data sets is to use CSV files (comma separated files).
CSV files contains plain text and is a well know format that can be read by everyone including Pandas.
If you have a large DataFrame with many rows, Pandas will only return the first 5 rows, and the
last 5 rows:
iterrows()
items()
itertuples()
The max and min functions find the maximum and the minimum of the
values respectively from a given set of dataFrame.
Syntax:
<dataframe>.min(axis=None,skipna=None,numeric_only=None)
<dataframe>.max(axis=None,skipna=None,numeric_only=None)
Syntax:
DataFrame.sort_values(by=, axis=0, ascending=True,inplace=False, kind=’quicksort’,
na_position=’last’)
import pandas as pd
df = pd.DataFrame({"A":[-5, 8, 12, None, 5, 3], #count of non-NA value across the row axis
"B":[-1, None, 6, 4, None, 3], print(df.count(axis = 0))
"C":["sam", "haris", "alex", np.nan, "peter", "nathan"]}) output:
print(df)
DataFrame - quantile() function
The quantile() function is used to get values at the given quantile over requested
axis.
count 7.000000
mean 24.285714
std 11.700631
min 10.000000
25% 17.500000
50% 20.000000
75% 30.000000
max 45.000000
Data Aggregations
Aggregation means to transform the dataset and produce a single numeric
value from an array. Aggregation can be applied to one or more columns
together. Aggregate functions are max(),min(), sum(), count(), std(), var().
GROUP BY Functions
In pandas, DataFrame.GROUP BY() function is used to split the data into groups based on
some criteria.
Pandas objects like a DataFrame can be split on any of their axes. The GROUP BY function
works based on a split-apply-combine strategy which is shown below using a 3-step process:
Step 1: Split the data into groups by creating a GROUP BY object from the original
DataFrame.
Step 2: Apply the required function.
Step 3: Combine the results to form a new DataFrame.
In Group by operation, we spilt the data into sets and
DataFrame
apply some functionality on each subset. Once the
subset is ready, we can perform any statistical
function or discard some data with some condition.
operations:
Group by, Groupby essentially splits the data into different
Reindex(): This function is used to changes the row labels and column labels of a DataFrame. To
reindex means to conform the data to match a given set of labels along a particular axis. Reorder the
existing data to match a new set of labels.
Reorder the existing data to match a new set of labels.
Insert missing value (NA) markers in label locations where no data for the label existed.
Syntax:
<df>.reindex(labels=None, index=None, columns=None,fill_value=nan, axis=None)
Parameters:
labels: New labels/index to conform the axis specified by ‘axis’ to. fill_value: Fill existing missing (NaN)
values, and any new element needed for successful DataFrame alignment.