0% found this document useful (0 votes)
35 views5 pages

Pandas

The document discusses creating and manipulating DataFrames in Python Pandas. It covers topics like creating DataFrames from different data structures, selecting data, filtering, adding/modifying columns and rows, importing/exporting CSV files, and descriptive statistics.

Uploaded by

khushhal2024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views5 pages

Pandas

The document discusses creating and manipulating DataFrames in Python Pandas. It covers topics like creating DataFrames from different data structures, selecting data, filtering, adding/modifying columns and rows, importing/exporting CSV files, and descriptive statistics.

Uploaded by

khushhal2024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

PYTHON PANDAS

Creating DataFrame:

Syntax:  DF.DataFrame(data,columns,index)

1. Creating Empty DataFrame


DF=pd.DataFrame( )
2. Creating DataFrame from 2D lists (List of Lists)
L=[ [ 10, 20, 30, 40], [12, 12, 13,14 ] ]  Created a list L
DF=pd.DataFrame( L )  Passed it to DataFrame( )
3. Creating DataFrame from 2D dictionary (dictionary of dictionaries)
D={ 1: { 'a':10, 'b':20, 'd':120 },2 : { 'a':30, 'b':60, 'c':90} }  Created a nested (2D) dictionary
DF=pd.DataFrame(D)  Passed it to DataFrame( )
4. Creating DataFrame from ndarray
a1=np.array([91,2,3,4,5])  Created array a1
a2=np.array([10,20,30,40])  Created array a2
>>> DF=pd.DataFrame([a1,a2])  passed the arrays to DataFrame ( ) (in the form of list)
5. Creating DataFrame from List of Dictionaries
A=[{ 'a':10, 'b':20, 'd':120 },{ 'a':30, 'b':60, 'c':90} ]
>>>>DF=pd.DataFrame(A)
6. Creating DataFrame from Dictionary of Lists
A= {'Maths':[80,85,90],'Science':[78,89,88],'English':[85,74,65]}
>>>>DF=pd.DataFrame(A)
7. Creating DataFrame from Dictionary of Series
DF1=pd.DataFrame({'Maths':pd.Series([80,85,90],index=['a','b','c']),'Science':pd.Series([78,89,88],index=['a','b','c'
]),'English':pd.Series([85,74,65],index=['a','b','c'])})
8. Creating DataFrame from another DataFrame

Attributes of DataFrame:

1. index 6. shape
2. columns 7.values
3. axes 8. ndim
4. dtype 9. empty
5. size 10. T

Selecting/Accessing DataFrame:

1. Accessing a column  DF[‘col_name’] or DF.col_name


2. Accessing Multiple columns  DF[ [ list of column names] ]
3. Accessing a row  DF.loc[ ‘row_label’ ]
4. Accessing Multiple rows DF.loc[ [ list of row_labels] ]
5. Accessing a subset of Dataframe DF.loc [ index , columns ]
index = list of row labels or slice (only labels or all i.e. : ) of row labels
columns = list of column labels or slice (only labels or all i.e. : ) of column labels
OR
DF.iloc[ row_start_index : row_end_index , col_start_index : col_end_index)

Note: When we use iloc[ ], then end index is excluded but while using loc[ ], the end index is included.

6. Selecting/Accessing individual value  DF.colmn_name[ ‘row_label’ or index ]


Ex. DF.population[‘Delhi’] or DF.population[2]
Filtering Dataframe using operators:

1. Using index label: DataFrameName[DataFrameName[ col_name] operator value]]


Example  DF[DF[‘Marks’]>90]]
DF[ (DF[‘Marks’]>90]) & (DF[‘Gender’] == “Female”]) ]
2. Using dot notation: DataFrameName[DataFrameName. col_name operator value]
Example  DF[DF.Marks>90]
DF[(DF.Marks>90) & (DF.Gender == “Female”)]
3. Using .query( ) function: DataFrameName.query( ‘ col_name operator value’ )
Example  DF.query( ‘ Maths > 90 & Science > 85’ )

Adding/Modifying Columns of a DataFrame:

DF[‘col_name’]= [List of values]  will add/overite a column with given values

DF[‘new_col_name’]= Single_Value  will add/overwrite a column with all values as same

DF.assign(colum_name=[ list of values])  will add/overite a column with given values

DF.loc[ : , ‘col_name’] = [ List of Values ] or a Single Value

Note: you may also use dot( . ) notation of modify/add a column Ex. DF.col_name = [list of values]

 It the column name does not exist in the DataFrame, it will add a new column but if a column name already exist
with the same name, it will overwrite the values of that particular column.
 If you are assigning a list of values, the values in the column required must match the number of values you have
assigned otherwise it will raise an error i.e. ValueError.
 assign( ) function can be used to add/modify the values for multiple columns. (the column_name must be
written without quotation marks)

Adding/Modifying Rows of a DataFrame: use .loc( )

DF.loc[‘row_label’]= [ List of values ]  will add/overite a row values with given values

DF.loc[‘row_label’]= Single_Value  will add/overwrite a row values with all values as same

DF.at[‘row_label’] = [ List of values ]

DF.loc [ ‘row_label’ , start : end ] = [ List of values ]  will only modify the given slice of row

Note: To update a subset of DataFrame: DF.loc[ row , columns ]

DF.loc[ list of row_labels/Slice, list of column_names or Slice]

Deleting Columns/Row of a DataFrame:

1. del DF[‘Col_Name’]  it can only delete column


It will delete the column and update the values in DF. (the original DataFrame will be updated)
It can be used to drop the complete dataframe Ex. del DF
2. DF.drop[[list of columns/indexes],axis= 0/1) By default the axis=0
Drop function returns us the datframe after deleted values so you may store it in some variable. It will not affect
the original DF.

Renaming Rows/Columns:

DF.rename(columns={old_name:new_name,old_name:new_name,……},inplace=True/False)  For columns

DF.rename(index={old_name:new_name,old_name:new_name,……},inplace=True/False)  For Rows


Note: To rename both rows and columns DF.rename( {columns={ ……….}, index = { ………….} )

Boolean Indexing: For this first of all you need to create the DF with Boolean indexing.

To display rows with False index  DF.loc[ 0 ] or DF.loc[ False ]

To display rows with True index  DF.loc[ True ] or DF.loc[ 1 ]

Importing and Exporting Data between CSV files and DataFrame

Importing Data from CSV files: read_csv( ) function is used

read_csv( “ file path”, sep, header, names)

To read a CSV named ‘resultsheet.csv’ and store in a dataframe named ‘DF’

DF=pd.read_csv(“E:/Python programs/resultsheet.csv”, sep= “,”, header=0, names=[‘RNo’, ‘Name’, ‘Eng’, ‘Hin’])

Note: The default value of sep is space (Means the data items will be separated by space character)

Exporting Data from DataFrame to CSV File: to_csv( ) function is used

DF.to_csv( “ file path”, sep, header, index)

To create a CSV file ‘resultsheetnew’ from a dataframe named ‘DF’

DF.to_csv(“E:/Python programs/resultsheetnew.csv”, sep= “,”, header=True/False, index=True/False)

Note: By default the value of hearer and index attributes is True. If you don’t want column labels/ row index to be
stored in CSV file, you can make it False.

Series V/s Numpy Array:

Series Numpy Array


 In series we can define our own labeled index to  NumPy arrays are accessed by their integer
access elements of an array. These can be position using numbers only.
numbers or letters.
 The elements can be indexed in descending  The indexing starts with zero for the first
order also. element and the index is fixed.
 If two series are not aligned, NaN or missing  There is no concept of NaN values and if there
values are generated are no matching values in arrays, alignment
fails.
 Series require more memory.  NumPy occupies lesser memory.

.
*********************

PANDAS – 2
Descriptive Statistics:

Name UT Maths Science S.St Hindi Eng


0 Raman 1 22 21 18 20 21
1 Raman 2 21 20 17 22 24
2 Raman 3 14 19 15 24 23
3 Zuhaire 1 20 17 22 24 19
4 Zuhaire 2 23 15 21 25 15
5 Zuhaire 3 22 18 19 23 13
6 Ashravy 1 23 19 20 15 22
7 Ashravy 2 24 22 24 17 21
8 Ashravy 3 12 25 19 21 23
9 Mishti 1 15 22 25 22 22
10 Mishti 2 18 21 25 24 23
11 Mishti 3 17 18 20 25 20
Calculating Maximum Value: df.max(numeric_only=True/False, axis=0/1)
To calculate maximum value in each column. By default, all these functions extract a value for each column i.e. axis=0.
However to find the descriptive statistics for each row, we need to specify axis=1 as its argument.
Example. To print maximum marks obtained in each subject in Unit Test 2.
>>> DFUT2=df[df.UT==2] or DFUT2=df[df[‘UT’]==2]
>>> DFUT2.max(numeric_only=True)

Calculating Minimum Value: df.min(numeric_only=True/False, axis=0/1)


Example. To display minimum marks obtained by Raman in all unit tests in Hindi.
dfRaman=df[df.Name=='Raman']  Extract the records of Raman only and store in
dfRamandfRamanHndi= dfRaman.Hindi  Extract the record of Hindi from Raman’s
print(dfRamanHndi.max())  Apply function on finally extracted record

Calculating Sum of Values: df.sum(numeric_only=True/False, axis=0/1)


Example: Write python statement to print the total marks obtained by Raman in each subject.
>>> dfRaman=df[df.Name=='Raman']
>>> dfRaman[['Maths','Science','S.St','Hindi','Eng']].sum()
Example: Write python statement to print the total marks obtained by Raman in all subjects in each Unit Test.
>>> dfRaman=df[df.Name=='Raman']
>>> dfRaman[['Maths','Science','S.St','Hindi','Eng']].sum(axis=1)

Calculating Number of Values: df.count(numeric_only=True/False, axis=0/1)

Calculating mean (average): df.mean(numeric_only=True/False, axis=0/1)


mean() will display the average of the values of each column of a DataFrame. It is only applicable for numeric values.

Calculating median (Middle Value): df.median(numeric_only=True/False, axis=0/1)


median() will display the middle values of each column of a DataFrame. It is only applicable for numeric values.

Calculating Mode (Most appeared): df.mode(numeric_only=True/False, axis=0/1)


Mode is defined as the value that appears the most number of times in a data set.

Calculating Quartile: df.quantile( q , numeric_only, axis )


quantile() function is used to get the quartiles. It will output the quartile of each column or row of the DataFrame in four
parts i.e. the first quartile is 25% (parameter q = .25), the second quartile is 50% (Median), the third quartile is 75%
(parameter q = .75). By default, it will display the second quantile (median) of all numeric values.
position = (n-1) × q + 1 to locate the quartile position (Sorting of the dataset is required to calculate it)
Example: >>> Series1=pd.Series([15,18,10,22,23,42,41,36,80,75])
>>> Series1.quantile(.25)
19.0
N=10 (total number of values in data set) q = .25
(10-1) × .25 + 1 = 3.25 position
Arrange data in ascending order 10 15 18 22 23 36 41 42 75 80
3rd 4th positions
rd th
Difference 3 and 4 position values is 4, so fractional part will be multiplied by 4 (i.e. 18 + (4 × .25) = 19)
Example: Write the statement to display the first and third quartiles of all subjects.
>>> df[['Maths','Science','S.St','Hindi','Eng']].quantile([.25,.75])
Maths Science S.St Hindi Eng
0.25 16.50 18.00 18.75 20.75 19.75
0.75 22.25 21.25 22.50 24.00 23.00

Calculating Variance: df.var(numeric_only, axis)


Variance is the average of squared differences from the mean.

Calculating Standard Deviation: df.std((numeric_only, axis)


Standard deviation is calculated as the square root of the variance.

Data Aggregations: Aggregation means to transform the dataset and produce a single numeric value from an array.
Aggregation can be applied to one or more columns together. Aggregate functions are max(),min(), sum(), count(), std(),
var().
>>> df.aggregate('max')  will calculate max for each column
>>> df.aggregate(['max','count'])  will calculate max and total items for each column
>>> df['Maths'].aggregate(['max','min'],axis=0)  will calculate max and min value of Maths column
>>> df[['Maths','Science']].aggregate('sum',axis=1)  will calculate sum of Maths and Science in each row.

Sorting a DataFrame: df.sort_values( by = ‘colname’, axis = 0/1, ascending = True/False)


>>> df.sort_values(by=['Name'])  will sort the df in ascending order of
Name
>>> dfUT2 = df[df.UT == 2]  will sort the
>>> print(dfUT2.sort_values(by=['Science']))

You might also like