0% found this document useful (0 votes)
39 views6 pages

Session 11 Lecture 1

The document discusses cleaning and analyzing a student exam data set using Pandas. Various data cleaning steps are applied including converting date columns, filling missing values, dropping duplicates, and plotting the data.

Uploaded by

detomal301
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views6 pages

Session 11 Lecture 1

The document discusses cleaning and analyzing a student exam data set using Pandas. Various data cleaning steps are applied including converting date columns, filling missing values, dropping duplicates, and plotting the data.

Uploaded by

detomal301
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

In [6]: 1 data =pd.read_csv('/content/SAMPLE_FOR_CLEANING.

csv')
2 print(data.to_string())

ROLL. NO. STUDENT_NAME DATE OF EXAM MARKS PERCENTAGE


0 0901IT181001 AADITYA KHANTAL 25-05-2021 15.0 75.0
1 0901IT181002 ADITYA JOSHI 05-25-2021 NaN NaN
2 0901IT181003 AJAY GARG 25-May-21 16.0 80.0
3 0901IT181004 AKASH KACHHAWAY 25-05-2021 17.0 85.0
4 0901IT181005 AKSHAT KOTHAVADE 25-05-2021 12.0 60.0
5 0901IT181006 ALAKH NIRANJAN THAKURIYA 05-25-2021 11.0 55.0
6 0901IT181007 ALOK KUMAR 05-25-2021 13.0 65.0
7 0901IT181008 AMAN DIXIT 25-05-2021 NaN NaN
8 0901IT181009 AMIT BAMNIYA 25-05-2021 12.0 60.0
9 0901IT181010 ANKIT KUMAR 25-05-2021 12.5 62.5
10 0901IT181011 ANKIT RAJ TIRKEY 25-May-21 13.0 65.0
11 0901IT181011 ANKIT RAJ TIRKEY 25-May-21 13.0 65.0

In [5]: 1 data['DATE OF EXAM']=pd.to_datetime(data['DATE OF EXAM'])


2 print(data.to_string())

ROLL. NO. STUDENT_NAME DATE OF EXAM MARKS PERCENTAGE


0 0901IT181001 AADITYA KHANTAL 2021-05-25 15.0 75.0
1 0901IT181002 ADITYA JOSHI 2021-05-25 NaN NaN
2 0901IT181003 AJAY GARG 2021-05-25 16.0 80.0
3 0901IT181004 AKASH KACHHAWAY 2021-05-25 17.0 85.0
4 0901IT181005 AKSHAT KOTHAVADE 2021-05-25 12.0 60.0
5 0901IT181006 ALAKH NIRANJAN THAKURIYA 2021-05-25 11.0 55.0
6 0901IT181007 ALOK KUMAR 2021-05-25 13.0 65.0
7 0901IT181008 AMAN DIXIT 2021-05-25 NaN NaN
8 0901IT181009 AMIT BAMNIYA 2021-05-25 12.0 60.0
9 0901IT181010 ANKIT KUMAR 2021-05-25 12.5 62.5
10 0901IT181011 ANKIT RAJ TIRKEY 2021-05-25 13.0 65.0
11 0901IT181011 ANKIT RAJ TIRKEY 2021-05-25 13.0 65.0

In [ ]: 1 data['DATE OF EXAM'].apply(lambda x:pd.to_datetime(x).strftime('%m-%d-%y'))


In [12]: 1 data = pd.read_csv('/content/SAMPLE_FOR_CLEANING.csv')
2 data['MARKS'].fillna(21, inplace=True)
3 data

Out[12]: ROLL. NO. STUDENT_NAME DATE OF EXAM MARKS PERCENTAGE

0 0901IT181001 AADITYA KHANTAL 25-05-2021 15.0 75.0

1 0901IT181002 ADITYA JOSHI 05-25-2021 21.0 NaN

2 0901IT181003 AJAY GARG 25-May-21 16.0 80.0

3 0901IT181004 AKASH KACHHAWAY 25-05-2021 17.0 85.0

4 0901IT181005 AKSHAT KOTHAVADE 25-05-2021 12.0 60.0

5 0901IT181006 ALAKH NIRANJAN THAKURIYA 05-25-2021 11.0 55.0

6 0901IT181007 ALOK KUMAR 05-25-2021 13.0 65.0

7 0901IT181008 AMAN DIXIT 25-05-2021 21.0 NaN

8 0901IT181009 AMIT BAMNIYA 25-05-2021 12.0 60.0

9 0901IT181010 ANKIT KUMAR 25-05-2021 12.5 62.5

10 0901IT181011 ANKIT RAJ TIRKEY 25-May-21 13.0 65.0

11 0901IT181011 ANKIT RAJ TIRKEY 25-May-21 13.0 65.0


In [15]: 1 for x in data.index:
2 if data.loc[x,'MARKS']>20:
3 data.loc[x, 'MARKS'] = 12
4 print(data.to_string())
5

ROLL. NO. STUDENT_NAME DATE OF EXAM MARKS PERCENTAGE


0 0901IT181001 AADITYA KHANTAL 25-05-2021 15.0 75.0
1 0901IT181002 ADITYA JOSHI 05-25-2021 12.0 NaN
2 0901IT181003 AJAY GARG 25-May-21 16.0 80.0
3 0901IT181004 AKASH KACHHAWAY 25-05-2021 17.0 85.0
4 0901IT181005 AKSHAT KOTHAVADE 25-05-2021 12.0 60.0
5 0901IT181006 ALAKH NIRANJAN THAKURIYA 05-25-2021 11.0 55.0
6 0901IT181007 ALOK KUMAR 05-25-2021 13.0 65.0
7 0901IT181008 AMAN DIXIT 25-05-2021 12.0 NaN
8 0901IT181009 AMIT BAMNIYA 25-05-2021 12.0 60.0
9 0901IT181010 ANKIT KUMAR 25-05-2021 12.5 62.5
10 0901IT181011 ANKIT RAJ TIRKEY 25-May-21 13.0 65.0
11 0901IT181011 ANKIT RAJ TIRKEY 25-May-21 13.0 65.0

In [ ]: 1 print(data.duplicated())

In [ ]: 1 data.drop_duplicates()

In [ ]: 1 data.corr()

In [19]: 1 import matplotlib.pyplot as plt


In [20]: 1 data =pd.read_csv('/content/SAMPLE_FOR_CLEANING.csv')
2 print(data.to_string())

ROLL. NO. STUDENT_NAME DATE OF EXAM MARKS PERCENTAGE


0 0901IT181001 AADITYA KHANTAL 25-05-2021 15.0 75.0
1 0901IT181002 ADITYA JOSHI 05-25-2021 NaN NaN
2 0901IT181003 AJAY GARG 25-May-21 16.0 80.0
3 0901IT181004 AKASH KACHHAWAY 25-05-2021 17.0 85.0
4 0901IT181005 AKSHAT KOTHAVADE 25-05-2021 12.0 60.0
5 0901IT181006 ALAKH NIRANJAN THAKURIYA 05-25-2021 11.0 55.0
6 0901IT181007 ALOK KUMAR 05-25-2021 13.0 65.0
7 0901IT181008 AMAN DIXIT 25-05-2021 NaN NaN
8 0901IT181009 AMIT BAMNIYA 25-05-2021 12.0 60.0
9 0901IT181010 ANKIT KUMAR 25-05-2021 12.5 62.5
10 0901IT181011 ANKIT RAJ TIRKEY 25-May-21 13.0 65.0
11 0901IT181011 ANKIT RAJ TIRKEY 25-May-21 13.0 65.0

In [21]: 1 data.plot()
2 plt.show()
In [22]: 1 data = pd.read_csv('/content/SAMPLE_FOR_CLEANING.csv')
2 data['MARKS'].fillna(12, inplace=True)
3 data['PERCENTAGE'].fillna(60, inplace=True)
4 data

Out[22]: ROLL. NO. STUDENT_NAME DATE OF EXAM MARKS PERCENTAGE

0 0901IT181001 AADITYA KHANTAL 25-05-2021 15.0 75.0

1 0901IT181002 ADITYA JOSHI 05-25-2021 12.0 60.0

2 0901IT181003 AJAY GARG 25-May-21 16.0 80.0

3 0901IT181004 AKASH KACHHAWAY 25-05-2021 17.0 85.0

4 0901IT181005 AKSHAT KOTHAVADE 25-05-2021 12.0 60.0

5 0901IT181006 ALAKH NIRANJAN THAKURIYA 05-25-2021 11.0 55.0

6 0901IT181007 ALOK KUMAR 05-25-2021 13.0 65.0

7 0901IT181008 AMAN DIXIT 25-05-2021 12.0 60.0

8 0901IT181009 AMIT BAMNIYA 25-05-2021 12.0 60.0

9 0901IT181010 ANKIT KUMAR 25-05-2021 12.5 62.5

10 0901IT181011 ANKIT RAJ TIRKEY 25-May-21 13.0 65.0

11 0901IT181011 ANKIT RAJ TIRKEY 25-May-21 13.0 65.0

In [ ]: 1 data.plot()
2 plt.show()
In [24]: 1 data.plot(kind='bar')
2 plt.show()

You might also like