0% found this document useful (0 votes)

26 views10 pages

Group 10A - GA2

The document is a Python code submission for a group assignment on pandas library functions. It contains code to: 1) Read in CSV and Excel files, view the top and bottom rows, and get information on columns and data types. 2) Use groupby to get mean math scores by gender, use pipe to pass data through a custom function, and find absolute values, all/any values above/below thresholds. 3) Filter data between values, find correlations, and calculate mean, median, mode, percentage change, skew, and standard error. It also uses value_counts and finds missing values.

Uploaded by

rony sheth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views10 pages

Group 10A - GA2

Uploaded by

rony sheth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

IMBA (2020-25)

Python Programming

Group Assignment-2

Term-1
Submitted to: Prof. Manoj Kumar

Group Number-10A

Roll No. Name

23ibm110 Ansh Chhaya
23ibm137 Kushagra Choudhary
23ibm139 Maanvijay Solanki
23ibm152 Rony Sheth
23ibm158 Shatakshi Srivastava
23ibm162 Shubhi Pateriya

1 | Page
Q.1 Solve the following question using excel file named (accidental-deaths-in-usa-
monthly.csv). (Using
pandas library)
a. To read csv file
b. To see the top 10 rows of the table
c. To see the last 8 rows of the table
d. To get information of the columns and its data
e. To know about data types of the column data
f. To know index
g. Use of loc and iloc
h. To use to timestamp
i. Find non-missing values in the data-table
j. Use of replace function
k. Use of

sort_values Code:
import pandas as pd

# a. To read the CSV file

file_path = "/accidental-deaths-in-usa-monthly.csv"
data = pd.read_csv(file_path)

# b. To see the top 10 rows of the table

top_10_rows = data.head(10)
print("b. Top 10 rows:")
print(top_10_rows)

# c. To see the last 8 rows of the table

last_8_rows = data.tail(8)
print("c. Last 8 rows:")
print(last_8_rows)

# d. To get information about the columns and its data

column_info = data.info()

# e. To know about data types of the column data

data_types = data.dtypes

# f. To know the index

index = data.index

# g. Use of loc and iloc

# Example of using loc to select rows and columns by labels
subset_loc = data.loc[5:10, ['Month', 'Accidental deaths in USA:
monthly, 1973 ? 1978']]
# Example of using iloc to select rows and columns by integer positions
subset_iloc = data.iloc[5:11, 0:2]

2 | Page
# h. To use to timestamp
data['Month'] = pd.to_datetime(data['Month'], format='%Y-%m')
print("h.\n",data)

# i. Find non-missing values in the data-table

non_missing_values = data.notnull().sum()

# j. Use of replace function

data['Accidental deaths in USA: monthly, 1973 ? 1978'] =
data['Accidental deaths in USA: monthly, 1973 ? 1978'].replace(',', '')
print("j.\n",data)

# k. Use of sort_values
sorted_data = data.sort_values(by='Month')

# Display the results print("\

nd. Column Information:")
print(column_info)

print("\ne. Data Types:")

print(data_types)

print("\nf. Index:")
print(index)

print("\ng. Subset using loc:")

print(subset_loc)

print("\ng. Subset using iloc:")

print(subset_iloc)

print("\ni. Non-Missing Values:")

print(non_missing_values)

print("\nk. Data after replacing commas:")

print(sorted_data)

Output:
b. Top 10 rows:
Month Accidental deaths in USA: monthly, 1973 ? 1978
0 1973-01 9007
1 1973-02 8106
2 1973-03 8928
3 1973-04 9137
4 1973-05 10017
5 1973-06 10826
6 1973-07 11317
7 1973-08 10744
8 1973-09 9713
9 1973-10 9938

3 | Page
c. Last 8 rows:
Month Accidental deaths in USA: monthly, 1973 ? 1978
64 1978-05 9115
65 1978-06 9434
66 1978-07 10484
67 1978-08 9827
68 1978-09 9110
69 1978-10 9070
70 1978-11 8633
71 1978-12 9240
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 72 entries, 0 to 71
Data columns (total 2 columns):
# Column Non-Null Count
Dtype

---
0 Month 72 non-null
object
1 Accidental deaths in USA: monthly, 1973 ? 1978 72 non-null
int64
dtypes: int64(1), object(1)
memory usage: 1.2+ KB

h.
Month Accidental deaths in USA: monthly, 1973 ? 1978
0 1973-01-01 9007
1 1973-02-01 8106
2 1973-03-01 8928
3 1973-04-01 9137
4 1973-05-01 10017
.. ... ...
67 1978-08-01 9827
68 1978-09-01 9110
69 1978-10-01 9070
70 1978-11-01 8633
71 1978-12-01 9240

[72 rows x 2 columns]

j.
Month Accidental deaths in USA: monthly, 1973 ? 1978
0 1973-01-01 9007
1 1973-02-01 8106
2 1973-03-01 8928
3 1973-04-01 9137
4 1973-05-01 10017
.. ... ...
67 1978-08-01 9827
68 1978-09-01 9110
69 1978-10-01 9070
70 1978-11-01 8633
71 1978-12-01 9240

[72 rows x 2 columns]

d. Column Information:
None

4 | Page
e. Data Types:
Month object
Accidental deaths in USA: monthly, 1973 ? 1978 int64
dtype: object

f. Index:
RangeIndex(start=0, stop=72, step=1)

g. Subset using loc:

Month Accidental deaths in USA: monthly, 1973 ? 1978
5 1973-06 10826
6 1973-07 11317
7 1973-08 10744
8 1973-09 9713
9 1973-10 9938
10 1973-11 9161

g. Subset using iloc:

Month Accidental deaths in USA: monthly, 1973 ? 1978
5 1973-06 10826
6 1973-07 11317
7 1973-08 10744
8 1973-09 9713
9 1973-10 9938
10 1973-11 9161

i. Non-Missing Values:
Month 72
Accidental deaths in USA: monthly, 1973 ? 1978 72
dtype: int64

k. Data after replacing commas:

Month Accidental deaths in USA: monthly, 1973 ? 1978
0 1973-01-01 9007
1 1973-02-01 8106
2 1973-03-01 8928
3 1973-04-01 9137
4 1973-05-01 10017
.. ... ...
67 1978-08-01 9827
68 1978-09-01 9110
69 1978-10-01 9070
70 1978-11-01 8633
71 1978-12-01 9240

[72 rows x 2 columns]

5 | Page
Q.2 Solve the following question using question using excel file
named (StudentsPerformance.xlsx).
(Using pandas library)
a. To read excel file
b. Use of groupby in the example
c. Use of pipe in the example
d. To get absolute value, all and any function
e. use of between and correlation function
f. Use of mean, median and mode
g. Use of pct_change
h. Use of skew and sem function
i. value_counts function
j. find missing values in the data table
k. Use of sort indeximport pandas as pd

Code:
# a. To read the CSV file
file_path = "/StudentsPerformance.csv"
df = pd.read_csv(file_path)

# Display the first few rows of the DataFrame

print("a. Reading the Excel file:")
print(df.head())

# b. Use of groupby:
grouped_data = df.groupby('gender')['math score'].mean()
print("\nb. Using groupby:")
print(grouped_data)

# c. Use of pipe:
def custom_function(data):
# Your custom processing here
return data

result = df.pipe(custom_function)
print("\nc. Using pipe:")
print(result.head())

# d. To get absolute value, all, and any function:

df['abs_math_score'] = df['math score'].abs()
all_greater_than_50 = df['reading score'].all()
any_less_than_40 = df['writing score'].lt(40).any()
print("\nd. Using abs, all, and any functions:")
print(df['abs_math_score'].head())
print(f"All values in 'reading_score' > 50: {all_greater_than_50}")
print(f"Any values in 'writing_score' < 40: {any_less_than_40}")

6 | Page
# e. Use of between and correlation function:
filtered_data = df[df['math score'].between(70, 90)]
correlation = df['math score'].corr(df['reading score'])
print("\ne. Using between and correlation functions:")
print(filtered_data.head())
print(f"Correlation between 'math_score' and 'reading_score':
{correlation}")

# f. Use of mean, median, and mode:

mean_math_score = df['math score'].mean()
median_math_score = df['math score'].median()
mode_math_score = df['math score'].mode().values[0]
print("\nf. Using mean, median, and mode:")
print(f"Mean 'math_score': {mean_math_score}")
print(f"Median 'math_score': {median_math_score}")
print(f"Mode 'math_score': {mode_math_score}")

# g. Use of pct_change:
df['math_score_pct_change'] = df['math score'].pct_change() * 100
print("\ng. Using pct_change:")
print(df['math_score_pct_change'].head())

# h. Use of skew and sem functions:

skewness = df['math score'].skew()
sem_math_score = df['math score'].sem()
print("\nh. Using skew and sem functions:")
print(f"Skewness of 'math_score': {skewness}")
print(f"SEM of 'math_score': {sem_math_score}")

# i. value_counts function:
gender_counts = df['gender'].value_counts()
print("\ni. Using value_counts:")
print(gender_counts)

# j. Find missing values in the data table:

missing_values = df.isnull().sum() print("\
nj. Finding missing values:")
print(missing_values)

# k. Use of sort_index:
sorted_df = df.sort_index(ascending=True)
print("\nk. Using sort_index:")
print(sorted_df.head())

7 | Page
Output:
a. Reading the Excel file:
gender race/ethnicity parental level of education lunch \
0 female group B bachelor's degree standard
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard

test preparation course math score reading score writing score

0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75

b. Using groupby:
gender
female 63.633205
male 68.728216
Name: math score, dtype: float64

c. Using pipe:
gender race/ethnicity parental level of education lunch \
0 female group B bachelor's degree standard
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard

test preparation course math score reading score writing score

0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75

d. Using abs, all, and any functions:

0 72
1 69
2 90
3 47
4 76
Name: abs_math_score, dtype: int64
All values in 'reading_score' > 50: True
Any values in 'writing_score' < 40: True

e. Using between and correlation functions:

gender race/ethnicity parental level of education lunch \
0 female group B bachelor's degree standard
2 female group B master's degree standard
4 male group C some college standard
5 female group B associate's degree standard
6 female group B some college standard

8 | Page
test preparation course math score reading score writing score \
0 none 72 72 74
2 none 90 95 93
4 none 76 78 75
5 none 71 83 78
6 completed 88 95 92
abs_math_score
0 72
2 90
4 76
5 71
6 88
Correlation between 'math_score' and 'reading_score':
0.8175796636720546

f. Using mean, median, and mode:

Mean 'math_score': 66.089
Median 'math_score': 66.0
Mode 'math_score': 65

g. Using pct_change:
0 NaN
1 -4.166667
2 30.434783
3 -47.777778
4 61.702128
Name: math_score_pct_change, dtype: float64

h. Using skew and sem functions:

Skewness of 'math_score': -0.27893514909431694
SEM of 'math_score': 0.4794986944695449

i. Using value_counts:
female 518
male 482
Name: gender, dtype: int64

j. Finding missing values:

gender 0
race/ethnicity 0
parental level of education 0
lunch 0
test preparation course 0
math score 0
reading score 0
writing score 0
abs_math_score 0
math_score_pct_change 1
dtype: int64

k. Using sort_index:
gender race/ethnicity parental level of education lunch \
0 female group B bachelor's degree standard
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard

9 | Page
test preparation course math score reading score writing score \
0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75

abs_math_score math_score_pct_change
0 72 NaN
1 69 -4.166667
2 90 30.434783
3 47 -47.777778
4 76 61.702128

10 | P a g e

AI Practical 2025
No ratings yet
AI Practical 2025
14 pages
Assignment
No ratings yet
Assignment
2 pages
DAV Previous Year
No ratings yet
DAV Previous Year
7 pages
Schematics and Wiring Diagrams
86% (7)
Schematics and Wiring Diagrams
46 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Data Handling Part Ii
No ratings yet
Data Handling Part Ii
41 pages
Wa0012.
No ratings yet
Wa0012.
30 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Data Handlinng Using Pandas-I (1) - 18-31
No ratings yet
Data Handlinng Using Pandas-I (1) - 18-31
14 pages
PDF&Rendition 1
No ratings yet
PDF&Rendition 1
47 pages
IP Lab Record
No ratings yet
IP Lab Record
23 pages
Data Science Practicals
No ratings yet
Data Science Practicals
47 pages
Xii Record (Dataframe & CSV)
No ratings yet
Xii Record (Dataframe & CSV)
11 pages
I037 - Manas Patel Experiment09
No ratings yet
I037 - Manas Patel Experiment09
9 pages
Information Practices
No ratings yet
Information Practices
141 pages
Ip Practical 2024 25 1 To 34
No ratings yet
Ip Practical 2024 25 1 To 34
32 pages
Assignment 2 - Jupyter Notebook
No ratings yet
Assignment 2 - Jupyter Notebook
8 pages
Create A Pandas Series From A Dictionary of Values and An Ndarray
No ratings yet
Create A Pandas Series From A Dictionary of Values and An Ndarray
15 pages
Ip Project Work 2
No ratings yet
Ip Project Work 2
52 pages
IP Practical File Project
No ratings yet
IP Practical File Project
60 pages
Document (4) - 1
No ratings yet
Document (4) - 1
15 pages
Timex Sinclair BASIC Primer With Graphics
100% (4)
Timex Sinclair BASIC Primer With Graphics
252 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
20 pages
IP Practical 2023-24 (1 To 34)
100% (1)
IP Practical 2023-24 (1 To 34)
32 pages
Unit3 - 3) Pandas - Ipynb - Colab
No ratings yet
Unit3 - 3) Pandas - Ipynb - Colab
11 pages
Lab File
No ratings yet
Lab File
96 pages
IP Practic MINE
No ratings yet
IP Practic MINE
30 pages
Lab 1 ML Lab
No ratings yet
Lab 1 ML Lab
15 pages
Cheat Sheet
No ratings yet
Cheat Sheet
15 pages
ML Lab Manual Final
No ratings yet
ML Lab Manual Final
36 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
IP - PRACTICAL EXAM - Revision
No ratings yet
IP - PRACTICAL EXAM - Revision
24 pages
Exp-12 Iaiml
No ratings yet
Exp-12 Iaiml
13 pages
DAV Practicle File
No ratings yet
DAV Practicle File
28 pages
Practical File IP
No ratings yet
Practical File IP
27 pages
Pandas & Vis 2
No ratings yet
Pandas & Vis 2
11 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
IP Imp Notes
No ratings yet
IP Imp Notes
5 pages
Python Amit
No ratings yet
Python Amit
11 pages
Unit 1 Python Pandas
No ratings yet
Unit 1 Python Pandas
20 pages
Term 1 IP AK
No ratings yet
Term 1 IP AK
6 pages
Lab Session 07: Perform Following Operations Using Pandas
No ratings yet
Lab Session 07: Perform Following Operations Using Pandas
4 pages
Xii Ip Practical List 2022-23-1
No ratings yet
Xii Ip Practical List 2022-23-1
23 pages
Python-for-Data-Analysis (Pandas
No ratings yet
Python-for-Data-Analysis (Pandas
31 pages
Term 1 IP AK
No ratings yet
Term 1 IP AK
6 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
Data Cleaning in Python
No ratings yet
Data Cleaning in Python
6 pages
L-2 (Data Frame Part 1) .Ipynb - Colab
No ratings yet
L-2 (Data Frame Part 1) .Ipynb - Colab
5 pages
Even Students
No ratings yet
Even Students
36 pages
PW2 DataCleaning
No ratings yet
PW2 DataCleaning
6 pages
LEED Green Associate VI. Stakeholder Involvement in Innovation
No ratings yet
LEED Green Associate VI. Stakeholder Involvement in Innovation
24 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
DA Cheat Codes
No ratings yet
DA Cheat Codes
2 pages
01 Migration Policy and Process
No ratings yet
01 Migration Policy and Process
14 pages
Important Pandas Operations 1697910759
No ratings yet
Important Pandas Operations 1697910759
6 pages
Pandas
No ratings yet
Pandas
5 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Practical File Questions With Answers
No ratings yet
Practical File Questions With Answers
7 pages
HCI Lecture Module 1
0% (1)
HCI Lecture Module 1
7 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Manual RCM 12
No ratings yet
Manual RCM 12
24 pages
Manual GE 745
No ratings yet
Manual GE 745
114 pages
PAT DS150H 2 (Consolas Service Manual)
No ratings yet
PAT DS150H 2 (Consolas Service Manual)
13 pages
Powerwall 2 AC Owners Manual
No ratings yet
Powerwall 2 AC Owners Manual
60 pages
Transmission Diagnostics 6T30-40
No ratings yet
Transmission Diagnostics 6T30-40
276 pages
BS 1881-112 1983 Concrete Methods of Accelerated Curing of Test Cubes
No ratings yet
BS 1881-112 1983 Concrete Methods of Accelerated Curing of Test Cubes
11 pages
Automatic Registration Version.2.5 Revision. 1.0 en
No ratings yet
Automatic Registration Version.2.5 Revision. 1.0 en
78 pages
Schneider Electric - Altistart-22 - ATS22C11Q
No ratings yet
Schneider Electric - Altistart-22 - ATS22C11Q
16 pages
Manual Softstarter 3RW55 and 3RW55 Failsafe en-US
No ratings yet
Manual Softstarter 3RW55 and 3RW55 Failsafe en-US
366 pages
840D SL ADI4 Equip Man 0721 en-US
No ratings yet
840D SL ADI4 Equip Man 0721 en-US
90 pages
IC Electronic English Catalogue 2010
No ratings yet
IC Electronic English Catalogue 2010
48 pages
Kubernetes
No ratings yet
Kubernetes
5 pages
Mini Project Report2
No ratings yet
Mini Project Report2
41 pages
117 2074 Salesforce-AI-Associate
No ratings yet
117 2074 Salesforce-AI-Associate
7 pages
Maltego Webinar Slides 58f8040a532d6
No ratings yet
Maltego Webinar Slides 58f8040a532d6
10 pages
DWH
No ratings yet
DWH
12 pages
Remote Maintenance System - Highlights
No ratings yet
Remote Maintenance System - Highlights
2 pages
Service Parts
No ratings yet
Service Parts
25 pages
Diode Clipping Circuits
No ratings yet
Diode Clipping Circuits
3 pages
CB - GA - Group 15
No ratings yet
CB - GA - Group 15
18 pages
Rehva 032006 p2412
No ratings yet
Rehva 032006 p2412
40 pages
English Test Results - Cocubes
No ratings yet
English Test Results - Cocubes
42 pages
Rony Sheth 23ibm152
No ratings yet
Rony Sheth 23ibm152
11 pages
Physics Classmate
No ratings yet
Physics Classmate
15 pages
AVP - Lead Infrastructure Technology
No ratings yet
AVP - Lead Infrastructure Technology
2 pages
23ibm158 IA3 Python
No ratings yet
23ibm158 IA3 Python
9 pages
Industrial Management: 3 Credits (3-0-0)
No ratings yet
Industrial Management: 3 Credits (3-0-0)
20 pages
Practical Exam - MBA - A - Class
No ratings yet
Practical Exam - MBA - A - Class
1 page
LG Flatron L3000a Prospecto
No ratings yet
LG Flatron L3000a Prospecto
1 page

Group 10A - GA2

Uploaded by

Group 10A - GA2

Uploaded by

IMBA (2020-25)

Roll No. Name

# a. To read the CSV file

# b. To see the top 10 rows of the table

# c. To see the last 8 rows of the table

# d. To get information about the columns and its data

# e. To know about data types of the column data

# f. To know the index

# g. Use of loc and iloc

# i. Find non-missing values in the data-table

# j. Use of replace function

# Display the results print("\

print("\ne. Data Types:")

print("\ng. Subset using loc:")

print("\ng. Subset using iloc:")

print("\ni. Non-Missing Values:")

print("\nk. Data after replacing commas:")

[72 rows x 2 columns]

[72 rows x 2 columns]

g. Subset using loc:

g. Subset using iloc:

k. Data after replacing commas:

[72 rows x 2 columns]

# Display the first few rows of the DataFrame

# d. To get absolute value, all, and any function:

# f. Use of mean, median, and mode:

# h. Use of skew and sem functions:

# j. Find missing values in the data table:

test preparation course math score reading score writing score

test preparation course math score reading score writing score

d. Using abs, all, and any functions:

e. Using between and correlation functions:

f. Using mean, median, and mode:

h. Using skew and sem functions:

j. Finding missing values:

You might also like