Group 10A - GA2
Group 10A - GA2
Python Programming
Group Assignment-2
Term-1
Submitted to: Prof. Manoj Kumar
Group Number-10A
1 | Page
Q.1 Solve the following question using excel file named (accidental-deaths-in-usa-
monthly.csv). (Using
pandas library)
a. To read csv file
b. To see the top 10 rows of the table
c. To see the last 8 rows of the table
d. To get information of the columns and its data
e. To know about data types of the column data
f. To know index
g. Use of loc and iloc
h. To use to timestamp
i. Find non-missing values in the data-table
j. Use of replace function
k. Use of
sort_values Code:
import pandas as pd
2 | Page
# h. To use to timestamp
data['Month'] = pd.to_datetime(data['Month'], format='%Y-%m')
print("h.\n",data)
# k. Use of sort_values
sorted_data = data.sort_values(by='Month')
print("\nf. Index:")
print(index)
Output:
b. Top 10 rows:
Month Accidental deaths in USA: monthly, 1973 ? 1978
0 1973-01 9007
1 1973-02 8106
2 1973-03 8928
3 1973-04 9137
4 1973-05 10017
5 1973-06 10826
6 1973-07 11317
7 1973-08 10744
8 1973-09 9713
9 1973-10 9938
3 | Page
c. Last 8 rows:
Month Accidental deaths in USA: monthly, 1973 ? 1978
64 1978-05 9115
65 1978-06 9434
66 1978-07 10484
67 1978-08 9827
68 1978-09 9110
69 1978-10 9070
70 1978-11 8633
71 1978-12 9240
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 72 entries, 0 to 71
Data columns (total 2 columns):
# Column Non-Null Count
Dtype
---
0 Month 72 non-null
object
1 Accidental deaths in USA: monthly, 1973 ? 1978 72 non-null
int64
dtypes: int64(1), object(1)
memory usage: 1.2+ KB
h.
Month Accidental deaths in USA: monthly, 1973 ? 1978
0 1973-01-01 9007
1 1973-02-01 8106
2 1973-03-01 8928
3 1973-04-01 9137
4 1973-05-01 10017
.. ... ...
67 1978-08-01 9827
68 1978-09-01 9110
69 1978-10-01 9070
70 1978-11-01 8633
71 1978-12-01 9240
j.
Month Accidental deaths in USA: monthly, 1973 ? 1978
0 1973-01-01 9007
1 1973-02-01 8106
2 1973-03-01 8928
3 1973-04-01 9137
4 1973-05-01 10017
.. ... ...
67 1978-08-01 9827
68 1978-09-01 9110
69 1978-10-01 9070
70 1978-11-01 8633
71 1978-12-01 9240
d. Column Information:
None
4 | Page
e. Data Types:
Month object
Accidental deaths in USA: monthly, 1973 ? 1978 int64
dtype: object
f. Index:
RangeIndex(start=0, stop=72, step=1)
i. Non-Missing Values:
Month 72
Accidental deaths in USA: monthly, 1973 ? 1978 72
dtype: int64
5 | Page
Q.2 Solve the following question using question using excel file
named (StudentsPerformance.xlsx).
(Using pandas library)
a. To read excel file
b. Use of groupby in the example
c. Use of pipe in the example
d. To get absolute value, all and any function
e. use of between and correlation function
f. Use of mean, median and mode
g. Use of pct_change
h. Use of skew and sem function
i. value_counts function
j. find missing values in the data table
k. Use of sort indeximport pandas as pd
Code:
# a. To read the CSV file
file_path = "/StudentsPerformance.csv"
df = pd.read_csv(file_path)
# b. Use of groupby:
grouped_data = df.groupby('gender')['math score'].mean()
print("\nb. Using groupby:")
print(grouped_data)
# c. Use of pipe:
def custom_function(data):
# Your custom processing here
return data
result = df.pipe(custom_function)
print("\nc. Using pipe:")
print(result.head())
6 | Page
# e. Use of between and correlation function:
filtered_data = df[df['math score'].between(70, 90)]
correlation = df['math score'].corr(df['reading score'])
print("\ne. Using between and correlation functions:")
print(filtered_data.head())
print(f"Correlation between 'math_score' and 'reading_score':
{correlation}")
# g. Use of pct_change:
df['math_score_pct_change'] = df['math score'].pct_change() * 100
print("\ng. Using pct_change:")
print(df['math_score_pct_change'].head())
# i. value_counts function:
gender_counts = df['gender'].value_counts()
print("\ni. Using value_counts:")
print(gender_counts)
# k. Use of sort_index:
sorted_df = df.sort_index(ascending=True)
print("\nk. Using sort_index:")
print(sorted_df.head())
7 | Page
Output:
a. Reading the Excel file:
gender race/ethnicity parental level of education lunch \
0 female group B bachelor's degree standard
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard
b. Using groupby:
gender
female 63.633205
male 68.728216
Name: math score, dtype: float64
c. Using pipe:
gender race/ethnicity parental level of education lunch \
0 female group B bachelor's degree standard
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard
8 | Page
test preparation course math score reading score writing score \
0 none 72 72 74
2 none 90 95 93
4 none 76 78 75
5 none 71 83 78
6 completed 88 95 92
abs_math_score
0 72
2 90
4 76
5 71
6 88
Correlation between 'math_score' and 'reading_score':
0.8175796636720546
g. Using pct_change:
0 NaN
1 -4.166667
2 30.434783
3 -47.777778
4 61.702128
Name: math_score_pct_change, dtype: float64
i. Using value_counts:
female 518
male 482
Name: gender, dtype: int64
k. Using sort_index:
gender race/ethnicity parental level of education lunch \
0 female group B bachelor's degree standard
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard
9 | Page
test preparation course math score reading score writing score \
0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75
abs_math_score math_score_pct_change
0 72 NaN
1 69 -4.166667
2 90 30.434783
3 47 -47.777778
4 76 61.702128
10 | P a g e