0% found this document useful (0 votes)
26 views10 pages

Group 10A - GA2

The document is a Python code submission for a group assignment on pandas library functions. It contains code to: 1) Read in CSV and Excel files, view the top and bottom rows, and get information on columns and data types. 2) Use groupby to get mean math scores by gender, use pipe to pass data through a custom function, and find absolute values, all/any values above/below thresholds. 3) Filter data between values, find correlations, and calculate mean, median, mode, percentage change, skew, and standard error. It also uses value_counts and finds missing values.

Uploaded by

rony sheth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views10 pages

Group 10A - GA2

The document is a Python code submission for a group assignment on pandas library functions. It contains code to: 1) Read in CSV and Excel files, view the top and bottom rows, and get information on columns and data types. 2) Use groupby to get mean math scores by gender, use pipe to pass data through a custom function, and find absolute values, all/any values above/below thresholds. 3) Filter data between values, find correlations, and calculate mean, median, mode, percentage change, skew, and standard error. It also uses value_counts and finds missing values.

Uploaded by

rony sheth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

IMBA (2020-25)

Python Programming

Group Assignment-2

Term-1
Submitted to: Prof. Manoj Kumar

Group Number-10A

Roll No. Name


23ibm110 Ansh Chhaya
23ibm137 Kushagra Choudhary
23ibm139 Maanvijay Solanki
23ibm152 Rony Sheth
23ibm158 Shatakshi Srivastava
23ibm162 Shubhi Pateriya

1 | Page
Q.1 Solve the following question using excel file named (accidental-deaths-in-usa-
monthly.csv). (Using
pandas library)
a. To read csv file
b. To see the top 10 rows of the table
c. To see the last 8 rows of the table
d. To get information of the columns and its data
e. To know about data types of the column data
f. To know index
g. Use of loc and iloc
h. To use to timestamp
i. Find non-missing values in the data-table
j. Use of replace function
k. Use of

sort_values Code:
import pandas as pd

# a. To read the CSV file


file_path = "/accidental-deaths-in-usa-monthly.csv"
data = pd.read_csv(file_path)

# b. To see the top 10 rows of the table


top_10_rows = data.head(10)
print("b. Top 10 rows:")
print(top_10_rows)

# c. To see the last 8 rows of the table


last_8_rows = data.tail(8)
print("c. Last 8 rows:")
print(last_8_rows)

# d. To get information about the columns and its data


column_info = data.info()

# e. To know about data types of the column data


data_types = data.dtypes

# f. To know the index


index = data.index

# g. Use of loc and iloc


# Example of using loc to select rows and columns by labels
subset_loc = data.loc[5:10, ['Month', 'Accidental deaths in USA:
monthly, 1973 ? 1978']]
# Example of using iloc to select rows and columns by integer positions
subset_iloc = data.iloc[5:11, 0:2]

2 | Page
# h. To use to timestamp
data['Month'] = pd.to_datetime(data['Month'], format='%Y-%m')
print("h.\n",data)

# i. Find non-missing values in the data-table


non_missing_values = data.notnull().sum()

# j. Use of replace function


data['Accidental deaths in USA: monthly, 1973 ? 1978'] =
data['Accidental deaths in USA: monthly, 1973 ? 1978'].replace(',', '')
print("j.\n",data)

# k. Use of sort_values
sorted_data = data.sort_values(by='Month')

# Display the results print("\


nd. Column Information:")
print(column_info)

print("\ne. Data Types:")


print(data_types)

print("\nf. Index:")
print(index)

print("\ng. Subset using loc:")


print(subset_loc)

print("\ng. Subset using iloc:")


print(subset_iloc)

print("\ni. Non-Missing Values:")


print(non_missing_values)

print("\nk. Data after replacing commas:")


print(sorted_data)

Output:
b. Top 10 rows:
Month Accidental deaths in USA: monthly, 1973 ? 1978
0 1973-01 9007
1 1973-02 8106
2 1973-03 8928
3 1973-04 9137
4 1973-05 10017
5 1973-06 10826
6 1973-07 11317
7 1973-08 10744
8 1973-09 9713
9 1973-10 9938

3 | Page
c. Last 8 rows:
Month Accidental deaths in USA: monthly, 1973 ? 1978
64 1978-05 9115
65 1978-06 9434
66 1978-07 10484
67 1978-08 9827
68 1978-09 9110
69 1978-10 9070
70 1978-11 8633
71 1978-12 9240
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 72 entries, 0 to 71
Data columns (total 2 columns):
# Column Non-Null Count
Dtype

---
0 Month 72 non-null
object
1 Accidental deaths in USA: monthly, 1973 ? 1978 72 non-null
int64
dtypes: int64(1), object(1)
memory usage: 1.2+ KB

h.
Month Accidental deaths in USA: monthly, 1973 ? 1978
0 1973-01-01 9007
1 1973-02-01 8106
2 1973-03-01 8928
3 1973-04-01 9137
4 1973-05-01 10017
.. ... ...
67 1978-08-01 9827
68 1978-09-01 9110
69 1978-10-01 9070
70 1978-11-01 8633
71 1978-12-01 9240

[72 rows x 2 columns]

j.
Month Accidental deaths in USA: monthly, 1973 ? 1978
0 1973-01-01 9007
1 1973-02-01 8106
2 1973-03-01 8928
3 1973-04-01 9137
4 1973-05-01 10017
.. ... ...
67 1978-08-01 9827
68 1978-09-01 9110
69 1978-10-01 9070
70 1978-11-01 8633
71 1978-12-01 9240

[72 rows x 2 columns]

d. Column Information:
None

4 | Page
e. Data Types:
Month object
Accidental deaths in USA: monthly, 1973 ? 1978 int64
dtype: object

f. Index:
RangeIndex(start=0, stop=72, step=1)

g. Subset using loc:


Month Accidental deaths in USA: monthly, 1973 ? 1978
5 1973-06 10826
6 1973-07 11317
7 1973-08 10744
8 1973-09 9713
9 1973-10 9938
10 1973-11 9161

g. Subset using iloc:


Month Accidental deaths in USA: monthly, 1973 ? 1978
5 1973-06 10826
6 1973-07 11317
7 1973-08 10744
8 1973-09 9713
9 1973-10 9938
10 1973-11 9161

i. Non-Missing Values:
Month 72
Accidental deaths in USA: monthly, 1973 ? 1978 72
dtype: int64

k. Data after replacing commas:


Month Accidental deaths in USA: monthly, 1973 ? 1978
0 1973-01-01 9007
1 1973-02-01 8106
2 1973-03-01 8928
3 1973-04-01 9137
4 1973-05-01 10017
.. ... ...
67 1978-08-01 9827
68 1978-09-01 9110
69 1978-10-01 9070
70 1978-11-01 8633
71 1978-12-01 9240

[72 rows x 2 columns]

5 | Page
Q.2 Solve the following question using question using excel file
named (StudentsPerformance.xlsx).
(Using pandas library)
a. To read excel file
b. Use of groupby in the example
c. Use of pipe in the example
d. To get absolute value, all and any function
e. use of between and correlation function
f. Use of mean, median and mode
g. Use of pct_change
h. Use of skew and sem function
i. value_counts function
j. find missing values in the data table
k. Use of sort indeximport pandas as pd

Code:
# a. To read the CSV file
file_path = "/StudentsPerformance.csv"
df = pd.read_csv(file_path)

# Display the first few rows of the DataFrame


print("a. Reading the Excel file:")
print(df.head())

# b. Use of groupby:
grouped_data = df.groupby('gender')['math score'].mean()
print("\nb. Using groupby:")
print(grouped_data)

# c. Use of pipe:
def custom_function(data):
# Your custom processing here
return data

result = df.pipe(custom_function)
print("\nc. Using pipe:")
print(result.head())

# d. To get absolute value, all, and any function:


df['abs_math_score'] = df['math score'].abs()
all_greater_than_50 = df['reading score'].all()
any_less_than_40 = df['writing score'].lt(40).any()
print("\nd. Using abs, all, and any functions:")
print(df['abs_math_score'].head())
print(f"All values in 'reading_score' > 50: {all_greater_than_50}")
print(f"Any values in 'writing_score' < 40: {any_less_than_40}")

6 | Page
# e. Use of between and correlation function:
filtered_data = df[df['math score'].between(70, 90)]
correlation = df['math score'].corr(df['reading score'])
print("\ne. Using between and correlation functions:")
print(filtered_data.head())
print(f"Correlation between 'math_score' and 'reading_score':
{correlation}")

# f. Use of mean, median, and mode:


mean_math_score = df['math score'].mean()
median_math_score = df['math score'].median()
mode_math_score = df['math score'].mode().values[0]
print("\nf. Using mean, median, and mode:")
print(f"Mean 'math_score': {mean_math_score}")
print(f"Median 'math_score': {median_math_score}")
print(f"Mode 'math_score': {mode_math_score}")

# g. Use of pct_change:
df['math_score_pct_change'] = df['math score'].pct_change() * 100
print("\ng. Using pct_change:")
print(df['math_score_pct_change'].head())

# h. Use of skew and sem functions:


skewness = df['math score'].skew()
sem_math_score = df['math score'].sem()
print("\nh. Using skew and sem functions:")
print(f"Skewness of 'math_score': {skewness}")
print(f"SEM of 'math_score': {sem_math_score}")

# i. value_counts function:
gender_counts = df['gender'].value_counts()
print("\ni. Using value_counts:")
print(gender_counts)

# j. Find missing values in the data table:


missing_values = df.isnull().sum() print("\
nj. Finding missing values:")
print(missing_values)

# k. Use of sort_index:
sorted_df = df.sort_index(ascending=True)
print("\nk. Using sort_index:")
print(sorted_df.head())

7 | Page
Output:
a. Reading the Excel file:
gender race/ethnicity parental level of education lunch \
0 female group B bachelor's degree standard
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard

test preparation course math score reading score writing score


0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75

b. Using groupby:
gender
female 63.633205
male 68.728216
Name: math score, dtype: float64

c. Using pipe:
gender race/ethnicity parental level of education lunch \
0 female group B bachelor's degree standard
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard

test preparation course math score reading score writing score


0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75

d. Using abs, all, and any functions:


0 72
1 69
2 90
3 47
4 76
Name: abs_math_score, dtype: int64
All values in 'reading_score' > 50: True
Any values in 'writing_score' < 40: True

e. Using between and correlation functions:


gender race/ethnicity parental level of education lunch \
0 female group B bachelor's degree standard
2 female group B master's degree standard
4 male group C some college standard
5 female group B associate's degree standard
6 female group B some college standard

8 | Page
test preparation course math score reading score writing score \
0 none 72 72 74
2 none 90 95 93
4 none 76 78 75
5 none 71 83 78
6 completed 88 95 92
abs_math_score
0 72
2 90
4 76
5 71
6 88
Correlation between 'math_score' and 'reading_score':
0.8175796636720546

f. Using mean, median, and mode:


Mean 'math_score': 66.089
Median 'math_score': 66.0
Mode 'math_score': 65

g. Using pct_change:
0 NaN
1 -4.166667
2 30.434783
3 -47.777778
4 61.702128
Name: math_score_pct_change, dtype: float64

h. Using skew and sem functions:


Skewness of 'math_score': -0.27893514909431694
SEM of 'math_score': 0.4794986944695449

i. Using value_counts:
female 518
male 482
Name: gender, dtype: int64

j. Finding missing values:


gender 0
race/ethnicity 0
parental level of education 0
lunch 0
test preparation course 0
math score 0
reading score 0
writing score 0
abs_math_score 0
math_score_pct_change 1
dtype: int64

k. Using sort_index:
gender race/ethnicity parental level of education lunch \
0 female group B bachelor's degree standard
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard

9 | Page
test preparation course math score reading score writing score \
0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75

abs_math_score math_score_pct_change
0 72 NaN
1 69 -4.166667
2 90 30.434783
3 47 -47.777778
4 76 61.702128

10 | P a g e

You might also like