0% found this document useful (0 votes)

15 views6 pages

ML Lab A1 A4

The document provides Python code for exploratory data analysis on two datasets, focusing on observations, features, occupations, and various metrics related to a football tournament. It also includes a function for calculating regression error metrics such as SSE, MSE, RMSE, and R2 score using actual and predicted values. The code demonstrates data manipulation and analysis using pandas and sklearn libraries.

Uploaded by

safiapathan03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views6 pages

ML Lab A1 A4

Uploaded by

safiapathan03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

A1).

Load the dataset from the below file and write python code to answer below exploratory analysis
questions :
a) How many observations are there in this dataset
num_observations = len(df) //df-CSV file name

print(f"There are {num_observations} observations in the dataset.")

b) How many various features are there in the dataset

num_features = len(df.columns)

print(f"There are {num_features} features in the dataset.")

c) How many different occupations (unique) are there in the dataset.

num_unique_occupations = df['High'].nunique()

print(f"There are {num_unique_occupations} different occupations in the dataset.")

d) What occupation is the most common.

most_common_occupation = df['Occupation'].mode()[0]

print(f"The most common occupation is: {most_common_occupation}")

e) What is the average age of all the people in this dataset

average_age = df['Age'].mean()

print(f"The average age of all people in the dataset is: {average_age:.2f}")

f) What is the average age of people in each occupation group

average_age_per_occupation = df.groupby('Occupation')['Age'].mean()
print("Average age of people in each occupation group:")
print(average_age_per_occupation)

g) What are the occupations of the youngest and oldest people in this dataset

youngest_person_age = df['Age'].min()
youngest_person_occupation = df[df['Age'] == youngest_person_age]['Occupation'].iloc[0]

oldest_person_age = df['Age'].max()
oldest_person_occupation = df[df['Age'] == oldest_person_age]['Occupation'].iloc[0]

print(f"The occupation of the youngest person is: {youngest_person_occupation}")

print(f"The occupation of the oldest person is: {oldest_person_occupation}")
A2. Load the dataset from the below file and write python code to answer below exploratory
analysis questions:
a) How many teams participated in this tournament.
b) List top two teams with high discipline and bottom two teams with low discipline (you can
consider red and yellow cards to calculate discipline)
c) On average, how many yellow cards are given per team.
d) How many teams scored more than 5 goals and which are those teams.
e) Which team is most accurate in shooting?
f) How many teams made more fouls than their opponents?

import pandas as pd
import numpy as np
df=pd.read_csv('A4-Football.csv')
df.head()

a) teams_participated = data['Team']. nunique ()

print(f"The number of teams participated in the tournament: {teams_participated}")

b) data ['Discipline'] = data ['Red Cards'] + data ['Yellow Cards']

# Find the top two teams with the highest discipline
top_teams = data.groupby('Team')['Discipline'].sum().nlargest(2)
print ("Top two teams with highest discipline:") print(top_teams)

# Find the bottom two teams with the lowest discipline bottom_teams = data.groupby('Team')
['Discipline'].sum().nsmallest(2)
print ("\nBottom two teams with lowest discipline:")
print(bottom_teams)

c) average_yellow= data.groupby('Team')['Yellow Cards'].mean()

# Calculate overall average of yellow cards across all teams

overall_average_yellow = average_yellow.mean()
print(f"On average, {overall_average_yellow:.2f} yellow cards are given per team.")

d) teams_goals = data [data ['Goals Scored'] > 5]

# Count the number of teams that scored more than 5 goals

Num_teams_more_5_goals = teams_goals['Team']. unique ()
print(f"{Num_teams_more_5_goals} teams scored more than 5 goals.")
print ("\nThe teams that scored more than 5 goals:")
print (teams_goals[['Team', 'Goals Scored']])

e) most_accurate_team = data.loc[data['Shooting Accuracy'].idxmax()]

print (f"The most accurate team in shooting is: {most_accurate_team['Team']} " f"with a
shooting accuracy of {most_accurate_team['Shooting Accuracy']}.")
# If you need the shooting accuracy values for all teams, you can sort the
#DataFrame:

sorted_teams_by_accuracy = data.sort_values(by='Shooting Accuracy',

ascending=False)
print ("\nTeams sorted by shooting accuracy:")
print (sorted_teams_by_accuracy[['Team', 'Shooting Accuracy']])

f) teams_more_fouls_than_opponents = data [data ['Own Fouls'] > data ['Opponent

Fouls']]

# Count the number of teams that made more fouls than their opponents
num_teams_more_fouls_than_opponents =
teams_more_fouls_than_opponents['Team']. unique ()
print(f"{num_teams_more_fouls_than_opponents} teams made more fouls than
their opponents.")
print ("\nThe teams that made more fouls than their opponents:")
print (teams_more_fouls_than_opponents[['Team', 'Own Fouls', 'Opponent
Fouls']])

A4). Write python code for calculating various regression errors/error metrics such as SSE, MSE,
RMSE and R2 score. The function should take actual target values and predicted targets from
the model as input and return these error metrics as output

Here's a Python function that calculates the Sum of Squared Errors (SSE), Mean Squared
Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R2) score:
The line from sklearn.metrics import mean_squared_error, r2_score imports specific
functions mean_squared_error and r2_score from the sklearn.metrics module. These
functions are used for evaluating regression models and calculating performance metrics:
mean_squared_error: This function calculates the Mean Squared Error (MSE), which
measures the average squared difference between the actual and predicted values. It's a
widely used metric to evaluate regression models. The formula for MSE is:

MSE = Σ(actual - predicted)^2 / n

r2_score: This function calculates the R-squared (R2) score, also known as the coefficient of
determination. It measures the proportion of variance in the dependent variable (target)
that is predictable from the independent variables (features). R2 score ranges between 0
and 1, where 1 indicates a perfect fit.

R2 Score = 1 - (SSres / SStot)

where SSres is the sum of squared residuals and SStot is the total sum of squares.
import numpy as np
import sklearn.metrics as metrics
import matplotlib.pyplot as plt
y = np.array([-3, -1, -2, 1, -1, 1, 2, 1, 3, 4, 3, 5])
yhat = np.array([-2, 1, -1, 0, -1, 1, 2, 2, 3, 3, 3, 5])
x = list(range(len(y)))
plt.scatter(x, y, color="blue", label="original")
plt.plot(x, yhat, color="red", label="predicted")
plt.legend()
plt.show()
# calculate manually
d = y - yhat
mse_f = np.mean(d**2)
mae_f = np.mean(abs(d))
rmse_f = np.sqrt(mse_f)
r2_f = 1-(sum(d**2)/sum((y-np.mean(y))**2))
print("Results by manual calculation:")
print("MAE:",mae_f)
print("MSE:", mse_f)
print("RMSE:", rmse_f)
print("R-Squared:", r2_f)
mae = metrics.mean_absolute_error(y, yhat)
mse = metrics.mean_squared_error(y, yhat)
rmse = np.sqrt(mse) #mse**(0.5)
r2 = metrics.r2_score(y,yhat)
print("Results of sklearn.metrics:")
print("MAE:",mae)
print("MSE:", mse)
print("RMSE:", rmse)
print("R-Squared:", r2)

Output:

V7 Adobe Acrobat Pro DC 2018 (11 - 04-11 - 10) (11 - 25)
50% (2)
V7 Adobe Acrobat Pro DC 2018 (11 - 04-11 - 10) (11 - 25)
6 pages
ML Lab Manual 2024
No ratings yet
ML Lab Manual 2024
41 pages
Experimenting With Data Analysis Packages and Statistical Operations
No ratings yet
Experimenting With Data Analysis Packages and Statistical Operations
18 pages
External
No ratings yet
External
11 pages
Train
No ratings yet
Train
17 pages
Exemplar - Perform Feature Engineering
No ratings yet
Exemplar - Perform Feature Engineering
14 pages
Astros
No ratings yet
Astros
20 pages
Players Performance Numpy & Titanic SurvivalAnalysis Pandas BUG
No ratings yet
Players Performance Numpy & Titanic SurvivalAnalysis Pandas BUG
4 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
EDA Lab Manual
No ratings yet
EDA Lab Manual
93 pages
EDA Lab Manual
100% (2)
EDA Lab Manual
93 pages
ML Record
No ratings yet
ML Record
19 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Dal Programs With Output
No ratings yet
Dal Programs With Output
11 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
Bda Assign
No ratings yet
Bda Assign
15 pages
2a EDA
No ratings yet
2a EDA
16 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
Experiment No. 1
No ratings yet
Experiment No. 1
7 pages
Python Codes Test 2
No ratings yet
Python Codes Test 2
12 pages
Dataframe Programs
No ratings yet
Dataframe Programs
12 pages
Practical File Artificial Intelligence Class 10
No ratings yet
Practical File Artificial Intelligence Class 10
11 pages
ML Manual
No ratings yet
ML Manual
18 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
27 pages
Data Science Manual
No ratings yet
Data Science Manual
16 pages
Machine Learning Lab Word 12-1-2025. Document
No ratings yet
Machine Learning Lab Word 12-1-2025. Document
68 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Lab Experiments Vi Sem-1
No ratings yet
Lab Experiments Vi Sem-1
10 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
Data-Engineering EINDE
No ratings yet
Data-Engineering EINDE
13 pages
L3 Notes-1
No ratings yet
L3 Notes-1
8 pages
AI Practical 2025
No ratings yet
AI Practical 2025
14 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Lab 11,12
No ratings yet
Lab 11,12
7 pages
Eidd S8 TD1
No ratings yet
Eidd S8 TD1
3 pages
Data Science
No ratings yet
Data Science
18 pages
AD3411 DATA SCIENCE AND ANALYTICS LAB (2) - Removed
No ratings yet
AD3411 DATA SCIENCE AND ANALYTICS LAB (2) - Removed
24 pages
Python Practice Questions
No ratings yet
Python Practice Questions
5 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
ML 1
No ratings yet
ML 1
16 pages
FDS Slot 1
No ratings yet
FDS Slot 1
19 pages
Dsa Lab
No ratings yet
Dsa Lab
28 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
26 pages
Da Program Upto 6
No ratings yet
Da Program Upto 6
20 pages
Ip Practical File
No ratings yet
Ip Practical File
23 pages
Coding Notes Data Science
No ratings yet
Coding Notes Data Science
4 pages
Report
No ratings yet
Report
25 pages
QP DAV 3rd Sem Dec 2023
No ratings yet
QP DAV 3rd Sem Dec 2023
12 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
DATA M EXAMS Programation 2
No ratings yet
DATA M EXAMS Programation 2
3 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
Xgboost
No ratings yet
Xgboost
12 pages
Dav Lab Manual
No ratings yet
Dav Lab Manual
28 pages
Unit 5 Descriptive Statistics
No ratings yet
Unit 5 Descriptive Statistics
7 pages
Pandas Worksheet
No ratings yet
Pandas Worksheet
3 pages
Data Science Lab Experiments
No ratings yet
Data Science Lab Experiments
32 pages
GE02 (DAVP) Assignment
No ratings yet
GE02 (DAVP) Assignment
3 pages
Lab Cs
No ratings yet
Lab Cs
38 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Final Project Report MRI Reconstruction
No ratings yet
Final Project Report MRI Reconstruction
19 pages
Age of Empires Rise of Rome
No ratings yet
Age of Empires Rise of Rome
35 pages
Monetary Statistics M
No ratings yet
Monetary Statistics M
42 pages
Abbotsford VFR Terminal Procedures Chart Rwy 01 & 19
No ratings yet
Abbotsford VFR Terminal Procedures Chart Rwy 01 & 19
3 pages
Method Statement For Installation
No ratings yet
Method Statement For Installation
6 pages
Philippine Public Administration
No ratings yet
Philippine Public Administration
15 pages
Lymph 4649 Document PDF
No ratings yet
Lymph 4649 Document PDF
17 pages
Id Questio N A Graph Is A Set of - and Set of - A Vertices, Edges B Variables, Values C Vertices, Distances D Variable, Equation Answer A Marks 1 Unit 1
No ratings yet
Id Questio N A Graph Is A Set of - and Set of - A Vertices, Edges B Variables, Values C Vertices, Distances D Variable, Equation Answer A Marks 1 Unit 1
94 pages
Illustrated Parts Catalog Bo105 Ls A-3: Lifting System Assy
No ratings yet
Illustrated Parts Catalog Bo105 Ls A-3: Lifting System Assy
2 pages
ERP in FMCG Company
No ratings yet
ERP in FMCG Company
48 pages
Chapter 1 Thesis Noise
100% (1)
Chapter 1 Thesis Noise
11 pages
Endometrium Embryology and Development
No ratings yet
Endometrium Embryology and Development
1 page
ONDC - Sept 2022
No ratings yet
ONDC - Sept 2022
16 pages
Mission 1 Stage 1 Copywriting
No ratings yet
Mission 1 Stage 1 Copywriting
3 pages
Power in The Stones: by Daniel Carlson
No ratings yet
Power in The Stones: by Daniel Carlson
7 pages
Trevithick Second Steam Locomotive PDF
50% (2)
Trevithick Second Steam Locomotive PDF
6 pages
Dismantling Naik
No ratings yet
Dismantling Naik
45 pages
Definition: The Ability To Use Strength Quickly To Produce An Explosive Effort
No ratings yet
Definition: The Ability To Use Strength Quickly To Produce An Explosive Effort
41 pages
Legal Positivism Austins Theory
No ratings yet
Legal Positivism Austins Theory
6 pages
Dhupguri Report
No ratings yet
Dhupguri Report
11 pages
WebSphere DataPower SOA Appliances and XSLT Part 1
No ratings yet
WebSphere DataPower SOA Appliances and XSLT Part 1
23 pages
Operating Systems
No ratings yet
Operating Systems
7 pages
AXP 2023 2024 ESG Report
No ratings yet
AXP 2023 2024 ESG Report
91 pages
FINAL MODEL PAPER 2023-24 Class 7
No ratings yet
FINAL MODEL PAPER 2023-24 Class 7
11 pages
Advertising Response Models
50% (2)
Advertising Response Models
36 pages
ACIIA July Newsletter
No ratings yet
ACIIA July Newsletter
14 pages
Number Series
No ratings yet
Number Series
16 pages
Cico Plast-N: Normal Water Reducing Admixture / Plasticiser For Concrete
No ratings yet
Cico Plast-N: Normal Water Reducing Admixture / Plasticiser For Concrete
2 pages
Action Plan in English
No ratings yet
Action Plan in English
4 pages

ML Lab A1 A4

Uploaded by

ML Lab A1 A4

Uploaded by

A1).

print(f"There are {num_observations} observations in the dataset.")

b) How many various features are there in the dataset

print(f"There are {num_features} features in the dataset.")

c) How many different occupations (unique) are there in the dataset.

print(f"There are {num_unique_occupations} different occupations in the dataset.")

d) What occupation is the most common.

print(f"The most common occupation is: {most_common_occupation}")

e) What is the average age of all the people in this dataset

print(f"The average age of all people in the dataset is: {average_age:.2f}")

f) What is the average age of people in each occupation group

print(f"The occupation of the youngest person is: {youngest_person_occupation}")

a) teams_participated = data['Team']. nunique ()

b) data ['Discipline'] = data ['Red Cards'] + data ['Yellow Cards']

c) average_yellow= data.groupby('Team')['Yellow Cards'].mean()

# Calculate overall average of yellow cards across all teams

d) teams_goals = data [data ['Goals Scored'] > 5]

# Count the number of teams that scored more than 5 goals

e) most_accurate_team = data.loc[data['Shooting Accuracy'].idxmax()]

sorted_teams_by_accuracy = data.sort_values(by='Shooting Accuracy',

f) teams_more_fouls_than_opponents = data [data ['Own Fouls'] > data ['Opponent

MSE = Σ(actual - predicted)^2 / n

R2 Score = 1 - (SSres / SStot)

You might also like