ML Lab A1 A4
ML Lab A1 A4
Load the dataset from the below file and write python code to answer below exploratory analysis
questions :
a) How many observations are there in this dataset
num_observations = len(df) //df-CSV file name
g) What are the occupations of the youngest and oldest people in this dataset
youngest_person_age = df['Age'].min()
youngest_person_occupation = df[df['Age'] == youngest_person_age]['Occupation'].iloc[0]
oldest_person_age = df['Age'].max()
oldest_person_occupation = df[df['Age'] == oldest_person_age]['Occupation'].iloc[0]
import pandas as pd
import numpy as np
df=pd.read_csv('A4-Football.csv')
df.head()
# Find the bottom two teams with the lowest discipline bottom_teams = data.groupby('Team')
['Discipline'].sum().nsmallest(2)
print ("\nBottom two teams with lowest discipline:")
print(bottom_teams)
# Count the number of teams that made more fouls than their opponents
num_teams_more_fouls_than_opponents =
teams_more_fouls_than_opponents['Team']. unique ()
print(f"{num_teams_more_fouls_than_opponents} teams made more fouls than
their opponents.")
print ("\nThe teams that made more fouls than their opponents:")
print (teams_more_fouls_than_opponents[['Team', 'Own Fouls', 'Opponent
Fouls']])
A4). Write python code for calculating various regression errors/error metrics such as SSE, MSE,
RMSE and R2 score. The function should take actual target values and predicted targets from
the model as input and return these error metrics as output
Here's a Python function that calculates the Sum of Squared Errors (SSE), Mean Squared
Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R2) score:
The line from sklearn.metrics import mean_squared_error, r2_score imports specific
functions mean_squared_error and r2_score from the sklearn.metrics module. These
functions are used for evaluating regression models and calculating performance metrics:
mean_squared_error: This function calculates the Mean Squared Error (MSE), which
measures the average squared difference between the actual and predicted values. It's a
widely used metric to evaluate regression models. The formula for MSE is:
where SSres is the sum of squared residuals and SStot is the total sum of squares.
import numpy as np
import sklearn.metrics as metrics
import matplotlib.pyplot as plt
y = np.array([-3, -1, -2, 1, -1, 1, 2, 1, 3, 4, 3, 5])
yhat = np.array([-2, 1, -1, 0, -1, 1, 2, 2, 3, 3, 3, 5])
x = list(range(len(y)))
plt.scatter(x, y, color="blue", label="original")
plt.plot(x, yhat, color="red", label="predicted")
plt.legend()
plt.show()
# calculate manually
d = y - yhat
mse_f = np.mean(d**2)
mae_f = np.mean(abs(d))
rmse_f = np.sqrt(mse_f)
r2_f = 1-(sum(d**2)/sum((y-np.mean(y))**2))
print("Results by manual calculation:")
print("MAE:",mae_f)
print("MSE:", mse_f)
print("RMSE:", rmse_f)
print("R-Squared:", r2_f)
mae = metrics.mean_absolute_error(y, yhat)
mse = metrics.mean_squared_error(y, yhat)
rmse = np.sqrt(mse) #mse**(0.5)
r2 = metrics.r2_score(y,yhat)
print("Results of sklearn.metrics:")
print("MAE:",mae)
print("MSE:", mse)
print("RMSE:", rmse)
print("R-Squared:", r2)
Output: