DA Programs
DA Programs
Data Preprocessing
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
A B
0 1.000000 5.000000
1 2.000000 6.666667
2 2.333333 7.000000
3 4.000000 8.000000
2. Forward/Backward Fill
Replace missing values with the previous or next value in the sequence.
# Forward fill
df['A'].fillna(method='ffill', inplace=True)
df['B'].fillna(method='ffill', inplace=True)
print(df)
A B
0 1.000000 5.000000
1 2.000000 6.666667
2 2.333333 7.000000
3 4.000000 8.000000
# 3. K-Nearest Neighbors (KNN) Imputation ## Replace missing values using the KNN algorithm.
[[1. 5. ]
[2. 6.5]
[2.5 7. ]
[4. 8. ]]
import pandas as pd
import numpy as np
# Create a sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 100]
})
A
0 1
1 2
2 3
3 4
4 100
2. Machine Learning Methods Use machine learning algorithms such as One-Class SVM and Local Outlier Factor (LOF) to detect anomalies.
print(df_cleaned)
A
1 2
2 3
3 4
4 100
import pandas as pd
import numpy as np
# Create a sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [2, 4, 6, 8, 10],
'C': [3, 6, 9, 12, 15]
})
# Calculate the correlation matrix
corr_matrix = df.corr()
# Identify highly correlated features
high_corr_features = corr_matrix[(corr_matrix > 0.9) & (corr_matrix < 1)].index
# Eliminate redundant features
df_eliminated = df.drop(high_corr_features, axis=1)
print(df_eliminated)
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4]
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
2. Implement any one imputation model
KNN Imputation Model: KNN imputation is a popular method for handling missing values. It works by finding the k most similar data points
(nearest neighbors) to the row with the missing value. The missing value is then imputed using the values from these nearest neighbors.
import pandas as pd
from sklearn.impute import KNNImputer
import numpy as np
# Create a sample dataset with missing values
data = {'A': [1, 2, np.nan, 4, 5],
'B': [np.nan, 3, 4, 5, 6],
'C': [7, 8, 9, np.nan, 11]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Create a KNN imputer with k=3
imputer = KNNImputer(n_neighbors=3)
# Fit the imputer to the data and transform the missing values
imputed_data = imputer.fit_transform(df)
# Convert the imputed data back to a DataFrame
imputed_df = pd.DataFrame(imputed_data, columns=df.columns)
print("\nImputed DataFrame:")
print(imputed_df)
Original DataFrame:
A B C
0 1.0 NaN 7.0
1 2.0 3.0 8.0
2 NaN 4.0 9.0
3 4.0 5.0 NaN
4 5.0 6.0 11.0
Imputed DataFrame:
A B C
0 1.000000 4.0 7.000000
1 2.000000 3.0 8.000000
2 2.333333 4.0 9.000000
3 4.000000 5.0 9.333333
4 5.000000 6.0 11.000000
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
3. Implement Linear Regression
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('salary_Data.csv')
dataset
YearsExperience Salary
0 1.1 39343.0
1 1.3 46205.0
2 1.5 37731.0
3 2.0 43525.0
4 2.2 39891.0
5 2.9 56642.0
6 3.0 60150.0
7 3.2 54445.0
8 3.2 64445.0
9 3.7 57189.0
10 3.9 63218.0
11 4.0 55794.0
12 4.0 56957.0
13 4.1 57081.0
14 4.5 61111.0
15 4.9 67938.0
16 5.1 66029.0
17 5.3 83088.0
18 5.9 81363.0
19 6.0 93940.0
20 6.8 91738.0
21 7.1 98273.0
22 7.9 101302.0
23 8.2 113812.0
24 8.7 109431.0
25 9.0 105582.0
26 9.5 116969.0
27 9.6 112635.0
28 10.3 122391.0
29 10.5 121872.0
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:,1].values
X
array([[ 1.1],
[ 1.3],
[ 1.5],
[ 2. ],
[ 2.2],
[ 2.9],
[ 3. ],
[ 3.2],
[ 3.2],
[ 3.7],
[ 3.9],
[ 4. ],
[ 4. ],
[ 4.1],
[ 4.5],
[ 4.9],
[ 5.1],
[ 5.3],
[ 5.9],
[ 6. ],
[ 6.8],
[ 7.1],
[ 7.9],
[ 8.2],
[ 8.7],
[ 9. ],
[ 9.5],
[ 9.6],
[10.3],
[10.5]])
▾ LinearRegression i ?
LinearRegression()
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
4.Implementation of Logistic Regression
Import Libraries
Feature Scaling
# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
▾ LogisticRegression i ?
LogisticRegression()
Evaluation Metrics
Accuracy: 73.03%
Confusion Matrix and Classification Report
Confusion Matrix:
[[36 13]
[11 29]]
Classification Report:
precision recall f1-score support
accuracy 0.73 89
macro avg 0.73 0.73 0.73 89
weighted avg 0.73 0.73 0.73 89
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2,
label=f'ROC Curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='Random')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve\nAccuracy: {:.2f}%'.format(
accuracy * 100))
plt.legend(loc="lower right")
plt.show()
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
5. Decision Tree Induction for Classification
Load Dataset
# Load dataset (Iris dataset)
iris = load_iris()
X, y = iris.data, iris.target # Features and labels
Data Splitting
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Model Training
# Initialize and train the Decision Tree Classifier
model = DecisionTreeClassifier(criterion="gini", max_depth=3, random_state=42)
model.fit(X_train, y_train)
▾ DecisionTreeClassifier i ?
DecisionTreeClassifier(max_depth=3, random_state=42)
Predictions
# Make predictions
y_pred = model.predict(X_test)
Model Evaluation
# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy:", accuracy)
Classification Report:
precision recall f1-score support
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
The Decision Tree Classifier is a simple yet effective machine learning model for classification
tasks. In this implementation, we used the Iris dataset to train and evaluate the model. The
decision tree achieves high accuracy, and by visualizing the tree and feature importance, we
gain insights into how decisions are made. This method is useful for explainable AI but can be
prone to overfitting if not carefully tuned. To improve generalization, techniques such as pruning
or ensemble methods like Random Forest can be applied.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
6. Implement Random Forest Classifier
1. Import Required Libraries We will be importing Pandas, matplotlib, seaborn and sklearn to build the model.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
2. Import Dataset
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target
df
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
3. Data Preparation
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
4. Splitting the Dataset
5. Feature Scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix, annot=True, fmt='g', cmap='Blues', cbar=False,
xticklabels=iris.target_names, yticklabels=iris.target_names)
Accuracy: 100.00%
8. Feature Importance
feature_importances = classifier.feature_importances_
plt.barh(iris.feature_names, feature_importances)
plt.xlabel('Feature Importance')
plt.title('Feature Importance in Random Forest Classifier')
plt.show()
Conclusion--From the graph we can see that petal width (cm) is the most
important feature followed closely by petal length (cm). The sepal width (cm)
and sepal length (cm) have lower importance in determining the model’s
predictions. This indicates that the classifier relies more on the petal
measurements to make predictions about the flower species.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
7. Implement ARIMA on Time Series data
import warnings
warnings.filterwarnings('ignore')
df
Value
Date
2020-01-01 0.496714
2020-01-02 0.358450
2020-01-03 1.006138
2020-01-04 2.529168
2020-01-05 2.295015
... ...
2020-04-05 -10.712354
2020-04-06 -10.416233
2020-04-07 -10.155178
2020-04-08 -10.150065
2020-04-09 -10.384652
SARIMAX Results
==============================================================================
Dep. Variable: Value No. Observations: 100
Model: ARIMA(2, 1, 2) Log Likelihood -130.434
Date: Sat, 22 Mar 2025 AIC 270.869
Time: 22:03:24 BIC 283.845
Sample: 01-01-2020 HQIC 276.119
- 04-09-2020
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ar.L1 0.1299 0.283 0.459 0.646 -0.425 0.684
ar.L2 0.8677 0.230 3.776 0.000 0.417 1.318
ma.L1 -0.0583 0.330 -0.177 0.860 -0.705 0.588
ma.L2 -0.9353 0.281 -3.323 0.001 -1.487 -0.384
sigma2 0.8134 0.148 5.502 0.000 0.524 1.103
===================================================================================
Ljung-Box (L1) (Q): 0.46 Jarque-Bera (JB): 0.33
Prob(Q): 0.50 Prob(JB): 0.85
Heteroskedasticity (H): 1.02 Skew: -0.14
Prob(H) (two-sided): 0.96 Kurtosis: 3.07
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
8. Object segmentation using hierarchical based methods
Install Required Libraries
Defaulting to user installation because normal site-packages is not writeableNote: you may need to restart the k
ernel to use updated packages.
import numpy as np
import matplotlib.pyplot as plt
from skimage import io, segmentation, color
from skimage.filters import sobel
Load an Image
# Here:
#n_segments=300 → Controls the number of superpixels.
Conclusion
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
9. Perform Visualization techniques (types of maps - Bar, Colum,
Line, Scatter, 3D Cubes etc)
Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
df
Category Value
0 C 98
1 B 69
2 D 50
3 D 38
4 C 24
5 D 54
6 D 74
7 A 98
8 C 80
9 E 18
10 C 97
11 E 10
12 A 17
13 B 97
14 D 72
15 A 20
16 D 90
17 B 17
18 B 44
19 A 44
20 B 42
21 E 14
22 B 50
23 D 37
24 D 16
25 D 82
26 D 81
27 E 21
28 C 43
29 A 42
30 D 57
31 B 32
32 D 71
33 B 97
34 B 46
35 D 53
36 E 95
37 B 44
38 B 74
39 D 56
40 B 87
41 B 12
42 D 10
43 D 14
44 A 99
45 E 23
46 E 36
47 B 18
48 E 88
49 B 24
10 Visualization Techniques
1. Bar Chart
plt.figure(figsize=(6, 4))
plt.bar(categories, values, color='skyblue')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Bar Chart')
plt.show()
plt.figure(figsize=(6, 4))
plt.barh(categories, values, color='salmon')
plt.xlabel('Values')
plt.ylabel('Categories')
plt.title('Column Chart')
plt.show()
3. Line Plot
plt.figure(figsize=(6, 4))
plt.plot(z, w, marker='o', linestyle='-', color='green')
plt.xlabel('X-Axis')
plt.ylabel('Y-Axis')
plt.title('Line Chart')
plt.show()
4. Scatter Plot
plt.figure(figsize=(6, 4))
plt.scatter(y, w, color='purple', alpha=0.7)
plt.xlabel('X-Axis')
plt.ylabel('Y-Axis')
plt.title('Scatter Plot')
plt.show()
5. Histogram
plt.figure(figsize=(6, 4))
plt.hist(df['Value'], bins=8, color='orange', alpha=0.7)
plt.xlabel('Value Ranges')
plt.ylabel('Frequency')
plt.title('Histogram')
plt.show()
6. Box Plot
plt.figure(figsize=(6, 4))
sns.boxplot(x='Category', y='Value', data=df, palette="Set2")
plt.title('Box Plot')
plt.show()
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable
to `hue` and set `legend=False` for the same effect.
7. Violin Plot
plt.figure(figsize=(6, 4))
sns.violinplot(x='Category', y='Value', data=df, palette="muted")
plt.title('Violin Plot')
plt.show()
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable
to `hue` and set `legend=False` for the same effect.
9. Heatmap
plt.figure(figsize=(6, 4))
data = np.random.rand(5, 5)
sns.heatmap(data, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Heatmap')
plt.show()
10. 3D Cube Visualization
fig = plt.figure(figsize=(6, 6))
ax = fig.add_subplot(111, projection='3d')
# Draw cube
ax.add_collection3d(Poly3DCollection(faces, alpha=0.3, linewidths=1, edgecolors='r'))
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
ax.set_title('3D Cube')
plt.show()
Conclusion
This script demonstrates 10 visualization techniques to analyze data effectively:
✔ Violin Plots & Heatmaps – Provide in-depth insights into variable relationships.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
10. Perform Descriptive analytics on healthcare data
Install Required Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Gender (Male/Female)
Cholesterol Level
Diabetes (Yes/No)
data = {
'Age': np.random.randint(20, 80, 100),
'Gender': np.random.choice(['Male', 'Female'], 100),
'Blood_Pressure': np.random.randint(90, 180, 100),
'Cholesterol': np.random.randint(150, 300, 100),
'Diabetes': np.random.choice(['Yes', 'No'], 100),
'Hospital_Stay': np.random.randint(1, 15, 100)
}
df = pd.DataFrame(data)
# Summary statistics
print(df.describe())
# Visualization
plt.figure(figsize=(5, 4))
sns.countplot(x='Gender', data=df, palette='coolwarm')
plt.title('Gender Distribution')
plt.show()
Gender
Male 61
Female 39
Name: count, dtype: int64
C:\Users\Ayesha Kausar\AppData\Local\Temp\ipykernel_24204\3193996698.py:7: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable
to `hue` and set `legend=False` for the same effect.
3. Distribution of Age
plt.figure(figsize=(6, 4))
sns.histplot(df['Age'], bins=10, kde=True, color='blue')
plt.xlabel('Age')
plt.ylabel('Count')
plt.title('Age Distribution of Patients')
plt.show()
4. Average Hospital Stay Based on Diabetes Condition
# Group by diabetes and compute mean hospital stay
print(df.groupby('Diabetes')['Hospital_Stay'].mean())
# Visualization
plt.figure(figsize=(5, 4))
sns.boxplot(x='Diabetes', y='Hospital_Stay', data=df, palette='Set2')
plt.title('Hospital Stay Duration Based on Diabetes')
plt.show()
Diabetes
No 7.551724
Yes 7.642857
Name: Hospital_Stay, dtype: float64
C:\Users\Ayesha Kausar\AppData\Local\Temp\ipykernel_24204\2272041737.py:6: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable
to `hue` and set `legend=False` for the same effect.
plt.figure(figsize=(6, 4))
sns.scatterplot(x='Blood_Pressure', y='Cholesterol', hue='Diabetes', data=df, palette='coolwarm')
plt.title('Blood Pressure vs Cholesterol Level')
plt.xlabel('Blood Pressure')
plt.ylabel('Cholesterol Level')
plt.show()
Conclusion
Using Descriptive Analytics, we derived key insights:
✔ Gender Distribution: The dataset has an almost equal number of males and females.
✔ Diabetes & Hospital Stay: Patients with diabetes tend to stay longer in the hospital.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
11. Perform Predictive analytics on Product Sales data
Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from statsmodels.tsa.holtwinters import ExponentialSmoothing
file_path = "sales_data_sample.csv"
df = pd.read_csv(file_path, encoding='latin1')
EDA
Dataset Overview:
Day MONTH_ID YEAR_ID QUANTITYORDERED PRICEEACH SALES
0 24 2 2003 30 95.70 2871.00
1 7 5 2003 34 81.35 2765.90
2 1 7 2003 41 94.74 3884.34
3 25 8 2003 45 83.26 3746.70
4 10 10 2003 49 100.00 5205.27
Summary Statistics:
Day MONTH_ID YEAR_ID QUANTITYORDERED PRICEEACH \
count 2823.000000 2823.000000 2823.00000 2823.000000 2823.000000
mean 14.291534 7.092455 2003.81509 35.092809 83.658544
std 8.777409 3.656633 0.69967 9.741443 20.174277
min 1.000000 1.000000 2003.00000 6.000000 26.880000
25% 6.000000 4.000000 2003.00000 27.000000 68.860000
50% 14.000000 8.000000 2004.00000 35.000000 95.700000
75% 21.000000 11.000000 2004.00000 43.000000 100.000000
max 31.000000 12.000000 2005.00000 97.000000 100.000000
SALES
count 2823.000000
mean 3553.889072
std 1841.865106
min 482.130000
25% 2203.430000
50% 3184.800000
75% 4508.000000
max 14082.800000
Missing Values:
Day 0
MONTH_ID 0
YEAR_ID 0
QUANTITYORDERED 0
PRICEEACH 0
SALES 0
dtype: int64
plt.figure(figsize=(10, 6))
sns.heatmap(df[['SALES', 'QUANTITYORDERED', 'PRICEEACH', 'MONTH_ID', 'YEAR_ID']].corr(), annot=True, cmap='coolwarm'
plt.title('Feature Correlation Heatmap')
plt.show()
plt.figure(figsize=(8, 5))
sns.histplot(df['SALES'], bins=30, kde=True)
plt.xlabel('Sales')
plt.ylabel('Frequency')
plt.title('Sales Distribution')
plt.show()
plt.figure(figsize=(8, 5))
sns.boxplot(y=df['SALES'])
plt.title('Sales Outlier Detection')
plt.show()
▾ LinearRegression i ?
LinearRegression()
Metric Value
0 Mean Absolute Error 6.557240e+02
1 Mean Squared Error 1.019664e+06
2 Root Mean Squared Error 1.009784e+03
3 R2 Score 7.396571e-01
future_data['PREDICTED_SALES'] = future_predictions
print("\nFuture Sales Predictions:\n")
print(future_data.to_string(index=False))
Conclusion
print("\nConclusion:\n")
print("1. The EDA revealed strong correlations between Quantity Ordered, Price Each, and Sales.\n")
print("2. The Linear Regression model achieved an R2 score of {:.2f}, indicating {} predictive accuracy.\n".format(
print("3. Future sales predictions highlight expected revenue based on given input values.\n")
print("4. Businesses can leverage these insights to optimize pricing, inventory, and sales strategies.")
Conclusion:
1. The EDA revealed strong correlations between Quantity Ordered, Price Each, and Sales.
2. The Linear Regression model achieved an R2 score of 0.74, indicating good predictive accuracy.
3. Future sales predictions highlight expected revenue based on given input values.
4. Businesses can leverage these insights to optimize pricing, inventory, and sales strategies.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
12. Apply Predictive analytics for Weather forecasting.
Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error
Missing Values:
origin 0
year 0
month 0
day 0
hour 0
temp 1
dewp 1
humid 1
wind_dir 418
wind_speed 3
wind_gust 3
precip 0
pressure 2730
visib 0
time_hour 0
dtype: int64
Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26130 entries, 0 to 26129
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 origin 26130 non-null object
1 year 26130 non-null int64
2 month 26130 non-null int64
3 day 26130 non-null int64
4 hour 26130 non-null int64
5 temp 26129 non-null float64
6 dewp 26129 non-null float64
7 humid 26129 non-null float64
8 wind_dir 25712 non-null float64
9 wind_speed 26127 non-null float64
10 wind_gust 26127 non-null float64
11 precip 26130 non-null float64
12 pressure 23400 non-null float64
13 visib 26130 non-null float64
14 time_hour 26130 non-null object
dtypes: float64(9), int64(4), object(2)
memory usage: 3.0+ MB
None
X = df[features]
y = df[target]
RandomForestRegressor(random_state=42)
print("Model Evaluation:")
print(f"Mean Absolute Error (MAE): {mae}")
print(f"Mean Squared Error (MSE): {mse}")
print(f"Root Mean Squared Error (RMSE): {rmse}")
Model Evaluation:
Mean Absolute Error (MAE): 0.10836664495115607
Mean Squared Error (MSE): 0.07925993917915358
Root Mean Squared Error (RMSE): 0.2815314177479195
r2 = r2_score(y_test, y_pred)
print(f"R² Score (Accuracy): {r2}")
Final Conclusion:
The Random Forest model effectively predicts temperature with an accuracy (R² Score) of 1.00. Key features like
humidity, wind speed, and pressure significantly impact predictions. Visualizations confirm a strong correlation
between actual and predicted values.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js