0% found this document useful (0 votes)
4 views3 pages

Lab9

The document outlines an experiment conducted by Dhruv Sharma involving data analysis using Python libraries such as pandas, numpy, and scikit-learn. It describes the process of loading a dataset, handling missing values, standardizing the data, and applying Isolation Forest for anomaly detection, followed by training a Random Forest classifier. Finally, it includes a visualization of the classification results using a scatter plot.

Uploaded by

dhruvkeshav123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views3 pages

Lab9

The document outlines an experiment conducted by Dhruv Sharma involving data analysis using Python libraries such as pandas, numpy, and scikit-learn. It describes the process of loading a dataset, handling missing values, standardizing the data, and applying Isolation Forest for anomaly detection, followed by training a Random Forest classifier. Finally, it includes a visualization of the classification results using a scatter plot.

Uploaded by

dhruvkeshav123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

EXPERIMENT - 9

Name –: Dhruv Sharma

Sap id –: 500107715

Roll No –: R2142220916
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import IsolationForest
# Load dataset
df = pd.read_csv('/content/Global_AI_Content_Impact_Dataset.csv')
# Drop non-numeric columns (if any)
df_numeric = df.select_dtypes(include=[np.number])
# Handle missing values
df_numeric.fillna(df_numeric.mean(), inplace=True)
# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df_numeric)# First, use Isolation
Forest to create pseudo-labels (anomalies)
iso_forest = IsolationForest(contamination=0.05, random_state=42)
pseudo_labels = iso_forest.fit_predict(X_scaled)
# Train Random Forest on these pseudo-labels
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_scaled, (pseudo_labels == 1).astype(int)) # 1 for
normal, 0 for anomaly
# Predict using Random Forest
rf_labels = rf.predict(X_scaled)
# Plotting
plt.figure(figsize=(10, 6))
colors = np.array(['#377eb8', '#e41a1c']) # blue = normal, red =
anomaly
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=colors[rf_labels],
s=20)
plt.title('Random Forest Classification (using pseudo-anomaly
labels)')
plt.xlabel('Feature 1 (scaled)')
plt.ylabel('Feature 2 (scaled)')
plt.grid(True)
plt.show()

You might also like