0% found this document useful (0 votes)
12 views3 pages

2

The document contains Python code for detecting and handling outliers in a dataset using the IQR method, visualized through boxplots. It also includes a section that visualizes the California Housing dataset with scatter plots comparing house age to average rooms and bedrooms. The code demonstrates data manipulation and visualization techniques using libraries such as NumPy, Pandas, Matplotlib, and Seaborn.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views3 pages

2

The document contains Python code for detecting and handling outliers in a dataset using the IQR method, visualized through boxplots. It also includes a section that visualizes the California Housing dataset with scatter plots comparing house age to average rooms and bedrooms. The code demonstrates data manipulation and visualization techniques using libraries such as NumPy, Pandas, Matplotlib, and Seaborn.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

DP 4

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

# Generate a sample dataset with some outliers

np.random.seed(42)

data = np.random.normal(50, 15, 200) # Normal distribution

data_with_outliers = np.append(data, [150, 200, 5, 2]) # Adding outliers

# Create a DataFrame

df = pd.DataFrame(data_with_outliers, columns=["Value"])

# Plot the original dataset

plt.figure(figsize=(8, 5))

sns.boxplot(x=df["Value"], color="skyblue")

plt.title("Boxplot: Detecting Outliers")

plt.show()

# Detecting Outliers using IQR

Q1 = df["Value"].quantile(0.25)

Q3 = df["Value"].quantile(0.75)

IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR

upper_bound = Q3 + 1.5 * IQR

# Identify outliers

outliers = df[(df["Value"] < lower_bound) | (df["Value"] > upper_bound)]

print("\nOutliers detected using IQR method:")


print(outliers)

# Handling Outliers: Replace them with the median

median_value = df["Value"].median()

df["Value_Handled"] = np.where(

(df["Value"] < lower_bound) | (df["Value"] > upper_bound),

median_value,

df["Value"],

# Plot the dataset after handling outliers

plt.figure(figsize=(8, 5))

sns.boxplot(x=df["Value_Handled"], color="lightgreen")

plt.title("Boxplot After Handling Outliers")

plt.show()

OR

import matplotlib.pyplot as plt

from sklearn.datasets import fetch_california_housing

# Load the California Housing dataset

data = fetch_california_housing(as_frame=True)

df = data['data']

# Create a grid with 1 row and 2 columns

fig, axs = plt.subplots(1, 2, figsize=(10, 5))

# Plot scatter plot of house age against average rooms

axs[0].scatter(df['HouseAge'], df['AveRooms'])

axs[0].set_xlabel('House Age')
axs[0].set_ylabel('Average Rooms')

axs[0].set_title('House Age vs. Average Bedrooms')

# Plot scatter plot of house age against average bedrooms

axs[1].scatter(df['HouseAge'], df['AveBedrms'])

axs[1].set_xlabel('House Age')

axs[1].set_ylabel('Average Bedrooms')

axs[1].set_title('House Age vs. Average Bedrooms')

# Adjust spacing between subplots

plt.tight_layout()

# Display the plots

plt.show()

You might also like