Exp 1 A

The document outlines a program to analyze the California Housing dataset by creating histograms and box plots for all numerical features to understand their distributions and identify outliers. It specifies the outputs to include statistical measures such as mean, standard deviation, and the number of outliers for each feature. The program utilizes Python libraries like pandas, matplotlib, and seaborn for data visualization and analysis.

Uploaded by

Shobha Hiremath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views5 pages

Exp 1 A

Uploaded by

Shobha Hiremath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Term work 1: Develop a program to create histograms for all numerical features

and analyze the distribution of each feature. Generate box plots for all
numerical features and identify any outliers. Use California Housing dataset.

Objective :

This program helps understand the distribution of each numerical feature and identify any potential
outliers in the data set.

The California Housing dataset is considered as an example.

Output to be observed :

The analysis for each of the attributes mentioned in the California dataset is analysed for

i. Mean of values in each of the attributes in dataset

ii. Standard deviation of each attributes
iii. Number of Values laying 25% of data
iv. Number of Values laying 50% of data
v. The outliers that are less than 25% of the values and more than 75% of values
vi. The histogram graph for the individual feature and analysis

Python Instructions
# -*- coding: utf-8 -*-
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_california_housing

# Fetch the California Housing dataset

california_housing = fetch_california_housing(as_frame=True)
data = california_housing.frame
# Display the first few rows of the dataset
print(data.head())
# Create histograms for all numerical features
def create_histograms(data):
data.hist(bins=30, figsize=(20, 15))
plt.suptitle('Histograms of Numerical Features', fontsize=20)
plt.show()

# Create box plots for all numerical features

def create_box_plots(data):
plt.figure(figsize=(20, 15))
for i, column in enumerate(data.columns):
plt.subplot(3, 3, i + 1)
sns.boxplot(y=data[column])
plt.title(f'Box Plot of {column}')
plt.suptitle('Box Plots of Numerical Features', fontsize=20)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()
# Analyze the distribution and identify outliers
def analyze_distribution(data):
for column in data.columns:
print (f'\nAnalyzing {column}:')
print(data[column].describe())
q1 = data[column].quantile(0.25)
q3 = data[column].quantile(0.75)
iqr = q3 - q1
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr
outliers = data[(data[column] < lower_bound) | (data[column] > upper_bound)]
print(f'Number of outliers in {column}: {len(outliers)}')

# Execute the functions

create_histograms(data)
create_box_plots(data)
analyze_distribution(data)

OUTPUT
MedInc HouseAge AveRooms ... Latitude Longitude MedHouseVal
0 8.3252 41.0 6.984127 ... 37.88 -122.23 4.526
1 8.3014 21.0 6.238137 ... 37.86 -122.22 3.585
2 7.2574 52.0 8.288136 ... 37.85 -122.24 3.521
3 5.6431 52.0 5.817352 ... 37.85 -122.25 3.413
4 3.8462 52.0 6.281853 ... 37.85 -122.25 3.422

[5 rows x 9 columns]
Analyzing MedInc:
count 20640.000000
mean 3.870671
std 1.899822
min 0.499900
25% 2.563400
50% 3.534800
75% 4.743250
max 15.000100
Name: MedInc, dtype: float64
Number of outliers in MedInc: 681

Analyzing HouseAge:
count 20640.000000
mean 28.639486
std 12.585558
min 1.000000
25% 18.000000
50% 29.000000
75% 37.000000
max 52.000000
Name: HouseAge, dtype: float64
Number of outliers in HouseAge: 0

Analyzing AveRooms:
count 20640.000000
mean 5.429000
std 2.474173
min 0.846154
25% 4.440716
50% 5.229129
75% 6.052381
max 141.909091
Name: AveRooms, dtype: float64
Number of outliers in AveRooms: 511

Analyzing AveBedrms:
count 20640.000000
mean 1.096675
std 0.473911
min 0.333333
25% 1.006079
50% 1.048780
75% 1.099526
max 34.066667
Name: AveBedrms, dtype: float64
Number of outliers in AveBedrms: 1424

Analyzing Population:
count 20640.000000
mean 1425.476744
std 1132.462122
min 3.000000
25% 787.000000
50% 1166.000000
75% 1725.000000
max 35682.000000
Name: Population, dtype: float64
Number of outliers in Population: 1196

Analyzing AveOccup:
count 20640.000000
mean 3.070655
std 10.386050
min 0.692308
25% 2.429741
50% 2.818116
75% 3.282261
max 1243.333333
Name: AveOccup, dtype: float64
Number of outliers in AveOccup: 711

Analyzing Latitude:
count 20640.000000
mean 35.631861
std 2.135952
min 32.540000
25% 33.930000
50% 34.260000
75% 37.710000
max 41.950000
Name: Latitude, dtype: float64
Number of outliers in Latitude: 0

Analyzing Longitude:
count 20640.000000
mean -119.569704
std 2.003532
min -124.350000
25% -121.800000
50% -118.490000
75% -118.010000
max -114.310000
Name: Longitude, dtype: float64
Number of outliers in Longitude: 0

Analyzing MedHouseVal:
count 20640.000000
mean 2.068558
std 1.153956
min 0.149990
25% 1.196000
50% 1.797000
75% 2.647250
max 5.000010
Name: MedHouseVal, dtype: float64
Number of outliers in MedHouseVal: 1071
The graph sheets is to be attached with appropriate scale
i. Two representations are to be presented on each graph sheet

Little Machine Shop Catalog
100% (2)
Little Machine Shop Catalog
128 pages
ML Lab Manual
No ratings yet
ML Lab Manual
110 pages
Machine Learning (BCSL606) Lab Manual
No ratings yet
Machine Learning (BCSL606) Lab Manual
117 pages
Machine Learning (BCSL606) Lab Manual
No ratings yet
Machine Learning (BCSL606) Lab Manual
117 pages
E Variable Transformation - Solution - Jupyter Notebook
No ratings yet
E Variable Transformation - Solution - Jupyter Notebook
29 pages
Ass 1 ML
No ratings yet
Ass 1 ML
21 pages
ML Lab Manual
No ratings yet
ML Lab Manual
60 pages
Merged
No ratings yet
Merged
35 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
Linear Regression Analysis - Polynomial Regression
No ratings yet
Linear Regression Analysis - Polynomial Regression
25 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
33 pages
Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
23 pages
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
No ratings yet
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
12 pages
DSDBAAssignment2 SUMEET
No ratings yet
DSDBAAssignment2 SUMEET
8 pages
PGM 1
No ratings yet
PGM 1
5 pages
ML Observation
No ratings yet
ML Observation
29 pages
ML 1st Program
No ratings yet
ML 1st Program
3 pages
Xgboost
No ratings yet
Xgboost
12 pages
West Rox
No ratings yet
West Rox
29 pages
Regression Algorithm
No ratings yet
Regression Algorithm
9 pages
Cia Code
No ratings yet
Cia Code
38 pages
Regression Analysis - Lasso and Ridge Regularization
No ratings yet
Regression Analysis - Lasso and Ridge Regularization
17 pages
A4 Dsbda Sana
No ratings yet
A4 Dsbda Sana
16 pages
A19 III Year Cse
No ratings yet
A19 III Year Cse
239 pages
Hack Your Database Before The Hackers Do
100% (1)
Hack Your Database Before The Hackers Do
52 pages
AAAAAAAAAAAAAAAAAAAAAAAAA
No ratings yet
AAAAAAAAAAAAAAAAAAAAAAAAA
41 pages
DL 1
No ratings yet
DL 1
11 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
ML LAB Manual-1
No ratings yet
ML LAB Manual-1
33 pages
ML Merged
No ratings yet
ML Merged
28 pages
Edp 3
No ratings yet
Edp 3
16 pages
Prog 1
No ratings yet
Prog 1
3 pages
ML Lab - Exp1-10
No ratings yet
ML Lab - Exp1-10
4 pages
Mlprogram 1
No ratings yet
Mlprogram 1
3 pages
Yamaha PA Full-Line 2018 Global EN PDF
No ratings yet
Yamaha PA Full-Line 2018 Global EN PDF
239 pages
ML Lab Program 1& 2
No ratings yet
ML Lab Program 1& 2
6 pages
DALab Part-B BCU&BU
No ratings yet
DALab Part-B BCU&BU
12 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
Program 01
No ratings yet
Program 01
1 page
Promax
No ratings yet
Promax
286 pages
Boston House Prediction - Colab1
No ratings yet
Boston House Prediction - Colab1
10 pages
DV Mid Internal 1
No ratings yet
DV Mid Internal 1
8 pages
Assignment 4
No ratings yet
Assignment 4
7 pages
Program 1
No ratings yet
Program 1
1 page
2 Program
No ratings yet
2 Program
8 pages
The Boston Housing Dataset
100% (2)
The Boston Housing Dataset
4 pages
Delhivery Mani
No ratings yet
Delhivery Mani
79 pages
DataPreparation - Outlier - Treatment ASSIGNMENT 1
100% (1)
DataPreparation - Outlier - Treatment ASSIGNMENT 1
7 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
P04 The Regression Pipeline - Preprocessing Ans
No ratings yet
P04 The Regression Pipeline - Preprocessing Ans
19 pages
Lab Prog1
No ratings yet
Lab Prog1
2 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Project Intern - Jupyter Notebook
No ratings yet
Project Intern - Jupyter Notebook
16 pages
Outlier Detection
No ratings yet
Outlier Detection
41 pages
Untitled
No ratings yet
Untitled
15 pages
Normialization Dataset
No ratings yet
Normialization Dataset
7 pages
House Price Prediction Models
No ratings yet
House Price Prediction Models
16 pages
Assignment 2 Ds
No ratings yet
Assignment 2 Ds
8 pages
Exp 12 and 15
No ratings yet
Exp 12 and 15
4 pages
Pandas
No ratings yet
Pandas
4 pages
ML Program No.1
No ratings yet
ML Program No.1
3 pages
Predicting Home Prices in Bangalore
No ratings yet
Predicting Home Prices in Bangalore
18 pages
BTVN1 - Colaboratory
No ratings yet
BTVN1 - Colaboratory
4 pages
2ND Quarter CSS 12 Week 5-6
No ratings yet
2ND Quarter CSS 12 Week 5-6
6 pages
Project Linear Regression
No ratings yet
Project Linear Regression
7 pages
Unit 5 Descriptive Statistics
No ratings yet
Unit 5 Descriptive Statistics
7 pages
Rest Dissertation Roy Fielding
100% (2)
Rest Dissertation Roy Fielding
5 pages
I. Experiment Environment: King Faisal University College of Engineering Department of Electrical Engineering
No ratings yet
I. Experiment Environment: King Faisal University College of Engineering Department of Electrical Engineering
7 pages
Debugging Essay
100% (1)
Debugging Essay
9 pages
Case Study - (Q & R) - DFC10033 - 1 2021 - 2022
No ratings yet
Case Study - (Q & R) - DFC10033 - 1 2021 - 2022
6 pages
ONB 2.0 Furnish Equipment Integration To External - v2.1
No ratings yet
ONB 2.0 Furnish Equipment Integration To External - v2.1
29 pages
Sample Phase 4
No ratings yet
Sample Phase 4
16 pages
Mini Project Synopsis
No ratings yet
Mini Project Synopsis
29 pages
Ph08k-Ce-nm 29 For 21j1c Slim
100% (2)
Ph08k-Ce-nm 29 For 21j1c Slim
1 page
Becl504 Lab Lesson Plan
No ratings yet
Becl504 Lab Lesson Plan
17 pages
Sample - Case Study Report
No ratings yet
Sample - Case Study Report
5 pages
Data Structure Tree: DR Mourad Raafat
No ratings yet
Data Structure Tree: DR Mourad Raafat
21 pages
BEREKET Database Design Basics
No ratings yet
BEREKET Database Design Basics
7 pages
Simatic Industrial Software SIMATIC Safety V14 Readme
No ratings yet
Simatic Industrial Software SIMATIC Safety V14 Readme
8 pages
Mitali Group of Industries: Curriculum Vitae
No ratings yet
Mitali Group of Industries: Curriculum Vitae
4 pages
Laserdesk Installation Guide and First Steps Scanlab en
No ratings yet
Laserdesk Installation Guide and First Steps Scanlab en
8 pages
Coaching & Educational Academy (Chiniot) : Objective Type
No ratings yet
Coaching & Educational Academy (Chiniot) : Objective Type
13 pages
Abcdplace Tcad2020 Lin
No ratings yet
Abcdplace Tcad2020 Lin
13 pages
HubSpot Architecture I
No ratings yet
HubSpot Architecture I
5 pages
Adobe Flash Tutorial V20
No ratings yet
Adobe Flash Tutorial V20
18 pages
John C. Scott Jr. Aerospace Design Engineer
No ratings yet
John C. Scott Jr. Aerospace Design Engineer
6 pages
pythonOCC Parametric Modeling Tutorial
No ratings yet
pythonOCC Parametric Modeling Tutorial
11 pages
LABEX3
No ratings yet
LABEX3
10 pages
PRGM2
No ratings yet
PRGM2
1 page
Hello Hello Hello Hello This Is Preeti Hello Hello Hello Hello This Is Preeti Hello Hello Hello Hello This Is Preeti
No ratings yet
Hello Hello Hello Hello This Is Preeti Hello Hello Hello Hello This Is Preeti Hello Hello Hello Hello This Is Preeti
1 page
Aacable Wordpress Com Tag Block Adult Web Sites in Mikrotik
No ratings yet
Aacable Wordpress Com Tag Block Adult Web Sites in Mikrotik
7 pages
Otsm
No ratings yet
Otsm
2 pages
SMP Gateway and SEL Relays
No ratings yet
SMP Gateway and SEL Relays
6 pages
Passive Income PDF
No ratings yet
Passive Income PDF
2 pages

Exp 1 A

Uploaded by

Exp 1 A

Uploaded by

Term work 1: Develop a program to create histograms for all numerical features

The California Housing dataset is considered as an example.

i. Mean of values in each of the attributes in dataset

# Fetch the California Housing dataset

# Create box plots for all numerical features

# Execute the functions

You might also like