0% found this document useful (0 votes)

11 views8 pages

EX. NO: 3 Performing Statistical Analysis On A Dataset DATE: 21/08/2024

DATA ANALYTICS 3

Uploaded by

robertdowneyrdj708

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views8 pages

EX. NO: 3 Performing Statistical Analysis On A Dataset DATE: 21/08/2024

DATA ANALYTICS 3

Uploaded by

robertdowneyrdj708

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

EX.

NO: 3 Performing statistical analysis on a dataset

DATE: 21/08/2024

AIM:

To perform statistical analysis like multiple regression and various statistical tests.

CODE:

from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score
import statsmodels.api as sm
from statsmodels.stats.weightstats import *
import scipy.stats

df = pd.read_csv('/content/Student_Performance.csv')
df

OUTPUT:

CODE:

df.info()

21
OUTPUT:

CODE:

label_encoder = LabelEncoder()
df['Extracurricular Activities'] = label_encoder.fit_transform(df['Extracurricular Activities'], )
df['Extracurricular Activities'].unique()

x = df.iloc[:, 0:-1]
y = df.iloc[:, -1]

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)

model = LinearRegression()
model.fit(x_train, y_train)

y_pred = model.predict(x_test)

r2_score(y_test, y_pred)

OUTPUT:

0.9880686410711422

CODE:

city_hall_dataset = pd.read_csv('/content/train.csv')
city_hall_dataset

22
OUTPUT:

CODE:

def results(p): this

if(p['p_value']<0.05):p['hypothesis_accepted'] = 'alternative'
if(p['p_value']>=0.05):p['hypothesis_accepted'] = 'null'
df = pd.DataFrame(p, index=[''])
cols = ['value1', 'value2', 'score', 'p_value', 'hypothesis_accepted']
return df[cols]

city_hall_dataset['SalePrice'] = np.log1p(city_hall_dataset['SalePrice'])
logged_budget = np.log1p(120000) #logged $120 000 is 11.695
logged_budget

OUTPUT:

11.695255355062795

One Sample T Test - 2 Tails

Question to answer - How is a budget of $120 000 situated vs. the average Ames house
SalePrice?
Is 120 000 (11.7 logged) any different from the mean SalePrice of the population?
We take a 25 observations sample, and perform One Sample T-Test.

CODE:

sample = city_hall_dataset.sample(n=25)
p = {}
p['value1'], p['value2'] = sample['SalePrice'].mean(), logged_budget
p['score'], pok I'm8['p_value'] = stats.ttest_1samp(sample['SalePrice'],
popmean=logged_budget)
results(p)
23
OUTPUT:

INFERENCE:
The budget is different from the average price of homes in Ames

One sample T-test One-tailed

Question - is budget of $120 000 lesser than mean?

CODE:

p['value1'], p['value2'] = sample['SalePrice'].mean(), logged_budget

p['score'], p['p_value'] = stats.ttest_1samp(sample['SalePrice'], popmean=logged_budget)
p['p_value'] = p['p_value']/2
results(p)

OUTPUT:

INFERENCE:
Alternate hyposthesis is accepted. Hence, we can say with 95% confidence that our
budget is not enough

Two sample T-test | Two-tailed | Means

Houses may be small or large, hence we can divide the population into 2 groups
Null Hypothesis : SalePrice of smaller houses = SalePrice of larger houses Alternative
Hypothesis : SalePrice of smaller houses ≠ SalePrice of larger houses

CODE:

24
smaller_houses = city_hall_dataset.sort_values('GrLivArea')[:730].sample(n=25)
larger_houses = city_hall_dataset.sort_values('GrLivArea')[730:].sample(n=25)
p['value1'], p['value2'] = smaller_houses['SalePrice'].mean(), larger_houses['SalePrice'].mean()
p['score'], p['p_value'], p['df'] = ttest_ind(smaller_houses['SalePrice'],
larger_houses['SalePrice'])
results(p)

OUTPUT:

INFERENCE:
There is differnece is sale price of small houses vs large houses

Two sample T-test | One-tailed | Means

Null Hypothesis : SalePrice of smaller houses >= SalePrice of larger houses

Alternative Hypothesis : SalePrice of smaller houses < SalePrice of larger houses

CODE:

p['value1'], p['value2'] = smaller_houses['SalePrice'].mean(), larger_houses['SalePrice'].mean()

p['score'], p['p_value'], p['df'] = ttest_ind(smaller_houses['SalePrice'],
larger_houses['SalePrice'], alternative='smaller')
results(p)

OUTPUT:

INFERENCE:
Larger houses are mre expensive

Two sample Z-test | One-tailed | Means

25
Using a larger sample size to draw conclusions. Here, normal distribution holds, hence Z test is
used
Null Hypothesis : SalePrice of smaller houses >= SalePrice of larger houses
Alternative Hypothesis : SalePrice of smaller houses < SalePrice of larger houses

CODE:

smaller_houses = city_hall_dataset.sort_values('GrLivArea')[:730].sample(n=100,
random_state=1)
larger_houses = city_hall_dataset.sort_values('GrLivArea')[730:].sample(n=100,
random_state=1)
p['value1'], p['value2'] = smaller_houses['SalePrice'].mean(), larger_houses['SalePrice'].mean()
p['score'], p['p_value'] = ztest(smaller_houses['SalePrice'], larger_houses['SalePrice'],
alternative='smaller')
results(p)

OUTPUT:

One sample Z-test | One-tailed

Null Hypothesis : Mean SalePrice of smaller houses => 11.695

Alternative Hypothesis : Mean SalePrice of smaller houses < 11.695

CODE:

p['value1'], p['value2'] = smaller_houses['SalePrice'].mean(), logged_budget

p['score'], p['p_value'] = ztest(smaller_houses['SalePrice'], value=logged_budget,
alternative='larger')
results(p)

OUTPUT:

INFERENCE:

26
This means $120 000 cant buy a small house (on average)

ANNOVA Test

Null Hypothesis : No difference between SalePrice means

Alternative Hypothesis : Difference between SalePrice means

CODE:

replacement = {'FV': "Floating Village Residential", 'C (all)': "Commercial", 'RH': "Residential
High Density",
'RL': "Residential Low Density", 'RM': "Residential Medium Density"}

smaller_houses['MSZoning_FullName'] = smaller_houses['MSZoning'].replace(replacement)
mean_price_by_zone = smaller_houses.groupby('MSZoning_FullName')
['SalePrice'].mean().to_frame()

CODE:

sh = smaller_houses.copy()
p['score'], p['p_value'] = stats.f_oneway(sh.loc[sh.MSZoning=='FV', 'SalePrice'],
sh.loc[sh.MSZoning=='C (all)', 'SalePrice'],
sh.loc[sh.MSZoning=='RH', 'SalePrice'],
sh.loc[sh.MSZoning=='RL', 'SalePrice'],
sh.loc[sh.MSZoning=='RM', 'SalePrice'],)
results(p)[['score', 'p_value', 'hypothesis_accepted']]

OUTPUT:

INFERENCE:
SalePrice varies based on Zone

27
RESULT:

T- test, Annova test and other statistical tests are done successfully.

Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
ADS LAB Merged
No ratings yet
ADS LAB Merged
86 pages
Integrated System Lab
No ratings yet
Integrated System Lab
25 pages
AI Lec 3
No ratings yet
AI Lec 3
36 pages
Module 2notes
No ratings yet
Module 2notes
44 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
Exercise4 Solution
No ratings yet
Exercise4 Solution
20 pages
ML File
No ratings yet
ML File
37 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
Assignment Project Using SPSS
No ratings yet
Assignment Project Using SPSS
14 pages
Data Scinece Practical File
No ratings yet
Data Scinece Practical File
23 pages
HW 3
No ratings yet
HW 3
20 pages
Lab ML
No ratings yet
Lab ML
26 pages
ML Record
No ratings yet
ML Record
21 pages
Xgboost
No ratings yet
Xgboost
12 pages
Document From Jahnavi
No ratings yet
Document From Jahnavi
20 pages
Data Analytucs 1
No ratings yet
Data Analytucs 1
5 pages
ML Manual
No ratings yet
ML Manual
9 pages
ML Project Part A 1
No ratings yet
ML Project Part A 1
6 pages
Regression Analysis - Lasso and Ridge Regularization
No ratings yet
Regression Analysis - Lasso and Ridge Regularization
17 pages
ML Record
No ratings yet
ML Record
19 pages
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
No ratings yet
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
20 pages
DM Assignment
No ratings yet
DM Assignment
17 pages
AAAAAAAAAAAAAAAAAAAAAAAAA
No ratings yet
AAAAAAAAAAAAAAAAAAAAAAAAA
41 pages
Linear Regression Assignment
0% (2)
Linear Regression Assignment
8 pages
Regression Analysis
No ratings yet
Regression Analysis
17 pages
Data Mining Final Assignment
No ratings yet
Data Mining Final Assignment
4 pages
ML Manual
No ratings yet
ML Manual
30 pages
IoT Task4 21BEC0384
No ratings yet
IoT Task4 21BEC0384
9 pages
T2 Summary VHA
No ratings yet
T2 Summary VHA
14 pages
GianluigiDeRubertis 228766
No ratings yet
GianluigiDeRubertis 228766
9 pages
Data Science Record - 05
No ratings yet
Data Science Record - 05
20 pages
Week 1 Get Familier With Jupyter Notebook
No ratings yet
Week 1 Get Familier With Jupyter Notebook
4 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Linear Regression Analysis - Polynomial Regression
No ratings yet
Linear Regression Analysis - Polynomial Regression
25 pages
TYCS Practical
No ratings yet
TYCS Practical
26 pages
Train
No ratings yet
Train
17 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
Linear Reg
No ratings yet
Linear Reg
25 pages
Faseeh Chap 2 Report
No ratings yet
Faseeh Chap 2 Report
30 pages
Comprehensive Data Exploration With Python
No ratings yet
Comprehensive Data Exploration With Python
20 pages
BZAN 6310-Project Instructions
No ratings yet
BZAN 6310-Project Instructions
4 pages
Making Predictions
No ratings yet
Making Predictions
13 pages
The Data Science Process
100% (1)
The Data Science Process
53 pages
8051 Microcontroller Program
100% (1)
8051 Microcontroller Program
15 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
Prediction
100% (1)
Prediction
10 pages
Turing Machine
No ratings yet
Turing Machine
99 pages
C1 W1 Lab03 Model Representation Soln-Copy1
No ratings yet
C1 W1 Lab03 Model Representation Soln-Copy1
7 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
7 pages
Regression: Introduction: Basic Idea: Use Data To Identify Among Variables and Use These Relationships To Make
No ratings yet
Regression: Introduction: Basic Idea: Use Data To Identify Among Variables and Use These Relationships To Make
23 pages
Bda Assign
No ratings yet
Bda Assign
15 pages
Pad Assignment No - 01
No ratings yet
Pad Assignment No - 01
6 pages
Cheat Sheet: Stan, Pystan and Arviz: Preliminaries Putting It All Together
No ratings yet
Cheat Sheet: Stan, Pystan and Arviz: Preliminaries Putting It All Together
9 pages
Elementary Statistics A Step by Step Approach 9th Edition Bluman Test Bank PDF Download
100% (2)
Elementary Statistics A Step by Step Approach 9th Edition Bluman Test Bank PDF Download
65 pages
Chapter 4
0% (1)
Chapter 4
65 pages
Module 2
No ratings yet
Module 2
20 pages
Class 6 Math Chapter 1 Knowing Our Numbers Solutions CE
No ratings yet
Class 6 Math Chapter 1 Knowing Our Numbers Solutions CE
13 pages
Process Control CHP 1
100% (1)
Process Control CHP 1
53 pages
Regression: Introduction: Basic Idea: Use Data To Identify Among Variables and Use These Relationships To Make
No ratings yet
Regression: Introduction: Basic Idea: Use Data To Identify Among Variables and Use These Relationships To Make
23 pages
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
Advanced Topics in Number Theory
No ratings yet
Advanced Topics in Number Theory
8 pages
Welding Quiz Solutions
No ratings yet
Welding Quiz Solutions
3 pages
Multiple Choice Questions: Answer: B Level: Easy Section: 6-1
No ratings yet
Multiple Choice Questions: Answer: B Level: Easy Section: 6-1
51 pages
11 Phy DPP 32
No ratings yet
11 Phy DPP 32
4 pages
Eda Document Longterm
No ratings yet
Eda Document Longterm
10 pages
AP Calc AB 2003 PDF
No ratings yet
AP Calc AB 2003 PDF
34 pages
EEG Eye State Report
No ratings yet
EEG Eye State Report
19 pages
Is 1893 (Part 4) :2005
100% (3)
Is 1893 (Part 4) :2005
24 pages
High Voltage Transformer
No ratings yet
High Voltage Transformer
12 pages
Formal Logic 2020 21 OBE Final Exam On March 19 FINAL PRINT
No ratings yet
Formal Logic 2020 21 OBE Final Exam On March 19 FINAL PRINT
3 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
Lead Compensator Design Paper
No ratings yet
Lead Compensator Design Paper
17 pages
Math Tessellation Final Project
No ratings yet
Math Tessellation Final Project
8 pages
Regression Week 2: Multiple Linear Regression Assignment 1: If You Are Using Graphlab Create
No ratings yet
Regression Week 2: Multiple Linear Regression Assignment 1: If You Are Using Graphlab Create
1 page
Submitted in Partial Fulfilment For The Award of Degree of
No ratings yet
Submitted in Partial Fulfilment For The Award of Degree of
13 pages
Customer Churn Analysis - Jupyter Notebook
No ratings yet
Customer Churn Analysis - Jupyter Notebook
10 pages
The 10 Minute Talk
No ratings yet
The 10 Minute Talk
11 pages
Chapter - 5 Is - LM Model Econ - 102 2
No ratings yet
Chapter - 5 Is - LM Model Econ - 102 2
28 pages
9th Grand Test Series V2 (2022-23)
No ratings yet
9th Grand Test Series V2 (2022-23)
5 pages
Maintaining Test Methods in The User's Laboratory: Standard Guide For
No ratings yet
Maintaining Test Methods in The User's Laboratory: Standard Guide For
4 pages
ST 16 2-5 (-4)
No ratings yet
ST 16 2-5 (-4)
9 pages
Test of Homogeneity Based On Geometric Mean of Variances
No ratings yet
Test of Homogeneity Based On Geometric Mean of Variances
11 pages
Solving Multiple Distribution Center Location Allocation Problem Using Kmeans Algorithm and Center of Gravity Method Take Jinjiang District of Chengdu as an ExampleIOP Conference Series Earth and Environmental Science
No ratings yet
Solving Multiple Distribution Center Location Allocation Problem Using Kmeans Algorithm and Center of Gravity Method Take Jinjiang District of Chengdu as an ExampleIOP Conference Series Earth and Environmental Science
7 pages
Quiz Diophantine Equations
No ratings yet
Quiz Diophantine Equations
4 pages
I-Tutor Weekly Test-3A Maths (C-IX) - 26-04-2020
No ratings yet
I-Tutor Weekly Test-3A Maths (C-IX) - 26-04-2020
1 page
U% & C.V%
100% (2)
U% & C.V%
4 pages
Python An Introduction
From Everand
Python An Introduction
Renier Engelbrecht
No ratings yet
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Gd Script
From Everand
Gd Script
Marijo Trkulja
No ratings yet

EX. NO: 3 Performing Statistical Analysis On A Dataset DATE: 21/08/2024

Uploaded by

EX. NO: 3 Performing Statistical Analysis On A Dataset DATE: 21/08/2024

Uploaded by

EX.

NO: 3 Performing statistical analysis on a dataset

from sklearn.preprocessing import LabelEncoder

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)

def results(p): this

One Sample T Test - 2 Tails

One sample T-test One-tailed

Question - is budget of $120 000 lesser than mean?

p['value1'], p['value2'] = sample['SalePrice'].mean(), logged_budget

Two sample T-test | Two-tailed | Means

Two sample T-test | One-tailed | Means

Null Hypothesis : SalePrice of smaller houses >= SalePrice of larger houses

p['value1'], p['value2'] = smaller_houses['SalePrice'].mean(), larger_houses['SalePrice'].mean()

Two sample Z-test | One-tailed | Means

One sample Z-test | One-tailed

Null Hypothesis : Mean SalePrice of smaller houses => 11.695

p['value1'], p['value2'] = smaller_houses['SalePrice'].mean(), logged_budget

Null Hypothesis : No difference between SalePrice means

You might also like