0% found this document useful (0 votes)

17 views13 pages

DA Manual - Part B

Uploaded by

sjrenuka37

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views13 pages

DA Manual - Part B

Uploaded by

sjrenuka37

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Data Analytics Lab Manual

Part -B

1
1 Probability
a. Calculating Simple probabilities
#Import necessary libraries
import pandas as pd
# Load your dataset
df = pd.read_csv('train.csv')

### Calculate probability of an event

probability_event = df['Survived'].value_counts() / len(df['Survived'])
print(probability_event)

# OUTPUT
Survived
0 0.616162
1 0.383838
Name: count, dtype: float64

b. Applications of Probability Distributions

# Import necessary libraries

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

## Drop missing values in the 'Age' column for simplicity

titanic_data = df.dropna(subset=['Age'])

# Plot the histogram

plt.hist(df['Age'], bins=30, density=True, alpha=0.5, color='b',label='Age Distribution')

2
# Fit a normal distribution to the data
mu, std = norm.fit(titanic_data['Age'])
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, mu, std)
plt.plot(x, p, 'k', linewidth=2)

# Display the plot

plt.hist(df['Age'], bins=30, density=True, alpha=0.5, color='b',label='Age Distribution')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.legend()
plt.show()

3
2 Test of Significance
a. t-Test: one sample, two independent samples and Paired

from scipy.stats import ttest_ind

# Load the dataset

df = pd.read_csv('StudentsPerformance.csv')

# Separate data for male and female students

from scipy.stats import ttest_ind
import pandas as pd

# Separate data for male students

male_scores = df[df['gender'] == 'male']['math score']
male_scores

OUTPUT
3 47
4 76
7 40
8 64
10 58
..
985 57
987 81
990 86
994 63
996 62
Name: math score, Length: 482, dtype: int64

# Separate data for female students

female_scores = df[df['gender'] == 'female']['math score']
female_scores
0 72
1 69
2 90
5 71
6 88
..
993 62
995 88
997 59
998 68
999 77
Name: math score, Length: 518, dtype: int64

4
# Perform independent two-sample t-test
t_statistic, p_value = ttest_ind(male_scores, female_scores)

# Print the results

print(f'T-Statistic: {t_statistic}')
print(f'P-Value: {p_value}')

OUTPUT
T-Statistic: 5.383245869828983
P-Value: 9.120185549328822e-08

# Interpret the results

alpha = 0.05
if p_value < alpha:
print("There is a significant difference in math scores between male and female
students.")
else:
print("There is no significant difference in math scores between male and female
students.")

OUTPUT
There is a significant difference in math scores between male and female students.

ANOVA: Comparing Multiple Groups (e.g., Ethnicity)

from scipy.stats import f_oneway

# Load the dataset

df = pd.read_csv('StudentsPerformance.csv')

# Extract math scores for each ethnicity_groups

ethnicity_groups = df['ethnicity'].unique()
ethnicity_groups

OUTPUT
array(['group B', 'group C', 'group A', 'group D', 'group E'],
dtype=object)

5
# Extract math scores for each ethnicity_data
ethnicity_data = {ethnicity: df[df['ethnicity'] == ethnicity]['math score'] for ethnicity in
ethnicity_groups}
ethnicity_data

OUTPUT
{'group B': 0 72
2 90
5 71
6 88
7 40
..
969 75
976 60
980 8
982 79
991 65
Name: math score, Length: 190, dtype: int64,
'group C': 1 69
4 76
10 58
15 69
16 88
..
979 91
984 74
986 40
996 62
997 59
Name: math score, Length: 319, dtype: int64,
'group A': 3 47
13 78
14 50
25 73
46 55
..
974 54
983 78
985 57
988 44
994 63
Name: math score, Length: 89, dtype: int64,
'group D': 8 64
11 40
20 66
22 44
24 74
..
989 67
992 55
993 62
998 68
999 77
Name: math score, Length: 262, dtype: int64,
'group E': 32 56
34 97
35 81
44 50
50 53

6
...
962 100
968 68
987 81
990 86
995 88
Name: math score, Length: 140, dtype: int64}

# Perform one-way ANOVA

f_statistic, p_value_anova = f_oneway(*ethnicity_data.values())

# Print the results

print(f'F-Statistic: {f_statistic}')
print(f'P-Value (ANOVA): {p_value_anova}')

OUTPUT
F-Statistic: 14.593885166332637
P-Value (ANOVA): 1.3732194030370688e-11

# Interpret the results

if p_value_anova < alpha:
print("There is a significant difference in math scores among different ethnicities.")
else:
print("There is no significant difference in math scores among different ethnicities.")

OUTPUT
There is a significant difference in math scores among different ethnicities.

7
3. Correlation and Regression Analysis
a. Scatter Diagram, Calculating of Correlation coefficient

#Importing necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score,
confusion_matrix

# Load the Housedata dataset

df = pd.read_csv('data.csv')

#To display the Columns in dataset

df.columns

# Display the first five rows of the dataset

print(df.head())
date price bedrooms bathrooms sqft_living sqft_lot \
0 2014-05-02 00:00:00 313000.0 3.0 1.50 1340 7912
1 2014-05-02 00:00:00 2384000.0 5.0 2.50 3650 9050
2 2014-05-02 00:00:00 342000.0 3.0 2.00 1930 11947
3 2014-05-02 00:00:00 420000.0 3.0 2.25 2000 8030
4 2014-05-02 00:00:00 550000.0 4.0 2.50 1940 10500

floors waterfront view condition sqft_above sqft_basement yr_built \

0 1.5 0 0 3 1340 0 1955
1 2.0 0 4 5 3370 280 1921
2 1.0 0 0 4 1930 0 1966
3 1.0 0 0 4 1000 1000 1963
4 1.0 0 0 4 1140 800 1976

yr_renovated street city statezip country

0 2005 18810 Densmore Ave N Shoreline WA 98133 USA
1 0 709 W Blaine St Seattle WA 98119 USA
2 0 26206-26214 143rd Ave SE Kent WA 98042 USA
3 0 857 170th Pl NE Bellevue WA 98008 USA
4 1992 9105 170th Ave NE Redmond WA 98052 USA

8
a. Scatter Diagram
# Scatter diagram for two variables (e.g., sqft_living vs. price)
plt.scatter(df['sqft_living'], df['price'])
plt.title('Scatter Diagram: sqft_living vs. price')
plt.xlabel('sqft_living')
plt.ylabel('price')
plt.show()

# Calculate the correlation coefficient

correlation_coefficient = df['sqft_living'].corr(df['price'])
print(f'Correlation Coefficient (sqft_living vs. price): {correlation_coefficient}')

OUTPUT
Correlation Coefficient (sqft_living vs. price): 0.43041002543262824

b. Linear Regression: Fitting, Testing Model Adequacy, and Prediction

(Simple and Multiple)

# Simple Linear Regression

X_simple = sm.add_constant(df[['sqft_living']])
y_simple = df['price']
model_simple = sm.OLS(y_simple, X_simple).fit()

9
# Summary of the simple linear regression
print(model_simple.summary())

OUTPUT
OLS Regression Results
==============================================================================
Dep. Variable: price R-squared: 0.185
Model: OLS Adj. R-squared: 0.185
Method: Least Squares F-statistic: 1045.
Date: Fri, 24 Nov 2023 Prob (F-statistic): 7.55e-207
Time: 17:00:16 Log-Likelihood: -66971.
No. Observations: 4600 AIC: 1.339e+05
Df Residuals: 4598 BIC: 1.340e+05
Df Model: 1
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]

const 1.295e+04 1.83e+04 0.709 0.479 -2.29e+04 4.88e+04

sqft_living 251.9501 7.792 32.334 0.000 236.674 267.227
==================== ==========================================================
Omnibus: 12550.690 Durbin-Watson: 1.980
Prob(Omnibus): 0.000 Jarque-Bera (JB): 504349454.972
Skew: 33.420 Prob(JB): 0.00
Kurtosis: 1623.778 Cond. No. 5.72e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
[2] The condition number is large, 5.72e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

# Multiple Linear Regression

X_multi = sm.add_constant(df[['sqft_living', 'bedrooms', 'bathrooms']])
y_multi = df['price']
model_multi = sm.OLS(y_multi, X_multi).fit()
# Summary of the multiple linear regression
print(model_multi.summary())

OUTPUT
OLS Regression Results
==============================================================================
Dep. Variable: price R-squared: 0.190
Model: OLS Adj. R-squared: 0.190
Method: Least Squares F-statistic: 359.8
Date: Fri, 24 Nov 2023 Prob (F-statistic): 6.78e-210
Time: 17:00:49 Log-Likelihood: -66957.
No. Observations: 4600 AIC: 1.339e+05
Df Residuals: 4596 BIC: 1.339e+05
Df Model: 3
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]

1
0
const 1.232e+05 3.02e+04 4.080 0.000 6.4e+04 1.82e+05
sqft_living 274.6629 12.692 21.641 0.000 249.781 299.545
bedrooms -5.514e+04 1.04e+04 -5.296 0.000 -7.56e+04 -3.47e+04
bathrooms 1.33e+04 1.5e+04 0.889 0.374 -1.6e+04 4.26e+04
==============================================================================
Omnibus: 12588.478 Durbin-Watson: 1.978
Prob(Omnibus): 0.000 Jarque-Bera (JB): 516518988.559
Skew: 33.683 Prob(JB): 0.00
Kurtosis: 1643.227 Cond. No. 9.83e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
[2] The condition number is large, 9.83e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

# Prediction
predictions_simple = model_simple.predict(X_simple)
predictions_simple
OUTPUT
0 350567.418016
1 932572.220763
2 499217.995341
3 516854.504515
4 501737.496651
...
4595 393398.940296
4596 380801.433743
4597 771324.136885
4598 539530.016310
4599 388359.937675
Length: 4600, dtype: float64

# Prediction
X_multi = sm.add_constant(df[['sqft_living', 'bedrooms', 'bathrooms']])
y_multi = df['price']
model_multi = sm.OLS(y_multi, X_multi).fit()
predictions_multi = model_multi.predict(X_multi)
predictions_multipredictions_multi = model_multi.predict(X_multi)
predictions_multi
OUTPUT
0 345726.155445
1 883214.890294
2 514428.895515
3 536981.112223
4 468684.238234
...
4595 395744.662521
4596 391988.957691
4597 817716.458384
4598 503232.046882
4599 400228.844801
Length: 4600, dtype: float64

10
c. Fitting of Linear regression

# Assuming 'waterfront' is a binary variable indicating waterfront or not

X_logistic = df[['sqft_living', 'waterfront']]
y_logistic = (df['price'] > df['price'].median()).astype(int)
# Binary target variable
X_train_log, X_test_log, y_train_log, y_test_log = train_test_split(X_logistic, y_logistic,
test_size=0.2, random_state=42)
print(X_logistic)
print(y_logistic)

OUTPUT
X_logistic:
sqft_living waterfront
0 1340 0
1 3650 0
2 1930 0
3 2000 0
4 1940 0
... ... ...
4595 1510 0
4596 1460 0
4597 3010 0
4598 2090 0
4599 1490 0

[4600 rows x 2 columns]

y_logistic:
0 0
1 1
2 0
3 0
4 1
..
4595 0
4596 1
4597 0
4598 0
4599 0
Name: price, Length: 4600, dtype: int32

# Logistic Regression
logreg = LogisticRegression()
logreg.fit(X_train_log, y_train_log)

OUTPUT

11
# Predictions
y_pred_log = logreg.predict(X_test_log)
y_pred_log
OUTPUT
array([0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0,
0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1,
1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1,
1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1,
1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0,
1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0,
1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1,
0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1,
0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0,
1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1,
1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0,
1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0,
0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0,
1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1,
0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1,
0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1,
1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0,
0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1,
0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0,
1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0,
1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1,
0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0,
0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0,
1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0,
0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1,
1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0,
0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0,
1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1,
0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1,
1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1,
1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1])

# Model Evaluation
accuracy_log = accuracy_score(y_test_log, y_pred_log)
conf_matrix_log = confusion_matrix(y_test_log, y_pred_log)
print(f'Accuracy (Logistic Regression): {accuracy_log}')
print(f'Confusion Matrix (Logistic Regression): \n{conf_matrix_log}'
OUTPUT
Accuracy (Logistic Regression): 0.7184782608695652
Confusion Matrix (Logistic Regression):
[[351 119]
[140 310]]

Dictionary of Banking and Finance
No ratings yet
Dictionary of Banking and Finance
2,291 pages
Data Science Practicals
No ratings yet
Data Science Practicals
47 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
32 pages
FDSA Lab Manual Aim Algorithm
No ratings yet
FDSA Lab Manual Aim Algorithm
32 pages
Dav Practicals
No ratings yet
Dav Practicals
33 pages
Datascience Lab
No ratings yet
Datascience Lab
24 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Dav Lab Manual
No ratings yet
Dav Lab Manual
28 pages
Dsa Lab
No ratings yet
Dsa Lab
28 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
31 pages
Stat Lab
No ratings yet
Stat Lab
24 pages
IIC Licensing C81 2020 S1 PowerPoint Final Revised PSWRD-merged
No ratings yet
IIC Licensing C81 2020 S1 PowerPoint Final Revised PSWRD-merged
156 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
ML Programs
No ratings yet
ML Programs
41 pages
AD3411 DATA SCIENCE AND ANALYTICS LAB (2) - Removed
No ratings yet
AD3411 DATA SCIENCE AND ANALYTICS LAB (2) - Removed
24 pages
AD3411
No ratings yet
AD3411
28 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
TOBo ML
No ratings yet
TOBo ML
135 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
27 pages
Experimenting With Data Analysis Packages and Statistical Operations
No ratings yet
Experimenting With Data Analysis Packages and Statistical Operations
18 pages
Ad3411 - Data Science and Analytics Laboratory
No ratings yet
Ad3411 - Data Science and Analytics Laboratory
26 pages
Machine Learning Lab Word 12-1-2025. Document
No ratings yet
Machine Learning Lab Word 12-1-2025. Document
68 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Data Science Fundamentals
No ratings yet
Data Science Fundamentals
22 pages
Ad3411-Data Science and Analytics Laboratory
No ratings yet
Ad3411-Data Science and Analytics Laboratory
27 pages
Fda Batch2program
No ratings yet
Fda Batch2program
18 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Fha-Pyhton Program Unit 1-4
No ratings yet
Fha-Pyhton Program Unit 1-4
13 pages
ML Lab Manual
No ratings yet
ML Lab Manual
28 pages
Objects Oriented Programming OOP
No ratings yet
Objects Oriented Programming OOP
67 pages
Objects Oriented Programming OOP
No ratings yet
Objects Oriented Programming OOP
66 pages
Data Analytics Lab Manual Final1
No ratings yet
Data Analytics Lab Manual Final1
32 pages
Commands For Data Analysis Using R
No ratings yet
Commands For Data Analysis Using R
11 pages
DM Assignment
No ratings yet
DM Assignment
17 pages
Saurabh
No ratings yet
Saurabh
22 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
TYCS Practical
No ratings yet
TYCS Practical
26 pages
DALab Part-B BCU&BU
No ratings yet
DALab Part-B BCU&BU
12 pages
CSA105-LinearRegression-HousePrice-Prediction - Ipynb - Colaboratory
No ratings yet
CSA105-LinearRegression-HousePrice-Prediction - Ipynb - Colaboratory
17 pages
Python Codes Test 2
No ratings yet
Python Codes Test 2
12 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
Time Series Analysis Group 9
No ratings yet
Time Series Analysis Group 9
16 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
Machine Learning Algorithms Are Generally Categorized Into Three Main Types
No ratings yet
Machine Learning Algorithms Are Generally Categorized Into Three Main Types
7 pages
cs3362 Foundations of Data Science Lab Manual
No ratings yet
cs3362 Foundations of Data Science Lab Manual
53 pages
Comprehensive Data Exploration With Python
No ratings yet
Comprehensive Data Exploration With Python
20 pages
Data Science Manual
No ratings yet
Data Science Manual
16 pages
4 12
No ratings yet
4 12
17 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Boston House Prediction - Colab1
No ratings yet
Boston House Prediction - Colab1
10 pages
STA 2110 Notes
100% (1)
STA 2110 Notes
51 pages
R Console
No ratings yet
R Console
6 pages
Data Preprocess Steps
No ratings yet
Data Preprocess Steps
2 pages
Dal Programs With Output
No ratings yet
Dal Programs With Output
11 pages
Gratuity Valuation From - 1000
No ratings yet
Gratuity Valuation From - 1000
151 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
Phil Iri GST Report Filipino 2021 2022
No ratings yet
Phil Iri GST Report Filipino 2021 2022
21 pages
Two-Variable Regression Model - The Problem of Estimation
No ratings yet
Two-Variable Regression Model - The Problem of Estimation
35 pages
chp2 Linear Regression Problems
100% (1)
chp2 Linear Regression Problems
5 pages
Data Science
No ratings yet
Data Science
18 pages
CH3. Multiple Linear Regression 2023
No ratings yet
CH3. Multiple Linear Regression 2023
76 pages
ESSU Zitabelle-Cantos Regression
No ratings yet
ESSU Zitabelle-Cantos Regression
5 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
Dissertation Ordinal Logistic Regression
100% (2)
Dissertation Ordinal Logistic Regression
5 pages
Paper 1. Econometrics
No ratings yet
Paper 1. Econometrics
2 pages
BAN5
No ratings yet
BAN5
2 pages
Employee Benefits As 15
No ratings yet
Employee Benefits As 15
58 pages
CDA Assignment4
No ratings yet
CDA Assignment4
12 pages
Population Demographics of Muntilupa
No ratings yet
Population Demographics of Muntilupa
24 pages
Topic Presentation 7
No ratings yet
Topic Presentation 7
58 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Fam-L Formula Sheet
No ratings yet
Fam-L Formula Sheet
9 pages
Sample 2 For Group Project Report
No ratings yet
Sample 2 For Group Project Report
25 pages
LN - ieML LogisticRegression
No ratings yet
LN - ieML LogisticRegression
21 pages
Risk Assessment DPPS 210619
No ratings yet
Risk Assessment DPPS 210619
4 pages
Module 7 Content
No ratings yet
Module 7 Content
10 pages
Wi3xBG41QGat8QRuNeBmXA Population Latin and Caribbean Countries 2010 2019
No ratings yet
Wi3xBG41QGat8QRuNeBmXA Population Latin and Caribbean Countries 2010 2019
10 pages
Ho - Ancova Example
No ratings yet
Ho - Ancova Example
3 pages
Outliers and Influential Points
No ratings yet
Outliers and Influential Points
14 pages
Pengaruh Risk Perception-Risk Tolerance-Overconfidence-dan Loss Aversion Terhadap Pengambilan Keputusan Investasi
No ratings yet
Pengaruh Risk Perception-Risk Tolerance-Overconfidence-dan Loss Aversion Terhadap Pengambilan Keputusan Investasi
21 pages
Dependent Variable
No ratings yet
Dependent Variable
3 pages
Gender and Preference
No ratings yet
Gender and Preference
3 pages
STRATEGI KEPUTUSAN PENDANAAN UNTUK PENINGKATAN KINERJA KEUANGAN UMKM KULINER BERBASIS PRODUK LOKAL (Studi Di Kota Tasikmalaya)
No ratings yet
STRATEGI KEPUTUSAN PENDANAAN UNTUK PENINGKATAN KINERJA KEUANGAN UMKM KULINER BERBASIS PRODUK LOKAL (Studi Di Kota Tasikmalaya)
12 pages
Manajemen Risiko Safety Dalam Proyek Konstruksi Bendungan: Kajian Literatur
No ratings yet
Manajemen Risiko Safety Dalam Proyek Konstruksi Bendungan: Kajian Literatur
12 pages
MATH Handout 2021
No ratings yet
MATH Handout 2021
2 pages
Ramp Risk Analysis and Management Projects
No ratings yet
Ramp Risk Analysis and Management Projects
9 pages

DA Manual - Part B

Uploaded by

DA Manual - Part B

Uploaded by

Data Analytics Lab Manual

### Calculate probability of an event

b. Applications of Probability Distributions

# Import necessary libraries

## Drop missing values in the 'Age' column for simplicity

# Plot the histogram

# Display the plot

from scipy.stats import ttest_ind

# Load the dataset

# Separate data for male and female students

# Separate data for male students

# Separate data for female students

# Print the results

# Interpret the results

ANOVA: Comparing Multiple Groups (e.g., Ethnicity)

from scipy.stats import f_oneway

# Load the dataset

# Extract math scores for each ethnicity_groups

# Perform one-way ANOVA

# Print the results

# Interpret the results

#Importing necessary libraries

# Load the Housedata dataset

#To display the Columns in dataset

# Display the first five rows of the dataset

floors waterfront view condition sqft_above sqft_basement yr_built \

yr_renovated street city statezip country

# Calculate the correlation coefficient

b. Linear Regression: Fitting, Testing Model Adequacy, and Prediction

# Simple Linear Regression

const 1.295e+04 1.83e+04 0.709 0.479 -2.29e+04 4.88e+04

# Multiple Linear Regression

# Assuming 'waterfront' is a binary variable indicating waterfront or not

[4600 rows x 2 columns]

You might also like