0% found this document useful (0 votes)

43 views10 pages

DA Lab ANSWERS

Uploaded by

sakthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views10 pages

DA Lab ANSWERS

Uploaded by

sakthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

AD8412 - DATA ANALYTICS LAB

1. Implement the following functions in the list of BMI values for people living in a rural area

bmi_list = [29, 18, 20, 22, 19, 25, 30, 28,22, 21, 18, 19, 20, 20, 22, 23]

(i) random.choice()
(ii) random.sample()
(iii) random.randint()
PROGRAM:

import random
from random import sample
def BMI(height, weight):
bmi = weight/(height**2)
return bmi

bmi_list = [29, 18, 20, 22, 19, 25, 30, 28,22, 21, 18, 19, 20, 20, 22, 23]
height = 1.79832
weight = 70
bmi= BMI(height, weight)
print("The BMI is", format(bmi), "so ", end='')
if (bmi < 18.5):
print("underweight")

elif ( bmi >= 18.5 and bmi < 24.9):

print("Healthy")

elif ( bmi >= 24.9 and bmi < 30):

print("overweight")

elif ( bmi >=30):

print("Suffering from Obesity")

The BMI is 21.64532402096181 so Healthy

(i) random.choice()

print(random.choice(bmi_list))

output:
30
In [98]:

Page 1 of 10
(ii)random.sample()

print(sample(bmi_list,3))

output: [18, 25, 22]

(iii)random.randint()

print(random.randint(0, 12))

output: 9

2. Use the random.choices() function to select multiple random items from a sequence with
repetition.

For example, You have a list of names, and you want to choose random four names from it,
and it’s okay for you if one of the names repeats.

names = ["Roger", "Nadal", "Novac", "Andre", "Sarena", "Mariya", "Martina", “KUMAR”]

PROGRAM:

import random

names=["Roger", "Nadal", "Novac", "Andre", "Sarena", "Mariya", "Martina","Kumar"]

# choose three random sample with replacement to including repetition

sample_list3 = random.choices(names, k=4)

print(sample_list3)

Output:
['Novac', 'Novac', 'Martina', 'Sarena']

3. Write a Python program to demonstrate the use of sample() function for string and tuple
types.

import random

string = "Welcome World"

print("With string:", random.sample(string, 4))

output: With string: ['r', 'm', 'W', 'W']

Page 2 of 10
tuple1 = ("Selshia", "AI", "computer", "science", "Jansons", "Engineering", "btech")

print("With tuple:", random.sample(tuple1, 4))

output:
With tuple: ['Jansons', 'Selshia', 'btech', 'Engineering']

4. Write a python script to implement the Z-Test for the following problem:

A school claimed that the students’ study that is more intelligent than the average school.
On calculating the IQ scores of 50 students, the average turns out to be 11. The mean of
the population IQ is 100 and the standard deviation is 15. Check whether the claim of
principal is right or not at a 5% significance level.

PROGRAM:

import math

import numpy as np

from numpy.random import randn

from statsmodels.stats.weightstats import ztest

mean_iq = 110

sd_iq = 15/math.sqrt(50)

alpha =0.05

null_mean =100

data = sd_iq*randn(50)+mean_iq

print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))

ztest_Score, p_value= ztest(data,value = null_mean, alternative='larger')

if(p_value < alpha):

print("Reject Null Hypothesis")

else:

print("Fail to Reject NUll Hypothesis")

OUTPUT:mean=109.65 stdv=2.06
Reject Null Hypothesis

Page 3 of 10
5. Write a Python program to demonstrate the ‘T-Test’ with suitable libraries for a sample
student’s data. (Create and use dataset of your own)

import pandas as pd

df=pd.read_csv("paired_ttest - paired_ttest.csv") tscore,pvalue= stats.ttest_rel(df['Brand

1'],df['Brand 2']) alpha=0.20

print(tscore,pvalue) if (pvalue>alpha):

print("Failed to reject or do not reject null hypothesis") else:

print("Reject null hypothesis")

output:

6. Import the necessary libraries in Python for implementing ‘One-Way ANOVA Test’ in a
sample dataset. (Create and use dataset of your own)

Program:

import pandas as pd

# load data file

df = pd.read_csv("https://reneshbedre.github.io/assets/posts/anova/onewayanova.txt",
sep="\t")

# reshape the d dataframe suitable for statsmodels package

df_melt = pd.melt(df.reset_index(), id_vars=['index'], value_vars=['A', 'B', 'C', 'D'])

# replace column names

df_melt.columns = ['index', 'treatments', 'value']

# generate a boxplot to see the data distribution by treatments. Using boxplot, we can

# easily detect the differences between different treatments

import matplotlib.pyplot as plt

import seaborn as sns

Page 4 of 10
ax = sns.barplot(x='treatments', y='value', data=df_melt)

ax = sns.swarmplot(x="treatments", y="value", data=df_melt)

plt.show()

output:

7. Import the necessary libraries in Python for implementing Two-Way ANOVA Test’ in a
sample dataset. (Create and use dataset of your own)

8. Let us consider a dataset where we have a value of response y for every feature x:

Generate a regression line for this sample data using Python.

PROGRAM:

Page 5 of 10
import numpy as np

import matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of observations/points

n = np.size(x)

# mean of x and y vector

m_x = np.mean(x)

m_y = np.mean(y)

# calculating cross-deviation and deviation about x

SS_xy = np.sum(yx) - nm_y*m_x

SS_xx = np.sum(xx) - nm_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx

b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):

plt.scatter(x, y, color = "m",

marker = "o", s = 30)

y_pred = b[0] + b[1]*x

plt.plot(x, y_pred, color = "g")

Page 6 of 10
plt.xlabel('x')

plt.ylabel('y')

plt.show()

def main():

x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

b = estimate_coef(x, y)

print("Estimated coefficients:\nb_0 = {} \

\nb_1 = {}".format(b[0], b[1]))

plot_regression_line(x, y, b)

if __name__ == "__main__":

main()

OUTPUT:
Estimated coefficients:
b_0 = 1.2363636363636363
b_1 = 1.1696969696969697

9. Import scipy and draw the line of Linear Regression for the following data:

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]

Page 7 of 10
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

Where the x-axis represents age, and the y-axis represents speed. We have registered the
age and speed of 13 cars as they were passing a tollbooth.

PROGRAM:

import matplotlib.pyplot as plt

from scipy import stats

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]

y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

slope, intercept, r, p, std_err = stats.linregress(x, y)

def myfunc(x):

return slope * x + intercept

mymodel = list(map(myfunc, x))

plt.scatter(x, y)

plt.plot(x, mymodel)

plt.xlabel('Age')

plt.ylabel('Speed Of Cars')

plt.show()

OUTPUT:

Page 8 of 10
Implement the time series analysis concept for a sample dataset using Pandas.
10. (Create and use dataset of your own)

Refer 12th program

Write a Python program to visualize the time series concepts using Matplotlib.
11. (Create and use dataset of your own)

REFER 12th program

12. Demonstrate various time series models using Python.(Create and use dataset of your own)

PROGRAM:

import matplotlib.pyplot as plt

df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')

# Draw Plot

def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):

plt.figure(figsize=(16,5), dpi=dpi)

plt.plot(x, y, color='tab:red')

plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)

plt.show()

plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales in Australia from

Page 9 of 10
1992 to 2008.')

OUTPUT:

Page 10 of 10

AD3411 DATA SCIENCE AND ANALYTICS LAB (2) - Removed
No ratings yet
AD3411 DATA SCIENCE AND ANALYTICS LAB (2) - Removed
24 pages
DM Slip Solutions
100% (1)
DM Slip Solutions
24 pages
Data Science Laboratory
No ratings yet
Data Science Laboratory
40 pages
MLC Practical
No ratings yet
MLC Practical
51 pages
Lab Mannual
No ratings yet
Lab Mannual
49 pages
Lab Manual (DAV)
No ratings yet
Lab Manual (DAV)
33 pages
ML Practical File
100% (2)
ML Practical File
43 pages
Rufh 2
No ratings yet
Rufh 2
28 pages
Pds Record Document Ds II
No ratings yet
Pds Record Document Ds II
36 pages
AD3411
No ratings yet
AD3411
28 pages
Ad3411 Data Science and Analytics Laboratory
100% (7)
Ad3411 Data Science and Analytics Laboratory
24 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
31 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Smec ML Lab Manual R22
No ratings yet
Smec ML Lab Manual R22
21 pages
DS - Lab Manual
No ratings yet
DS - Lab Manual
31 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
Data Science Manual
No ratings yet
Data Science Manual
16 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
27 pages
Ad3411-Data Science and Analytics Laboratory
No ratings yet
Ad3411-Data Science and Analytics Laboratory
27 pages
Gec Practicals
No ratings yet
Gec Practicals
31 pages
FDSA Lab Manual Aim Algorithm
No ratings yet
FDSA Lab Manual Aim Algorithm
32 pages
FDSA Lab Record
No ratings yet
FDSA Lab Record
30 pages
Dsa Lab
No ratings yet
Dsa Lab
28 pages
Assignment 1
No ratings yet
Assignment 1
16 pages
Data Science Assignment
No ratings yet
Data Science Assignment
24 pages
Vanshika Goyal Gec Practicals
No ratings yet
Vanshika Goyal Gec Practicals
31 pages
Ad3411 - Data Science and Analytics Laboratory
No ratings yet
Ad3411 - Data Science and Analytics Laboratory
26 pages
Dal Programs With Output
No ratings yet
Dal Programs With Output
11 pages
Experimenting With Data Analysis Packages and Statistical Operations
No ratings yet
Experimenting With Data Analysis Packages and Statistical Operations
18 pages
Fha-Pyhton Program Unit 1-4
No ratings yet
Fha-Pyhton Program Unit 1-4
13 pages
Data Science
No ratings yet
Data Science
18 pages
Dsa Lab Manual
No ratings yet
Dsa Lab Manual
17 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
Univds
No ratings yet
Univds
8 pages
Fdsa Record Ai&Ds
No ratings yet
Fdsa Record Ai&Ds
26 pages
4 12
No ratings yet
4 12
17 pages
Fda Batch2program
No ratings yet
Fda Batch2program
18 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
36 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
ML Lab Manual
No ratings yet
ML Lab Manual
28 pages
Rufh 4
No ratings yet
Rufh 4
24 pages
Dav Pracs
No ratings yet
Dav Pracs
9 pages
Python 1
No ratings yet
Python 1
16 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
Lab 11,12
No ratings yet
Lab 11,12
7 pages
Datascience Lab
No ratings yet
Datascience Lab
24 pages
Ankit Python
No ratings yet
Ankit Python
26 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
Exp 5-6-7-8
No ratings yet
Exp 5-6-7-8
8 pages
Fds Mannual
No ratings yet
Fds Mannual
39 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Lab Experiments Vi Sem-1
No ratings yet
Lab Experiments Vi Sem-1
10 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
ML Lab Manual
No ratings yet
ML Lab Manual
38 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
Statistical Approaches To Causal Analysis, 1st Edition EPUB DOCX PDF Download
100% (11)
Statistical Approaches To Causal Analysis, 1st Edition EPUB DOCX PDF Download
14 pages
Threats To Validity of Research Design
No ratings yet
Threats To Validity of Research Design
10 pages
Factorial Experiments R Codes
No ratings yet
Factorial Experiments R Codes
7 pages
CIPM
No ratings yet
CIPM
13 pages
Asm1 570
No ratings yet
Asm1 570
16 pages
Statistics Summer Course
No ratings yet
Statistics Summer Course
49 pages
The Normal Distribution
No ratings yet
The Normal Distribution
9 pages
Dimensionality Reduction (Pca)
No ratings yet
Dimensionality Reduction (Pca)
32 pages
A - 24-Step - Guide - On - How - To - Desi SLR
No ratings yet
A - 24-Step - Guide - On - How - To - Desi SLR
13 pages
PMMT100 FT 11 2020 1
No ratings yet
PMMT100 FT 11 2020 1
4 pages
Lecture 4 & 5 - Chapter 5 - Forecasting
No ratings yet
Lecture 4 & 5 - Chapter 5 - Forecasting
50 pages
Case Study DBM Maths - 3
No ratings yet
Case Study DBM Maths - 3
11 pages
AS Maths Statistics Unit 1 MS
No ratings yet
AS Maths Statistics Unit 1 MS
8 pages
Data Science Presentation SSJ.01
No ratings yet
Data Science Presentation SSJ.01
16 pages
Business Statistics Notes
No ratings yet
Business Statistics Notes
6 pages
Stsurvivalanalysis
No ratings yet
Stsurvivalanalysis
8 pages
Lesson Two
No ratings yet
Lesson Two
66 pages
02c# - Guggenmos Et Al. (2018) - Custom Contrast Testing
No ratings yet
02c# - Guggenmos Et Al. (2018) - Custom Contrast Testing
23 pages
2022 Year 10 5.3 AT3
No ratings yet
2022 Year 10 5.3 AT3
14 pages
Assignment 9-KS
No ratings yet
Assignment 9-KS
3 pages
Chapter 5 Hypothesis Testing
No ratings yet
Chapter 5 Hypothesis Testing
14 pages
Huong Dan Lam Tat Ca Cac Dang
No ratings yet
Huong Dan Lam Tat Ca Cac Dang
14 pages
The Landscape of R Packages For Automated Exploratory Data Analysis
No ratings yet
The Landscape of R Packages For Automated Exploratory Data Analysis
19 pages
A. Point Estimate D. Describe The Result B. Margin of Error
No ratings yet
A. Point Estimate D. Describe The Result B. Margin of Error
15 pages
Resource Mobilization and Academic Performance of Public Secondary Schools in Bungoma County, Kenya
No ratings yet
Resource Mobilization and Academic Performance of Public Secondary Schools in Bungoma County, Kenya
7 pages
Eviews Understanding
No ratings yet
Eviews Understanding
23 pages
NJC Sampling Lecture Notes
No ratings yet
NJC Sampling Lecture Notes
24 pages
Theoretical Aspects of Selection For Yield in Stress and Non-Stress Environments 1
No ratings yet
Theoretical Aspects of Selection For Yield in Stress and Non-Stress Environments 1
4 pages
Chapter 5 Homework
No ratings yet
Chapter 5 Homework
7 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
C Language Programming Codes
From Everand
C Language Programming Codes
Durgesh
No ratings yet

DA Lab ANSWERS

Uploaded by

DA Lab ANSWERS

Uploaded by

AD8412 - DATA ANALYTICS LAB

elif ( bmi >= 18.5 and bmi < 24.9):

elif ( bmi >= 24.9 and bmi < 30):

elif ( bmi >=30):

The BMI is 21.64532402096181 so Healthy

output: [18, 25, 22]

names = ["Roger", "Nadal", "Novac", "Andre", "Sarena", "Mariya", "Martina", “KUMAR”]

names=["Roger", "Nadal", "Novac", "Andre", "Sarena", "Mariya", "Martina","Kumar"]

# choose three random sample with replacement to including repetition

sample_list3 = random.choices(names, k=4)

string = "Welcome World"

print("With string:", random.sample(string, 4))

print("With tuple:", random.sample(tuple1, 4))

from numpy.random import randn

from statsmodels.stats.weightstats import ztest

print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))

ztest_Score, p_value= ztest(data,value = null_mean, alternative='larger')

if(p_value < alpha):

print("Reject Null Hypothesis")

print("Fail to Reject NUll Hypothesis")

df=pd.read_csv("paired_ttest - paired_ttest.csv") tscore,pvalue= stats.ttest_rel(df['Brand

print("Failed to reject or do not reject null hypothesis") else:

print("Reject null hypothesis")

# load data file

# reshape the d dataframe suitable for statsmodels package

df_melt = pd.melt(df.reset_index(), id_vars=['index'], value_vars=['A', 'B', 'C', 'D'])

# replace column names

df_melt.columns = ['index', 'treatments', 'value']

# easily detect the differences between different treatments

import matplotlib.pyplot as plt

import seaborn as sns

ax = sns.swarmplot(x="treatments", y="value", data=df_melt)

Generate a regression line for this sample data using Python.

import matplotlib.pyplot as plt

def estimate_coef(x, y):

# mean of x and y vector

# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x

SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx

b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):

plt.scatter(x, y, color = "m",

marker = "o", s = 30)

y_pred = b[0] + b[1]*x

plt.plot(x, y_pred, color = "g")

y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

\nb_1 = {}".format(b[0], b[1]))

import matplotlib.pyplot as plt

from scipy import stats

slope, intercept, r, p, std_err = stats.linregress(x, y)

return slope * x + intercept

mymodel = list(map(myfunc, x))

Refer 12th program

REFER 12th program

import matplotlib.pyplot as plt

def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):

plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)

plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales in Australia from

You might also like

SS_xy = np.sum(yx) - nm_y*m_x

SS_xx = np.sum(xx) - nm_x*m_x