0% found this document useful (0 votes)

12 views23 pages

6.lab Activity

This document outlines a lab experiment focused on introducing machine learning concepts, including statistical measures such as mean, median, mode, standard deviation, and percentiles, as well as data visualization techniques like scatter plots and linear regression. It explains the importance of understanding data types and distributions, and provides practical examples using Python's NumPy and Matplotlib libraries. The document aims to help learners analyze data and predict outcomes through machine learning techniques.

Uploaded by

Azhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views23 pages

6.lab Activity

Uploaded by

Azhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Experiment No.

06
Introduction to Machine Learning, Mean Median Mode, Standard

Deviation, Percentiles, Data Distribution, Normal Data Distribution, Scatter

Plot, & Linear Regression
Objectives:

This lab will guide you through:

1. Understand and calculate mean, median, and mode.

2. Learn how to calculate standard deviation and percentiles.
3. Explore data distributions, with a focus on normal distribution.
4. Visualize data using scatter plots.
5. Implement and understand simple linear regression.
Equipment Required:
A computer with Python installed.

Theory:

Machine Learning
Machine Learning is making the computer learn from studying data and statistics.
Machine Learning is a step into the direction of artificial intelligence (AI).
Machine Learning is a program that analyses data and learns to predict the outcome.
Data Set
In the mind of a computer, a data set is any collection of data. It can be anything from an array to
a complete database.
Example of an array:
[99,86,87,88,111,86,103,87,94,78,77,85,86]
By looking at the array, we can guess that the average value is probably around 80 or 90, and we
are also able to determine the highest value and the lowest value, but what else can we do?
And by looking at the database we can see that the most popular color is white, and the oldest car
is 17 years, but what if we could predict if a car had an AutoPass, just by looking at the other
values?
That is what Machine Learning is for! Analyzing data and predicting the outcome!

In Machine Learning it is common to work with very large data sets. In this lab
we will try to make it as easy as possible to understand the different concepts of
machine learning, and we will work with small easy-to-understand data sets.
Data Types
To analyze data, it is important to know what type of data we are dealing with.
We can split the data types into three main categories:
 Numerical
 Categorical
 Ordinal
Numerical data are numbers, and can be split into two numerical categories:
 Discrete-Data
- counted data that are limited to integers. Example: The number of cars passing by.
 Continuous-Data
- measured data that can be any number. Example: The price of an item, or the size of an
item

Categorical data are values that cannot be measured up against each other. Example: a color
value, or any yes/no values.

Ordinal data are like categorical data, but can be measured up against each other. Example:
school grades where A is better than B and so on.
 By knowing the data type of your data source, you will be able to know what technique to
use when analyzing them.

Mean, Median, and Mode

What can we learn from looking at a group of numbers?
In Machine Learning (and in mathematics) there are often three values that interests us:
 Mean - The average value
 Median - The mid point value
 Mode - The most common value
Example: We have registered the speed of 13 cars:
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
What is the average, the middle, or the most common speed value?

Mean
The mean value is the average value.
To calculate the mean, find the sum of all values, and divide the sum by the number of values:
(99+86+87+88+111+86+103+87+94+78+77+85+86) / 13 = 89.77

Example
Use the NumPy mean() method to find the average speed:

import numpy

speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]

x = numpy.mean(speed)

print(x)

Median
The median value is the value in the middle, after you have sorted all the values:
77, 78, 85, 86, 86, 86, 87, 87, 88, 94, 99, 103, 111

It is important that the numbers are sorted before you can find the median.

Example
Use the NumPy median() method to find the middle value:
import numpy

speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]

x = numpy.median(speed)

print(x)

If there are two numbers in the middle, divide the sum of those numbers by two.

77, 78, 85, 86, 86, 86, 87, 87, 94, 98, 99, 103

(86 + 87) / 2 = 86.5

Example
Using the NumPy module:

import numpy

speed = [99,86,87,88,86,103,87,94,78,77,85,86]

x = numpy.median(speed)

print(x)

Mode

The Mode value is the value that appears the most number of times:

99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86 = 86

The SciPy module has a method for this.

Example
Use the SciPy mode() method to find the number that appears the most:

from scipy import stats

speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]

x = stats.mode(speed)

print(x)
What is Standard Deviation?

Standard deviation is a number that describes how spread out the values are.

A low standard deviation means that most of the numbers are close to the mean (average) value.

A high standard deviation means that the values are spread out over a wider range.

Example: This time we have registered the speed of 7 cars:

speed = [86,87,88,86,87,85,86]

The standard deviation is:

0.9

Meaning that most of the values are within the range of 0.9 from the mean value, which is 86.4.

Let us do the same with a selection of numbers with a wider range:

speed = [32,111,138,28,59,77,97]

The standard deviation is:

37.85

Meaning that most of the values are within the range of 37.85 from the mean value, which is
77.4.

As you can see, a higher standard deviation indicates that the values are spread out over a wider
range.

The NumPy module has a method to calculate the standard deviation:

Example
Use the NumPy std() method to find the standard deviation:

import numpy

speed = [86,87,88,86,87,85,86]

x = numpy.std(speed)
print(x)

Example
import numpy

speed = [32,111,138,28,59,77,97]

x = numpy.std(speed)

print(x)

What are Percentiles?

Percentiles are used in statistics to give you a number that describes the value that a given
percent of the values are lower than.

Example: Let's say we have an array of the ages of all the people that live in a street.

ages = [5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]

What is the 75. percentile? The answer is 43, meaning that 75% of the people are 43 or younger.

The NumPy module has a method for finding the specified percentile:

Example
Use the NumPy percentile() method to find the percentiles:

import numpy

ages = [5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]

x = numpy.percentile(ages, 75)

print(x)

Example
What is the age that 90% of the people are younger than?

import numpy

ages = [5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]
x = numpy.percentile(ages, 90)

print(x)

Data Distribution

Earlier in this lab we have worked with very small amounts of data in our examples, just to
understand the different concepts.

In the real world, the data sets are much bigger, but it can be difficult to gather real world data, at
least at an early stage of a project.

How Can we Get Big Data Sets?

To create big data sets for testing, we use the Python module NumPy, which comes with a
number of methods to create random data sets, of any size.

Example
Create an array containing 250 random floats between 0 and 5:

import numpy

x = numpy.random.uniform(0.0, 5.0, 250)

print(x)

Histogram

To visualize the data set we can draw a histogram with the data we collected.

We will use the Python module Matplotlib to draw a histogram.

Example
Draw a histogram:

import numpy
import matplotlib.pyplot as plt

x = numpy.random.uniform(0.0, 5.0, 250)

plt.hist(x, 5)
plt.show()

Histogram Explained

We use the array from the example above to draw a histogram with 5 bars.

The first bar represents how many values in the array are between 0 and 1.

The second bar represents how many values are between 1 and 2.

Etc.

Which gives us this result:

 52 values are between 0 and 1

 48 values are between 1 and 2

 49 values are between 2 and 3

 51 values are between 3 and 4

 50 values are between 4 and 5

Note: The array values are random numbers and will not show the exact same
result on your computer.

Big Data Distributions

An array containing 250 values is not considered very big, but now you know how to create a
random set of values, and by changing the parameters, you can create the data set as big as you
want.

Example
Create an array with 100000 random numbers, and display them using a
histogram with 100 bars:

import numpy
import matplotlib.pyplot as plt

x = numpy.random.uniform(0.0, 5.0, 100000)

plt.hist(x, 100)
plt.show()

Normal Data Distribution

Now, we have learned how to create a completely random array, of a given size, and between
two given values.

In this section we will learn how to create an array where the values are concentrated around a
given value.

In probability theory this kind of data distribution is known as the normal data distribution, or
the Gaussian data distribution, after the mathematician Carl Friedrich Gauss who came up with
the formula of this data distribution.

Example
A typical normal data distribution:

import numpy
import matplotlib.pyplot as plt

x = numpy.random.normal(5.0, 1.0, 100000)

plt.hist(x, 100)
plt.show()
Note: A normal distribution graph is also known as the bell curve because of it's
characteristic shape of a bell.

Histogram Explained

We use the array from the numpy.random.normal() method, with 100000 values, to draw a
histogram with 100 bars.

We specify that the mean value is 5.0, and the standard deviation is 1.0.

Meaning that the values should be concentrated around 5.0, and rarely further away than 1.0
from the mean.

And as you can see from the histogram, most values are between 4.0 and 6.0, with a top at
approximately 5.0.

Scatter Plot

A scatter plot is a diagram where each value in the data set is represented by a dot.

The Matplotlib module has a method for drawing scatter plots, it needs two arrays of the same
length, one for the values of the x-axis, and one for the values of the y-axis:
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]

y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

The x array represents the age of each car.

The y array represents the speed of each car.

Example
Use the scatter() method to draw a scatter plot diagram:

import matplotlib.pyplot as plt

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

plt.scatter(x, y)
plt.show()
Scatter Plot Explained

The x-axis represents ages, and the y-axis represents speeds.

What we can read from the diagram is that the two fastest cars were both 2 years old, and the
slowest car was 12 years old.

Note: It seems that the newer the car, the faster it drives, but that could be a coincidence, after all
we only registered 13 cars.

Random Data Distributions

In Machine Learning the data sets can contain thousands-, or even millions, of values.

You might not have real world data when you are testing an algorithm, you might have to use
randomly generated values.

As we have learned, the NumPy module can help us with that!

Let us create two arrays that are both filled with 1000 random numbers from a normal data
distribution.

The first array will have the mean set to 5.0 with a standard deviation of 1.0.

The second array will have the mean set to 10.0 with a standard deviation of 2.0:

Example
A scatter plot with 1000 dots:

import numpy
import matplotlib.pyplot as plt

x = numpy.random.normal(5.0, 1.0, 1000)

y = numpy.random.normal(10.0, 2.0, 1000)

plt.scatter(x, y)
plt.show()
Scatter Plot Explained

We can see that the dots are concentrated around the value 5 on the x-axis, and 10 on the y-axis.

We can also see that the spread is wider on the y-axis than on the x-axis.

Regression

The term regression is used when you try to find the relationship between variables.

In Machine Learning, and in statistical modeling, that relationship is used to predict the outcome
of future events.

Linear Regression

Linear regression uses the relationship between the data-points to draw a straight line through all
them.

This line can be used to predict future values.

In Machine Learning, predicting the future is very important.

How Does it Work?

Python has methods for finding a relationship between data-points and to draw a line of linear
regression. We will show you how to use these methods instead of going through the mathematic
formula.

In the example below, the x-axis represents age, and the y-axis represents speed. We have
registered the age and speed of 13 cars as they were passing a tollbooth. Let us see if the data we
collected could be used in a linear regression:

Example
Start by drawing a scatter plot:
import matplotlib.pyplot as plt

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

plt.scatter(x, y)
plt.show()

Example
Import scipy and draw the line of Linear Regression:

import matplotlib.pyplot as plt

from scipy import stats

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

slope, intercept, r, p, std_err = stats.linregress(x, y)

def myfunc(x):
return slope * x + intercept

mymodel = list(map(myfunc, x))

plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()

import matplotlib.pyplot as plt

from scipy import stats

Create the arrays that represent the values of the x and y axis:

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

Execute a method that returns some important key values of Linear Regression:

slope, intercept, r, p, std_err = stats.linregress(x, y)

Create a function that uses the slope and intercept values to return a new value. This new value
represents where on the y-axis the corresponding x value will be placed:

def myfunc(x):
return slope * x + intercept

Run each value of the x array through the function. This will result in a new array with new
values for the y-axis:

mymodel = list(map(myfunc, x))

Draw the original scatter plot:

plt.scatter(x, y)

Draw the line of linear regression:

plt.plot(x, mymodel)

Display the diagram:

plt.show()

R for Relationship

It is important to know how the relationship between the values of the x-axis and the values of
the y-axis is, if there are no relationship the linear regression can not be used to predict anything.

This relationship - the coefficient of correlation - is called r.

The r value ranges from -1 to 1, where 0 means no relationship, and 1 (and -1) means 100%
related.

Python and the Scipy module will compute this value for you, all you have to do is feed it with
the x and y values.

Example
How well does my data fit in a linear regression?

from scipy import stats

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
slope, intercept, r, p, std_err = stats.linregress(x, y)

print(r)

Note: The result -0.76 shows that there is a relationship, not perfect, but it
indicates that we could use linear regression in future predictions.

Predict Future Values

Now we can use the information we have gathered to predict future values.

Example: Let us try to predict the speed of a 10 years old car.

To do so, we need the same myfunc() function from the example above:

def myfunc(x):
return slope * x + intercept

Example
Predict the speed of a 10 years old car:

from scipy import stats

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

slope, intercept, r, p, std_err = stats.linregress(x, y)

def myfunc(x):
return slope * x + intercept

speed = myfunc(10)

print(speed)

The example predicted a speed at 85.6, which we also could read from the diagram:
Bad Fit?

Let us create an example where linear regression would not be the best method to predict future
values.

Example
These values for the x- and y-axis should result in a very bad fit for linear
regression:

import matplotlib.pyplot as plt

from scipy import stats

x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]

slope, intercept, r, p, std_err = stats.linregress(x, y)

def myfunc(x):
return slope * x + intercept

mymodel = list(map(myfunc, x))

plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()

And the r for relationship?

Example
You should get a very low r value.

import numpy
from scipy import stats

x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]

slope, intercept, r, p, std_err = stats.linregress(x, y)

print(r)

The result: 0.013 indicates a very bad relationship, and tells us that this data set is not suitable for
linear regression.
Task 1: Mean, Median, and Mode

1. Create a dataset of 20 random integers between 1 and 100.

2. Calculate and print the mean, median, and mode of the dataset.

Task 2: Standard Deviation and Percentiles

1. Calculate and print the standard deviation of the dataset created in Task 1.

2. Compute and print the 25th, 50th, and 75th percentiles.

Task 3: Data Distribution and Normal Distribution

1. Plot a histogram of the data from Task 1.

2. Generate a new dataset of 1000 values that follows a normal distribution (mean=50,
std_dev=10).

3. Plot a histogram of the normal distribution to see the bell curve.

Task 4: Scatter Plot

1. Generate two sets of random data with 50 points each.

2. Plot these data points on a scatter plot to observe any relationship.

Task 5: Linear Regression

1. Using the x and y datasets from Task 4, perform a linear regression.

2. Plot the scatter plot along with the regression line.

Activity Name 

Group No. 

Student Roll No.


C P Domain +
L L Taxonom
No. O O y Criteria Awarded Score (out of 4 for each cell)
Operational Skills for Anaconda
1 5 5 P3
/Python /Spyder

MLCourse Slides
No ratings yet
MLCourse Slides
356 pages
MLCourse Slides
No ratings yet
MLCourse Slides
427 pages
Mx3ipg2a PDF
No ratings yet
Mx3ipg2a PDF
2 pages
Calculus 1 - Limits - Worksheet 9 - Using The Limit Laws
100% (1)
Calculus 1 - Limits - Worksheet 9 - Using The Limit Laws
15 pages
Unit 2 1
No ratings yet
Unit 2 1
54 pages
ML Course Slides
No ratings yet
ML Course Slides
356 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Machine Learning
No ratings yet
Machine Learning
65 pages
Topics To Be Covered
No ratings yet
Topics To Be Covered
58 pages
Machine Learning: Data Set
100% (1)
Machine Learning: Data Set
52 pages
Modul 7 Praktikum Machine Learning Python
No ratings yet
Modul 7 Praktikum Machine Learning Python
32 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
28 pages
4-Demonstrate The Descriptive Statistics For A Sample Data Like Mean, Median, Variance and Correlation Etc.,-16-12-2024
No ratings yet
4-Demonstrate The Descriptive Statistics For A Sample Data Like Mean, Median, Variance and Correlation Etc.,-16-12-2024
10 pages
Week2 UnderstandingData
No ratings yet
Week2 UnderstandingData
27 pages
Notes 1
No ratings yet
Notes 1
21 pages
Machine Learning
No ratings yet
Machine Learning
22 pages
ML Lab Manual
No ratings yet
ML Lab Manual
27 pages
Principles of AI Laboratory Varshadr
No ratings yet
Principles of AI Laboratory Varshadr
54 pages
Chapter1.2 PythonPandas2
No ratings yet
Chapter1.2 PythonPandas2
38 pages
B Lab Manual Machine Learning SEM-7 CSE 2024
No ratings yet
B Lab Manual Machine Learning SEM-7 CSE 2024
49 pages
5 - Data Summaries and Visualization
No ratings yet
5 - Data Summaries and Visualization
97 pages
Machine Learning: Where To Start?
No ratings yet
Machine Learning: Where To Start?
71 pages
5 - Data Summaries and Visualization
No ratings yet
5 - Data Summaries and Visualization
87 pages
Machine Learning
No ratings yet
Machine Learning
80 pages
Stats Lect
No ratings yet
Stats Lect
77 pages
Linear Regression Example
No ratings yet
Linear Regression Example
26 pages
Risk Management Case Study
100% (1)
Risk Management Case Study
3 pages
ML 3170724 Unit-2
No ratings yet
ML 3170724 Unit-2
40 pages
Shubh Am
No ratings yet
Shubh Am
70 pages
Machine Learning Ess - Week 1-4week
No ratings yet
Machine Learning Ess - Week 1-4week
43 pages
DS Chapter - 2
No ratings yet
DS Chapter - 2
73 pages
Unit 1
No ratings yet
Unit 1
78 pages
FIT1043 - Lecture 3 - 2024
No ratings yet
FIT1043 - Lecture 3 - 2024
69 pages
Lab Plan 5: Statistics and Probability: Describing A Single Set of Data
No ratings yet
Lab Plan 5: Statistics and Probability: Describing A Single Set of Data
19 pages
Data Science - G.scali (Lect1)
No ratings yet
Data Science - G.scali (Lect1)
22 pages
Rahul ML File' (1) 2
No ratings yet
Rahul ML File' (1) 2
30 pages
Machine Learning: Dr. Muhammad Asadullah
No ratings yet
Machine Learning: Dr. Muhammad Asadullah
69 pages
Unit2PreparingtoModelpptx 2023 09 02 14 52 40
No ratings yet
Unit2PreparingtoModelpptx 2023 09 02 14 52 40
43 pages
CHP 2
No ratings yet
CHP 2
52 pages
ML2 Math Algo
No ratings yet
ML2 Math Algo
72 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
ML Course Slides
No ratings yet
ML Course Slides
345 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Notebook Statistics
No ratings yet
Notebook Statistics
6 pages
Data Mining and Predictive Modelling Assignment
No ratings yet
Data Mining and Predictive Modelling Assignment
34 pages
Machine Learning Lab Word 12-1-2025. Document
No ratings yet
Machine Learning Lab Word 12-1-2025. Document
68 pages
Python Tutorial - W3school2 PDF
No ratings yet
Python Tutorial - W3school2 PDF
131 pages
Data Mining Lab Maual Through Python 031023
No ratings yet
Data Mining Lab Maual Through Python 031023
22 pages
Build ETL Using Python
No ratings yet
Build ETL Using Python
7 pages
The Idiomatic Programmer - Statistics Primer
No ratings yet
The Idiomatic Programmer - Statistics Primer
44 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
Matplot Lib Practicals
No ratings yet
Matplot Lib Practicals
24 pages
Arema Mre Chapter 2 2019
100% (1)
Arema Mre Chapter 2 2019
7 pages
Program-1
No ratings yet
Program-1
15 pages
Numpy and Pandas
No ratings yet
Numpy and Pandas
11 pages
Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
No ratings yet
Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
10 pages
Module 1 Overview - of - Statistics
No ratings yet
Module 1 Overview - of - Statistics
11 pages
Nummerical Summaries
No ratings yet
Nummerical Summaries
11 pages
Statistical Analysis: 1 Data Analysis: Mean, Variance, Boxplots
No ratings yet
Statistical Analysis: 1 Data Analysis: Mean, Variance, Boxplots
4 pages
A Project Report: in Partial Fulfillment For The Award of The Degree
No ratings yet
A Project Report: in Partial Fulfillment For The Award of The Degree
50 pages
Measures of Location and Spread
No ratings yet
Measures of Location and Spread
1 page
Forecast Explosion Consumption Setup Steps
No ratings yet
Forecast Explosion Consumption Setup Steps
5 pages
Competency Based Curriculum: Technical Educational and Skills Development Authority
No ratings yet
Competency Based Curriculum: Technical Educational and Skills Development Authority
24 pages
Toshiba 37bv701b Chassis 17mb60 17mb65 Ver.1.00
No ratings yet
Toshiba 37bv701b Chassis 17mb60 17mb65 Ver.1.00
54 pages
SMPTE ST 291-12011 Ancillary-Data-Packet-And-Space-Formatting
100% (1)
SMPTE ST 291-12011 Ancillary-Data-Packet-And-Space-Formatting
17 pages
MB 0044
No ratings yet
MB 0044
8 pages
Project Report On Golden Bricks
No ratings yet
Project Report On Golden Bricks
29 pages
CS311-Computational Structures: Problems, Languages, Machines, Computability, Complexity
No ratings yet
CS311-Computational Structures: Problems, Languages, Machines, Computability, Complexity
51 pages
Ada Worksheet Patterson
No ratings yet
Ada Worksheet Patterson
2 pages
Auto Turn Off Battery Charger
100% (7)
Auto Turn Off Battery Charger
1 page
CD GTU Study Material Presentations Unit-1 27062020072512AM
No ratings yet
CD GTU Study Material Presentations Unit-1 27062020072512AM
41 pages
Holiday Homework
No ratings yet
Holiday Homework
22 pages
Expense Management in D365
No ratings yet
Expense Management in D365
28 pages
Indian Technology Congress Brochure
No ratings yet
Indian Technology Congress Brochure
4 pages
Mba 4 TH Sem Only
No ratings yet
Mba 4 TH Sem Only
29 pages
Bibliography
No ratings yet
Bibliography
3 pages
Internship-Report 2028208
No ratings yet
Internship-Report 2028208
24 pages
Quezon City Polytechnic University
No ratings yet
Quezon City Polytechnic University
13 pages
HFY-Checklist-14!06!04Inspection Checklist FOTE Test（FOTE试验（现场验收试验）
No ratings yet
HFY-Checklist-14!06!04Inspection Checklist FOTE Test（FOTE试验（现场验收试验）
1 page
Kumasi Girls' Senior High School: Personal Records Form
No ratings yet
Kumasi Girls' Senior High School: Personal Records Form
2 pages
Business Plan
No ratings yet
Business Plan
22 pages
Addition of Matrices
No ratings yet
Addition of Matrices
4 pages
Airphoton Satellite Payloads
No ratings yet
Airphoton Satellite Payloads
18 pages
Clustering
No ratings yet
Clustering
32 pages
Sigma Rules in Technical Writing
No ratings yet
Sigma Rules in Technical Writing
4 pages
AI - Lab - Manual - Day2
No ratings yet
AI - Lab - Manual - Day2
25 pages
Chios Victory Equasis
No ratings yet
Chios Victory Equasis
4 pages
Advertisment For Internship 2025N
No ratings yet
Advertisment For Internship 2025N
2 pages
1.lab Activity
No ratings yet
1.lab Activity
5 pages
Loan Default Prediction Using Machine Learning
No ratings yet
Loan Default Prediction Using Machine Learning
8 pages
Crop Yield Prediction Presentation
No ratings yet
Crop Yield Prediction Presentation
8 pages
Ecm Battery
No ratings yet
Ecm Battery
1 page
7.3 Curls
No ratings yet
7.3 Curls
6 pages
Lecture 9-1
No ratings yet
Lecture 9-1
5 pages
Area of Circle (Function Outside The Class Body)
No ratings yet
Area of Circle (Function Outside The Class Body)
1 page
7.4 Stokes Theorem and Onwards
No ratings yet
7.4 Stokes Theorem and Onwards
4 pages
Job Advertisement For Junior Electrical Engineer
No ratings yet
Job Advertisement For Junior Electrical Engineer
2 pages

6.lab Activity

Uploaded by

6.lab Activity

Uploaded by

Experiment No.

Deviation, Percentiles, Data Distribution, Normal Data Distribution, Scatter

This lab will guide you through:

1. Understand and calculate mean, median, and mode.

Mean, Median, and Mode

(86 + 87) / 2 = 86.5

The SciPy module has a method for this.

from scipy import stats

Example: This time we have registered the speed of 7 cars:

The standard deviation is:

Let us do the same with a selection of numbers with a wider range:

The standard deviation is:

The NumPy module has a method to calculate the standard deviation:

What are Percentiles?

How Can we Get Big Data Sets?

x = numpy.random.uniform(0.0, 5.0, 250)

We will use the Python module Matplotlib to draw a histogram.

x = numpy.random.uniform(0.0, 5.0, 250)

Which gives us this result:

 52 values are between 0 and 1

 48 values are between 1 and 2

 49 values are between 2 and 3

 51 values are between 3 and 4

 50 values are between 4 and 5

Big Data Distributions

x = numpy.random.uniform(0.0, 5.0, 100000)

Normal Data Distribution

x = numpy.random.normal(5.0, 1.0, 100000)

The x array represents the age of each car.

The y array represents the speed of each car.

import matplotlib.pyplot as plt

The x-axis represents ages, and the y-axis represents speeds.

Random Data Distributions

As we have learned, the NumPy module can help us with that!

x = numpy.random.normal(5.0, 1.0, 1000)

This line can be used to predict future values.

How Does it Work?

import matplotlib.pyplot as plt

slope, intercept, r, p, std_err = stats.linregress(x, y)

mymodel = list(map(myfunc, x))

import matplotlib.pyplot as plt

slope, intercept, r, p, std_err = stats.linregress(x, y)

mymodel = list(map(myfunc, x))

Draw the original scatter plot:

Draw the line of linear regression:

Display the diagram:

This relationship - the coefficient of correlation - is called r.

from scipy import stats

Predict Future Values

Example: Let us try to predict the speed of a 10 years old car.

from scipy import stats

slope, intercept, r, p, std_err = stats.linregress(x, y)

import matplotlib.pyplot as plt

slope, intercept, r, p, std_err = stats.linregress(x, y)

mymodel = list(map(myfunc, x))

And the r for relationship?

slope, intercept, r, p, std_err = stats.linregress(x, y)

1. Create a dataset of 20 random integers between 1 and 100.

Task 2: Standard Deviation and Percentiles

2. Compute and print the 25th, 50th, and 75th percentiles.

Task 3: Data Distribution and Normal Distribution

1. Plot a histogram of the data from Task 1.

3. Plot a histogram of the normal distribution to see the bell curve.

Task 4: Scatter Plot

1. Generate two sets of random data with 50 points each.

2. Plot these data points on a scatter plot to observe any relationship.

Task 5: Linear Regression

1. Using the x and y datasets from Task 4, perform a linear regression.

2. Plot the scatter plot along with the regression line.

Student Roll No.

You might also like