0% found this document useful (0 votes)

29 views8 pages

Data Science Lab 5

Uploaded by

Tayyaba Faisal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views8 pages

Data Science Lab 5

Uploaded by

Tayyaba Faisal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Bahria University, Islamabad Campus

Department of Computer Science

CSL487: Introduction to Data Science Lab

Class: BSCS-6A

Lab 5: Prediction of data in Python

Date: 5-3-2020

Time: 8.30 AM-11:00AM

Instructor: Tayyaba Faisal

Bahria University, Islamabad Campus
Department of Computer Science

Table of Content

s
Features Data Type............................................................................................................3
Scikit-learn.................................................................................................................................4
Linear Regression.......................................................................................................................4
How to Find the Regression Equation..........................................................................................................4
How to Use the Regression Equation...........................................................................................................6
How to Find the Coefficient of Determination.............................................................................................6
K Means.....................................................................................................................................7
LAB TASKS..................................................................................................................................8
Bahria University, Islamabad Campus
Department of Computer Science
Lab 5: Prediction of data in Python

Introduction

The purpose of this lab is to get familiar with Data Science by Python. In this lab we explore
prediction techniques on data in Python, using examples. I encourage you to type all python
commands your own machine.

Tools/Software Requirement
Python, Jupyter Notebook

Note: Comment your program.

Features Data Type

There are four basic type of data:
1. Numeric

Data with Numeric data type

2. Nominal

categories, states, or “names of things”

Hair_color = {auburn, black, blond, brown, grey, red, white}, occupation, ID
numbers, zip codes
3. Binary

Nominal attribute with only 2 states (0 and 1)

e.g., gender, medical test (positive vs. negative)
4. Ordinal

Values have a meaningful order (ranking)

Size = {small, medium, large}, grades, army rankings

Scikit-learn
Scikit-learn is probably the most useful library for machine learning in Python. It is on
NumPy, SciPy and matplotlib, this library contains a lot of effiecient tools for
machine learning and statistical modeling including classification, regression,
clustering and dimensionality reduction

Linear Regression

Linear regression is a predictive modeling technique. It is used whenever there is a linear relation
between the dependent and the independent variables.

Y = b0 + b1* x
Bahria University, Islamabad Campus
Department of Computer Science
It is used in estimating exactly how much of y will change, when x changes a certain amount.

As we see in the picture, a flower’s sepal length is mapped onto the x-axis and the petal length is
mapped on the y-axis.

How to Find the Regression Equation

In the table below, the xi column shows scores on the aptitude test. Similarly, the yi column
shows statistics grades. The last two columns show deviations scores - the difference between the
student's score and the average score on each test. The last two rows show sums and mean scores
that we will use to conduct the regression analysis.

Student xi yi (xi-x) (yi-y)

1 95 85 17 8
2 85 95 7 18
3 80 70 2 -7
4 70 65 -8 -12
5 60 70 -18 -7
Sum 390 385
Mean 78 77

And for each student, we also need to compute the squares of the deviation scores (the last two
columns in the table below).

Student xi yi (xi-x)2 (yi-y)2

1 95 85 289 64
2 85 95 49 324
3 80 70 4 49
4 70 65 64 144
5 60 70 324 49
Sum 390 385 730 630
Mean 78 77
Bahria University, Islamabad Campus
Department of Computer Science
And finally, for each student, we need to compute the product of the deviation scores.

Student xi yi (xi-x)(yi-y)
1 95 85 136
2 85 95 126
3 80 70 -14
4 70 65 96
5 60 70 126
Sum 390 385 470
Mean 78 77

The regression equation is a linear equation of the form: ŷ = b0 + b1x . To conduct a regression
analysis, we need to solve for b0 and b1. Computations are shown below. Notice that all of our
inputs for the regression analysis come from the above three tables.

First, we solve for the regression coefficient (b1):

b1 = Σ [ (xi - x)(yi - y) ] / Σ [ (xi - x)2]

b1 = 470/730

b1 = 0.644

Once we know the value of the regression coefficient (b1), we can solve for the regression slope
(b0):

b0 = y - b1 * x

b0 = 77 - (0.644)(78)

b0 = 26.768

Therefore, the regression equation is: ŷ = 26.768 + 0.644x .

How to Use the Regression Equation

Once you have the regression equation, using it is a snap. Choose a value for the independent
variable (x), perform the computation, and you have an estimated value (ŷ) for the dependent
variable.

In our example, the independent variable is the student's score on the aptitude test. The dependent
variable is the student's statistics grade. If a student made an 80 on the aptitude test, the estimated
statistics grade (ŷ) would be:

ŷ = b0 + b1x

ŷ = 26.768 + 0.644x = 26.768 + 0.644 * 80

ŷ = 26.768 + 51.52 = 78.288

Bahria University, Islamabad Campus
Department of Computer Science
When you use a regression equation, do not use values for the independent variable that are
outside the range of values used to create the equation. That is called extrapolation, and it can
produce unreasonable estimates.

In this example, the aptitude test scores used to create the regression equation ranged from 60 to
95. Therefore, only use values inside that range to estimate statistics grades. Using values outside
that range (less than 60 or greater than 95) is problematic.

How to Find the Coefficient of Determination

Whenever you use a regression equation, you should ask how well the equation fits the data. One
way to assess fit is to check the coefficient of determination, which can be computed from the
following formula.

R2 = { ( 1 / N ) * Σ [ (xi - x) * (yi - y) ] / (σx * σy ) }2

where N is the number of observations used to fit the model, Σ is the summation symbol, xi is the
x value for observation i, x is the mean x value, yi is the y value for observation i, y is the mean y
value, σx is the standard deviation of x, and σy is the standard deviation of y.

Computations for the sample problem of this lesson are shown below. We begin by computing
the standard deviation of x (σx):

σx = sqrt [ Σ ( xi - x )2 / N ]

σx = sqrt( 730/5 ) = sqrt(146) = 12.083

Next, we find the standard deviation of y, (σy):

σy = sqrt [ Σ ( yi - y )2 / N ]

σy = sqrt( 630/5 ) = sqrt(126) = 11.225

And finally, we compute the coefficient of determination (R2):

R2 = { ( 1 / N ) * Σ [ (xi - x) * (yi - y) ] / (σx * σy ) }2

R2 = [ ( 1/5 ) * 470 / ( 12.083 * 11.225 ) ]2

R2 = ( 94 / 135.632 )2 = ( 0.693 )2 = 0.48

A coefficient of determination equal to 0.48 indicates that about 48% of the variation in statistics
grades (the dependent variable) can be explained by the relationship to math aptitude scores
(the independent variable). This would be considered a good fit to the data, in the sense that it
would substantially improve an educator's ability to predict student performance in statistics
class.

K Means
KNN can be summarized as below:
Bahria University, Islamabad Campus
Department of Computer Science
 Initialisation – K initial “means” (centroids) are generated at random
 Assignment – K clusters are created by associating each observation with the nearest
centroid
 Update – The centroid of the clusters becomes the new mean

classifying data using the K-Means algorithm with python. As always, we need to start by
importing the required libraries.
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.datasets.samples_generator import make_blobs
from sklearn.cluster import Kmeans

In this tutorial, we’ll generate our own data using the make_blobs function from
the sklearn.datasets module. The centers parameter specifies the number of clusters.
X, y = make_blobs(n_samples=300, centers=4, cluster_std=0.60,
random_state=0)plt.scatter(X[:,0], X[:,1])

LAB TASKS

Task 1
Write data types of bigsales.csv from above mentioned types
Feature Name Data type
Bahria University, Islamabad Campus
Department of Computer Science

Task 2
Housing dataset contains information collected by the U.S Census Service concerning
housing in the area of Boston Mass. There are 506 samples and 13 feature variables in this
dataset. The objective is to predict the value of prices of the house using the given features
by Linear regression.Following features will be considered for regression
LSTAT: Percentage of lower status of the population
MEDV: Median value of owner-occupied homes in $1000s

Task 2

Import bigmart-sales dataset (download from piazza)

 Cluster data by k means

Deliverables: Submit Python files as zip archive before the next lab along with lab journal.

Applied Statistics With Python
100% (1)
Applied Statistics With Python
320 pages
I Am Discourses 01 ST Germain
100% (5)
I Am Discourses 01 ST Germain
7 pages
Regression Analysis Assignment
100% (1)
Regression Analysis Assignment
8 pages
Regression Analysis Assignment
No ratings yet
Regression Analysis Assignment
8 pages
Simple Linear Regression Example
100% (1)
Simple Linear Regression Example
3 pages
Calculations of Regression Equation
No ratings yet
Calculations of Regression Equation
3 pages
INT354 - Unit 4
No ratings yet
INT354 - Unit 4
50 pages
Regression
No ratings yet
Regression
9 pages
CS601 - Machine Learning - Unit 1 - Regression
No ratings yet
CS601 - Machine Learning - Unit 1 - Regression
11 pages
Regression Analysis Assignment
No ratings yet
Regression Analysis Assignment
8 pages
ML Assignment No. 1: 1.1 Title
No ratings yet
ML Assignment No. 1: 1.1 Title
8 pages
Lecture 16 Regression
No ratings yet
Lecture 16 Regression
30 pages
Student Xi Yi 1 95 85 2 85 95 3 80 70 4 70 65 5 60 70: Student X y (X - XM) (Y - Y)
No ratings yet
Student Xi Yi 1 95 85 2 85 95 3 80 70 4 70 65 5 60 70: Student X y (X - XM) (Y - Y)
3 pages
Topics: Regression
No ratings yet
Topics: Regression
26 pages
AP Statistics Tutorial
No ratings yet
AP Statistics Tutorial
3 pages
5 Bivariate Data. Double The Data, Double The Fun: 5.1 Covariance and Correlation
No ratings yet
5 Bivariate Data. Double The Data, Double The Fun: 5.1 Covariance and Correlation
10 pages
Regressi On
No ratings yet
Regressi On
16 pages
CORE Stat and Prob Q4 Mod20 W10 Solving Problems Involving Regression Analysis
No ratings yet
CORE Stat and Prob Q4 Mod20 W10 Solving Problems Involving Regression Analysis
19 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
LP-III Lab Manual
No ratings yet
LP-III Lab Manual
49 pages
Answers For Homework #2: 1 Theoretical Exercises
No ratings yet
Answers For Homework #2: 1 Theoretical Exercises
7 pages
Ps Answers 2 Marks
No ratings yet
Ps Answers 2 Marks
20 pages
4 Regression
No ratings yet
4 Regression
24 pages
MathEng5-M - Part 5
No ratings yet
MathEng5-M - Part 5
53 pages
Chapter 5. Regression Models: 1 A Simple Model
No ratings yet
Chapter 5. Regression Models: 1 A Simple Model
49 pages
Cl-Vii Ass2 4301063
No ratings yet
Cl-Vii Ass2 4301063
5 pages
Presentation of Statistics
No ratings yet
Presentation of Statistics
21 pages
Fall 2023 Statistics by MR - Ali
No ratings yet
Fall 2023 Statistics by MR - Ali
3 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Regression Primer 02
No ratings yet
Regression Primer 02
18 pages
AP Statistics Portfolio Q2
No ratings yet
AP Statistics Portfolio Q2
17 pages
Chapter 12
No ratings yet
Chapter 12
48 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
06 Regression
No ratings yet
06 Regression
18 pages
6.3 SSK5210 Parametric Statistical Testing - Analysis of Variance LR and Correlation - 2
No ratings yet
6.3 SSK5210 Parametric Statistical Testing - Analysis of Variance LR and Correlation - 2
39 pages
Elements of Statistics and Probability STA 201 S M Rajib Hossain MNS, BRAC University Lecture-8
No ratings yet
Elements of Statistics and Probability STA 201 S M Rajib Hossain MNS, BRAC University Lecture-8
6 pages
Chapter 5.3-Mulitple Linear Regression
No ratings yet
Chapter 5.3-Mulitple Linear Regression
26 pages
Module 6
No ratings yet
Module 6
4 pages
Stats and Probability
No ratings yet
Stats and Probability
13 pages
Regression 101
No ratings yet
Regression 101
18 pages
Regression
No ratings yet
Regression
9 pages
ML Unit-2
No ratings yet
ML Unit-2
138 pages
Lecture Notes Statistics
100% (2)
Lecture Notes Statistics
117 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
IST ChiSquare Sample
No ratings yet
IST ChiSquare Sample
3 pages
Linear Regression With Python
No ratings yet
Linear Regression With Python
140 pages
Lecture 16: Polynomial and Categorical Regression 1 Review
No ratings yet
Lecture 16: Polynomial and Categorical Regression 1 Review
10 pages
Mindanao State University General Santos City: Simple Linear Regression
No ratings yet
Mindanao State University General Santos City: Simple Linear Regression
12 pages
Linear Regresion
No ratings yet
Linear Regresion
28 pages
Linear Regression II
No ratings yet
Linear Regression II
54 pages
Unit 5
No ratings yet
Unit 5
21 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
Correlation & Regression
No ratings yet
Correlation & Regression
65 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
Core Concepts in Real Analysis
From Everand
Core Concepts in Real Analysis
Roshan Trivedi
No ratings yet
IGNOU BCA Computer Oriented Numerical Technique Previous Year Unsolved Papers BCS 054
From Everand
IGNOU BCA Computer Oriented Numerical Technique Previous Year Unsolved Papers BCS 054
Manish Soni
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Mathematical Chess
From Everand
Mathematical Chess
Dr George Ho
No ratings yet
Dive Into Algorithms: A Pythonic Adventure for the Intrepid Beginner
From Everand
Dive Into Algorithms: A Pythonic Adventure for the Intrepid Beginner
Bradford Tuckfield
No ratings yet
Comprehensive Linear Algebra
From Everand
Comprehensive Linear Algebra
Kartikeya Dutta
No ratings yet
Linear Algebra Fundamentals
From Everand
Linear Algebra Fundamentals
Kartikeya Dutta
No ratings yet
Chapter No. 17
No ratings yet
Chapter No. 17
33 pages
Chapter No. 9
No ratings yet
Chapter No. 9
19 pages
Chapter No. 16
No ratings yet
Chapter No. 16
27 pages
IR-19 Asgmnt02 PDF
No ratings yet
IR-19 Asgmnt02 PDF
1 page
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
64 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
NUST National University of Sciences and Technology (NUST) School of Electrical Engineering and Computer Science (SEECS)
No ratings yet
NUST National University of Sciences and Technology (NUST) School of Electrical Engineering and Computer Science (SEECS)
1 page
Lesson 3.6 - Supervised Learning Neural Networks PDF
No ratings yet
Lesson 3.6 - Supervised Learning Neural Networks PDF
97 pages
Lesson 4.1 - Unsupervised Learning Partitioning Methods
No ratings yet
Lesson 4.1 - Unsupervised Learning Partitioning Methods
32 pages
Lesson 3.3 - Supervised Learning Rule Based Classification
No ratings yet
Lesson 3.3 - Supervised Learning Rule Based Classification
43 pages
Lesson 2.2 - Frequent Pattern Analysis
No ratings yet
Lesson 2.2 - Frequent Pattern Analysis
54 pages
Lesson 2.1 - Know Your Data PDF
No ratings yet
Lesson 2.1 - Know Your Data PDF
43 pages
Lesson 3.6 - Supervised Learning Neural Networks
No ratings yet
Lesson 3.6 - Supervised Learning Neural Networks
35 pages
Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF
No ratings yet
Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF
41 pages
Lesson 3.2 - Supervised Learning Evaluation PDF
No ratings yet
Lesson 3.2 - Supervised Learning Evaluation PDF
38 pages
Lesson 3.1 - Supervised Learning Decision Trees
No ratings yet
Lesson 3.1 - Supervised Learning Decision Trees
51 pages
Lesson 3.2 - Supervised Learning Evaluation
No ratings yet
Lesson 3.2 - Supervised Learning Evaluation
31 pages
PT650M Weighing Display Controller (English Version)
0% (1)
PT650M Weighing Display Controller (English Version)
9 pages
09 Zechariah The Prophet PS
No ratings yet
09 Zechariah The Prophet PS
10 pages
Oracle DBA Syllabus
No ratings yet
Oracle DBA Syllabus
7 pages
Question - Quora
No ratings yet
Question - Quora
24 pages
David Ellen - The Scientific Examination of Documents - Methods and Techniques - Methods and Techniques-Taylor & Francis (2014)
No ratings yet
David Ellen - The Scientific Examination of Documents - Methods and Techniques - Methods and Techniques-Taylor & Francis (2014)
189 pages
English Lesson N3
No ratings yet
English Lesson N3
9 pages
English Tenses of Grammar
No ratings yet
English Tenses of Grammar
5 pages
Grade Wise Subject Teacher
No ratings yet
Grade Wise Subject Teacher
11 pages
Ralph Rosen-Making Mockery - The Poetics of Ancient Satire (Classical Culture and Society) (2007)
100% (1)
Ralph Rosen-Making Mockery - The Poetics of Ancient Satire (Classical Culture and Society) (2007)
311 pages
Detailed Lesson Plan
No ratings yet
Detailed Lesson Plan
13 pages
The Impact of Listening To Short Stories On Comprehension
No ratings yet
The Impact of Listening To Short Stories On Comprehension
53 pages
Notas Clase Top Notch 1
No ratings yet
Notas Clase Top Notch 1
57 pages
NATO Phonetic Alphabet 2015 NGL
No ratings yet
NATO Phonetic Alphabet 2015 NGL
1 page
TIBCO Ems Commands
No ratings yet
TIBCO Ems Commands
4 pages
Theis Topics - MA in Teaching English As A Foreign Language
No ratings yet
Theis Topics - MA in Teaching English As A Foreign Language
14 pages
Kajian Sikap Dan Persepsi Terhadap Pembelajaran Bahasa Mandarin Dalam Kalangan Pelajar Uitm Kelantan
No ratings yet
Kajian Sikap Dan Persepsi Terhadap Pembelajaran Bahasa Mandarin Dalam Kalangan Pelajar Uitm Kelantan
16 pages
Quanta G31a Dag31amb6d0 Y61x-6l Rev 1a
No ratings yet
Quanta G31a Dag31amb6d0 Y61x-6l Rev 1a
49 pages
Programming Tools
No ratings yet
Programming Tools
2 pages
Identifying and Remediating Reading Difficulties
No ratings yet
Identifying and Remediating Reading Difficulties
17 pages
Session 1 DA Introduction
No ratings yet
Session 1 DA Introduction
69 pages
NETCONF and YANG Concepts: Presented by Tail-F
No ratings yet
NETCONF and YANG Concepts: Presented by Tail-F
16 pages
Toothpick Patterns SOLUTIONS
No ratings yet
Toothpick Patterns SOLUTIONS
4 pages
I Dedicate My Victory To Palestine' Afaf Raed Sharif, 17, From Palestine
No ratings yet
I Dedicate My Victory To Palestine' Afaf Raed Sharif, 17, From Palestine
2 pages
2016-2017 Q4 Conexiones Culturales
No ratings yet
2016-2017 Q4 Conexiones Culturales
2 pages
Intro To C - Module 3
No ratings yet
Intro To C - Module 3
13 pages
Name: Anisa Suci Ramadhani Student ID Number:: Reading Iv - Assignment 3 Answer Sheet
No ratings yet
Name: Anisa Suci Ramadhani Student ID Number:: Reading Iv - Assignment 3 Answer Sheet
4 pages
Asynchronous Activity (3) Alejandra Avellaneda
No ratings yet
Asynchronous Activity (3) Alejandra Avellaneda
6 pages
Tourist Attractions in Roxas
No ratings yet
Tourist Attractions in Roxas
10 pages
Analyzing The Role of Religion in Promoting Interfaith Dialogue and Peace
No ratings yet
Analyzing The Role of Religion in Promoting Interfaith Dialogue and Peace
12 pages

Data Science Lab 5

Uploaded by

Data Science Lab 5

Uploaded by

Bahria University, Islamabad Campus

Department of Computer Science

Department of Computer Science

CSL487: Introduction to Data Science Lab

Lab 5: Prediction of data in Python

Time: 8.30 AM-11:00AM

Instructor: Tayyaba Faisal

Note: Comment your program.

Features Data Type

Data with Numeric data type

categories, states, or “names of things”

Nominal attribute with only 2 states (0 and 1)

Values have a meaningful order (ranking)

How to Find the Regression Equation

Student xi yi (xi-x) (yi-y)

Student xi yi (xi-x)2 (yi-y)2

First, we solve for the regression coefficient (b1):

b1 = Σ [ (xi - x)(yi - y) ] / Σ [ (xi - x)2]

Therefore, the regression equation is: ŷ = 26.768 + 0.644x .

How to Use the Regression Equation

ŷ = 26.768 + 0.644x = 26.768 + 0.644 * 80

ŷ = 26.768 + 51.52 = 78.288

How to Find the Coefficient of Determination

R2 = { ( 1 / N ) * Σ [ (xi - x) * (yi - y) ] / (σx * σy ) }2

σx = sqrt( 730/5 ) = sqrt(146) = 12.083

Next, we find the standard deviation of y, (σy):

σy = sqrt( 630/5 ) = sqrt(126) = 11.225

And finally, we compute the coefficient of determination (R2):

R2 = { ( 1 / N ) * Σ [ (xi - x) * (yi - y) ] / (σx * σy ) }2

R2 = [ ( 1/5 ) * 470 / ( 12.083 * 11.225 ) ]2

R2 = ( 94 / 135.632 )2 = ( 0.693 )2 = 0.48

Import bigmart-sales dataset (download from piazza)

You might also like