Data Mining and Predictive Modelling Assignment

*For practice purpose

Uploaded by

rkumar25022000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views34 pages

Data Mining and Predictive Modelling Assignment

*For practice purpose

Uploaded by

rkumar25022000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

DMDW Lab using PYTHON

5th Semester
Department of Computer Science and
Engineering
GIET University, Gunupur
ASSIGNMENT 1
MEASURES OF CENTRAL TENDENCY
It describes distribution of data focusing on
central location around which all other data
are clustered.
MEASURES OF CENTRAL TENDENCY
It attempts to describe set of data by
identifying the central position within which
data is set.
Measure of central tendency:
1. Mean
2. Median
3. Mode
MEAN

Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
MEDIAN
The median is the middle score for a set of data that has been
arranged in order of magnitude.
The median is less affected by outliers and skewed data. In order
to calculate the median, suppose we have the data below
Ex-1) 65 55 89 56 35 14 56 55 87 45 92
We ﬁrst need to rearrange that data into order of magnitude
14 35 45 55 55 56 56 65 87 89 92
Our median mark is the middle mark - in this case is 56
Ex-2) 65 55 89 56 35 14 56 55 87 45

We again rearrange that data into order of magnitude (smallest ﬁrst):

14 35 45 55 55 56 56 65 87 89
Only now we have to take the 5th and 6th score in our data set and
average them to get a median of 55.5.
MODE
The mode is the most frequent score in our data set. On a histogram it
represents the highest bar in a bar chart or histogram in fig-1 .

Fig-1 Fig-2 Fig-3

Normally, the mode is used for categorical data where we wish to know
which is the most common category, as illustrated in fig-2.
However, one of the problems with the mode is that it is not unique, so it
leaves us with problems when we have two or more values that share
the highest frequency, such as fig-3.
SKEWED DISTRIBUTIONS
An example of a normally distributed set of data is presented
below.

•In any symmetrical distribution the mean, median and mode are
equal.
• Mean is widely preferred as the best measure of central tendency
because it is the measure that includes all the values in the data set
for its calculation.
CONTD.
However, when our data is skewed, for example, as with the right-skewed
data set below:

•Median is generally considered to be the best representative of the

central location of the data.
•The more skewed the distribution, the greater the difference between
the median and mean .
•The greater emphasis should be placed on using the median as opposed
to the mean.
SUMMARY OF WHEN TO USE THE MEAN, MEDIAN AND
MODE

Please use the following summary table to know what the best
measure of central tendency is with respect to the different types of
variable.

Best measure of central

Type of Variable
tendency
Nominal Mode
Ordinal Median
Interval/Ratio (not
Mean
skewed)
Interval/Ratio (skewed) Median
VARIANCE AND STANDARD DEVIATION
EXAMPLE
The ages of you and your friends are 25, 26, 27, 30, and 32.
First, we must find the mean age: (25 + 26 + 27 + 30 + 32) / 5 =
28.
Then, we need to calculate the differences from the mean for
each of the 5 friends.
25 – 28 = -3
26 – 28 = -2
27 – 28 = -1
30 – 28 = 2
32 – 28 = 4
Next, to calculate the variance, we take each difference from
the mean, square it, then average the result.
Variance = ( (-3)2 + (-2)2 + (-1)2 + 22 + 42)/ 5
= (9 + 4 + 1 + 4 + 16 ) / 5 = 6.8
Variance is 6.8. Standard deviation is the square root of the
variance, which is 2.61.
PRACTICE-1
Write the python code for following statistical
operations with and without library function:
✔ Mean

✔ Median

✔ Mode

✔ Standard Deviation and

✔ Variance
MEAN WITHOUT LIBRARY FUNCTION

# Mean without using library

n_num = [1, 2, 3, 4, 5]
n = len(n_num)
get_sum = sum(n_num)
mean = get_sum / n
print("Mean / Average is: " + str(mean))
MEDIAN WITHOUT LIBRARY FUNCTION
# Median without using library
n_num = [1, 2, 3, 4, 5]
n = len(n_num)
n_num.sort()
if n % 2 == 0:
median1 = n_num[n//2]
median2 = n_num[n//2 - 1]
median = (median1 + median2)/2
else:
median = n_num[n//2]
print("Median is: " + str(median))
MODE WITHOUT LIBRARY FUNCTION
# Python program to print mode of elements
from collections import Counter
n_num = [1, 2, 3, 4, 5, 5]
n = len(n_num)
data = Counter(n_num)
get_mode = dict(data)
mode = [k for k, v in get_mode.items() if v==
max(list(data.values()))]
if len(mode) == n:
get_mode = "No mode found"
else:
get_mode = "Mode is / are: " + ', '.join(map(str,
mode))
print(get_mode)
MODE WITH LIBRARY FUNCTION
import numpy
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
x = numpy.mean(speed)
y = numpy.median(speed)
s = numpy.std(speed)
v = numpy.var(speed)
print(x)
print(y)
print(s)
print(v)
MODE WITH LIBRARY FUNCTION
from scipy import stats
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
x = stats.mode(speed)
print(x)
ANACONDA PLATFORM
Anaconda Individual Edition is the world's most
popular Python distribution platform with over 20 million users
worldwide.

• Anaconda Navigator is a desktop graphical user interface (GUI) included in

Anaconda distribution that allows users to launch applications and manage
conda packages, environments and channels without using command-line
commands.
Anaconda Installation
Steps
BENEFITS OF USING PYTHON ANACONDA
It is free and open-source
It has more than 1500 Python/R data science packages
It creates an environment that is easily manageable for
deploying any project
Download more than 1500 Python/R data science packages
Manage libraries, dependencies, and environments with
conda
Build and train ML and deep learning models with
scikit-learn, TensorFlow and Theano
Use Dask, NumPy, Pandas and Numba to analyze data
scalably and fast
Perform visualization with Matplotlib, Bokeh, Datashader,
and Holoviews
THE JUPYTER NOTEBOOK
The Jupyter Notebook is an open-source web application that
allows you to create and share documents that contain live code,
equations, visualizations and narrative text.
Uses include: data cleaning and transformation, numerical
simulation, statistical modeling, data visualization, machine
learning, and much more.
INTRODUCTION TO GOOGLE-COLAB
Colaboratory, or 'Colab' for short, allows you to write and execute Python in your
browser, with
✔ Zero configuration required
✔ Free access to GPUs
✔ Easy sharing
Advantages
It performs all the tasks and code that Jupyter Notebook executes, using
Python 2 and 3.
It is THE Google Documents of Code. The notebook can be shared and edited in
real-time by different team members, add comments, see the edition history and go
back to previous versions, like in google docs.
No more Anaconda. It is all cloud-based and it doesn't require any main settings
or installations. If the library that you want to use is not on Colab, just pip it as
usual. Being installed in the virtual environment.
Personalization. Add your own shortcuts, night/light/adaptive - mode, and fonts.
Playground mode. With 2 clicks you can enter open a new notebook that won’t be
saved, and try different code options without affecting your original code.
ASSIGNMENT-1 QUESTION
1. Write a python code for finding mean, median
and mode with and without using library
functions.
2. Write a python code for calculating variance and
standard deviation for the set of elements with
and without using library functions..
3. Practice some basic python programs with List,
Tuple, Dictionary & string.

UCCM2233 - Chp3 Num Descriptive Measures-Wble
No ratings yet
UCCM2233 - Chp3 Num Descriptive Measures-Wble
103 pages
Geometry Dash Editor Guide
67% (3)
Geometry Dash Editor Guide
73 pages
V2-Landas-Toolkit-for-pilot
No ratings yet
V2-Landas-Toolkit-for-pilot
37 pages
Mathematical Analysis
100% (1)
Mathematical Analysis
46 pages
Albert Einstein Biography
100% (4)
Albert Einstein Biography
4 pages
Unit II TYCS DS
No ratings yet
Unit II TYCS DS
176 pages
200170116013_AML
No ratings yet
200170116013_AML
61 pages
200170116029_AML
No ratings yet
200170116029_AML
61 pages
Satellite - Digital Image Processing Geo Informatics
No ratings yet
Satellite - Digital Image Processing Geo Informatics
133 pages
DS Chapter - 2
No ratings yet
DS Chapter - 2
73 pages
Math236_Lecture_2 (1)
No ratings yet
Math236_Lecture_2 (1)
64 pages
Stats Lect
No ratings yet
Stats Lect
77 pages
12. B Lab Manual Machine Learning SEM-7 CSE 2024
No ratings yet
12. B Lab Manual Machine Learning SEM-7 CSE 2024
49 pages
Data Analysis and Visualization EDA
No ratings yet
Data Analysis and Visualization EDA
51 pages
chapter2-statistical analysis
No ratings yet
chapter2-statistical analysis
86 pages
Screenshot 2025-03-29 at 13.17.26
No ratings yet
Screenshot 2025-03-29 at 13.17.26
59 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Module 3 - Branches of Statistics (1)
No ratings yet
Module 3 - Branches of Statistics (1)
50 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
prw questions
No ratings yet
prw questions
31 pages
Data Mining Lab Maual Through Python 031023
No ratings yet
Data Mining Lab Maual Through Python 031023
22 pages
Descriptive Statistics W25
No ratings yet
Descriptive Statistics W25
41 pages
Unit 2 1
No ratings yet
Unit 2 1
54 pages
04-003 Statistics
No ratings yet
04-003 Statistics
14 pages
Shubh Am
No ratings yet
Shubh Am
70 pages
ENDATA130 Data Summarization - Measures of Central Tendency
No ratings yet
ENDATA130 Data Summarization - Measures of Central Tendency
30 pages
Statiscics Part 1
No ratings yet
Statiscics Part 1
28 pages
Lab Plan 5: Statistics and Probability: Describing A Single Set of Data
No ratings yet
Lab Plan 5: Statistics and Probability: Describing A Single Set of Data
19 pages
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
No ratings yet
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
62 pages
Nummerical Summaries
No ratings yet
Nummerical Summaries
11 pages
4-Demonstrate the Descriptive Statistics for a sample data like mean, median, variance and correlation etc.,-16-12-2024
No ratings yet
4-Demonstrate the Descriptive Statistics for a sample data like mean, median, variance and correlation etc.,-16-12-2024
10 pages
MCS Lecture 3
No ratings yet
MCS Lecture 3
57 pages
Data Science-3-Central Tendency
No ratings yet
Data Science-3-Central Tendency
8 pages
Session 12
No ratings yet
Session 12
8 pages
Measures of Central Tendency
100% (1)
Measures of Central Tendency
48 pages
Chapter 3 A
No ratings yet
Chapter 3 A
62 pages
Notebook Statistics
No ratings yet
Notebook Statistics
6 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
stat python
No ratings yet
stat python
4 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
35 pages
1 Measures of Central Tendency
No ratings yet
1 Measures of Central Tendency
32 pages
6.Lab Activity
No ratings yet
6.Lab Activity
23 pages
EXP-1- Statistics and Plotting
No ratings yet
EXP-1- Statistics and Plotting
23 pages
Algebraic Notation Multiplying Terms Presentation in Colourful Hand drawn Style
No ratings yet
Algebraic Notation Multiplying Terms Presentation in Colourful Hand drawn Style
20 pages
Stats For Data Science
No ratings yet
Stats For Data Science
21 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
program-1_
No ratings yet
program-1_
15 pages
Statistical Analysis
No ratings yet
Statistical Analysis
9 pages
DA Practical Lab 02 Statistical Functions
No ratings yet
DA Practical Lab 02 Statistical Functions
6 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
48 pages
2 - measure of centeral tendency
No ratings yet
2 - measure of centeral tendency
3 pages
Summary Statistics
No ratings yet
Summary Statistics
28 pages
Maintenance-Booklet - New Logo
No ratings yet
Maintenance-Booklet - New Logo
47 pages
Week 3_ Review Topic_ Measures of Central Tendency and Dispersion _ NEUVLE (1)
No ratings yet
Week 3_ Review Topic_ Measures of Central Tendency and Dispersion _ NEUVLE (1)
13 pages
Measures of Central Tendency (Mean, Median, Mode)
No ratings yet
Measures of Central Tendency (Mean, Median, Mode)
6 pages
true false
No ratings yet
true false
41 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
9 pages
5-MEASURES of DISPERSION-02-Aug-2019Material I 02-Aug-2019 Exp. No. 1 - Measures of Central Tendency Dispersion Skewness and Kurtosi
No ratings yet
5-MEASURES of DISPERSION-02-Aug-2019Material I 02-Aug-2019 Exp. No. 1 - Measures of Central Tendency Dispersion Skewness and Kurtosi
10 pages
NM_chap_1.2
No ratings yet
NM_chap_1.2
19 pages
Lesson 1 - Measures of Central Tendency
No ratings yet
Lesson 1 - Measures of Central Tendency
4 pages
2018 Book PolymerReactionEngineeringOfDi PDF
No ratings yet
2018 Book PolymerReactionEngineeringOfDi PDF
224 pages
Measures of Location and Spread
No ratings yet
Measures of Location and Spread
1 page
Research Ii: Whole Brain Learning System Outcome-Based Education
No ratings yet
Research Ii: Whole Brain Learning System Outcome-Based Education
16 pages
the-girl-who-can
No ratings yet
the-girl-who-can
8 pages
Data Analytics Ass Group-4 Updated
No ratings yet
Data Analytics Ass Group-4 Updated
7 pages
Final Assisgnment
No ratings yet
Final Assisgnment
61 pages
Consumer Behavior of Pharmacy Customers
100% (1)
Consumer Behavior of Pharmacy Customers
15 pages
LONKAR
No ratings yet
LONKAR
4 pages
Basics of UPS 1724813795
No ratings yet
Basics of UPS 1724813795
6 pages
GCC Case File
No ratings yet
GCC Case File
17 pages
Estmt - 2024 09 17
No ratings yet
Estmt - 2024 09 17
8 pages
Oil Seal Reference Chart
No ratings yet
Oil Seal Reference Chart
1 page
KNN Algorithm: Gnitc Mrs - Sumitra Mallick CSE Dept
No ratings yet
KNN Algorithm: Gnitc Mrs - Sumitra Mallick CSE Dept
12 pages
New Documentary Ecologies Emerging Platforms Pract... - (Introduction New Documentary Ecologies Emerging Platforms Practices An... )
No ratings yet
New Documentary Ecologies Emerging Platforms Pract... - (Introduction New Documentary Ecologies Emerging Platforms Practices An... )
7 pages
The Foundation of Economics
No ratings yet
The Foundation of Economics
11 pages
RP E303
100% (1)
RP E303
28 pages
kanvadee,+ ($userGroup) ,+RJP41+-+24+บทความวิจัย+การพัฒนาสมรรถนะครูด้านการจัดการเรียนรู้ในโรงเรียนสังกัดเทศบาลนครสุราษฎร์ธานี+ +วิชา
No ratings yet
kanvadee,+ ($userGroup) ,+RJP41+-+24+บทความวิจัย+การพัฒนาสมรรถนะครูด้านการจัดการเรียนรู้ในโรงเรียนสังกัดเทศบาลนครสุราษฎร์ธานี+ +วิชา
16 pages
Defense-Grading-Sheet - DESIGN 9
No ratings yet
Defense-Grading-Sheet - DESIGN 9
1 page
Chapter -9 Starting With Libre Office Base Class x q and A
No ratings yet
Chapter -9 Starting With Libre Office Base Class x q and A
2 pages
Dolar - Phrenology of Spirit
No ratings yet
Dolar - Phrenology of Spirit
10 pages
0417 s18 QP 21
No ratings yet
0417 s18 QP 21
12 pages
Performance Appraisal of Akij Group of Industry
No ratings yet
Performance Appraisal of Akij Group of Industry
5 pages
A Project Report On Online Banking
No ratings yet
A Project Report On Online Banking
18 pages
Christian McQueen
100% (1)
Christian McQueen
5 pages
Tomato Production in Powder A Tomato Conservation Technology To Support The Communities and Methodological Proposal For Chemistry Contextualized Education (#334028) - 334027
No ratings yet
Tomato Production in Powder A Tomato Conservation Technology To Support The Communities and Methodological Proposal For Chemistry Contextualized Education (#334028) - 334027
10 pages
3-934584-65-9 VP44
No ratings yet
3-934584-65-9 VP44
4 pages
ESSAY
No ratings yet
ESSAY
1 page
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

Data Mining and Predictive Modelling Assignment

Uploaded by

Data Mining and Predictive Modelling Assignment

Uploaded by

DMDW Lab using PYTHON

We again rearrange that data into order of magnitude (smallest ﬁrst):

Fig-1 Fig-2 Fig-3

•Median is generally considered to be the best representative of the

Best measure of central

✔ Standard Deviation and

# Mean without using library

• Anaconda Navigator is a desktop graphical user interface (GUI) included in

You might also like