0% found this document useful (0 votes)

23 views23 pages

Data Analytics With Python Lecture 1

Uploaded by

Sukant Tekade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views23 pages

Data Analytics With Python Lecture 1

Uploaded by

Sukant Tekade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

WEEK 01

Lecture 1: Introduction to data analytics

Objective of the course
• The principal focus of this course is to introduce conceptual understanding using simple and
practical examples rather than repetitive and point click mentality
• This course should make you comfortable using analytics in your career and your life
• You will know how to work with real data, and might have learned many different
methodologies but choosing the right methodology is important
• The danger in using quantitative method does not generally lie in the inability to perform the
calculation • The real threat is lack of fundamental understanding of:
– Why to use a particular technique of procedure
– How to use it correctly and,
– How to correctly interpret the res

Learning objectives
1. Define data and its importance
2. Define data analytics and its types
3. Explain why analytics is important in today’s business environment
4. Explain how statistics, analytics and data science are interrelated
5. Why python?
6. Explain the four different levels of Data:
– Nominal
– Ordinal
– Interval and
– Ratio
Define Data and its importance
• Variable, Measurement and Data
• What is generating so much data?
• How data add value to the business?
• Why data is important?
1.1 Variable, Measurement and Data
• Variables – is a characteristic of any entity being studied that is capable of taking on different
values
• Measurements – is when a standard process is used to assign numbers to particular attributes
or characteristic of a variable
• Data – data are recorded measurements
1.2 What is generating so much data?
• Data can be generated by
– Humans,
– Machines or
– Humans-machines combines
• It can be generated anywhere where any information is generated and stored in structured or
unstructured formats
1.4 Why Data is important?
• Data helps in make better decisions
• Data helps in solve problems by finding the reason for underperformance
• Data helps one to evaluate the performance.
• Data helps one improve processes
• Data helps one understand consumers and the market 10
2. Define data analytic and its types
• Define data analytics
• Why analytics is important?
• Data analysis
• Data analytics vs. Data analysis
• Types of Data analytics
2.1. Define data analytics
• Analytics is defined as “the scientific process of transforming data into insights for making better
decisions”
• Analytics, is the use of data, information technology, statistical analysis, quantitative methods, and
mathematical or computer-based models to help managers gain improved insight about their
business operations and make better, fact-based decisions – James Evans
• Analysis = Analytics?
2.2 Why analytics is important?
• Opportunity abounds for the use of analytics and big data such as:
1. Determining credit risk
2. Developing new medicines
3. Finding more efficient ways to deliver products and services
4. Preventing fraud
5. Uncovering cyber threats
6. Retaining the most valuable customers
2.3 Data analysis
• Data analysis is the process of examining, transforming, and arranging raw data in a specific way to
generate useful information from it
• Data analysis allows for the evaluation of data through analytical and logical reasoning to lead to
some sort of outcome or conclusion in some context
• Data analysis is a multi-faceted process that involves a number of steps, approaches, and diverse
techniques

Analysis / = Analytics
Data Analysis = Data analytics /
Business Analysis = Business analytics
2.5 Classification of Data analytics
Based on the phase of workflow and the kind of analysis required, there are four major types of data
analytics.
• Descriptive analytics • Diagnostic analytics • Predictive analytics • Prescriptive analytics
Descriptive Analytics
• Descriptive Analytics, is the conventional form of Business Intelligence and data analysis
• It seeks to provide a depiction or “summary view” of facts and figures in an understandable format
• This either inform or prepare data for further analysis
• Descriptive analysis or statistics can summarize raw data and convert it into a form that can be
easily understood by humans
• They can describe in detail about an event that has occurred in the past
Example
A common example of Descriptive Analytics is company reports that simply provide a historic review
like: • Data Queries • Reports • Descriptive Statistics • Data Visualization • Data dashboard
Diagnostic analytics
• Diagnostic Analytics is a form of advanced analytics which examines data or content to answer the
question “Why did it happen?”
• Diagnostic analytical tools aid an analyst to dig deeper into an issue so that they can arrive at the
source of a problem
• In a structured business environment, tools for both descriptive and diagnostic analytics go
parallel
Example
• It uses techniques such as: 1. Data Discovery 2. Data Mining 3. Correlations
Predictive analytics
• Predictive analytics helps to forecast trends based on the current events
• Predicting the probability of an event happening in future or estimating the accurate time it will
happen can all be determined with the help of predictive analytical models
• Many different but co-dependent variables are analysed to predict a trend in this type of analysis

Example • Set of techniques that use model constructed from past data to predict the future or
ascertain impact of one variable on another: 1. Linear regression 2. Time series analysis and
forecasting 3. Data mining

Prescriptive analytics
• Set of techniques to indicate the best course of action
• It tells what decision to make to optimize the outcome
• The goal of prescriptive analytics is to enable: 1. Quality improvements 2. Service enhancements
3. Cost reductions and 4. Increasing productivity
Prescriptive analytics: Example • Optimization Model • Simulation • Decision Analysis
3. Explain why analytics is important
• Demand for Data Analytics • Element of data Analytics

4. Data analyst and Data scientist

• The requisite skill set
• Difference between Data analyst and Data
Scientist
6.Explain the four different levels of Data
• Types of Variables
• Levels of Data Measurement
• Compare the four different levels of Data: Nominal ,Ordinal, Interval and Ratio
• Usage Potential of Various Levels of Data
• Data Level, Operations, and Statistical Methods
6.2 Levels of Data Measurement
• Nominal — Lowest level of
measurement
• Ordinal
• Interval
• Ratio — Highest level of
measurement
6.3.1 Nominal
• A nominal scale classifies data into
distinct categories in which no
ranking is implied
• Example: Gender, Marital Status

6.3.2 Ordinal scale

• An ordinal scale classifies data into distinct categories in which ranking is implied
• Example: – Product satisfaction  Satisfied, Neutral, Unsatisfied – Faculty rank  Professor,
Associate Professor, Assistant Professor – Student Grades  A, B, C, D, F
6.3.3. Interval scale
• An interval scale is an ordered scale in which the difference between measurements is a
meaningful quantity but the measurements do not have a true zero point.
• Example – Temperature in Fahrenheit and Celsius – Y
6.3.4 Ratio scale
• A ratio scale is an ordered scale in which the difference between the measurements is a
meaningful quantity and the measurements have a true zero point.
• Example – Weight – Age – Salary

Lecture 2: Python – Fundamentals

Learning objectives: 1. Installing Python 2. Fundamentals of Python 3. Data Visualisation
Python Installation
Installation Process
Step 1: Type https://www.anaconda.com at the address bar of web browser.
Step 2: Click on download button
Step 3: Download python 3.7 version for windows OS
Step 4: Double click on file to run the application
Step 5: Follow the instructions until completion of installation process
Python Installation Process Installation Process
– Step 1: Type https://www.anaconda.com at the address bar of web browser.

About Jupyter Notebook

• Command mode allow to edit notebook
as whole
• To close edit mode (Press Escape key)
• Execution (Three ways) o Ctrl +Enter
(Output field can not be modified) o Shift
+Enter (Output field is modified) o Run
button on Jupyter interface
• Comment line is written preceding with
# symbol.

About Jupyter Notebook --Important

shortcut keys
• A -> To create cell above
• Y -> For code cell
• B -> To create cell below
• D + D -> For deleting cell
• M -> For markdown cell
Fundamentals of Python
• Loading a simple delimited data file
• Counting how many rows and columns were loaded
• Determining which type of data was loaded • Looking at different parts of the data by subsetting
rows and columns

GET THE NUMBER OF ROWS AND COLUMNS

GET COLUMN NAMES

GET THE DTYPE OF EACH COLUMN PANDAS TYPES VERSUS PYTHON TYPES

Looking At Columns, Rows, and Cells

• get the country column and save it to its own
variable

#Show the first 5 Observations

#Show the last 5 observations # Looking at country, continent, and year

Lecture 3: Python – Fundamentals – II

With iloc, we can pass in the -1 to get the last row—something we couldn’t do with loc.
Subsetting Columns
• The Python slicing syntax uses a colon, :
• If we have just a colon, the attribute refers to everything.
• So, if we just want to get the first column using the loc or iloc syntax, we can write something like
df.loc[:, [columns]] to subset the column(s).

# subset columns with loc # note the position of the colon # it is used to select all rows
Grouped Means
# For each year in our data, what was the average life expectancy?
# To answer this question, # we need to split our data into parts by year;
# then we get the 'lifeExp' column and calculate the mean
Visual Representation of the Data
• Histogram -- vertical bar chart of frequencies
• Frequency Polygon -- line graph of frequencies
• Ogive -- line graph of cumulative frequencies
• Pie Chart -- proportional representation for categories of
a whole
• Stem and Leaf Plot
• Pareto Chart
• Scatter Plot
Principles of Excellent Graphs
• The graph should not distort the data
• The graph should not contain unnecessary
adornments (sometimes referred to as chart junk)
• The scale on the vertical axis should begin at
zero
• All axes should be properly labelled
• The graph should contain a title
• The simplest possible graph should
be used for a given set of data
Lecture 4: Central Tendency and Dispersion
Lecture objectives • Central tendency • Measures of Dispersion

Measures of Central Tendency

• Measures of central tendency yield information about “particular places or locations in a group of
numbers.”
• A single number to describe the characteristics of a set of data
Summary statistics
• Central tendency or measures of location
– Arithmetic mean – Weighted mean – Median – Percentile
• Dispersion
– Skewness – Kurtosis – Range – Interquartile range – Variance – Standard score – Coefficient of
variation
Arithmetic Mean
• Commonly called ‘the mean’
• It is the average of a group of numbers
• Applicable for interval and ratio data
• Not applicable for nominal or ordinal data
• Affected by each value in the data set, including extreme values
• Computed by summing all values in the data set and dividing the sum by the number of values in
the data set
Weighted Average
• Sometimes we wish to average numbers, but we want to assign more importance, or weight, to
some of the numbers.
• The average you need is the weighted average.

Median
• Middle value in an ordered array of numbers
• Applicable for ordinal, interval, and ratio data
• Not applicable for nominal data
• Unaffected by extremely large and extremely small values
Median: Computational Procedure
• First Procedure – Arrange the observations in an ordered array – If there is an odd number of
terms, the median is the middle term of the ordered array – If there is an even number of terms, the
median is the average of the middle two terms
• Second Procedure – The median’s position in an ordered
array is given by (n+1)/2.
Median: Example with an Odd Number of Terms
Ordered Array 3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22
• There are 17 terms in the ordered array.
• Position of median = (n+1)/2 = (17+1)/2 = 9
• The median is the 9th term, 15.
• If the 22 is replaced by 100, the median is 15.
• If the 3 is replaced by -103, the median is 15.
Median: Example with an Even Number of Terms
Ordered Array 3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21
• There are 16 terms in the ordered array
• Position of median = (n+1)/2 = (16+1)/2 = 8.5
• The median is between the 8th and 9th terms, 14.5
• If the 21 is replaced by 100, the median is 14.5
• If the 3 is replaced by -88, the median is 14.5
Mode
• The most frequently occurring value in a data set
• Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio)
• Bimodal -- Data sets that have two modes
• Multimodal -- Data sets that contain more than two modes

Percentiles
• Measures of central tendency that divide a group of data into 100 parts
• Example: 90th percentile indicates that at most 90% of the data lie below it, and at least 10% of
the data lie above it
• The median and the 50th percentile have the same value
• Applicable for ordinal, interval, and ratio data • Not applicable for nominal data
Percentiles: Computational Procedure
• Organize the data into an ascending ordered array
𝑝
• Calculate the pth percentile location: 𝑖 = (𝑛)
100
• Determine the percentile’s location and its value.
• If i is a whole number, the percentile is the average of the values at the i and (i+1) positions
• If i is not a whole number, the percentile is at the (i+1) position in the ordered array 24
Percentiles: Example
• Raw Data: 14, 12, 19, 23, 5, 13, 28, 17
• Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28
30
• Location of 30th percentile: 𝑖 = (8) = 2.4
100
• The location index, i, is not a whole number; i+1 = 2.4+1=3.4; the whole number portion is 3; the
30th percentile is at the 3rd location of the array; the 30th percentile is 13.
Dispersion
• Measures of variability describe the spread or the dispersion of a set of data
• Reliability of measure of central tendency
• To compare dispersion of various samples

Measures of Variability or dispersion

Common Measures of Variability
• Range
• Inter-quartile range
• Mean Absolute Deviation
• Variance
• Standard Deviation
• Z scores
• Coefficient of Variation
Range – ungrouped data
• The difference between the largest and the smallest values in a set of
data
• Simple to compute
• Ignores all data points except the two extremes
• Example: Range = Largest – Smallest = 48 - 35 = 13
Quartiles
• Measures of central tendency that divide a group of data into four
subgroups
• Q1: 25% of the data set is below the first quartile
• Q2: 50% of the data set is below the second quartile
• Q3: 75% of the data set is below the third quartile
• Q1 is equal to the 25th percentile
• Q2 is located at 50th percentile and equals the median
• Q3 is equal to the 75th percentile
• Quartile values are not necessarily members of the data set.

Interquartile Range
• Range of values between the first and
third quartiles
• Range of the “middle half”
• Less influenced by extremes
Interquartile Range Q=Q3-Q1
Uses of Standard Deviation
• Indicator of financial risk
• Quality Control – construction of quality control charts – process capability studies
• Comparing populations – household incomes in two cities – employee absenteeism at two plants
Lecture 5: Central Tendency and Dispersion- II

Coefficient of Variation
• Ratio of the standard deviation to the mean,
expressed as a percentage
• Measurement of relative dispersion

𝜎
𝐶. 𝑉 = (100)
𝜇
Measures of Shape
• Skewness
– Absence of symmetry
– Extreme values in one side of a distribution
• Kurtosis Peakedness of a distribution
– Leptokurtic: high and thin
– Mesokurtic: normal shape
– Platykurtic: flat and spread out
• Box and Whisker Plots
– Graphic display of a distribution
– Reveals skewness

Skewness..
The skewness of a distribution is measured by comparing the relative positions of the mean, median
and mode.
• Distribution is symmetrical
• Mean = Median = Mode
• Distribution skewed right
• Median lies between mode and mean, and mode is
less than mean
• Distribution skewed left
• Median lies between mode and mean, and mode is
greater than mean
Box and Whisker Plot
• Five specific values are used:
– Median, Q2
– First quartile, Q1
– Third quartile, Q3
– Minimum value in the data set
– Maximum value in the data set

MS Word Tools and Function
No ratings yet
MS Word Tools and Function
29 pages
The Basics of Data Analytics
88% (8)
The Basics of Data Analytics
17 pages
Ca 1 Merged
No ratings yet
Ca 1 Merged
677 pages
Lecture 1,2&3
No ratings yet
Lecture 1,2&3
80 pages
Unit I (Notes 2)
No ratings yet
Unit I (Notes 2)
16 pages
Business Analytics
No ratings yet
Business Analytics
33 pages
Big Data Chapter 2
No ratings yet
Big Data Chapter 2
62 pages
Module 5
No ratings yet
Module 5
14 pages
Data Science Introduction
100% (1)
Data Science Introduction
54 pages
Big Data and Analytics
No ratings yet
Big Data and Analytics
86 pages
Data Analytics And Business Intelligence NOTES (1)
No ratings yet
Data Analytics And Business Intelligence NOTES (1)
37 pages
Bridging Blaze Lbolytc Finals Reviewer
No ratings yet
Bridging Blaze Lbolytc Finals Reviewer
33 pages
Unit 4 Part-1
No ratings yet
Unit 4 Part-1
17 pages
Data Analytics
No ratings yet
Data Analytics
36 pages
UNITWISE-IMP-NOTES
No ratings yet
UNITWISE-IMP-NOTES
34 pages
Lec_1_ABA
No ratings yet
Lec_1_ABA
19 pages
LBYACST [Lecture Notes] (3)
No ratings yet
LBYACST [Lecture Notes] (3)
9 pages
1- Introduction
No ratings yet
1- Introduction
41 pages
1-1
No ratings yet
1-1
24 pages
Data Analytics v1.2 for DA only
No ratings yet
Data Analytics v1.2 for DA only
41 pages
chapter-1 Introduction to Data Analytics
No ratings yet
chapter-1 Introduction to Data Analytics
34 pages
Business Analytics Theory Exam Notes
No ratings yet
Business Analytics Theory Exam Notes
61 pages
Unit 1
No ratings yet
Unit 1
30 pages
8TypesofDataAnalysisBuiltIn_1736859875544
No ratings yet
8TypesofDataAnalysisBuiltIn_1736859875544
10 pages
AA MDM MST
No ratings yet
AA MDM MST
8 pages
Java Ninja
No ratings yet
Java Ninja
1 page
Week 1
No ratings yet
Week 1
50 pages
DA-Unit-2-Trio-1
No ratings yet
DA-Unit-2-Trio-1
26 pages
Approaches in data analysis [Slides] [Re-brand]
No ratings yet
Approaches in data analysis [Slides] [Re-brand]
13 pages
u1 c clsrm
No ratings yet
u1 c clsrm
30 pages
Chapter 1 BA
No ratings yet
Chapter 1 BA
17 pages
Intro to Business Analytics
No ratings yet
Intro to Business Analytics
27 pages
1 Introduction to Data Analytics
No ratings yet
1 Introduction to Data Analytics
14 pages
UNIT 1 - INTRODUCTION ( DATA ANALYTICS AND BIG DATA )_60515294_2025_05_15_17_42
No ratings yet
UNIT 1 - INTRODUCTION ( DATA ANALYTICS AND BIG DATA )_60515294_2025_05_15_17_42
25 pages
Experiment No 1 Study of The Basics of Data Science and Data Analytics
No ratings yet
Experiment No 1 Study of The Basics of Data Science and Data Analytics
5 pages
Data Analytics
No ratings yet
Data Analytics
16 pages
Introduction to Data Analytics
No ratings yet
Introduction to Data Analytics
19 pages
Data Analytics For Beginners - Paul Kinley - CreateSpace Independent Publishing Platform 2016 - IsBN 978-1-53989-673-9
100% (2)
Data Analytics For Beginners - Paul Kinley - CreateSpace Independent Publishing Platform 2016 - IsBN 978-1-53989-673-9
51 pages
What Are Data Analytics
No ratings yet
What Are Data Analytics
19 pages
Unit-3 DS
No ratings yet
Unit-3 DS
21 pages
Data Analytics
No ratings yet
Data Analytics
5 pages
BA_CH01
No ratings yet
BA_CH01
14 pages
Data analytics_1
No ratings yet
Data analytics_1
21 pages
Kingword
No ratings yet
Kingword
11 pages
Data Analytics - Notes
No ratings yet
Data Analytics - Notes
1 page
Enache 1
No ratings yet
Enache 1
6 pages
Note On Data Analytics
No ratings yet
Note On Data Analytics
21 pages
Ba Unit 1a
No ratings yet
Ba Unit 1a
18 pages
Business Analytics Summary (Units 1.2 - 1.8)
No ratings yet
Business Analytics Summary (Units 1.2 - 1.8)
8 pages
2.1_Data_Analytics[1]
No ratings yet
2.1_Data_Analytics[1]
16 pages
Unit 1 Topic 1 Intro
No ratings yet
Unit 1 Topic 1 Intro
30 pages
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
No ratings yet
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
28 pages
DMC1937
No ratings yet
DMC1937
144 pages
9404
No ratings yet
9404
54 pages
Deployment Diagram
No ratings yet
Deployment Diagram
6 pages
MODULE-2
No ratings yet
MODULE-2
18 pages
ITGY403 Lesson 1
No ratings yet
ITGY403 Lesson 1
16 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
16 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
1 Introduction
No ratings yet
1 Introduction
9 pages
Adapt-Rc 2010 User Manual
No ratings yet
Adapt-Rc 2010 User Manual
161 pages
English Sample Exam BCMF 201606
50% (2)
English Sample Exam BCMF 201606
48 pages
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
No ratings yet
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
8 pages
Subtractionusing 1 Scomplement
No ratings yet
Subtractionusing 1 Scomplement
28 pages
Data Analytics 1
No ratings yet
Data Analytics 1
3 pages
Data Analytics and Reporting - Notes Unit 1 and 2
No ratings yet
Data Analytics and Reporting - Notes Unit 1 and 2
11 pages
Cit 427 Database Systems and Management
100% (1)
Cit 427 Database Systems and Management
171 pages
Untitled Document-1
No ratings yet
Untitled Document-1
3 pages
Information and Communication Technology: Grade 6
No ratings yet
Information and Communication Technology: Grade 6
82 pages
Project Tracker Template
No ratings yet
Project Tracker Template
9 pages
Jaggaer - UpdatedBuyer User Guides - Technical Evaluation V1
No ratings yet
Jaggaer - UpdatedBuyer User Guides - Technical Evaluation V1
10 pages
Linux Commands Cheatsheet V1.01
No ratings yet
Linux Commands Cheatsheet V1.01
31 pages
Computer POST and Beep Codes
No ratings yet
Computer POST and Beep Codes
20 pages
Assignment No.6 1
No ratings yet
Assignment No.6 1
5 pages
3 PH Connection Types 1519892309
No ratings yet
3 PH Connection Types 1519892309
18 pages
W100 Manual en
No ratings yet
W100 Manual en
44 pages
Untitled Document
No ratings yet
Untitled Document
3 pages
Computer Communications and Networks: FTP, SMTP (Comparison With HTTP), POP3, DNS-The Internet's Directory Service
No ratings yet
Computer Communications and Networks: FTP, SMTP (Comparison With HTTP), POP3, DNS-The Internet's Directory Service
30 pages
CHN Group 5 - BSN 2-2
No ratings yet
CHN Group 5 - BSN 2-2
33 pages
Nonlinear Functions
No ratings yet
Nonlinear Functions
27 pages
AP Repeater On Dlink Dir-300 - DD-WRT Wiki
No ratings yet
AP Repeater On Dlink Dir-300 - DD-WRT Wiki
2 pages
! Diet Problem Given in The Note : Model Title
No ratings yet
! Diet Problem Given in The Note : Model Title
2 pages
Blog BL-21 PDF
No ratings yet
Blog BL-21 PDF
2 pages
HimanshuSahu InternshalaResume
No ratings yet
HimanshuSahu InternshalaResume
2 pages
Peer Feedback Questions For Student Websites Barney
No ratings yet
Peer Feedback Questions For Student Websites Barney
3 pages
Small and Midsize Companies (Having Up To 2,500 Employees) With Stable Processes and That Want To Use A Preconfigured System From SAP." True
No ratings yet
Small and Midsize Companies (Having Up To 2,500 Employees) With Stable Processes and That Want To Use A Preconfigured System From SAP." True
19 pages
Cemont 130
No ratings yet
Cemont 130
10 pages
Page 23 PDF
No ratings yet
Page 23 PDF
1 page
JSP Interview Questions and Answers
No ratings yet
JSP Interview Questions and Answers
7 pages
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet

Data Analytics With Python Lecture 1

Uploaded by

Data Analytics With Python Lecture 1

Uploaded by

WEEK 01

Lecture 1: Introduction to data analytics

4. Data analyst and Data scientist

6.3.2 Ordinal scale

Lecture 2: Python – Fundamentals

About Jupyter Notebook

About Jupyter Notebook --Important

GET THE NUMBER OF ROWS AND COLUMNS

GET COLUMN NAMES

Looking At Columns, Rows, and Cells

#Show the first 5 Observations

Lecture 3: Python – Fundamentals – II

Measures of Central Tendency

Measures of Variability or dispersion

You might also like