0% found this document useful (0 votes)

16 views31 pages

Data Analysis3

Uploaded by

ericgasper008

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views31 pages

Data Analysis3

Uploaded by

ericgasper008

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

DATA ANALYSIS

• In today’s data-driven world, organizations rely on data analysis to

uncover patterns, trends, and relationships within their data.

The term data analysis refers to the systematic application of statistical

and logical techniques to describe, summarize, and evaluate data.
• This process can involve transforming raw data into a more
understandable format, identifying significant patterns, and drawing
conclusions based on the findings.
• it essentially refers to the practice of examining datasets to draw
conclusions about the information they contain.
• The process of inspecting, cleaning, transforming, and modeling data
to discover useful information, draw conclusions, and support
decision-making.”
Methods of data analysis

•Descriptive analytics answers: ‘What is the current prevalence of HIV across the
country?’

•Diagnostic analytics answers: ‘Why are HIV patients stopping to use medications l?‘

•Predictive analytics answers: 'Which HIV paients are at risk of stopping medications in
near future?‘

•Prescriptive analytics answers: ‘'What actions can be taken to reduce number of HIV
Patients stopping to use medications ?'
• 1. Descriptive analytics
• Descriptive analytics focuses on answering the question, ‘What is
happening?’ or ‘What has happened?’ by analyzing past data.
• Of all the types of data analytics, this is the most straightforward
approach as it summarizes and simplifies the main features and
characteristics of complex datasets through interactive visualizations.
• 2. Predictive analytics
• Predictive analytics uses historical data to answer the question, ‘What
may happen next?’ This model to predict future outcomes, find
patterns, and identify risks or growth opportunities.
• While descriptive analytics serves as a reflective mirror, showing us a
holistic picture of our past activities, predictive analytics acts as a
crystal ball, providing a sneak peek into the future.
• 3. Prescriptive analytics
• Unlike predictive analytics, which focuses on future outcomes,
prescriptive analytics helps decision-makers identify the best course
of action to help them achieve their business goals.
• The primary goal of this model is to answer the question: ‘What
should we do?’
• 4. Diagnostic analytics
• Diagnostic analytics examines past data to identify the root causes
behind a particular outcome. This type of analytics aims to answer the
question, ‘Why did this happen?’
• It focuses on uncovering insights into historical data patterns,
anomalies, and correlations to facilitate a deeper understanding of a
particular business problem.
DATA ENTRY
• Data entry is the process of digitizing data by entering it into a
computer system for organization and management purposes
• Data entry is often done with a keyboard and at times also using a
mouse,[7
• Although most data entered into a computer are stored in a database,
a significant amount is stored in a spreadsheet.[17] The use of
spreadsheets instead of databases for data entry can be traced to the
1979] although a manually-fed scanner may be involved.[8]
TYPES OF DATA ENTRY
• Manual data entry

• This method involve individuals manually entering data using

keyboars , keypads.it is suitable for small data entry tasks or situations
where data is received in physical formats like paper documents
• Online data entry

• This type of data entry involves inputting data directly into online
forms or systems.it is commonly used for tasks such as online surveys
or customer registration.
DATA CLEANING
• Data cleaning refers to a process of fixing or removing incorrect ,
corrupted , incorrectly formatted , duplicate or incomplete data
within data set

• There is no one absolute way to prescribe the exactly steps of data

cleaning process because it will vary from data set to data set
Ways of cleaning data
• 1 Remove duplicate data
• 2 Fix structural errors example incorrect naming
• 3 Handle missing data
• 4. filter unwanted outliers
Characteristics of quality data
• Accuracy
• Completeness
• Consistency
• Uniformity
DATA SUMMARIZATION
• Data summarization refers presenting a compact description of a
dataset. In other words, data summarization is the presentation of a
dataset in an easy, informative, and comprehensive manner

• Data summarization is a meticulously performed summary that is

obtained from the entire data set and will divulge significant patterns
and trends in a clarified manner.
Types of data summarization
1. Based on Centrality
• A data can be summarised on the basis of its centrality. Centrality of a
data describes the centre or middle value of the data set. In other
words, it ascertains one central value around which all other values of
a dataset revolve. The other name for centrality is ‘average.’

• There several ways to find the centrality of a data. However, the most
popular ones are mean, mode and median. These three summarises
the distribution of the dataset.
• Mean
• Mean is used to calculate the numerical average of a dataset.
Arithmetic mean is calculated by adding all the values of the given
dataset and dividing it by the by number of items therein. The
mathematical formula is as follows:
x = ∑x/n
• Here, ‘∑’ represents ‘summation’
‘n’ represents ‘number of items’
• For example: consider the following heights of 10 men in centimeters
(cm): 165, 167, 169, 169, 171, 173, 175, 176, 176, 169

• The mean height is calculated by adding the heights for the ten men
and dividing the sum by 10.
Arithmetic mean = 165 + 167 + 169 + 169 + 171 + 173 + 175 + 176 + 176
+ 169 /10

x̄ = 1710/10 = 171 cm
Mode

• Mode refers to the most recurring value in the sample. In other

words, it refers to the most frequent number of the given dataset.
Mode is comparatively less preferred in statistical analysis.

• Although it can be calculated for any type of sample, but it is mostly

used where the sample size is large or the given values are integers.
• Note that it is possible to have more than one mode. For example: in
the following set of numbers (8, 7, 8, 8, 9, 6, 5, 6, 4, 6, 7) the mode is
both 8 and 6, since each is included in the dataset three times.

• This dataset is referred to as bimodal because it has two modes. • It

is also possible not to have a mode in a set of numbers.

For example: in the following set of numbers (5, 4, 9, 7, 6, 3, 8) there is

no number which occurs more frequently than any other, therefore,
there is no mode.
• Median
• Median refers to the middle value of the series when arranged in
ascending or descending order. When the distribution is normal, the
mean and median tend to coincide.

For example, below is a series of durations (in days) of absence from

classes due to sickness: 1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 6, 6, 6, 7, 8, 10,
10, 38, 80. o The median duration is 5 days.
• . Based on Dispersion
• The term ‘dispersion’ means ‘spread.’ To elaborate, dispersion means
how scattered the sample values are around the mean. It shows the
variability present within the given data
• The Range
Is defined as the difference between the maximum value and the
minimum value. For example: if the lowest and highest of a series of
diastolic blood pressure are 65 mm Hg and 95 mm Hg, then the range =
95-65 = 30 mm Hg.
The range is seldom used in statistical analysis because:
• It wastes information since it uses information from only two extreme
values.
• The two extreme values are more likely to be faulty.
• The range increases with increasing number of observations
Standard Deviation (SD)

• Standard deviation is the most used measure of dispersion. It is used

in normally distributed data and shows how spread the values are
from the mean.

• To rephrase, it shows extra small or extra-large values of the data.

Thus, gives an understanding of how scattered a data is. It is also
known as ‘average deviation’ from mean.
• The formula for SD is
Variance
• The variance represents the amount of spread or variability around
the mean of a set of data.

• Because the variance is in units squared, we find the standard

deviation to describe our data in the proper units.
Tools used in data analysis
• Numerous statistical software systems are available currently. The
commonly used software systems are
• Statistical Package for the Social Sciences (SPSS – manufactured by IBM
corporation),
• Statistical Analysis System ((SAS – developed by SAS Institute North
Carolina, United States of America),
• R (designed by Ross Ihaka and Robert Gentleman from R core team),
• Minitab (developed by Minitab Inc),
• Stata (developed by StataCorp) and the
• MS Excel (developed by Microsoft).
• Briefly explain methods of data analysis with examples

• With regard to data summarization ; explain the types and calculate

SD and variance of the following

sn 1 2 3 4 5 6 7 8
Value 24 34 38 46 47 53 53 61

Tajima TMFX User Manual
14% (7)
Tajima TMFX User Manual
6 pages
Oxford Essential Chemistry Coursebook
100% (5)
Oxford Essential Chemistry Coursebook
286 pages
Fresco Play Course Detail
50% (4)
Fresco Play Course Detail
4 pages
Operating System Lab Manual
No ratings yet
Operating System Lab Manual
58 pages
Important Cyber Law Case Studies
100% (2)
Important Cyber Law Case Studies
14 pages
Topic 8 Data Processing and Analysis PDF
No ratings yet
Topic 8 Data Processing and Analysis PDF
157 pages
Presentation On Data Analysis: Submitted by
No ratings yet
Presentation On Data Analysis: Submitted by
38 pages
Chapter 1 RM
No ratings yet
Chapter 1 RM
44 pages
Data science-Unit-3-Complete
No ratings yet
Data science-Unit-3-Complete
33 pages
Module 5 Research Methodology
No ratings yet
Module 5 Research Methodology
9 pages
Lec 05&06 Data Mining and Data Wherehousing
No ratings yet
Lec 05&06 Data Mining and Data Wherehousing
25 pages
Dsa Report
No ratings yet
Dsa Report
11 pages
Notes On Data Processing, Analysis, Presentation
No ratings yet
Notes On Data Processing, Analysis, Presentation
63 pages
Unit .......
No ratings yet
Unit .......
45 pages
SCS3250A - Module 1 - Introduction To Statistics and Analytics
No ratings yet
SCS3250A - Module 1 - Introduction To Statistics and Analytics
44 pages
SSM & Da All Unit Notes
No ratings yet
SSM & Da All Unit Notes
152 pages
Descriptive Analytics Notes
No ratings yet
Descriptive Analytics Notes
6 pages
Ai - Ssmda
No ratings yet
Ai - Ssmda
142 pages
Unit - II - Part I - Importance of Statistics in Data Science
No ratings yet
Unit - II - Part I - Importance of Statistics in Data Science
10 pages
Advance Statistics For Data Science and Data Analysis
No ratings yet
Advance Statistics For Data Science and Data Analysis
47 pages
Statistics, Statistical Modelling & Data Analytics
No ratings yet
Statistics, Statistical Modelling & Data Analytics
68 pages
BRM Module 5
No ratings yet
BRM Module 5
11 pages
Business Analytics (MIS171) Summary Notes
No ratings yet
Business Analytics (MIS171) Summary Notes
6 pages
MGT 1103
No ratings yet
MGT 1103
4 pages
Qunt Data Coding & Analysis
No ratings yet
Qunt Data Coding & Analysis
104 pages
E-Book On Essentials of Business Analytics: Group 7
No ratings yet
E-Book On Essentials of Business Analytics: Group 7
6 pages
Research Method Lecture Notes
No ratings yet
Research Method Lecture Notes
32 pages
1 Data and Statistics
No ratings yet
1 Data and Statistics
65 pages
Iba Unit - Ii
No ratings yet
Iba Unit - Ii
31 pages
Session 4 Data Analysis
No ratings yet
Session 4 Data Analysis
18 pages
10 Question Answer
No ratings yet
10 Question Answer
2 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Chapter2-Statistical Analysis
No ratings yet
Chapter2-Statistical Analysis
86 pages
1overview On Data Analysis
No ratings yet
1overview On Data Analysis
67 pages
Summarising and Analysing Data
No ratings yet
Summarising and Analysing Data
36 pages
Research Methodology: Result and Analysis (Part 1)
No ratings yet
Research Methodology: Result and Analysis (Part 1)
65 pages
Data Management
No ratings yet
Data Management
57 pages
Nursing Research Methods: PH.D in Nursing
No ratings yet
Nursing Research Methods: PH.D in Nursing
66 pages
Statistics and Analysis Notes
No ratings yet
Statistics and Analysis Notes
8 pages
Mathematics in The Modern World
No ratings yet
Mathematics in The Modern World
13 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
26 pages
DA Major Notes
No ratings yet
DA Major Notes
46 pages
Module 2c - Exploratory Data Analysis
No ratings yet
Module 2c - Exploratory Data Analysis
18 pages
RM Module 3
No ratings yet
RM Module 3
34 pages
Module1 BDA
No ratings yet
Module1 BDA
39 pages
How Much Data Does Google Handle?
No ratings yet
How Much Data Does Google Handle?
132 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
Chapter Five:: Analyses and Interpretation of Data
No ratings yet
Chapter Five:: Analyses and Interpretation of Data
72 pages
BSQT PG II Sem II Notes Session (1 6)
No ratings yet
BSQT PG II Sem II Notes Session (1 6)
35 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
73 pages
BRM Chapter 6
No ratings yet
BRM Chapter 6
8 pages
RDA Imp
No ratings yet
RDA Imp
26 pages
Analytics PrepBook AnSoc 2017 PDF
100% (1)
Analytics PrepBook AnSoc 2017 PDF
41 pages
Quantitative Methods For Decision Making: Dr. Akhter
No ratings yet
Quantitative Methods For Decision Making: Dr. Akhter
100 pages
Lesson 2 Notes
No ratings yet
Lesson 2 Notes
11 pages
Evaluating Analytical Chemistry
No ratings yet
Evaluating Analytical Chemistry
4 pages
Statistics
No ratings yet
Statistics
11 pages
Statistics
No ratings yet
Statistics
45 pages
Statistics
No ratings yet
Statistics
35 pages
Basic Statistics
100% (9)
Basic Statistics
73 pages
Unit 2 DS PDF
No ratings yet
Unit 2 DS PDF
97 pages
Statistics Intro 1
No ratings yet
Statistics Intro 1
41 pages
Week 1 Quantitative
No ratings yet
Week 1 Quantitative
32 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
63 pages
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Ds-3E1510P-Ei Smart Managed 8-Port Gigabit Poe Switch: Key Feature
No ratings yet
Ds-3E1510P-Ei Smart Managed 8-Port Gigabit Poe Switch: Key Feature
5 pages
Presentation On Microsoft's ZUNE
No ratings yet
Presentation On Microsoft's ZUNE
21 pages
Lab 4 Perform An SQL Injection Attack Against MSSQL To Extract Databases Using Sqlmap
No ratings yet
Lab 4 Perform An SQL Injection Attack Against MSSQL To Extract Databases Using Sqlmap
27 pages
Analogue Addressable Control Panel: Product Data
No ratings yet
Analogue Addressable Control Panel: Product Data
4 pages
Smwa Notes
No ratings yet
Smwa Notes
37 pages
ANNEX 10 Annual Implementation Plan Template
100% (1)
ANNEX 10 Annual Implementation Plan Template
13 pages
Array Leetcode PDF
No ratings yet
Array Leetcode PDF
4 pages
Sap MM Module Most Essential Notes at One Place
88% (8)
Sap MM Module Most Essential Notes at One Place
18 pages
Raunak Resume
No ratings yet
Raunak Resume
1 page
Enabled Secure Boot Issues With Schneider Electric PLC USB Driver & Unitelway Driver
No ratings yet
Enabled Secure Boot Issues With Schneider Electric PLC USB Driver & Unitelway Driver
7 pages
Accuvein Vein Finder How Does It Work?: Problem Statement
No ratings yet
Accuvein Vein Finder How Does It Work?: Problem Statement
5 pages
AG7 Access+resource+secrets+more+securely+across+services Ed1
No ratings yet
AG7 Access+resource+secrets+more+securely+across+services Ed1
55 pages
EC8661 VLSI Design Lab Manual
100% (3)
EC8661 VLSI Design Lab Manual
76 pages
Toshiba e Studio 5518a 6518a 7518a 8518 Brochure
No ratings yet
Toshiba e Studio 5518a 6518a 7518a 8518 Brochure
6 pages
Ezserver User Guide: Ezhometech
No ratings yet
Ezserver User Guide: Ezhometech
120 pages
ChatPDF-Use of Hierarchical Cascading Technique For FEM Analysis of Transverse-Mode Behaviors in Surface Acoustic-Wave Devices
No ratings yet
ChatPDF-Use of Hierarchical Cascading Technique For FEM Analysis of Transverse-Mode Behaviors in Surface Acoustic-Wave Devices
3 pages
Software Requirements Specification: Splitpay
No ratings yet
Software Requirements Specification: Splitpay
13 pages
Test Bank - 2
No ratings yet
Test Bank - 2
57 pages
SWITCH Poe sg350 E
No ratings yet
SWITCH Poe sg350 E
4 pages
Ali Gohar - (Research Fellow) : Master of Engineering in Computer Science
No ratings yet
Ali Gohar - (Research Fellow) : Master of Engineering in Computer Science
2 pages
Advt No 514 Applications Are Invited in Offline Mode For Recruitment
No ratings yet
Advt No 514 Applications Are Invited in Offline Mode For Recruitment
6 pages
Lesson 1: Introduction To ICT
No ratings yet
Lesson 1: Introduction To ICT
18 pages
Assam PAT Bot User Manual 2025-26
No ratings yet
Assam PAT Bot User Manual 2025-26
16 pages
Unit 1
No ratings yet
Unit 1
29 pages
DevOps Cheat Sheet
No ratings yet
DevOps Cheat Sheet
297 pages

Data Analysis3

Uploaded by

Data Analysis3

Uploaded by

DATA ANALYSIS

• In today’s data-driven world, organizations rely on data analysis to

The term data analysis refers to the systematic application of statistical

• This method involve individuals manually entering data using

• There is no one absolute way to prescribe the exactly steps of data

• Data summarization is a meticulously performed summary that is

• Mode refers to the most recurring value in the sample. In other

• Although it can be calculated for any type of sample, but it is mostly

• This dataset is referred to as bimodal because it has two modes. • It

For example: in the following set of numbers (5, 4, 9, 7, 6, 3, 8) there is

For example, below is a series of durations (in days) of absence from

• Standard deviation is the most used measure of dispersion. It is used

• To rephrase, it shows extra small or extra-large values of the data.

• Because the variance is in units squared, we find the standard

• With regard to data summarization ; explain the types and calculate

You might also like