Escriptive Tatistics Pplications: Pavan Kumar A
Escriptive Tatistics Pplications: Pavan Kumar A
APPLICATIONS
Pavan Kumar A
INTRODUCTION TO DATA CLEANING
Data Wrangling is the process of
transforming raw data into consistent
data that can be analyzed.
Data cleaning is one of the primary pain
points of data science.
Data Scientists spend 80% of data
analysis time in cleaning data.[1]
Summary Statistics
Tabular Statistics
Graphical Statistics
Summary Statistics
Source: http://www.tulane.edu/~panda2/Analysis2/datclean/stats_with_errors.html
Source: http://www.tulane.edu/~panda2/Analysis2/datclean/stats_with_errors.html
ERROR DETECTION
Descriptive Statistics : Graphical Analysis (Scatter Plot)
Some errors appears only when it is compared with two variables.
Source: http://www.tulane.edu/~panda2/Analysis2/datclean/stats_with_errors.html
ERROR DETECTION
Descriptive Statistics : Tabular Analysis (Frequency)
Frequencies help to locate the 'dirty' data (Unequal distribution) among the entered
variables.
Example 2: Baby ages
ERROR DETECTION
Logic Checks
We can often detect errors in data simply by seeing if the responses are logical.
Example
We would expect to see 100% of responses, not 110%.
3. Best way: Outliers set to “MEAN” (for multiple variable analysis) for
normal distribution of the data values.
THANK YOU !!!!