0% found this document useful (0 votes)
10 views18 pages

05-Data Exploring and Analysis

This document discusses techniques for exploring and analyzing grouped data, including performing statistical analysis on groups, iterating through groups, and applying aggregation, transformation, and filtration methods to extract useful insights from grouped data in Python. Key topics covered include common statistical analysis methods in Pandas like describe(), mean(), corr(), count(), and how to group data, iterate through groups, and apply aggregations, transformations, and filters to grouped data.

Uploaded by

Sabrina Sibarani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views18 pages

05-Data Exploring and Analysis

This document discusses techniques for exploring and analyzing grouped data, including performing statistical analysis on groups, iterating through groups, and applying aggregation, transformation, and filtration methods to extract useful insights from grouped data in Python. Key topics covered include common statistical analysis methods in Pandas like describe(), mean(), corr(), count(), and how to group data, iterate through groups, and apply aggregations, transformations, and filters to grouped data.

Uploaded by

Sabrina Sibarani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

IF2106 – Data Engineering

Data Exploring and Analysis (2)


• Statistical Analysis
• Data Grouping

Undergraduate

Computer Science
Overview

• Learn how to statistically analyze grouped data, iterate through


groups, and apply aggregations, transformations, and filtration
techniques
Objectives

Upon completion of this Unit, you are expected to be able to:


• Properly perform and practice data exploration and analysis
techniques
Contents

a. Statistical Analysis

b. Data Grouping
Statistical Analysis
Data Analysis

• Pandas provides
numerous methods for
data analysis

• Also, you can define


your own methods for
specific statistical
analysis
• df.describe(): Summary statistics for numerical columns

• df.mean(): Returns the mean of all columns


Statistical • df.corr(): Returns the correlation between columns in a data
Analysis frame

• df.count(): Returns the number of non-null values in each


data frame column
• The correlation coefficient is a measure that
determines the degree to which two
variables’ movements are associated

Statistical • The most common correlation coefficient,

Analysis generated by the Pearson correlation, may


be used to measure the linear relationship
(Cont.) between two variables
• However, in a nonlinear relationship, this
correlation coefficient may not always
be a suitable measure of dependence
• The range of values for the correlation coefficient
is -1.0 to 1.0
• In other words, the values cannot exceed 1.0
or be less than -1.0, whereby a correlation of
-1.0 indicates a perfect negative correlation,
Statistical and a correlation of 1.0 indicates a perfect

Analysis positive correlation

(Cont.) • The correlation coefficient is denoted as r


• If its value greater than zero, it’s a positive
relationship; while if the value is less than
zero, it’s a negative relationship
• A value of zero indicates that there is no
relationship between the two variables
• df.max(): Returns the highest value in
each column

• df.min(): Returns the lowest value in


Statistical each column
Analysis
• df.median(): Returns the median of each
(Cont.)
column

• df.std(): Returns the standard deviation


of each column
Data Grouping
• You can split data into groups to
perform more specific analysis
over the data set

• Once you perform data grouping,

Data Grouping you can compute summary


statistics (aggregation), perform
specific group operations
(transformation), and discard
data with some conditions
(filtration)
Iterating Through
Groups
• You can iterate through a specific
group

• You can also select a specific group


using the get_group() method
Aggregations • Aggregation functions return a
single aggregated value for each
group

• Once the groupby object is


created, you can implement
various functions on the grouped
data
Transformations

• Transformation on a group or a column returns an


object that is indexed the same size as the one being
grouped

• Thus, the transform should return a result that is the


same size as that of a group chunk
Filtration

• Python provides direct filtering for data


Summary

This Unit covered how to explore and analyze data in different collection
structures. Here’s a recap of what was covered in this Unit:

• How to apply statistical analysis on the derived data from implementing


Python data grouping, iterating through groups, aggregations,
transformations, and filtration techniques
Discussion

You might also like