Unit 4

Uploaded by

abernakumari87

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views15 pages

Unit 4

Uploaded by

abernakumari87

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

AD3301 DATA EXPLORATION AND VISUALIZATION

UNIT - 4
BIVARIATE ANALYSIS
Presented By
Dr R Murugadoss
Professor
Artificial Intelligence & Data Science
- Relationships between Two Variables
- Percentage Tables
- Analyzing Contingency Tables
- Handling Several Batches
- Scatterplots and Resistant Lines
– Transformations
Two variables are related if knowing one gives you information
about the other. For example, height and weight are related;
people who are taller tend to be heavier.

Association Examples:
◦ Smoking is associated with heart disease.
◦ Weight is associated with height.
◦ Income is associated with education.
Both variables are categorical. We analyze an association through a comparison of
conditional probabilities and graphically represent the data using contingency tables.
Examples of categorical variables are gender and class standing.
Both variables are quantitative. To analyze this situation we consider how one
variable, called a response variable, changes in relation to changes in the other variable
called an explanatory variable. Graphically we use scatterplots to display two
quantitative variables. Examples are age, height, weight (i.e. things that are measured).
One variable is categorical and the other is quantitative, for instance height and
gender. These are best compared by using side-by-side boxplots to display any
differences or similarities in the center and variability of the quantitative variable (e.g.
height) across the categories (e.g. Male and Female).
-1 indicates a perfectly negative linear correlation between two variables
0 indicates no linear correlation between two variables
1 indicates a perfectly positive linear correlation between two variables
A Percentage is calculated by the mathematical formula of
dividing the value by the sum of all the values and then
multiplying the sum by 100. This is also applicable in
Pandas Dataframes. Here, the pre-defined sum() method of
pandas series is used to compute the sum of all the values of
a column.
Contingency Table is one of the techniques for exploring two or even
more variables. It is basically a tally of counts between two or more
categorical variables.
import numpy as np data = pd.read_csv("loan_status.csv")
import pandas as pd
import matplotlib as plt print (data.head(10))
Most analytics applications require frequent batch processing
that allows them to process data in batches at varying
intervals. For example, processing daily sales aggregations by
individual store and then writing that data to the data
warehouse on a nightly basis can allow business intelligence
(BI) reporting queries to run faster. Batch systems must be
built to scale for all sizes of data and to scale seamlessly to the
size of the dataset being processed by various job runs.
Scatterplots and Resistant Lines
Scatter Plot
In a scatter plot, the values of two variables are plotted along two axes and the resulting pattern can
reveal correlation present between the variables if any.
A scatter plot is also useful for assessing the strength of the relationship and to find if there are any
outliers in the data.

import numpy
import matplotlib.pyplot as plt
x = numpy.random.normal(5.0, 1.0, 1000)
y = numpy.random.normal(10.0, 2.0, 1000)
plt.scatter(x, y) plt.show()

The ‘scatter()’ method of matplotlib can be used to draw the scatter plot which takes both the
variables.
The resistant line basics

The eda_rline function fits a robust line through a bivariate dataset.

It does so by first breaking the data into three roughly equal sized
batches following the x-axis variable. It then uses the batches’ median
values to compute the slope and intercept.
What is a Resistance Line? A Resistance line, sometimes also known

as a Speed Line, helps identify stock trends and levels of support and

resistance. Resistance lines are technical indication tools used by

equity analysts and investors to determine the price trend of a specific

stock.
Data transformation is the process of converting raw data into a format
or structure that would be more suitable for model building and also
data discovery in general. It is an imperative step in feature engineering
that facilitates discovering insights.

Unit 2
No ratings yet
Unit 2
34 pages
Data Exploration and Visualization Unit 2
100% (1)
Data Exploration and Visualization Unit 2
19 pages
8537ADS Experiment 03
No ratings yet
8537ADS Experiment 03
4 pages
Stat and Prob Q4 Week 7 Module 15 Lorena
No ratings yet
Stat and Prob Q4 Week 7 Module 15 Lorena
24 pages
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
No ratings yet
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
79 pages
BA 216 Lecture 5 Notes
No ratings yet
BA 216 Lecture 5 Notes
31 pages
cs446 - Tool Summarizing and Visualizing Numerical Variables in Bbivariate and Multivariate Analyses
No ratings yet
cs446 - Tool Summarizing and Visualizing Numerical Variables in Bbivariate and Multivariate Analyses
14 pages
Relationship Between Variables
No ratings yet
Relationship Between Variables
18 pages
DataAnalytics (Unit 2)
No ratings yet
DataAnalytics (Unit 2)
131 pages
Tools For Displaying
No ratings yet
Tools For Displaying
11 pages
Unit Iv
No ratings yet
Unit Iv
24 pages
Chapter 6 PPT Sldies
No ratings yet
Chapter 6 PPT Sldies
30 pages
ST2187 Block 3
No ratings yet
ST2187 Block 3
20 pages
2 Mark Key DS
No ratings yet
2 Mark Key DS
3 pages
Statistics Learners' Working Manual
No ratings yet
Statistics Learners' Working Manual
25 pages
Chapter 03 Describing Bivarate Data
No ratings yet
Chapter 03 Describing Bivarate Data
32 pages
EDA Unit 4 Notes
No ratings yet
EDA Unit 4 Notes
22 pages
Exploratory Data Analysis - v3 - Part1
No ratings yet
Exploratory Data Analysis - v3 - Part1
36 pages
Unit 2
No ratings yet
Unit 2
44 pages
Bivariate Data Year 10 Notes Pwe 2016
No ratings yet
Bivariate Data Year 10 Notes Pwe 2016
14 pages
Correlation Analysis
No ratings yet
Correlation Analysis
32 pages
Notes 2 - Scatterplots and Correlation
No ratings yet
Notes 2 - Scatterplots and Correlation
6 pages
Two Quantitative Variables: Scatterplot, Correlation, and Linear Regression
No ratings yet
Two Quantitative Variables: Scatterplot, Correlation, and Linear Regression
17 pages
Analise Bivariada - Moodle
No ratings yet
Analise Bivariada - Moodle
46 pages
Chapter 3 - Regression
No ratings yet
Chapter 3 - Regression
8 pages
Numpy and Pandas
No ratings yet
Numpy and Pandas
11 pages
Chapter - 4
No ratings yet
Chapter - 4
4 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
68 pages
IPS7e LecturePPT ch02
No ratings yet
IPS7e LecturePPT ch02
105 pages
Lecture 4
No ratings yet
Lecture 4
60 pages
Data Cleaning
No ratings yet
Data Cleaning
39 pages
Results of Controlled Experiment Supervised by Law Enforcement Officials
No ratings yet
Results of Controlled Experiment Supervised by Law Enforcement Officials
14 pages
1.1 Univariate Analysis: 1.1.1 Categorical Data
No ratings yet
1.1 Univariate Analysis: 1.1.1 Categorical Data
10 pages
DSBDAL - Assignment No 9
No ratings yet
DSBDAL - Assignment No 9
12 pages
Experiment No 9
No ratings yet
Experiment No 9
13 pages
Looking at Data Relationships p79: Explanatory
No ratings yet
Looking at Data Relationships p79: Explanatory
8 pages
Chapter 3 Notes 2024 2025 PDF
No ratings yet
Chapter 3 Notes 2024 2025 PDF
28 pages
YMS Topic Review (Chs 1-8)
No ratings yet
YMS Topic Review (Chs 1-8)
7 pages
Unit IV 2
No ratings yet
Unit IV 2
24 pages
Data Science Presentation
100% (3)
Data Science Presentation
113 pages
Dsa QB 2023-24
No ratings yet
Dsa QB 2023-24
3 pages
Business Club: Basic Statistics
No ratings yet
Business Club: Basic Statistics
26 pages
Co 2 Multivariate Analysis
No ratings yet
Co 2 Multivariate Analysis
71 pages
WEEK 6 Modular
No ratings yet
WEEK 6 Modular
10 pages
AIML Module - 4
No ratings yet
AIML Module - 4
25 pages
Introduction To Data Analytics-Module 1 Part 2
No ratings yet
Introduction To Data Analytics-Module 1 Part 2
78 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Exploratory Data Analysis
100% (3)
Exploratory Data Analysis
26 pages
Sem 6 Ques Data Science
No ratings yet
Sem 6 Ques Data Science
23 pages
Programming For AI: Exploratory Data Analysis
No ratings yet
Programming For AI: Exploratory Data Analysis
52 pages
6) Exploratory Data Analysis
No ratings yet
6) Exploratory Data Analysis
29 pages
Correg
No ratings yet
Correg
19 pages
BDA (18CS72) Module-5
No ratings yet
BDA (18CS72) Module-5
52 pages
Hypothesis Testing Correlation
No ratings yet
Hypothesis Testing Correlation
15 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
22 pages
Answer Key
No ratings yet
Answer Key
12 pages
Big Questions With Answers
100% (1)
Big Questions With Answers
32 pages
Answer Key - OS
No ratings yet
Answer Key - OS
13 pages
5
No ratings yet
5
29 pages
Unit 3
No ratings yet
Unit 3
10 pages
Unit 2
No ratings yet
Unit 2
19 pages
Lab Record Details
No ratings yet
Lab Record Details
3 pages
Data Warehousing Answer Key
No ratings yet
Data Warehousing Answer Key
4 pages
Unit 1
No ratings yet
Unit 1
19 pages
06-01-2024 Anna University - COE
No ratings yet
06-01-2024 Anna University - COE
2 pages
HdfcLogs 20 June 23
No ratings yet
HdfcLogs 20 June 23
4 pages
Phy Unit Test I Fin
No ratings yet
Phy Unit Test I Fin
4 pages
CH 12
No ratings yet
CH 12
9 pages
Mu0010-Manpower Planning and Resourcing
No ratings yet
Mu0010-Manpower Planning and Resourcing
8 pages
NASSCO Codes PDF
No ratings yet
NASSCO Codes PDF
5 pages
Mathematical Statistics With Applications Chapter1 Solution
No ratings yet
Mathematical Statistics With Applications Chapter1 Solution
7 pages
Research Intrument in Quantitative Method
No ratings yet
Research Intrument in Quantitative Method
5 pages
Specialized Allied Services
No ratings yet
Specialized Allied Services
84 pages
Cerumenolytic Efficacy of 2.5% Sodium Bicarbonate Versus Docusate Sodium: A Randomized, Controlled Trial
No ratings yet
Cerumenolytic Efficacy of 2.5% Sodium Bicarbonate Versus Docusate Sodium: A Randomized, Controlled Trial
6 pages
Mod 5 Data Analysis, Report and Decision Making
No ratings yet
Mod 5 Data Analysis, Report and Decision Making
21 pages
School Caloocan High School Grade Level 12 Teacher Maricar Telan Artuz Learning Area Diass Date Aug 19-2019 Quarter Class Schedule
No ratings yet
School Caloocan High School Grade Level 12 Teacher Maricar Telan Artuz Learning Area Diass Date Aug 19-2019 Quarter Class Schedule
3 pages
Ar2006-0553-0561 Laryea and Hughes
No ratings yet
Ar2006-0553-0561 Laryea and Hughes
9 pages
MGT 208-302 Business Research Methods
No ratings yet
MGT 208-302 Business Research Methods
6 pages
Epathshala Legal Research PDF
No ratings yet
Epathshala Legal Research PDF
243 pages
Human Resources Planning and Job Analysis
No ratings yet
Human Resources Planning and Job Analysis
22 pages
Determinants of Credit Default Risk of Microfinance Institutions
No ratings yet
Determinants of Credit Default Risk of Microfinance Institutions
9 pages
Affective Job Insecurity Scale 1
No ratings yet
Affective Job Insecurity Scale 1
21 pages
CPHQ Exam Content
No ratings yet
CPHQ Exam Content
11 pages
4TH Sem. Final Project
No ratings yet
4TH Sem. Final Project
63 pages
Management Dissertation Examples PDF
100% (2)
Management Dissertation Examples PDF
7 pages
Report B.pharm Projects
No ratings yet
Report B.pharm Projects
4 pages
3rd Quarter Exam in Math 8
93% (14)
3rd Quarter Exam in Math 8
2 pages
Managing For Business Success
No ratings yet
Managing For Business Success
44 pages
Children Benzydamine
No ratings yet
Children Benzydamine
3 pages
Perspective of Futurology and Its Implication in Education: November 2018
No ratings yet
Perspective of Futurology and Its Implication in Education: November 2018
6 pages
Quality Management For Drug Development
No ratings yet
Quality Management For Drug Development
14 pages
Harnessinggenerativeai 1
No ratings yet
Harnessinggenerativeai 1
22 pages
A B C D A B C D A B C D
No ratings yet
A B C D A B C D A B C D
7 pages
07 Eugenia Tan - Singapore PDF
No ratings yet
07 Eugenia Tan - Singapore PDF
11 pages
Practice Quiz - Chap 10
No ratings yet
Practice Quiz - Chap 10
30 pages
Asian Development Bank: Impact of Rural Roads ON Poverty Reduction: A Case Study-Based Analysis
No ratings yet
Asian Development Bank: Impact of Rural Roads ON Poverty Reduction: A Case Study-Based Analysis
141 pages
Action Research 4
No ratings yet
Action Research 4
10 pages