0% found this document useful (0 votes)

147 views123 pages

Unit - 1 EDA

The document outlines the objectives and fundamentals of Exploratory Data Analysis (EDA), emphasizing its significance in understanding data through various techniques including univariate, bivariate, and multivariate analysis. It details the phases of data analysis, including data collection, processing, cleaning, and visualization, while highlighting the importance of data types and measurement scales. Additionally, it discusses the role of EDA in data science projects and the use of software tools for effective data exploration and visualization.

Uploaded by

mk4997320

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

147 views123 pages

Unit - 1 EDA

Uploaded by

mk4997320

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 123

AD3301 - DATA

EXPLORATION &
VISUALIZATION
AD3301 DATA EXPLORATION AND VISUALIZATION LT
PC

3024

OBJECTIVES:

TO OUTLINE AN OVERVIEW OF EXPLORATORY DATA ANALYSIS.

TO IMPLEMENT DATA VISUALIZATION USING MATPLOTLIB.
TO PERFORM UNIVARIATE DATA EXPLORATION AND ANALYSIS.
TO APPLY BIVARIATE DATA EXPLORATION AND ANALYSIS.
TO USE DATA EXPLORATION AND VISUALIZATION TECHNIQUES FOR MULTIVARIATE
AND TIME SERIES DATA.

UNIT I EXPLORATORY DATA ANALYSIS

EDA FUNDAMENTALS – UNDERSTANDING DATA SCIENCE – SIGNIFICANCE OF EDA –

MAKING SENSE OF DATA – COMPARING EDA WITH CLASSICAL AND BAYESIAN
ANALYSIS – SOFTWARE TOOLS FOR EDA - VISUAL AIDS FOR EDA- DATA
TRANSFORMATION TECHNIQUES-MERGING DATABASE, RESHAPING AND PIVOTING,
UNIT - 1
EXPLORATORY DATA
ANALYSIS
TOPIC - 1
EXPLORATORY DATA
ANALYSIS
FUNDAMENTALS
ATA ?
D
DATA - Collection of objects,
events and facts in the form
of numbers, text, audio &
videos.
How do we get
meaningful & useful
imformation from
data?
EXPLORATORY DATA
ANALYSIS (EDA)
EDA - Process of investing
datasets, explaining subjects,
extracting the information
enfolded in the data and
visualizing outcomes
WHY EDA IS
IMPORTANT?
• Just like everything in this world,
data has its imperfections.
• Raw data is usually skewed, may
have outliers, or too many
missing values.
• A model built on such data
results in sub-optimal
performance.
TOPIC - 2
UNDERSTANDING
DATA SCIENCE
UNDERSTANDING DATA
SCIENCE
Data Science involves cross
disciplinary knowledge from data,
statistics, computer science and
mathematics
DATA SCIENCE
PROJECT FLOW
Data Science Project Flow with EDA
as part of data preparation
PHASES OF DATA
ANALYSIS
PHASES OF DATA ANALYSIS

CRISP (CROSS 1. DATA COLLECTION

INDUSTRY STANDARD
PROCESS FOR DATA 2. DATA PROCESSING
MINING)
FRAMEWORKS IN DATA 3. DATA CLEANING
MINING
4. EDA
5. MODELLING &
ALGORITHMS
STAGES OF DATA
ANALYSIS
STAGES (PILLARS) OF EDA
1. DATA REQUIREMENTS

CRISP (CROSS 2. DATA COLLECTION

INDUSTRY 3. DATA PROCESSING
STANDARD
4. DATA CLEANING
PROCESS FOR
DATA MINING) 5. EDA
FRAMEWORKS 6. MODELLING & ALGORITHMS
IN DATA MINING
7. DATA PRODUCT
8. COMMUNICATION
DATA REQUIREMENTS
• There can be various sources of data for an organization.
• It is important to comprehend what type of data is required
for the organization to be collected, curated, and stored.

DEMENTIA PATIENT - CASE STUDY

An application tracking the sleeping pattern of
patients suffering from dementia requires what
types of sensors’ data storage?
Tracking the sleeping pattern of patients suffering
from dementia requires what types of sensors’ data
storage?
01. SLEEP DATA

02. HEART RATE FROM THE PATIENTS

03. ELECTRO-DERMAL activities

04. USER ACTIVITIES PATTERN

All of these data points are required to correctly diagnose the

mental state of the person.
NUMERICAL

DATA

CATEGORICAL
NUMERICAL (Numbers)

CATEGORICAL (collection of
information divided into groups)
DATA COLLECTION
• Data can be collected from several
objects on several events using
different types of sensors and storage
tools.
• Data collected from several sources
must be stored in the correct format
DATA PROCESSING
• Preprocessing involves the process of
pre-curating the dataset before actual
analysis.
• Common tasks involve correctly
exporting the dataset, placing them
under the right tables, structuring
them, and exporting them in the
correct format
DATA CLEANING
• Preprocessed data is still not ready for detailed
analysis.
• It must be correctly transformed for an
incompleteness check, duplicate check, error
check, and missing value check.
• Finding inaccuracies in the dataset,
understanding the overall data quality,
removing duplicate items, and filling in the
missing values.
• These tasks are performed in the data cleaning
stage.
EDA
• Exploratory data analysis, is the stage
where we actually start to understand
the message contained in the data.
• It should be noted that several types
of data transformation techniques
might be required during the process
of exploration.
MODELING & ALGORITHM
• From a data science perspective,
generalized models can exhibit
relationships among different
variables, such as correlation or
causation.
• These models involve one or more
variables that depend on other
variables to cause an event.
For example, when buying, pens, the
Total price of pens (Total) = price for one
pen (UnitPrice) * the number of pens
bought (Quantity).
Total = UnitPrice * Quantity
• Total price is dependent on the unit
price.
• Hence, the total price is referred to as
the dependent variable, and the unit
price is referred to as an independent
variable.
• Model always describes the
relationship between
independent and dependent
variables.
DATA PRODUCT
• Any computer software that uses data as
inputs, produces outputs, and provides
feedback based on the output to control the
environment is referred to as a data product.
• A data product is generally based on a model
developed during data analysis.
• For example, a recommendation model that
inputs user purchase history and recommends
a related item that the user is highly likely to
buy.
COMMUNICATION
• This stage deals with disseminating the results
to end stakeholders to use the results for
business intelligence.
• One of the most notable steps in this stage is
data visualization.
• Visualization deals with information relay
techniques such as tables, charts, summary
diagrams, and bar charts to show the analyzed
result.
TOPIC - 3
SIGNIFICANCE OF
EDA
SIGNIFICANCE OF EDA

• Different fields of science, economics,

engineering, and marketing accumulate and
store data primarily in databases.
• Appropriate and well-established decisions
should be made using the data collected.
• It is practically impossible to make sense of
datasets containing more than a handful of
data points without the help of computer
programs.
SIGNIFICANCE OF EDA
• To get insights out of the collected data and to make further
decisions, data mining is performed where we go through
distinctive analysis processes.
• Exploratory data analysis is key, and usually the first exercise
in data mining.
• It allows us to visualize data to understand it as well as to
create hypotheses for further analysis.
• EDA actually reveals ground truth about the content without
making any underlying assumptions.
• This is the fact that data scientists use this process to actually
understand what type of modeling and hypotheses can be
created.
KEY COMPONENTS
OF EDA
01. SUMMARIZING DATA
PANDAS

02. STATISTICAL ANALYSIS

SCIPY

03. VISUALIZATION OF DATA

MATPLOTLIB & PLOTLY
STEPS IN EDA

1. PROBLEM DEFINITION 2. DATA PREPARATION

3. DATA ANALYSIS

4. DEVELOPMENT &
REPRESENTATION OF RESULTS
PROBLEM DEFINITION
• Before trying to extract useful insight from
the data, it is essential to define the
business problem to be solved.
• The main tasks involved in problem
definition are defining the main objective of
the analysis, defining the main
deliverables, outlining the main roles and
responsibilities, obtaining the current
status of the data, defining the timetable,
and performing cost/benefit analysis.
• Based on such a problem definition, an
execution plan can be created
DATA PREPARATION
• This step involves methods for preparing the dataset before
actual analysis
Clean the dataset
Define the Sources of
data
Define data schemas and tables Delete non-relevant
datasets
Understand the main Transform the data
characteristics of the data

Divide the data into

required chunks for analysis
DATA ANALYSIS
• This is one of the most crucial steps that deals with descriptive
statistics and analysis of the data

Summarizing the data

Evaluating the models,

Finding the hidden correlation

and relationships among the
data Calculating the
accuracies.
Developing predictive models
TECHNIQUES USED FOR DATA
SUMMARIZATION ?????????
Summary Tables

TECHNIQUES Graphs
USED FOR DATA
SUMMARIZATION
Descriptive Statistics

Inferential Statistics

Grouping Correlation Statistics

Mathematical Models Searching

DEVELOPMENT & REPRESENTATION
OF THE RESULTS
• This step involves presenting the dataset to the target audience
in the form of graphs, summary tables, maps, and diagrams.

Scattering Plots Box Plots

Character Plots Residual Plots

Histograms Mean Plots

Some commonly used plots for EDA are:

• Histograms: To check the distribution of a specific

variable
• Scatter plots: To check the dependency between
two variables
• Feature correlation plot (heatmap): To understand
the dependencies between multiple variables
• Time series plots: To identify trends and seasonality
in time dependent data
EXPLORATORY TOOLS
PYTHON
ENTERPRISE APPLICATIONS
TOPIC - 4
MAKING SENSE OF
DATA
MAKING SENSE OF DATA

• Different disciplines store different kinds of data for different

purposes.

Medical Researchers Patient Data

Universities Students’ & Teachers’ Data

Real estate House & Building datasets

industries
• A dataset contains many observations about a particular object.
• For instance, a dataset about patients in a hospital can contain
many observations.
• A patient can be described by a,

Patient identifier (ID) Weight

Variable

Name Date of Birth

Address Email

Gender
• Each of these features that describes a patient is a variable.
• Each observation can have a specific value for each of these
variables.
PATIENT INFORMATION DATABASE

9
R I E S
T E G O
C A ?
D A TA
O F
MOST OF THE DATASET BROADLY FALLS INTO TWO GROUPS

NUMERICAL

DATA

CATEGORICAL
I C AL
M ER
NU
A TA ?
D
NUMERICAL DATA
• This data has a sense of measurement involved in it; for
example, a person's age, height, weight, blood pressure,
heart rate, temperature, number of teeth, number of bones,
and the number of family members.
• This data is often referred to as quantitative data in
statistics. DISCRETE DATA

NUMERICAL DATA

CONTINUOUS DATA
R E T E
DI S C
A TA ?
D
DISCRETE DATA
• This is data that is countable and its values can be listed
out.
• For example, if we flip a coin, the number of heads in 200
coin flips can take values from 0 to 200 (finite) cases.
• A variable that represents a discrete dataset is referred
to as a discrete variable.
• The discrete variable takes a fixed number of distinct
values.
• For example, the Country variable can have values such
as Nepal, India, Norway, and Japan. It is fixed.
• The Rank variable of a student in a classroom can take
values from 1, 2, 3, 4, 5 and so on.
O U S
T I NU
C O N
A TA?
D
CONTINUOUS DATA
• A variable that can have an infinite number of numerical
values within a specific range is classified as continuous
data.
• A variable describing continuous data is a continuous
variable.
• For example, what is the temperature of your city today?
Can we be finite?
• Similarly, the weight variable in the previous section is a
continuous variable.
TRY
IT!!

Check the preceding table and determine which of the variables are
discrete and which of the variables are continuous. Can you justify your
I C AL
E G R
O
C AT
A TA?
D
CATEGORICAL DATA

• This type of data represents the

characteristics of an object; for example,
gender, marital status, type of address, or
categories of the movies.
• This data is often referred to as qualitative
datasets in statistics.
TYPES OF CATEGORICAL DATA
• Gender (Male, Female, Other, or Unknown)

• Marital Status (Divorced, Legally Separated, Married,

Never Married, Domestic Partner, Unmarried,
Widowed, or Unknown)

• Movie genres (Action, Adventure, Comedy, Crime,

Drama, Fantasy, Historical, Horror, Mystery,
Philosophical, Political, Romance, Saga, Satire,
Science Fiction, Social, Thriller, Urban, or Western)
CATEGORICAL DATA
Blood type (A, B, AB, or O)

Types of drugs (Stimulants, Depressants,

Hallucinogens, Dissociatives, Opioids, Inhalants, or
Cannabis)
• A variable describing categorical data is referred to as a
categorical variable.
• These types of variables can have one of a limited
number of values.
S O F
T Y PE
I C A L
E G O R
C AT E ?
I AB L
V A R
DICHOTOMOUS VARIABLE

CATEGORICAL VARIABLE

POLYTOMOUS VARIABLE
DICHOTOMOUS VARIABLE
• A binary categorical variable can take exactly
two values and is also referred to as a
dichotomous variable.
• For example, when you create an experiment,
the result is either success or failure.
• Hence, results can be understood as a binary
categorical variable.
POLYTOMOUS VARIABLES
• A Polytomous variables are categorical variables that
can take more than two possible values.
• For example, marital status can have several values,
such as divorced, legally separated, married, never
married, domestic partners, unmarried, widowed,
domestic partner, and unknown.
• Since marital status can take more than two possible
values, it is a polytomous variable.
MEASUREMENT SCALES

NOMINAL

ORDINAL

INTERVAL

RATIO
NAL?
O M I
N
NOMINAL

• In nominal, the scales are generally referred to

as labels.
• These scales are mutually exclusive and do
not carry any numerical importance.
• Nominal scales are considered qualitative
scales.
NAL?
R DI
O
ORDINAL
• The main difference in the ordinal and nominal scale is the order.
• In ordinal scales, the order of the values is a significant factor.

Likert
scale
VAL ?
T ER
IN
INTERVAL

• Interval scales are widely used in statistics, for

example, in the measure of central tendencies
—mean, median, mode, and standard
deviations.
T I O ?
RA
RATIO
• Ratio scales contain order, exact values, and absolute zero,
which makes it possible to be used in descriptive and
inferential statistics.
• Mathematical operations, the measure of central
tendencies, and the measure of dispersion and coefficient of
variation can also be computed from such scales.
• Examples include a measure of energy, mass, length,
duration, electrical energy, plan angle, and volume.
TOPIC - 5
COMPARING EDA
WITH CLASSICAL &
BAYESIAN ANALYSIS
COMPARING EDA WITH CLASSICAL AND
BAYESIAN ANALYSIS

1. CLASSICAL DATA ANALYSIS

2. EXPLORATORY DATA ANALYSIS

3. BAYESIAN DATA ANALYSIS

TOPIC - 6
SOFTWARE TOOLS
AVAILABLE FOR EDA
A R E
FT W
SO O R
LS F
TO O
EDA?
SOFTWARE TOOLS AVAILABLE FOR
EDA
PYTHON

R PROGRAMMING

WEKA

KNIME
PYTHON
• This is an open source programming language
widely used in data analysis, data mining, and
data science

R PROGRAMMING LANGUAGE
• R is an open source programming language that
is widely utilized in statistical computation and
graphical data analysis
WEKA
• This is an open source data mining package that
involves several EDA tools and algorithms

KNIME
• This is an open source tool for data analysis and
is based on Eclipse
GETTING
STARTED
WITH EDA
th o n
a py
W ri t e
1. ra m to
p ro g s ?
e fi l e
/ w r i t
rea d
th o n
a p y
W r i t e ro r
2. o r er
a m f
o g r
pr l i n g ?
ha n d
j e c t-
a n o b
W r i t e e p t
3. c o n c
n t ed
o ri e u s i n g
g r a m
pr o ?
th o n
py
READING/WRITING TO FILES

filename = "datamining.txt"
file = open(filename, mode="r",
encoding='utf-8')
for line in file:
lines = file.readlines()
print(lines)
file.close()
ERROR HANDLING
try:
Value = int(input("Type a number between 47 and
100:"))
except ValueError:
print("You must type a number between 47 and
100!")
else:
if (Value > 47) and (Value <= 100):
print("You typed: ", Value)
else:
OBJECT-ORIENTED CONCEPT
class Disease:
def __init__(self, disease = 'Depression'):
self.type = disease
def getName(self):
print("Mental Health Diseases
{0}".format(self.type))
d1 = Disease('Social Anxiety Disorder')
d1.getName()
m p y ?
i s Nu
h a t
W
NUMPY
• NumPy is a python library that can be used to
perform a variety of mathematical operations
on arrays.
• It adds powerful data structures to Python that
guarantee efficient calculations with arrays and
matrices.
IMPORTING NUMPY
• import numpy as np

CREATING DIFFERENT TYPES OF NUMPY

ARRAYS
# Defining 1D array
• my1DArray = np.array([1, 8, 27, 64])
• print(my1DArray)
# Defining and printing 2D array
• my2DArray = np.array([[1, 2, 3, 4], [2, 4, 9, 16], [4, 8, 18,
32]])
• print(my2DArray)
#Defining and printing 3D array
• my3Darray = np.array([[[ 1, 2 , 3 , 4],[ 5 , 6 , 7 ,8]], [[ 1, 2, 3,
4],[ 9, 10, 11, 12]]])
• print(my3Darray)
1 DIMENSIONAL ARRAY
• A One-Dimensional Array is the simplest form of an
Array in which the elements are stored linearly and
can be accessed individually by specifying the index
value of each element stored in the array.

2 DIMENSIONAL ARRAY
DISPLAYING BASIC INFORMATION - DATA
TYPE, SHAPE, SIZE, AND STRIDES OF a
NumPy array
# Print out memory address
• print(my2DArray.data)
# Print the shape of array
• print(my2DArray.shape)
# Print out the data type of the array
• print(my2DArray.dtype)
# Print the stride of the array
• print(my2DArray.strides)
u i l t i n
t i s B
W h a i o n s ?
u n c t
p y F
Nu m
BUILT-IN NUMPY FUNCTIONS
#Array of ones
ones = np.ones((3,4))
# Array of ones with integer data type
ones = np.ones((3, 4), dtype=int)
print(ones)
# Array of zeros
zeros = np.zeros((2,3,4),dtype=np.int16)
print(zeros)
It creates a 3-dimensional array with a shape of (2,
3, 4), meaning it has 2 blocks, each containing 3 rows and 4
columns. The dtype=np.int16 parameter specifies that the
array elements should be of 16-bit integer data type.
BUILT-IN NUMPY FUNCTIONS
# Array with random values
np.random.random((2,2))
It creates a 2x2 array with random values between
0 & 1.
# Empty array
emptyArray = np.empty((3,2))
print(emptyArray)
It creates a 2-dimensional array with a shape of (3,
2).
# Full array
fullArray = np.full((2,2),7)
print(fullArray)
Each element in the array is 7 because you specified
7 as the fill value.
BUILT-IN NUMPY FUNCTIONS

# Array of evenly-spaced values

evenSpacedArray = np.arange(10,25,5)
print(evenSpacedArray)
# Array of evenly-spaced values
evenSpacedArray2 = np.linspace(0,2,9)
print(evenSpacedArray2)
PANDAS

Pandas Library is used to get meaningful

insight from the data.

import numpy as np
import pandas as pd
print("Pandas Version:", pd.__version__)
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)
PANDAS

In pandas, we can create data structures

in two ways:
Series
Dataframes
CREATE A DATAFRAME FROM A SERIES
series = pd.Series([2, 3, 7, 11, 13, 17, 19, 23])
print(series)
# Creating data frame from Series
series_df = pd.DataFrame({
'A': range(1, 5),
'B': pd.Timestamp('20190526'),
'C': pd.Series(5, index=list(range(4)), dtype='float64'),
'D': np.array([3] * 4, dtype='int64'),
'E': pd.Categorical(["Depression", "Social Anxiety", "Bipolar Disorder", "Eating
Disorder"]),
'F': 'Mental health',
'G': 'is challenging' })
print(series_df)
CREATE A DATAFRAME FOR A DICTIONARY

# Creating data frame from Dictionary

dict_df = [{'A': 'Apple', 'B': 'Ball'},{'A': 'Aeroplane', 'B':
'Bat', 'C': 'Cat'}]
dict_df = pd.DataFrame(dict_df)
print(dict_df)
CREATE A DATAFRAME FROM N-DIMENSIONAL
ARRAYS
# Creating a dataframe from ndarrays
sdf = {
'County':['Østfold', 'Hordaland', 'Oslo', 'Hedmark', 'Oppland',
'Buskerud'],
'ISO-Code':[1,2,3,4,5,6],
'Area': [4180.69, 4917.94, 454.07, 27397.76, 25192.10, 14910.94],
'Administrative centre': ["Sarpsborg", "Oslo", "City of Oslo",
"Hamar", "Lillehammer", "Drammen"]
}
sdf = pd.DataFrame(sdf)
LOAD A DATASET FROM AN EXTERNAL
SOURCE INTO A PANDAS DATAFRAME

columns = ['age', 'workclass', 'fnlwgt', 'education',

'education_num', 'marital_status', 'occupation', 'relationship',
'ethnicity', 'gender', 'capital_gain', 'capital_loss',
'hours_per_week', 'country_of_origin', 'income']
df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-
learning-databases/adult/adult.data',names=columns)
df.head(10)
i l o c ?
lo c &
f o () ?
d f. i n
df.info() - Displays the rows, columns, data types, and
memory used by the dataframe
loc() and iloc() are one of the methods used in slicing data
from the Pandas DataFrame.
• They help in the convenient selection of data from
the DataFrame in Python.
• They are used in filtering the data according to some
conditions.
[1 0 ]?
. i l o c
df
# Selects a
row
df.iloc[10]
ro w s
c t 1 0
Sel e ?
g i l o c
u s i n
# Selects 10
rows
df.iloc[0:10]
l a s t 2
t t h e
S el ec o c ?
i n g i l
s u s
r o w
# Selects the last 2
rows
df.iloc[-2:]
If the values are greater than zero, we
change the color to black (the default
color); if the value is less than zero, we
change the color to red; and finally,
everything else would be colored green.
Define a Python function to accomplish
this
def colorNegativeValueToRed(value):
if value < 0:
color = 'red'
elif value > 0:
color = 'black'
else:
color = 'green'
return 'color: %s' % color
C I P Y ?
S
o t l i b ?
a t p l
M

GE8072 - Foundation Skills in Integrated Product Development (Ripped From Amazon Kindle Ebooks by Sai Seena)
No ratings yet
GE8072 - Foundation Skills in Integrated Product Development (Ripped From Amazon Kindle Ebooks by Sai Seena)
140 pages
Ad3351 Daa Lecture Notes Units 1,2,3
No ratings yet
Ad3351 Daa Lecture Notes Units 1,2,3
79 pages
Database Design, Creation of Table, Data Program / Queries Output Viva-Voce Record Total 30 50 10 10 100
50% (4)
Database Design, Creation of Table, Data Program / Queries Output Viva-Voce Record Total 30 50 10 10 100
8 pages
Unit - 5 Multivariate Analysis
No ratings yet
Unit - 5 Multivariate Analysis
29 pages
Ad3381 Set3
No ratings yet
Ad3381 Set3
4 pages
Dev PDF
100% (1)
Dev PDF
35 pages
DAP Lab Manual
No ratings yet
DAP Lab Manual
20 pages
Customer (Custid, Custname, Age, Phone) Loan (Loanid, Amount, Custid, EMI)
No ratings yet
Customer (Custid, Custname, Age, Phone) Loan (Loanid, Amount, Custid, EMI)
8 pages
EDA - With Python Question Bank
No ratings yet
EDA - With Python Question Bank
3 pages
CS3352 - Foundation of Data Science
No ratings yet
CS3352 - Foundation of Data Science
2 pages
Ad3351 Daa Unit I
No ratings yet
Ad3351 Daa Unit I
135 pages
CCS341 Data Warehousing
No ratings yet
CCS341 Data Warehousing
7 pages
CCS341 - Data Warehousing 2023 Nov Dec
No ratings yet
CCS341 - Data Warehousing 2023 Nov Dec
2 pages
Deep Learning For Vision Lab Manual 2024
100% (1)
Deep Learning For Vision Lab Manual 2024
25 pages
ccs346 Eda
No ratings yet
ccs346 Eda
2 pages
Untitled
No ratings yet
Untitled
4 pages
FDSA Unit-2
No ratings yet
FDSA Unit-2
41 pages
R22 DVT Handout
No ratings yet
R22 DVT Handout
10 pages
Ad3381 Set2
No ratings yet
Ad3381 Set2
4 pages
Ad3491 Fdsa Unit 3 Notes Eduengg
No ratings yet
Ad3491 Fdsa Unit 3 Notes Eduengg
37 pages
Iso 3511 Instrument - Symbols - Part - 4 PDF
0% (1)
Iso 3511 Instrument - Symbols - Part - 4 PDF
10 pages
CCS341 Set3
100% (1)
CCS341 Set3
3 pages
STA112 - Lecture - 1 - Content - Probability 1
No ratings yet
STA112 - Lecture - 1 - Content - Probability 1
42 pages
Data Analytics Question Bank
No ratings yet
Data Analytics Question Bank
4 pages
Data Warehousing Full
No ratings yet
Data Warehousing Full
41 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
Ad3301 Data Exploration and Visualization
No ratings yet
Ad3301 Data Exploration and Visualization
38 pages
FDS Lesson Plan
No ratings yet
FDS Lesson Plan
8 pages
CCS354 Set1
No ratings yet
CCS354 Set1
2 pages
Unit 2 Fod
No ratings yet
Unit 2 Fod
27 pages
Experiment 5
100% (1)
Experiment 5
6 pages
ccs346 Eda Unit 1 Notes
No ratings yet
ccs346 Eda Unit 1 Notes
20 pages
Ad3301 Data Exploration and Visualization
No ratings yet
Ad3301 Data Exploration and Visualization
24 pages
CS8481 DBMS Lab QN
No ratings yet
CS8481 DBMS Lab QN
9 pages
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
100% (1)
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
17 pages
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
No ratings yet
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
79 pages
Foundation Class X PCMB
No ratings yet
Foundation Class X PCMB
1,571 pages
DDM Lab Manual
100% (1)
DDM Lab Manual
80 pages
Question Paper - AI (Feb 1)
No ratings yet
Question Paper - AI (Feb 1)
2 pages
CCW331 BA IAT 1 Set 1 & Set 2 Questions
No ratings yet
CCW331 BA IAT 1 Set 1 & Set 2 Questions
19 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
Unit 1
No ratings yet
Unit 1
50 pages
Ad3461 ML Lab Manual
No ratings yet
Ad3461 ML Lab Manual
48 pages
BA Lab Manual
No ratings yet
BA Lab Manual
62 pages
Ge8151 Phython Prog Unit 4 New
No ratings yet
Ge8151 Phython Prog Unit 4 New
33 pages
CCS354 Network Security
No ratings yet
CCS354 Network Security
87 pages
Alemite Oil Mist Application Manual
100% (1)
Alemite Oil Mist Application Manual
34 pages
Data Structures Design - AD3251 - Important Questions With Answer - Unit 1 - Abstract Data Types
No ratings yet
Data Structures Design - AD3251 - Important Questions With Answer - Unit 1 - Abstract Data Types
15 pages
Concrete Sheet Pile Drawingdrawing06040
100% (1)
Concrete Sheet Pile Drawingdrawing06040
4 pages
Cs3481 - Dbms Record
No ratings yet
Cs3481 - Dbms Record
63 pages
Ad3381 Set1
No ratings yet
Ad3381 Set1
3 pages
CS3492 Database Management Systems Question Bank 1
No ratings yet
CS3492 Database Management Systems Question Bank 1
11 pages
XS Series E Appen 7 Installation PDF
No ratings yet
XS Series E Appen 7 Installation PDF
101 pages
Notes - EDA-Unit1
No ratings yet
Notes - EDA-Unit1
34 pages
Cs3353 Foundations of Data Science L T P C 3 0 0 3
No ratings yet
Cs3353 Foundations of Data Science L T P C 3 0 0 3
2 pages
Deterioration of Concrete
No ratings yet
Deterioration of Concrete
34 pages
Lecture Notes: Introduction To Data Science and Big Data
No ratings yet
Lecture Notes: Introduction To Data Science and Big Data
5 pages
Data Science Fundamentals QB
No ratings yet
Data Science Fundamentals QB
23 pages
Study On Intel 80386 Microprocessor
No ratings yet
Study On Intel 80386 Microprocessor
3 pages
Dbms
No ratings yet
Dbms
99 pages
Ad3311 Set4
No ratings yet
Ad3311 Set4
2 pages
Curriculum Development Prof Ed LET Reviewer
100% (1)
Curriculum Development Prof Ed LET Reviewer
6 pages
Case History, Assessment Process and Report
No ratings yet
Case History, Assessment Process and Report
88 pages
Question Bank - OS
No ratings yet
Question Bank - OS
6 pages
Unit I - Part I Notes
100% (7)
Unit I - Part I Notes
33 pages
Third Party Sales Process
No ratings yet
Third Party Sales Process
4 pages
BDA Unit 1-1
No ratings yet
BDA Unit 1-1
21 pages
Security Trends, Legal, Ethical and Professional Aspects of Security
No ratings yet
Security Trends, Legal, Ethical and Professional Aspects of Security
3 pages
EAPP Module2 v2
No ratings yet
EAPP Module2 v2
7 pages
Data Engineering UNIT-1
100% (1)
Data Engineering UNIT-1
14 pages
Design of HVAC Control System For Building Energy Management Systems
No ratings yet
Design of HVAC Control System For Building Energy Management Systems
5 pages
Information Required For Preparation of Offers For Safety Consultancy Assignments
No ratings yet
Information Required For Preparation of Offers For Safety Consultancy Assignments
3 pages
Nba Lab Details May 2014
No ratings yet
Nba Lab Details May 2014
38 pages
Lecture 0&1 Introduction Sensors
No ratings yet
Lecture 0&1 Introduction Sensors
34 pages
225 WEIGHT INDICATOR Installation and Technical Manual
No ratings yet
225 WEIGHT INDICATOR Installation and Technical Manual
156 pages
PhysicsBowl 2017
No ratings yet
PhysicsBowl 2017
11 pages
VOCALOID 6 Reference Manual ENG
No ratings yet
VOCALOID 6 Reference Manual ENG
88 pages
7 8 STS Handout Key
No ratings yet
7 8 STS Handout Key
9 pages
Data Sheet: SFH757 and SFH757V
No ratings yet
Data Sheet: SFH757 and SFH757V
4 pages
WS - 3 Class X Phy CH - 10 (Light - Refraction) - 1
No ratings yet
WS - 3 Class X Phy CH - 10 (Light - Refraction) - 1
3 pages
Critical Elements For A Successful Energy Transition - A Systematic Review
No ratings yet
Critical Elements For A Successful Energy Transition - A Systematic Review
21 pages
Solar-Powered Lawnmower Design and Development
No ratings yet
Solar-Powered Lawnmower Design and Development
8 pages
Implementation of Modbus Slave TCPIP For Alfen NG9xx Platform
No ratings yet
Implementation of Modbus Slave TCPIP For Alfen NG9xx Platform
15 pages
CNP Bill
No ratings yet
CNP Bill
1 page
Review Quiz - Attempt Review2
No ratings yet
Review Quiz - Attempt Review2
11 pages
Refrigeration
No ratings yet
Refrigeration
5 pages
Lecture Set Three-Wave Generator
No ratings yet
Lecture Set Three-Wave Generator
10 pages
Rapid Serial Visual Presentation in Dynamic Graph Visualization
No ratings yet
Rapid Serial Visual Presentation in Dynamic Graph Visualization
8 pages
Permodelan Proses Bisnis Untuk Procurement Suku Cadang Impor (Studi Pada PT Berkah Industri Mesin Angkat Surabaya)
No ratings yet
Permodelan Proses Bisnis Untuk Procurement Suku Cadang Impor (Studi Pada PT Berkah Industri Mesin Angkat Surabaya)
10 pages
Interfacing of LED 8051
No ratings yet
Interfacing of LED 8051
16 pages

Unit - 1 EDA

Uploaded by

Unit - 1 EDA

Uploaded by

AD3301 - DATA

TO OUTLINE AN OVERVIEW OF EXPLORATORY DATA ANALYSIS.

UNIT I EXPLORATORY DATA ANALYSIS

EDA FUNDAMENTALS – UNDERSTANDING DATA SCIENCE – SIGNIFICANCE OF EDA –

CRISP (CROSS 1. DATA COLLECTION

CRISP (CROSS 2. DATA COLLECTION

DEMENTIA PATIENT - CASE STUDY

02. HEART RATE FROM THE PATIENTS

03. ELECTRO-DERMAL activities

04. USER ACTIVITIES PATTERN

All of these data points are required to correctly diagnose the

• Different fields of science, economics,

02. STATISTICAL ANALYSIS

03. VISUALIZATION OF DATA

1. PROBLEM DEFINITION 2. DATA PREPARATION

Divide the data into

Summarizing the data

Finding the hidden correlation

Grouping Correlation Statistics

Mathematical Models Searching

Scattering Plots Box Plots

Character Plots Residual Plots

Histograms Mean Plots

• Histograms: To check the distribution of a specific

• Different disciplines store different kinds of data for different

Medical Researchers Patient Data

Universities Students’ & Teachers’ Data

Real estate House & Building datasets

Patient identifier (ID) Weight

Name Date of Birth

• This type of data represents the

• Marital Status (Divorced, Legally Separated, Married,

• Movie genres (Action, Adventure, Comedy, Crime,

Types of drugs (Stimulants, Depressants,

• In nominal, the scales are generally referred to

• Interval scales are widely used in statistics, for

1. CLASSICAL DATA ANALYSIS

2. EXPLORATORY DATA ANALYSIS

3. BAYESIAN DATA ANALYSIS

CREATING DIFFERENT TYPES OF NUMPY

# Array of evenly-spaced values

Pandas Library is used to get meaningful

In pandas, we can create data structures

# Creating data frame from Dictionary

columns = ['age', 'workclass', 'fnlwgt', 'education',

You might also like