0% found this document useful (0 votes)

22 views50 pages

Unit 1

Uploaded by

maryjan88

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views50 pages

Unit 1

Uploaded by

maryjan88

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 50

UNIT-1

EXPLORATORY DATA ANALYSIS

SYLLABUS
• EDA fundamentals – Understanding data science
– Significance of EDA – Making sense of data –
Comparing EDA with classical and Bayesian
analysis – Software tools for EDA –
Numpy-Pandas-SciPy- Matplotlib
1.1 Exploratory Data Analysis Fundamentals

Data
• Data encompasses a collection of discrete objects, numbers, words,
events, facts, measurements, observations, or even descriptions of
things. Such data is collected and stored by every event or process
occurring in several disciplines, including biology, economics,
engineering, marketing, and others.
Information
• Processing data elicits useful information and processing such
information generates useful knowledge
EDA
• Exploratory Data Analysis is a process of examining the available dataset to

discover patterns, spot anomalies, test hypotheses, and check assumptions using statistical

measures.

The primary aim of EDA

• The primary aim of EDA is to examine what data can tell us before actually going
through formal modeling or hypothesis formulation. John Tuckey promoted EDA to
statisticians to examine and discover the data and create newer hypotheses that could be
used for the development of a newer approach in data collection and experimentations.
1.2 Understanding data science

Data science involves cross-disciplinary knowledge from

computer science, data, statistics, and mathematics.
There are several phases of data analysis, including data
requirements, data collection, data processing, data cleaning,
exploratory data analysis, modeling and algorithms, and data
product and communication.
PHASES OF DATA ANALYSIS
• Data requirements,
• Data collection,
• Data processing,
• Data cleaning,
• Exploratory data analysis,
• Modeling and algorithms,
• Data product
• Communication.
• Data requirements: There can be various sources of data for an organization. It is important to
comprehend what type of data is required for the organization to be collected, curated, and stored.

• For example, an application tracking the sleeping pattern of patients suffering from dementia
requires several types of sensors' data storage, such as sleep data, heart rate from the patient,
electro-dermal activities, and user activities pattern. All of these data points are required to
correctly diagnose the mental state of the person. Hence, these are mandatory requirements for the
application. In addition to this, it is required to categorize the data, numerical or categorical, and
the format of storage and dissemination.

• Data collection: Data collected from several sources must be stored in the correct format and
transferred to the right information technology personnel within a company. As mentioned
previously, data can be collected from several objects on several events using different types of
sensors and storage tools.
• Data processing: Preprocessing involves the process of pre-curating the dataset before
actual analysis. Common tasks involve correctly exporting the dataset, placing them
under the right tables, structuring them, and exporting them in the correct format.

• Data cleaning: Preprocessed data is still not ready for detailed analysis. It must be
correctly transformed for an incompleteness check, duplicates check, error check, and
missing value check. These tasks are performed in the data cleaning stage, which
involves responsibilities such as matching the correct record, finding inaccuracies in
the dataset, understanding the overall data quality, removing duplicate items, and
filling in the missing values.
• EDA: Exploratory data analysis, is the stage where we actually start to understand the message contained in the
data. It should be noted that several types of data transformation techniques might be required during the process
of exploration.
• Modeling and algorithm: From a data science perspective, generalized models or mathematical formulas can
represent or exhibit relationships among different variables, such as correlation or causation. These models or
equations involve one or more variables that depend on other variables to cause an event.
• For example, when buying, say, pens, the total price of pens(Total) = price for one pen(UnitPrice) * the number of
pens bought (Quantity). Hence, our model would be Total = UnitPrice * Quantity. Here, the total price is
dependent on the unit price. Hence, the total price is referred to as the dependent variable and the unit price is
referred to as an independent variable. In general, a model always describes the relationship between independent
and dependent variables. Inferential statistics deals with quantifying relationships between particular variables.
• The Judd model for describing the relationship between data, model, and error still holds true: Data = Model +
Error.
• Data Product: Any computer software that uses data as inputs, produces outputs, and
provides feedback based on the output to control the environment is referred to as a data
product. A data product is generally based on a model developed during data analysis, for
example, a recommendation model that inputs user purchase history and recommends a
related item that the user is highly likely to buy.

• Communication: This stage deals with disseminating the results to end stakeholders to use
the result for business intelligence. One of the most notable steps in this stage is data
visualization. Visualization deals with information relay techniques such as tables, charts,
summary diagrams, and bar charts to show the analyzed result.
•
1.3 The significance of EDA

• Different fields of science, economics, engineering, and marketing accumulate and

store data primarily in electronic databases. Appropriate and well-established
decisions should be made using the data collected. It is practically impossible to
make sense of datasets containing more than a handful of data points without the
help of computer programs. To be certain of the insights that the collected data
provides and to make further decisions,
• Data mining is performed where we go through distinctive analysis processes.
Exploratory data analysis is key, and usually the first exercise in data mining. It
allows us to visualize data to understand it as well as to create hypotheses for further
analysis. The exploratory analysis centers around creating a synopsis of data or
insights for the next steps in a data mining project.
• Key components of exploratory data analysis include summarizing data, statistical analysis, and visualization
of data. Python provides expert tools for exploratory analysis, with pandas for summarizing; scipy, along with
others, for statistical analysis; and matplotlib and plotly for visualizations.
• Steps in EDA
• Problem definition: Before trying to extract useful insight from the data, it is essential to define the business
problem to be solved. The problem definition works as the driving force for a data analysis plan execution. The
main tasks involved in problem definition are defining the main objective of the analysis, defining the main
deliverables, outlining the main roles and responsibilities, obtaining the current status of the data, defining the
timetable, and performing cost/benefit analysis. Based on such a problem definition, an execution plan can be
created.
• Data preparation: This step involves methods for preparing the dataset before actual analysis. In this step, we
define the sources of data, define data schemas and tables, understand the main characteristics of the data, clean
the dataset, delete non-relevant datasets, transform the data, and divide the data into required chunks for analysis.
• Data analysis: This is one of the most crucial steps that deals with descriptive statistics
and analysis of the data. The main tasks involve summarizing the data, finding the hidden
correlation and relationships among the data, developing predictive models, evaluating the
models, and calculating the accuracies. Some of the techniques used for data
summarization are summary tables, graphs, descriptive statistics, inferential statistics,
correlation statistics, searching, grouping, and mathematical models.
• Development and representation of the results: This step involves presenting the dataset
to the target audience in the form of graphs, summary tables, maps, and diagrams. This is
also an essential step as the result analyzed from the dataset should be interpretable by the
business stakeholders, which is one of the major goals of EDA. Most of the graphical
analysis techniques include scattering plots, character plots, histograms, box plots, residual
plots, mean plots, and others.
1.4 Making sense of data
• Numerical data

This data has a sense of measurement involved in it; for example, a person's age, height, weight, blood pressure,
heart rate, temperature, number of teeth, number of bones, and the number of family members. This data is
often referred to as quantitative data in statistics. The numerical dataset can be either discrete or continuous
types.

• Discrete data

• This is data that is countable and its values can be listed out. For example, if we flip a coin, the number of
heads in 200 coin flips can take values from 0 to 200 (finite) cases. A variable that represents a discrete
dataset is referred to as a discrete variable. The discrete variable takes a fixed number of distinct values. For
example, the Country variable can have values such as Nepal, India, Norway, and Japan. It is fixed. The Rank
variable of a student in a classroom can take values from 1, 2, 3, 4, 5, and so on.
• Continuous data
• A variable that can have an infinite number of numerical values within
a specific range is classified as continuous data. A variable describing
continuous data is a continuous
• variable. For example, what is the temperature of your city today? Can
we be finite? Similarly, the weight variable in the previous section is a
continuous variable.
• Continuous data
• A variable that can have an infinite number of numerical values within
a specific range is classified as continuous data. A variable describing
continuous data is a continuous variable.
• For example, what is the temperature of your city today? Can we be
finite.
• the weight variable in the previous section is a continuous variable.
We are
Categorical data

• Categorical data
This type of data represents the characteristics of an object; for example, gender, marital status, type
of address, or categories of the movies. This data is often referred to as qualitative datasets in
statistics. To understand clearly, here are some of the most common types of categorical data you can
find in data:
Gender (Male, Female, Other, or Unknown)
Marital Status (Annulled, Divorced, Interlocutory, Legally Separated, Married,
Movie genres (Action, Adventure, Comedy, Crime, Drama, Fantasy, Historical,
Horror, Mystery, Philosophical, Political, Romance, Saga, Satire, Science Fiction,
Social, Thriller, Urban, or Western)
Blood type (A, B, AB, or OTypes of drugs (Stimulants, Depressants, Hallucinogens, Dissociatives,
Opioids, Inhalants, or Cannabis)
• A variable describing categorical data is referred to as a categorical variable

• A binary categorical variable can take exactly two values and is also referred to as a dichotomous variable.
For example, when you create an experiment, the result is either success or failure. Hence, results can be
understood as a binary categorical variable.

• Polytomous variables are categorical variables that can take more than two

possible values
Measurement scales

•Nominal
•Ordinal
•Interval
•Ratio
Nominal
• These are practiced for labeling variables without any quantitative value. The scales are generally referred to as
labels. And these scales are mutually exclusive and do not carry any numerical importance.

• Nominal scales are considered qualitative scales and the measurements that are taken using qualitative scales are
considered qualitative data.

• Examples

• What is your gender?

• The languages that are spoken in a particular country

• Biological species

• Parts of speech in grammar (noun, pronoun, adjective, and so on)

• Taxonomic ranks in biology (Archea, Bacteria, and Eukarya)

How to analyse nominal data
• Frequency is the rate at which a label occurs over a period of time
within the dataset.
• Proportion can be calculated by dividing the frequency by the total
number of events.
• Then, you could compute the percentage of each proportion.
• And to visualize the nominal dataset, use either a pie chart or a bar
chart.
Ordinal

• In ordinal scales, the order of the values is a

significant factor. An easy tip to remember
the ordinal scale is that it sounds like an
order
A Likert scale
A Likert scale is a rating scale used to measure opinions, attitudes, or behaviors. It consists of a statement
or a question, followed by a series of five or seven answer statements. Respondents choose the option that
best corresponds with how they feel about the statement or question.
The median item is allowed as the measure of central tendency; however, the average
is not permitted.
Interval

• In interval scales, both the order and exact

differences between the values are significant.
• Interval scales are widely used in statistics, for
example, in the measure of central tendencies—
mean, median, mode, and standard deviations.
Ratio

• Ratio scales contain order, exact values, and absolute zero, which
makes it possible to be used in descriptive and inferential statistics.
These scales provide numerous possibilities for statistical analysis.
• Mathematical operations, the measure of central tendencies, and
the measure of dispersion and coefficient of variation can also be
computed from such scales.
• Examples include a measure of energy, mass, length, duration,
electrical energy, plan angle, and volume
A summary of the data types and scale measures:
1.5 Comparing EDA with classical and Bayesian analysis
• Classical data analysis: For the classical data
analysis approach, the problem definition and
data collection step are followed by model
development, which is followed by analysis and
result communication.
• Exploratory data analysis approach: For the EDA
approach, it follows the same approach as classical
data analysis except the model imposition and the data
analysis steps are swapped. The main focus is on the
data, its structure, outliers, models, and visualizations.
Generally, in EDA, we do not impose any
deterministic or probabilistic models on the data.
• Bayesian data analysis approach: The Bayesian
approach incorporates prior probability distribution
knowledge into the analysis steps as shown in the
diagram.
• Prior probability, in Bayesian statistics, is the probability of an
event before new data is collected. This is the best rational
assessment of the probability of an outcome based on the
current knowledge before an experiment is performed.
1.6 Software tools available for EDA

• Python: This is an open source programming language widely used in

data analysis, data mining, and data science
• R programming language: R is an open source programming
language
• Weka: This is an open source data mining package that involves
several EDA tools and algorithms
• KNIME: This is an open source tool for data analysis and is based on
Eclipse
Numpy
1. For importing numpy, we will use the following code:
import numpy as np
2. For creating different types of numpy arrays
# Defining 1D array
my1DArray = np.array([1, 8, 27, 64])
print(my1DArray)
# Defining and printing 2D array
my2DArray = np.array([[1, 2, 3, 4], [2, 4, 9, 16], [4, 8, 18, 32]])
print(my2DArray)
#Defining and printing 3D array
my3Darray = np.array([[[ 1, 2 , 3 , 4],[ 5 , 6 , 7 ,8]], [[ 1, 2, 3, 4],[ 9, 10, 11, 12]]])
print(my3Darray)
3. For displaying basic information, such as the data type, shape, size, and
strides of a NumPy array, we will use the following code:

# Print out memory address

print(my2DArray.data)

# Print the shape of array

print(my2DArray.shape)

# Print out the data type of the array

print(my2DArray.dtype)

# Print the stride of the array.

print(my2DArray.strides)
4. For creating an array using built-in NumPy functions, we will use the following code

# Array of ones
ones = np.ones((3,4))
print(ones)
# Array of zeros
zeros = np.zeros((2,3,4),dtype=np.int16)
print(zeros)
# Array with random values
np.random.random((2,2))
# Empty array
emptyArray = np.empty((3,2))
print(emptyArray)
# Full array
fullArray = np.full((2,2),7)
5. For NumPy arrays and file operations, we will use the
following code:
# Save a numpy array into file
x = np.arange(0.0,50.0,1.0)
np.savetxt('data.out', x, delimiter=',')
# Loading numpy array from text
z = np.loadtxt('data.out', unpack=True)
print(z)
# Loading numpy array using genfromtxt method
my_array2 = np.genfromtxt('data.out',
skip_header=1,
filling_values=-999)
print(my_array2)
6. For inspecting NumPy arrays, we will use the following
code:

• # Print the number of `my2DArray`'s dimensions

• print(my2DArray.ndim)
• # Print the number of `my2DArray`'s elements
• print(my2DArray.size)
• # Print information about `my2DArray`'s memory layout
• print(my2DArray.flags)
• # Print the length of one array element in bytes
• print(my2DArray.itemsize)
• # Print the total consumed bytes by `my2DArray`'s elements
• print(my2DArray.nbytes)
Broadcasting is a mechanism that permits NumPy to
operate with arrays of different shapes when performing
arithmetic operations:

• # Rule 1: Two dimensions are operatable

if they are equal
• # Rule 2: Two dimensions are also
compatible when one of them is 1
• # Rule 3: Arrays can be broadcast
together if they are compatible in all
dimensions
For seeing NumPy mathematics at work, we will use
the following example:
# Basic operations (+, -, *, /, %)
x = np.array([[1, 2, 3], [2, 3, 4]])
y = np.array([[1, 4, 9], [2, 3, -2]])
# Add two array
add = np.add(x, y)
print(add)
# Subtract two array
sub = np.subtract(x, y)
print(sub)
# Multiply two array
mul = np.multiply(x, y)
print(mul)
# Divide x, y
div = np.divide(x,y)
print(div)
# Calculated the remainder of x and y
Let's now see how we can create a subset and slice an
array using an index:

x = np.array([10, 20, 30, 40, 50])

# Select items at index 0 and 1
print(x[0:2])
# Select item at row 0 and 1 and column 1 from
2D array
y = np.array([[ 1, 2, 3, 4], [ 9, 10, 11 ,12]])
print(y[0:2, 1])
# Specifying conditions
biggerThan2 = (y >= 2)
print(y[biggerThan2])
Pandas
• Wes McKinney open sourced the pandas library (https:/ / github. com/ wesm) that
has been widely used in data science.
1. To set default parameters
import numpy as np
import pandas as pd
print("Pandas Version:", pd.__version__)
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)
2. Create data structures in two ways: series and dataframes.
Check the following snippet to understand how we can create a dataframe from
series, dictionary, and n-dimensional arrays.

• create a dataframe from a series:

series = pd.Series([2, 3, 7, 11, 13, 17, 19, 23])
print(series)
# Creating dataframe from Series
series_df = pd.DataFrame({
'A': range(1, 5),
'B': pd.Timestamp('20190526'),
'C': pd.Series(5, index=list(range(4)), dtype='float64'),
'D': np.array([3] * 4, dtype='int64'),
'E': pd.Categorical(["Depression", "Social Anxiety", "Bipolar
Disorder", "Eating Disorder"]),
'F': 'Mental health',
'G': 'is challenging'
})
print(series_df)
create a dataframe for a dictionary:

# Creating dataframe from Dictionary

dict_df = [{'A': 'Apple', 'B': 'Ball'},{'A': 'Aeroplane', 'B':
'Bat', 'C': 'Cat'}]
dict_df = pd.DataFrame(dict_df)
print(dict_df)
create a dataframe from n-dimensional arrays:

# Creating a dataframe from ndarrays

sdf = {
'County':['Ostfold', 'Hordaland', 'Oslo', 'Hedmark', 'Oppland',
'Buskerud'],
'ISO-Code':[1,2,3,4,5,6],
'Area': [4180.69, 4917.94, 454.07, 27397.76, 25192.10,
14910.94],
'Administrative centre': ["Sarpsborg", "Oslo", "City of Oslo",
"Hamar", "Lillehammer", "Drammen"]
}
sdf = pd.DataFrame(sdf)
print(sdf)
load a dataset from an external source into a
pandas DataFrame

columns = ['age', 'workclass', 'fnlwgt', 'education',

'education_num',
'marital_status', 'occupation', 'relationship', 'ethnicity',
'gender','capital_gain','capital_loss','hours_per_week','country_of
_origin','income']
df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-
databases/adult/adult.data',names=columns)
df.head(10)
code displays the rows, columns, data types, and
memory used by the dataframe:
df.info()
_____________________________________________________________________________
__________
• select rows and columns in any dataframe:
# Selects a row
df.iloc[10]
# Selects 10 rows
df.iloc[0:10]
# Selects a range of rows
df.iloc[10:15]
# Selects the last 2 rows
df.iloc[-2:]
# Selects every other row in columns 3-5
df.iloc[::2, 3:5].head()
SciPy

SciPy is a scientific library for Python and is open source. This

library depends on the NumPy library, which provides an efficient

n-dimensional array manipulation function. scipy.stats from the

SciPy library.
Matplotlib

Matplotlib provides a huge library of customizable plots, along with a comprehensive set
of
backends. It can be utilized to create professional reporting applications, interactive
analytical applications, complex dashboard applications, web/GUI applications,
embedded
views, and many more.

Unit 1 - Exploratory Data Analysis Fundamentals
No ratings yet
Unit 1 - Exploratory Data Analysis Fundamentals
47 pages
UNIT 1 Exploratory Data Analysis
100% (3)
UNIT 1 Exploratory Data Analysis
21 pages
Data Exploration and Visualization
100% (1)
Data Exploration and Visualization
281 pages
EDA Lecture Notes
No ratings yet
EDA Lecture Notes
205 pages
Vlsi Term Paper Topics
100% (1)
Vlsi Term Paper Topics
7 pages
Unit I - Part I Notes
100% (7)
Unit I - Part I Notes
33 pages
Unit 3 Ids Notes
No ratings yet
Unit 3 Ids Notes
31 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
Module 2
No ratings yet
Module 2
78 pages
IDS CH2 Bharath S
No ratings yet
IDS CH2 Bharath S
57 pages
EDA Unit 1 Notes
No ratings yet
EDA Unit 1 Notes
27 pages
Pressure Vessel Handout
No ratings yet
Pressure Vessel Handout
14 pages
Unit - 1 EDA
No ratings yet
Unit - 1 EDA
123 pages
Data Science Lecture No 02
No ratings yet
Data Science Lecture No 02
21 pages
Devish All Unit
No ratings yet
Devish All Unit
42 pages
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
No ratings yet
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
8 pages
Unit I Exploratory Data Analysis
No ratings yet
Unit I Exploratory Data Analysis
38 pages
Eda 1
No ratings yet
Eda 1
25 pages
Ccs346 Eda Unit 1
No ratings yet
Ccs346 Eda Unit 1
129 pages
Data Science Lecture No 02
No ratings yet
Data Science Lecture No 02
21 pages
Loyola College (Autonomous), Chennai - 600 034: B.Sc. November 2016 16UST1MC01/ST 1502/ST 1500 - STATISTICAL METHODS
No ratings yet
Loyola College (Autonomous), Chennai - 600 034: B.Sc. November 2016 16UST1MC01/ST 1502/ST 1500 - STATISTICAL METHODS
2 pages
Unit 3
No ratings yet
Unit 3
83 pages
AMP Microproject Grp-12
No ratings yet
AMP Microproject Grp-12
16 pages
Linear Regression Merged
No ratings yet
Linear Regression Merged
38 pages
CHP 19 Rehman-Et-Al-2022-Developing-The-Integrated-Marketing-Communication-Imc-Through-Social-Media-Sm-The-Modern-Marketing
No ratings yet
CHP 19 Rehman-Et-Al-2022-Developing-The-Integrated-Marketing-Communication-Imc-Through-Social-Media-Sm-The-Modern-Marketing
23 pages
Section 23 21 14 - Underground Pre-Insulated Hydronic Piping
No ratings yet
Section 23 21 14 - Underground Pre-Insulated Hydronic Piping
7 pages
Eda
No ratings yet
Eda
6 pages
DS Lecture 15
No ratings yet
DS Lecture 15
44 pages
Comparing Tools Provided by Python and R For Exploratory Data Analysis
No ratings yet
Comparing Tools Provided by Python and R For Exploratory Data Analysis
12 pages
Unit 1
No ratings yet
Unit 1
29 pages
TP5088 PDF
No ratings yet
TP5088 PDF
6 pages
Data Sciecnce
No ratings yet
Data Sciecnce
16 pages
Document
No ratings yet
Document
21 pages
Annual Report TATA Motors
No ratings yet
Annual Report TATA Motors
212 pages
Unit 2
No ratings yet
Unit 2
58 pages
Aspratame :from Dr. Adrian Gross, FDA Toxicologist, To Carl Sharp
No ratings yet
Aspratame :from Dr. Adrian Gross, FDA Toxicologist, To Carl Sharp
3 pages
Lesson Plan in Science 6
100% (1)
Lesson Plan in Science 6
6 pages
Eda 2
No ratings yet
Eda 2
69 pages
Unit - 1
No ratings yet
Unit - 1
25 pages
EDA Feature Eng - Estimation Inference and Hypothesis
No ratings yet
EDA Feature Eng - Estimation Inference and Hypothesis
53 pages
Unit 4
No ratings yet
Unit 4
33 pages
Group 7
No ratings yet
Group 7
19 pages
Exploratory Data Analysis in ML
No ratings yet
Exploratory Data Analysis in ML
7 pages
Unit 1
No ratings yet
Unit 1
52 pages
Session1 DataCharacteristics
No ratings yet
Session1 DataCharacteristics
41 pages
CZA PPT Tirana Final 8 CD 5039
No ratings yet
CZA PPT Tirana Final 8 CD 5039
94 pages
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
Probability and Stat Unit 1
No ratings yet
Probability and Stat Unit 1
12 pages
Danfoss Refrigeration Basics - ESSENTIAL
100% (1)
Danfoss Refrigeration Basics - ESSENTIAL
24 pages
Exploratory Data Analysis (Eda)
No ratings yet
Exploratory Data Analysis (Eda)
10 pages
Data Science Process
No ratings yet
Data Science Process
30 pages
1.) Trace The Development of Science and Technology From Pre-Colonial Times Up To The Present. What Have You Observe?
No ratings yet
1.) Trace The Development of Science and Technology From Pre-Colonial Times Up To The Present. What Have You Observe?
1 page
BI-LEc 3
No ratings yet
BI-LEc 3
24 pages
Datascience Unit-4
No ratings yet
Datascience Unit-4
6 pages
Unit3 Eda
No ratings yet
Unit3 Eda
13 pages
Wa0000.
No ratings yet
Wa0000.
15 pages
Internal Analysis of FedEx V
100% (1)
Internal Analysis of FedEx V
3 pages
Unit 1
No ratings yet
Unit 1
23 pages
Eda Sandhya
No ratings yet
Eda Sandhya
7 pages
Vorplex - MST - Airblowing and Water Flushing
No ratings yet
Vorplex - MST - Airblowing and Water Flushing
14 pages
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
9 pages
Unit 1
No ratings yet
Unit 1
19 pages
Electrically Switch Electromagnet
No ratings yet
Electrically Switch Electromagnet
16 pages
Unit 3
No ratings yet
Unit 3
47 pages
DSP Unit - Ii
No ratings yet
DSP Unit - Ii
14 pages
Dev Answer Key
No ratings yet
Dev Answer Key
21 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
Data Analytics Interview Questions
No ratings yet
Data Analytics Interview Questions
3 pages
Notes - Unit 1 - Exploratory Data Analysis
No ratings yet
Notes - Unit 1 - Exploratory Data Analysis
33 pages
Notes Unit I
No ratings yet
Notes Unit I
47 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
62 pages
Balloon Manual
No ratings yet
Balloon Manual
7 pages
VCO Non-Adjusting PLL FM MPX Stereo Demodulator With FM Accessories
No ratings yet
VCO Non-Adjusting PLL FM MPX Stereo Demodulator With FM Accessories
16 pages
LiFePO4 Battery Material For The Production of Lit
No ratings yet
LiFePO4 Battery Material For The Production of Lit
13 pages
Notes - EDA-Unit1
No ratings yet
Notes - EDA-Unit1
34 pages
The Analysis - in - EDA
No ratings yet
The Analysis - in - EDA
7 pages
DS Mod 1 To 2 Complete Notes
No ratings yet
DS Mod 1 To 2 Complete Notes
63 pages
Confirmation
No ratings yet
Confirmation
2 pages
AMCA Standard 99-0401-86 Classification For Spark Resistant Construction - REA HVAC
No ratings yet
AMCA Standard 99-0401-86 Classification For Spark Resistant Construction - REA HVAC
2 pages
Catch-up-Friday-Teaching-Guide-HG V - Week 3
No ratings yet
Catch-up-Friday-Teaching-Guide-HG V - Week 3
5 pages
22O23A2 - 1 Business Accounting Case Study 15-Nov-2024
No ratings yet
22O23A2 - 1 Business Accounting Case Study 15-Nov-2024
12 pages
VVM MCQ On Electricity
No ratings yet
VVM MCQ On Electricity
4 pages
UIIC Motor Commercial Worksheet
No ratings yet
UIIC Motor Commercial Worksheet
2 pages
1 Essay3
No ratings yet
1 Essay3
2 pages
2025 Article 58786
No ratings yet
2025 Article 58786
12 pages
Fin
No ratings yet
Fin
2 pages
Itl 512 Learning Map Planning 1
No ratings yet
Itl 512 Learning Map Planning 1
12 pages
Sensors: Implementation of Parameter Observer For Capacitors
No ratings yet
Sensors: Implementation of Parameter Observer For Capacitors
19 pages
Microsoft Excel Statistical and Advanced Functions for Decision Making
From Everand
Microsoft Excel Statistical and Advanced Functions for Decision Making
Palani Murugappan
4/5 (2)
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)

Unit 1

Uploaded by

Unit 1

Uploaded by

UNIT-1

EXPLORATORY DATA ANALYSIS

The primary aim of EDA

Data science involves cross-disciplinary knowledge from

• Different fields of science, economics, engineering, and marketing accumulate and

• What is your gender?

• The languages that are spoken in a particular country

• Parts of speech in grammar (noun, pronoun, adjective, and so on)

• Taxonomic ranks in biology (Archea, Bacteria, and Eukarya)

• In ordinal scales, the order of the values is a

• In interval scales, both the order and exact

• Python: This is an open source programming language widely used in

# Print out memory address

# Print the shape of array

# Print out the data type of the array

# Print the stride of the array.

• # Print the number of `my2DArray`'s dimensions

• # Rule 1: Two dimensions are operatable

x = np.array([10, 20, 30, 40, 50])

• create a dataframe from a series:

# Creating dataframe from Dictionary

# Creating a dataframe from ndarrays

columns = ['age', 'workclass', 'fnlwgt', 'education',

SciPy is a scientific library for Python and is open source. This

library depends on the NumPy library, which provides an efficient

n-dimensional array manipulation function. scipy.stats from the

You might also like