0% found this document useful (0 votes)

15 views64 pages

Lec 2 Getting To Know Data EDA

The document discusses Exploratory Data Analysis (EDA), highlighting its purpose and benefits, such as understanding data characteristics, identifying missing values, and discovering patterns. It covers the types of data and attributes, including univariate, bivariate, and multivariate data, as well as nominal, ordinal, and numeric attributes. Additionally, it explains statistical measures like central tendencies and spread, which are essential for summarizing and interpreting data effectively.

Uploaded by

Saman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views64 pages

Lec 2 Getting To Know Data EDA

Uploaded by

Saman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Big Data Analytics

Getting to Know Data & Exploratory Data Analysis

EDA: Purpose & Benefits

Size, Dimension, and Resolution of Data
Types of Attributes
Statistical EDA
Measures of Central Tendencies and Spread
Bivariate EDA: Correlation, Contingency Table
Graphical EDA
Types of Diagrams

Imdad ullah Khan

Imdad ullah Khan (LUMS) Getting to know Data & EDA 1 / 64

Exploratory Data Analysis (EDA): Purpose and Benefits

EDA: Initial investigation of data using summary statistics and diagrams

Objectives of EDA are to

understand data (what it is, where it comes from, what does it

represent, kind of values, specific characteristics of data)
find out if there are missing values? (how to deal with them!)
spot anomalies (are there outliers?)
discover patterns (how does the data look like?)
understand relationships between features (measure similarity,
distance and relationship type)
check our assumptions
visually describe the data

Imdad ullah Khan (LUMS) Getting to know Data & EDA 2 / 64

EDA: Purpose and Benefits

Preliminary exploration and inspection of data is essential for analysis

It guides preprocessing steps
It gives a clear picture of data sizes, which helps in selecting the right
data structures, tools and even modeling strategies
Could help reduce data sizes (dimensions or records)

Imdad ullah Khan (LUMS) Getting to know Data & EDA 3 / 64

Data object and Attribute

Data object
represents an entity in the data set
also called data item, point, instance, example, sample, row, observation
e.g. a patient, movie, student, customer, product, book, tweet
described by a set of attributes

Attribute
is a data field, representing a feature/characteristic of data objects
also called variable, feature, dimension, column, coordinate, field
e.g. reaction to a test, genre/director, course, address, price/category,
author, publisher, word

Imdad ullah Khan (LUMS) Getting to know Data & EDA 4 / 64

Size and dimensions of data

Size of Data refers to number of data objects

Dimension of Data refers to number of attributes

Sparsity in Data
If most of the feature values are missing, then the data is called sparse

Missing values could be represented as NaN, blank, -, 0

This could be a problem for many statistical methods
For efficient computation, can use libraries for sparse data
e.g. sparse matrix multiplication, sparse storage schemes

Imdad ullah Khan (LUMS) Getting to know Data & EDA 5 / 64

Resolution of Data

Different resolution reveal different patterns

If resolution is too fine, a pattern may be buried in noise

If the resolution is too coarse pattern may disappear
See number of bins in histograms below

Imdad ullah Khan (LUMS) Getting to know Data & EDA 6 / 64

Types of Data

Types of data based on number of attributes

Univariate Data
Bivariate Data
Multivariate Data

Imdad ullah Khan (LUMS) Getting to know Data & EDA 7 / 64

Types of Data

Univariate: Consists of only one feature per observation. Analysis

deals with only one quantity that changes
Heights (cm)
164
167.3
170
174.2
178
180
186

What is the average height?

How much the values deviate form the average height?

Imdad ullah Khan (LUMS) Getting to know Data & EDA 8 / 64

Types of Data

Bivariate: Involves two different features per observation

Analysis of this type of data deals with comparisons, relationships,
causes and explanations

Temperature (°C) Ice Cream Sales

20 2000
25 2500
35 5000
43 7800

Are the temperature and ice cream sales related/dependent?

As temperature increases, sales also increases

Imdad ullah Khan (LUMS) Getting to know Data & EDA 9 / 64

Types of Data

Multivariate: Objects are described by more than 2 features

To see if one or more of them are predictive of a certain outcome
The predictive variables are independent variables and the outcome is
the dependent variable

Roll Num CS100 SS101 MT200 MGMT240 Major

19100115 A B B C CS
19100120 B A B C PHY
19100122 B B C A CS
19100126 C A C A EE
19100127 B A C C CS
19100133 C B A B PHY
19100135 C C A C Maths

Imdad ullah Khan (LUMS) Getting to know Data & EDA 10 / 64

Types of Attributes

Imdad ullah Khan (LUMS) Getting to know Data & EDA 11 / 64

Types of Attributes

Roll Num Gender Grade Age Major

19100115 Male B 23 CS
19100120 Male A 22 PHY
19100122 Female B 21 CS
19100126 Male C 19 EE
19100127 Female A 21 CS
19100133 Female B 20 PHY
19100135 Male C 22 Maths

Nominal/Categorical Attributes
Ordinal Attributes
Numeric Attributes

Imdad ullah Khan (LUMS) Getting to know Data & EDA 12 / 64

Types of Attributes: Nominal/Categorical
Possible values are symbols, labels or names of things, categories
gender, major, state, color
Describe a feature qualitatively and values have no order
Not quantitative, arithmetic operations can’t be performed on them
male − female = ?? green + blue = ??
Can code by numbers (numeric symbols) e.g. postal codes, roll numb
frequency of values and the most frequent value
Can compute middle value
average value of an attribute
Binary Attribute: - special case of nominal true/false, Pass/Fail, 0/1
Symmetric: Both symbols carry the same weight e.g. gender
Asymmetric: Both symbols are not equally important, e.g. Pass/Fail

Imdad ullah Khan (LUMS) Getting to know Data & EDA 13 / 64

Types of Attributes: Ordinal Attributes

Possible values have meaningful order

Grades : A,B,C,D
Serving Sizes : Small, Medium, Large
Ratings : poor, average, excellent
No quantified difference between two levels
A is higher/better than B but
Cannot quantify how much higher is A than B, or
if the difference between A and B the same as the difference between
B and C
Can be obtained by discretizing numeric quantities (data reduction)

frequency of values and the most frequent value

Can compute middle value
average value of an attribute

Imdad ullah Khan (LUMS) Getting to know Data & EDA 14 / 64

Types of Attributes: Numeric Attributes
Quantitative and measurable
can quantify the difference between two values
temperature, age, number of courses, height, years of experience
frequency of values and the most frequent value
Can compute middle value
average value of an attribute
Discrete Numeric Attributes
values come from a finite or countably infinite sets
Continuous Numeric Attributes
values are real (continuous)
Interval-Scaled: No point 0, ratios have no meaning
e.g. Temperature in Celsius. 30◦ is not double as hot as 15◦
Ratio-Scaled: Well-Defined point 0, ratios are meaningful
e.g. Temperature in Kelvin. 30◦ is double as hot as 15◦

Imdad ullah Khan (LUMS) Getting to know Data & EDA 15 / 64

Statistical EDA

Imdad ullah Khan (LUMS) Getting to know Data & EDA 16 / 64

Statistical Description of Data

Estimates that give an overall picture of data

Summary statistics are numbers that summarize properties of data
Typical values of variables (features/attributes)
Spread and distribution of values
Dependencies and correlations among variables

Imdad ullah Khan (LUMS) Getting to know Data & EDA 17 / 64

Measures of Central Tendencies

These measures describe the location of data

location of concentration or middle of data

Data is “distributed” around this “center”

Computed for each attribute
Three common types of locations
Mode
Mean
Median

These measures do not give information regarding

extreme values in data
distribution or spread of the data

Imdad ullah Khan (LUMS) Getting to know Data & EDA 18 / 64

Frequency

Nominal and Ordinal attributes are generally described with frequencies

The frequency of a value is the number of times the value occurs in

the dataset

Some time we use fraction or percentage of time the value appears

Probability mass function

Imdad ullah Khan (LUMS) Getting to know Data & EDA 19 / 64

Measures of Central Tendencies: Mode

For location of nominal and ordinal attributes one can use the most
frequent value

Mode is the most frequent element

Can have more than one modes
unimodal (one mode in data)
multi-modal (bimodal, trimodal): more than one modes in data

Not the same as the Majority element (a value with frequency more than
50%)

Imdad ullah Khan (LUMS) Getting to know Data & EDA 20 / 64

Measures of Central Tendencies: Mean

For a dataset X = {x1 , x2 , · · · , xn }

(Arithmetic) Mean is the average of the data set
▷ This definition readily extend to higher dimensional data
Pn
x1 + x2 + . . . + xn i=1 xi
x = =
n n
Weighted Mean Pn
wi xi
x = Pi=1
n
i=1 wi
Harmonic Mean
n
x = Pn 1
i=1 xi
Geometric Mean !1/n
n
Y
x = xi
i=1

Imdad ullah Khan (LUMS) Getting to know Data & EDA 21 / 64

Other Types of Mean
Arithmetic mean is sensitive to outliers ▷ unstable statistic
Just one very high/low value (think ±∞) makes mean very high/low
2.5 2.5 3 3.5 3.5 3.5 3.5 4 4 4 4.5 4.5 4.5 5 5 5.5 5.5 6 98 99

5 99

Mean = 13.57

Trimmed Mean: Ignore k% of values at both extremes to compute mean

2.5 2.5 3 3.5 3.5 3.5 3.5 4 4 4 4.5 4.5 4.5 5 5 5.5 5.5 6 98 99

5 99

Mean = 4.34

Imdad ullah Khan (LUMS) Getting to know Data & EDA 22 / 64

Measures of Central Tendencies: Median

Median is the middle value of a dataset

Odd/even number of values

Median is less sensitive to outliers as compared to mean
Median is good for asymmetric distributions and where data has outliers

5 99

Median = 4.25 Mean = 13.57

Various possible definitions for median of higher dimensional data

Mean together with variance (see below) has nice properties

Imdad ullah Khan (LUMS) Getting to know Data & EDA 23 / 64

Measures of Spread

Location measures do not tell anything about extremes or spread (how

extreme are the extremes)
Measures of spread describe distribution of data

Max
Min
Range := max - min
Midrange := average of min and max
Inter-Quartile Range := 3rd quartile - 1st quartile
Low Spread Mid-spread High Spread
Variance and Standard Deviation

Imdad ullah Khan (LUMS) Getting to know Data & EDA 24 / 64

Quantile

Quantiles are points taken at regular interval so as data is divided into

roughly equal sized consecutive subsets

The ith q-quantile is a data point x such that ∼ i/q fraction of points
are less than x and ∼ (q−i)/q fraction of points are greater than x
Median is the first 2-quantile
3rd quartile := 3rd 4-quantile := 75 percentile

Imdad ullah Khan (LUMS) Getting to know Data & EDA 25 / 64

Measures of Spread

Imdad ullah Khan (LUMS) Getting to know Data & EDA 26 / 64

Five-Number Summary

Five-number summary (elementary EDA of numeric univariate data)

maximum (100th percentile)

{
Min

1st /lower quartile upper quartile (75th percentile)

{
interquartile range
data range
median (50th percentile)

}|
Median
}|

lower quartile (25th percentile)

2nd /upper quartile z

Max
minimum (0th percentile)
z

Imdad ullah Khan (LUMS) Getting to know Data & EDA 27 / 64

EDA: Measures of Spread

Variance: Measures the deviation in values relative to mean

Pn
2 (xi − x)2
σ = i=1
n
Varaince is mean squared deviation from mean
Squared to avoid cancellation of +ve and −ve deviation
Mean deviation could be 0 for data with significant spread
mean and average distance from mean of both
{−5, −10, 5, 10} and {−100, −50, 50, 100} are 0 and 0
▷ There is significantly more spread in the latter data
Pn
|xi −x|
Mean Absolute Deviation: MAD := i=1
n

Variance is easy to compute and has useful mathematical properties

Imdad ullah Khan (LUMS) Getting to know Data & EDA 28 / 64

Measures of Spread

Standard Deviation

Variance has different unit than that of original data

Standard deviation also measures deviation in values relative to mean
Standard deviation is the square root of variance
r Pn
2
i=1 (xi − x)
σ=
n
Standard deviation restores the measure to the original unit of data

Imdad ullah Khan (LUMS) Getting to know Data & EDA 29 / 64

Normal Distribution (Bell-Curve)

For normal distribution, there are guarantees that certain number of values
must fall within k st-dev from the mean

At least ∼ 68% must lie within k = 1 st-dev (x ± 1σ)

At least ∼ 95% must lie within k = 2 st-dev (x ± 2σ)
At least ∼ 99.7% must lie within k = 3 st-dev (x ± 3σ)

Imdad ullah Khan (LUMS) Getting to know Data & EDA 30 / 64

EDA: Three-Sigma Rule - The Empirical Rule

For any distribution of data, there are guarantees that certain number of
values must fall with k st-dev from the mean
At least ∼ 75% must lie within k = 2 st-dev (x ± 2σ)
At least ∼ 89% must lie within k = 3 st-dev (x ± 3σ)
At least ∼ 93% must lie within k = 4 st-dev (x ± 4σ)

Imdad ullah Khan (LUMS) Getting to know Data & EDA 31 / 64

Bivariate Measures

Used for bivariate data or pairs of attributes, more detail later

Nominal or Ordinal Attributes

Contingency Table
χ2 statistics

Numeric Attributes
Covariance
Correlation
Correlation Matrix

Imdad ullah Khan (LUMS) Getting to know Data & EDA 32 / 64

Contingency Table
Contingency table summarizes data with two nominal or ordinal features
Used to determine whether the variable pair is correlated (χ2 -Test)

(nominal) A and B taking values in {a1 , a2 , . . . , ap } and {b1 , b2 , . . . , bq }

fij : frequency of joint occurrence of (ai , bj )

▷ observed frequency of the joint event (A = ai , B = bj )

Contingency Table:
a1 a2 . . . ap
b1
C = b2
..
. fij
bq

Imdad ullah Khan (LUMS) Getting to know Data & EDA 33 / 64

χ2 -test for two attributes A and B

χ2 -statistic: A “correlation” between two nominal attributes A and B

taking values in {a1 , a2 , . . . , ap } and {b1 , b2 , . . . , bq }

fij : frequency of joint occurrence of (ai , bj )

▷ observed frequency of the joint event (A = ai , B = bj )
The expected frequency, eij of the joint event (A = ai , B = bj ),
under independence assumption
Pq
j=1 fij
Estimating probability, Pai = Pr {A = ai } = N , N = pq
eij = Pai · Pbj · N
p P q (f − e )2
ij ij
The χ2 value (Pearson’s χ2 -statistics) is
P
i=1 j=1 eij

Large χ2 values indicates variables are related

Imdad ullah Khan (LUMS) Getting to know Data & EDA 34 / 64

Covariance and Correlation

Covariance and correlation are helpful in understanding the

dependency/relationship between two numeric variables

Covariance between two variables x = {x1 , x2 , · · · , xn } and

y = {y1 , y2 , · · · , yn } with means x and y , resp. is defined as
Pn
i=1 (xi − x)(yi − y)
cov(x, y) =
n
▷ Covariance reveals the “proportionality” between variables
Note when xi and yi both are greater or smaller than their respective
means, (xi − x)(yi − y) is positive and vice-versa
cov(x, y) < 0 =⇒ inverse proportionality
cov(x, y) > 0 =⇒ direct proportionality
cov(x, y) = 0 =⇒ no linear relation

Imdad ullah Khan (LUMS) Getting to know Data & EDA 35 / 64

Covariance and Correlation

Some properties of covariance that readily follow from definition

cov(x, y) = cov(y, x)
cov(x, x) = var(x, x)
If x and y are independent, then cov(x, y) = 0
For constant a and b
cov(x, a) = 0
cov(ax, by) = ab cov(x, y)
cov(x + a, y + b) = cov(x, y)

cov(x, y + z) = cov(x, y) + cov(x, z)

Imdad ullah Khan (LUMS) Getting to know Data & EDA 36 / 64

Covariance and Correlation

Correlation
Covariance depends on magnitude and scale of variable x and y
Correlation quantifies how strongly two variables are linearly related

cov(x, y)
rxy = corr (x, y) =
σx .σy
−1 ≤ corr (x, y) ≤ 1

It is not affected by changes in scale of variables x and y

corr (x, y) = −1 =⇒ perfect negative linear association

corr (x, y) = 1 =⇒ perfect positive linear association
corr (x, y) = 0 =⇒ no linear association

Imdad ullah Khan (LUMS) Getting to know Data & EDA 37 / 64

Correlation
1 0.8 0.4 0 −.4 −.8 −1

1 1 1 1 −1 −1 −1

0 0 0 0 0 0 0

Figure: x and y -axis represent variables - their correlations is on the top

Imdad ullah Khan (LUMS) Getting to know Data & EDA 38 / 64
Correlation matrix

For multi-variate numeric data correlation matrix is

A table of pairwise correlation coefficients between variables
Each cell shows the correlation between two variables
Used to summarize data, as an input into a more advanced analysis,
and as a diagnostic for advanced analyses
Also used to remove redundant variables

Imdad ullah Khan (LUMS) Getting to know Data & EDA 39 / 64

Graphical EDA

Imdad ullah Khan (LUMS) Getting to know Data & EDA 40 / 64

Diagrammatic Representations of Data

Easy to understand: Numbers do not tell all the story. Diagrammatic

representation of data makes it easier to understand
Simplified Presentation: Large volumes of complex data can be
represented in a simplified and intelligible diagram
Reveals hidden facts: Diagrams help in bringing out the facts and
relationships between data not noticeable in raw/tabular form
Easy to compare: Diagrams make it easier to compare data

Imdad ullah Khan (LUMS) Getting to know Data & EDA 41 / 64

Visualizing Data for Insight

Purpose of Graphical EDA: To reveal underlying structures, detect

outliers and anomalies, and understand patterns within the data through
visual methods.
Simplifies complex quantitative information.
Facilitates faster comprehension and decision-making.
Helps in spotting trends, patterns, and outliers.
Common Tools: Histograms, Box plots, Scatter plots, etc.

Imdad ullah Khan (LUMS) Getting to know Data & EDA 42 / 64

Types of Diagrams

We will briefly discuss and use the following types of diagrams

▷ More on importance of visualization later

Bar Charts
Histogram ▷ and also overlapping histogram
Box Plot ▷ and also side-by-side box-plots
Scatter Plot ▷ and scatter plot matrix
Heat map
Line Graph
Parallel Axis Plot
Word-Cloud

Imdad ullah Khan (LUMS) Getting to know Data & EDA 43 / 64

Bar charts

Generally used for a nominal and ordinal variables

Different bars (usually colored/shaded differently) for distinct values
(levels, categories, symbols) of the variable
Height of bar represent frequencies of each symbol (value)
Can reveal variables that have no or limited information e.g. constants
Note that we can use pie charts for the same purpose too
Humans perceive difference in lengths better than in angles

Imdad ullah Khan (LUMS) Getting to know Data & EDA 44 / 64

Histograms
Represent distribution of data in a numeric/continuous variable
(estimates probability distribution of a numeric variable)
Group values by a series of intervals (bins - usually consecutive
non-overlapping subintervals covering range of data)
Plot the number of values falling in each bin (represented by the
height of the bar)
Normalized histogram shows proportion of values in each bin

Imdad ullah Khan (LUMS) Getting to know Data & EDA 45 / 64

Histograms

A histogram with appropriate number/length of bins reveals

Where is the data located

Where/what are the extremes
What is the distribution of the data
How the data is spread out
If the distribution is symmetric or have skew (left or right)
Whether the data is unimodal, bimodal or more
Can also detect outliers in the data if any

Imdad ullah Khan (LUMS) Getting to know Data & EDA 46 / 64

Histograms

Number and sizes of bins are important considerations

Bins do not have to be of equal sizes
For unequal bin sizes height of the bar is not the frequency of values
in the bin, it is the frequency density
Area of the bar is proportional to the frequency
Number of items per unit of the variable of x-axis

Too many bins in histogram gives too much unnecessary details

(shows too much noise)
Too few bins give almost nothing, obscure the underlying patterns

Imdad ullah Khan (LUMS) Getting to know Data & EDA 47 / 64

Histograms

Imdad ullah Khan (LUMS) Getting to know Data & EDA 48 / 64

Overlapping Histograms

Useful in observing distribution of values with respect to a nominal variable

Imdad ullah Khan (LUMS) Getting to know Data & EDA 49 / 64

Box Plots

Another way of displaying the distribution of data (somewhat)

Box-Plots or Box and Whisker diagrams

Imdad ullah Khan (LUMS) Getting to know Data & EDA 50 / 64

Box Plots

Box-Plots or Box and Whisker diagrams

Top and bottom lines of the box are 3rd and 1st quartiles of data
Length of the box is the inter-quartile range (midspread)
The line in the middle of the box is median of data
The top whisker denotes the largest value in the data that is within
1.5 times midspread (Q3 × 1.5 · IQR)
Similarly the bottom whisker
Anything above and below the whiskers are considered outliers
Relative location of median within the box tells us about data
distribution
We find out at what end are the outliers if any

Imdad ullah Khan (LUMS) Getting to know Data & EDA 51 / 64

Box Plots

Can get some idea of skew by observing the shorter whisker

Various norms for whiskers (sometime) top whisker is 90th percentile
Uni-modality and multi-modality type information is generally not
clear from box plots

Imdad ullah Khan (LUMS) Getting to know Data & EDA 52 / 64

Side-by-side Box Plots

Extremely useful for comparisons of two or more variables.

To compare numeric variables, we draw their box-plots in parallel

Imdad ullah Khan (LUMS) Getting to know Data & EDA 53 / 64

Side-by-side Box Plots

Side by side groupwise box plots are extremely useful

Groups are based on values of a categorical variable
It reveals whether a factor (the categorical variable) is important
It addresses whether the location of data differ between groups
To some extent it also reveals whether distribution and variation
differ between groups
Overlapping histograms are more suitable for the latter question,
unless there is too much overlap

Imdad ullah Khan (LUMS) Getting to know Data & EDA 54 / 64

Scatter Plot

Scatter Plot is the best to visualize two dimensional numeric data

This directly represent the two dimensional observations as points in R2 .
Plot one variable on x-axis and other on y -axis

Imdad ullah Khan (LUMS) Getting to know Data & EDA 55 / 64

Scatter Plot

Scatter Plot is the best to visualize two dimensional numeric data

This directly represent the two dimensional observations as points in R2 .
Plot one variable on x-axis and other on y -axis

It shows how the two variables are related to each other

▷ reveals correlations between the variables
If one or both variables are highly skewed, then scatter plots are hard
to examine, as bulk of the data is concentrated in a small part of plot
For this we should use some kind of transformation, explained later
on one or both the variables
log-scaled plots can also be used in such cases

Imdad ullah Khan (LUMS) Getting to know Data & EDA 56 / 64

Scatter Plot

Imdad ullah Khan (LUMS) Getting to know Data & EDA 57 / 64

Scatter Plot Matrix
Pairwise scatter plots, pairwise correlations and individual histograms
or density plots
Summarize the relationships of all pairs of numerical attributes

Imdad ullah Khan (LUMS) Getting to know Data & EDA 58 / 64

Scatter Plot Matrix
Scatter plot (matrix) can be combined with information in a nominal
attribute encoded through color or marker shape

Imdad ullah Khan (LUMS) Getting to know Data & EDA 59 / 64

Heat Map

Presents pairwise relationship between attributes of multivariate data

Imdad ullah Khan (LUMS) Getting to know Data & EDA 60 / 64

Heat Map

Presents pairwise relationship between attributes of multivariate data

Provides a numerical value of the correlation between each variable
Also provides an easy to understand visual representation of those
numbers (colors shades)
Darker red showing high correlation
Dark blue showing none or negative correlation
Can be used to visualize any matrix

Imdad ullah Khan (LUMS) Getting to know Data & EDA 61 / 64

Line graphs
Line graphs are used for time series e.g. player’s yearly average,
student’s semester gpa or hourly energy consumption
Two or more time series can be compared in different colors or
markers (legend should be provided)

Imdad ullah Khan (LUMS) Getting to know Data & EDA 62 / 64

Parallel Axis Plot

Imdad ullah Khan (LUMS) Getting to know Data & EDA 63 / 64

Word-Cloud

Very useful in text analytics

A word cloud shows words used in a text corpus (collection of documents)
with size of words proportional to their importance (e.g. tf-idf)

Quite clear that the word cloud on left is for a collection of articles about US politics,
political news, while that on the right seems a corpus of astronomy/astrophysics

Imdad ullah Khan (LUMS) Getting to know Data & EDA 64 / 64

Chapter 2: Getting To Know Your Data
No ratings yet
Chapter 2: Getting To Know Your Data
30 pages
2 Knowing Data & Visualization
No ratings yet
2 Knowing Data & Visualization
51 pages
AP Statistics Midterm
33% (3)
AP Statistics Midterm
51 pages
DMDW 2
No ratings yet
DMDW 2
68 pages
Attribute Oriented Analysis
No ratings yet
Attribute Oriented Analysis
27 pages
Business Statistics I BBA 1303: Muktasha Deena Chowdhury Assistant Professor, Statistics, AUB
100% (1)
Business Statistics I BBA 1303: Muktasha Deena Chowdhury Assistant Professor, Statistics, AUB
54 pages
Sess02 Data
No ratings yet
Sess02 Data
96 pages
Unit-3 DS Students
No ratings yet
Unit-3 DS Students
35 pages
Abdiasis Abdallah Jama - Statistics Guide For Student and Researchers. With SPSS Illustrations (2020) PDF
No ratings yet
Abdiasis Abdallah Jama - Statistics Guide For Student and Researchers. With SPSS Illustrations (2020) PDF
212 pages
Chapter-2 (Data)
No ratings yet
Chapter-2 (Data)
95 pages
Lesson 2.1 - Know Your Data PDF
No ratings yet
Lesson 2.1 - Know Your Data PDF
43 pages
Module No 2 - Part 2 - Compressed - Compressed
No ratings yet
Module No 2 - Part 2 - Compressed - Compressed
46 pages
Data-Preprocessing
No ratings yet
Data-Preprocessing
138 pages
3 Data
No ratings yet
3 Data
64 pages
Unit 3 Data Preprocessing - Data
No ratings yet
Unit 3 Data Preprocessing - Data
90 pages
Bab 2 Data: Created By: Arif Djunaidy (Ftif - Its)
No ratings yet
Bab 2 Data: Created By: Arif Djunaidy (Ftif - Its)
57 pages
Presentation 1
No ratings yet
Presentation 1
46 pages
01 Data
No ratings yet
01 Data
100 pages
Lect 2 DM Converted 1
No ratings yet
Lect 2 DM Converted 1
29 pages
4 ExploratoryAnalysis
No ratings yet
4 ExploratoryAnalysis
42 pages
02data DMDW
No ratings yet
02data DMDW
40 pages
02 Data
No ratings yet
02 Data
35 pages
Data
No ratings yet
Data
84 pages
Transportation Data Mining: Chapter 2. Getting To Know Your Data
No ratings yet
Transportation Data Mining: Chapter 2. Getting To Know Your Data
77 pages
ITS665dm Topic2-DataUnderstanding
No ratings yet
ITS665dm Topic2-DataUnderstanding
53 pages
Topics To Be Covered
No ratings yet
Topics To Be Covered
58 pages
Unit 1 - IDS
No ratings yet
Unit 1 - IDS
49 pages
Data Mining: Data
No ratings yet
Data Mining: Data
50 pages
DWDM Unit6-Data Similarity Measures
No ratings yet
DWDM Unit6-Data Similarity Measures
40 pages
CH 2
No ratings yet
CH 2
35 pages
Getting To Know Your Data
No ratings yet
Getting To Know Your Data
42 pages
Data Mining Unit-I
No ratings yet
Data Mining Unit-I
44 pages
Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
No ratings yet
Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
31 pages
Data Preprocessing Data Basics
No ratings yet
Data Preprocessing Data Basics
86 pages
Dsa 2
No ratings yet
Dsa 2
64 pages
Math 1f - All Lessons
No ratings yet
Math 1f - All Lessons
81 pages
X Chapter 02 Data
No ratings yet
X Chapter 02 Data
67 pages
Data Mining: Data Exploration: - Chapter 6
No ratings yet
Data Mining: Data Exploration: - Chapter 6
56 pages
Lec 2 Getting To Know Data EDA
No ratings yet
Lec 2 Getting To Know Data EDA
64 pages
Chap2 Data
No ratings yet
Chap2 Data
87 pages
Week 1B - Data
No ratings yet
Week 1B - Data
38 pages
DA Major Notes
No ratings yet
DA Major Notes
46 pages
Getting To Know Your Data
No ratings yet
Getting To Know Your Data
78 pages
Lect 3
No ratings yet
Lect 3
51 pages
4 DataUnderstanding
No ratings yet
4 DataUnderstanding
51 pages
Lecture 2 EDA 1
No ratings yet
Lecture 2 EDA 1
26 pages
DWDM Unit-2
No ratings yet
DWDM Unit-2
19 pages
Unit 2 Final Ids
No ratings yet
Unit 2 Final Ids
38 pages
DM Unit-1-1
No ratings yet
DM Unit-1-1
56 pages
Lec2 Data
No ratings yet
Lec2 Data
51 pages
Machine Learning Unit 1
No ratings yet
Machine Learning Unit 1
30 pages
Week 5 - Data Mining Exploring Data With R
No ratings yet
Week 5 - Data Mining Exploring Data With R
146 pages
Getting To Know Data EDA
No ratings yet
Getting To Know Data EDA
79 pages
Lecture 2 - Exploratory Data Analysis
No ratings yet
Lecture 2 - Exploratory Data Analysis
35 pages
Dmi Unit 2
No ratings yet
Dmi Unit 2
19 pages
Mod 4 Types of Data in Cluster Analysis
No ratings yet
Mod 4 Types of Data in Cluster Analysis
31 pages
Lecture 2
No ratings yet
Lecture 2
33 pages
Get To Know About Data
No ratings yet
Get To Know About Data
25 pages
2 1 Data
No ratings yet
2 1 Data
22 pages
Know Your Data
No ratings yet
Know Your Data
83 pages
02 Kinds of Data
No ratings yet
02 Kinds of Data
41 pages
DM-Knowing Your Data
No ratings yet
DM-Knowing Your Data
56 pages
CAC 428 Topic 1 - Introduction To Data
No ratings yet
CAC 428 Topic 1 - Introduction To Data
24 pages
Ids U2 PPT 30092024
No ratings yet
Ids U2 PPT 30092024
87 pages
Introduction To Data
No ratings yet
Introduction To Data
26 pages
Chapter 6 Measures of Dispersion
No ratings yet
Chapter 6 Measures of Dispersion
27 pages
WST01 01 Que 20171018 PDF
No ratings yet
WST01 01 Que 20171018 PDF
24 pages
Statistics: 17.1 Data
100% (1)
Statistics: 17.1 Data
46 pages
Unit-2 - Notes
No ratings yet
Unit-2 - Notes
80 pages
R Boxplot
No ratings yet
R Boxplot
3 pages
Python Data Processing
No ratings yet
Python Data Processing
36 pages
Measures of Variability
No ratings yet
Measures of Variability
7 pages
Statistics III Form 4
100% (2)
Statistics III Form 4
5 pages
Cereal Production in Pakistan
No ratings yet
Cereal Production in Pakistan
24 pages
Measure of Dispersion
No ratings yet
Measure of Dispersion
32 pages
Time Series Analysis and Forecasting
No ratings yet
Time Series Analysis and Forecasting
7 pages
Quartile & List of Journals
No ratings yet
Quartile & List of Journals
4 pages
Introduction To Data Science Lecture 1
No ratings yet
Introduction To Data Science Lecture 1
4 pages
2 Descriptive Statistics 1st Sem AY 2024-2025
No ratings yet
2 Descriptive Statistics 1st Sem AY 2024-2025
67 pages
Lecture Notes - Anomaly Detection in Time Series
No ratings yet
Lecture Notes - Anomaly Detection in Time Series
43 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
q2 Final Exam Grade 9 Elective
No ratings yet
q2 Final Exam Grade 9 Elective
6 pages
Additional Mathematics Results For SMK Taman SEA From 2013 - 2017
No ratings yet
Additional Mathematics Results For SMK Taman SEA From 2013 - 2017
8 pages
Maths Worksheet For GR 11 Unit 7& 8 V
No ratings yet
Maths Worksheet For GR 11 Unit 7& 8 V
7 pages
Python Question Bank
No ratings yet
Python Question Bank
47 pages
Effects of Occupational Therapy Tatsumi
No ratings yet
Effects of Occupational Therapy Tatsumi
10 pages
Cabral, Jose Arnold Jr. F. - MMW FINAL TERM PROJECT SAMPLE FORMAT
No ratings yet
Cabral, Jose Arnold Jr. F. - MMW FINAL TERM PROJECT SAMPLE FORMAT
12 pages
Maths 9709 Paper 5 Format 2 - Representation of Data
No ratings yet
Maths 9709 Paper 5 Format 2 - Representation of Data
19 pages
C.V and Quatiles
No ratings yet
C.V and Quatiles
16 pages
Session 1 Data Handling Marking Guideline
No ratings yet
Session 1 Data Handling Marking Guideline
6 pages
Seminar 3 Measures of Dispersion With Answers
No ratings yet
Seminar 3 Measures of Dispersion With Answers
7 pages
Normalization
No ratings yet
Normalization
2 pages
Case Study Normalization
No ratings yet
Case Study Normalization
1 page
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet