0% found this document useful (0 votes)

8 views

Data Preprocessing

Uploaded by

SahilPatel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Data Preprocessing

Uploaded by

SahilPatel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

The slides are derived from the following publisher instructor

material. This work is protected by United States copyright laws

and is provided solely for the use of instructors in teaching
their courses and assessing student learning, dissemination or
sale of any part of this work will destroy the integrity of the
work and is not permitted. All recipients of this work are
expected to abide by these restrictions.

Data Mining and Predictive Analytics, Second Edition, by Daniel Larose and Chantal Larose, John Wiley and Sons, Inc., 2015.
Data Preprocessing
Outline:

This chapter shows

how to:
– Evaluate the quality of Business / Research Data Understanding
the data Understanding Phase Phase

– Clean the raw data Data Preparation

Deployment Phase
– Deal with missing data Phase

– Perform Evaluation
Modeling Phase
transformations on Phase

certain variables

CRISP-DM standard process

3
Why Do We Preprocess Data?

• Raw data often incomplete, noisy

• May contain:
– redundant fields
– Missing values
– Outliers
– Data in form not suitable for data mining
– Erroneous values

4
Why Do We Preprocess Data? (cont’d)
• For data mining purposes, database values must undergo data
cleaning and data transformation
• Data often from legacy databases where values:
– Not looked at in years
– Expired
– No longer relevant
– Missing
• Minimize GIGO (Garbage In Garbage Out)
– IF garbage input minimized → THEN garbage in results minimized
• Data preparation is 60% of effort for data mining process (Pyle)

5
Data Cleaning
Data errors in the following table:

• Five-numeral U.S. Zip Code?

– Not all countries use same zip code format, 90210 (U.S.) vs. J2S7K7
(Canada)
– Should expect unusual values in some fields

• Four Digit Zip Code?

– Leading zero truncated, 6269 vs. 06269 (New England states)
– Database field numeric and chopped-off leading zero

6
Data Cleaning (cont’d)

• Income Field Contains $10,000,000?

– Assumed to measure gross annual income
– Possibly valid
– Still considered outlier (extreme data value)
– Some statistical and data mining methods affected by outliers
• Income Field Contains -$40,000?
– Income less than $0?
– Value beyond bounds for expected income, therefore an error
– Caused by data entry error?
– Discuss anomaly with database administrator

7
Data Cleaning (cont’d)

• Income Field Contains $99,999?

– Other values appear rounded to nearest $5,000
– Value may be completely valid
– Value represents database code used to denote missing value?
– Confirm values in expected unit of measure, such as U.S. dollars
– Which unit of measure for income?
– Customer with zip code J2S7K7 in Canadian dollars?
– Discuss anomaly with database administrator

8
Data Cleaning (cont’d)

• Age Field Contains “C”?

– Other records have numeric values for field
– Record categorized into group labeled “C”
– Value must be resolved
– Data mining software expects numeric values for field
• Age Field Contains 0?
– Zero-value used to indicate missing/unknown value?
– Customer refused to provide their age?

9
Data Cleaning (cont’d)

• Marital Status Field Contains “S”?

– What does this symbol mean?
– Does “S” imply single or separated?
– Discuss anomaly with database administrator

10
Handling Missing Data

• Missing values pose problems to data analysis methods

• More common in databases containing large number of fields

• Absence of information rarely beneficial to task of analysis

• In contrast, having more data almost always better

• Careful analysis required to handle issue

11
Handling Missing Data (cont’d)
• Examine cars_preprocessing dataset containing records for 261
automobiles manufactured in 1970s and 1980s

cars = read.csv("cars_preprocessing.csv",stringsAsFactors
= FALSE, na.strings = "")
# create a new dataset as car.cleaned
cars.cleaned = cars
head(cars.cleaned)

12
Handling Missing Data (cont’d)
• Delete Records Containing Missing Values?
– Not necessarily best approach
– Pattern of missing values may be systematic
– Deleting records creates biased subset
– Valuable information in other fields lost

• Four Alternate Methods Available

1. Replace Missing Values with User-defined Constant
2. Replace Missing Values with Mode (for categorical variables) or Mean (for
numeric variables)
3. Replace the missing values with a value generated at random from the
observed distribution of the variable.
4. Replace the missing values with imputed values based on the other
characteristics of the record.

13
Handling Missing Data (cont’d)
• (1) Replace Missing Values with User-defined Constant
– Missing numeric values replaced with 0.0

cars.cleaned$hp[c(4,5)] = 0

Or use the following code to replace all NAs with 0

cars.cleaned$hp[is.na(cars.cleaned$hp)] = 0

– Missing categorical values replaced with “Missing”

cars.cleaned$brand[is.na(cars.cleaned$brand)] =
"Missing"

14
Handling Missing Data (cont’d)
• (2) Replace Missing Values with Mode or Mean
– Mode of categorical field cylinders = 4
– Missing values replaced with this value

# Replace values with mean and mode

our_table = table(cars.cleaned$cylinders)
pos_mode = which.max(our_table)
our_mode = names(pos_mode)
cars.cleaned$cylinders[is.na(cars.cleaned$cylinders)] =
our_mode

– Mean for non-missing values in numeric field mpg= 23.12171

– Missing values replaced with 23.12171
cars.cleaned$mpg[is.na(cars.cleaned$mpg)] =
mean(na.omit(cars.cleaned$mpg))

15
Handling Missing Data (cont’d)
• (3) Replace Missing Values with Random Values
– Values randomly taken from underlying distribution
– Method superior compared to mean substitution
– Measures of location and spread remain closer to original
# Generate random observations
obs_cubicinches = sample(na.omit(cars.cleaned$cubicinches), 1)

cars.cleaned$cubicinches[is.na(cars.cleaned$cubicinches)] =
obs_cubicinches

– No guarantee resulting records make sense

– Suppose randomly-generated values cylinders = 8 and cubicinches = 82
– What is likely value, given record’s other attribute values? (imputation)
– For example, American car has 300 cubic inches and 150 horsepower
– Japanese car has 100 cubic inches and 90 horsepower
– American car expected to have more cylinders
16
Handling Missing Data (cont’d)
• (4) Imputation
– In data imputation, we need to answer “What would be the
most likely value for this missing value, given all the other
attributes for a particular record?”
• An American car with 300 cubic inches and 150 horsepower would
probably be expected to have more cylinders than a Japanese car with 100
cubic inches and 90 horsepower.

– For imputation of missing data, we need to learn the tools

needed to do so, such as multiple regression or classification and
regression trees (future topics)

17
Identifying Misclassifications

– Verify values valid and consistent

table(cars.cleaned$brand)

– Frequency distribution shows five classes: USA, France, US,

Europe, and Japan
– Count for USA = 1 and France = 1?
– Two records classified inconsistently with respect to origin
of the manufacturer
– Maintain consistency by labeling USA → US, and France →
Europe
18
Graphical Methods for Identifying Outliers
• Outliers are values that lie near extreme limits of data range
• Outliers may represent errors in data entry
• Certain statistical methods very sensitive to outliers and may produce
unstable results
• A histogram examines values of numeric fields

# Create a Histogram
hist(cars$weightlbs,col="blue",border = "black",xlab =
"Weight", ylab = "Counts", main = "Histogram of Car Weights")

# Make a box around the plot

box(which = "plot", lty = "solid", col = "black")

19
Graphical Methods for Identifying Outliers (cont’d)

– A histogram examines values

of numeric fields
– This histogram shows vehicle
weights for cars data set
– The extreme left-tail contains
one outlier weighing several
hundred pounds (192.5)
– Perhaps value of 192.5 is an
error
– Should it be 1925?
– Cannot know for sure and
requires further investigation

20
Graphical Methods for Identifying Outliers (cont’d)
– Two-dimensional scatter plots help determine outliers between
variable pairs
– Scatter plot of mpg against weightlbs shows two possible
outliers
– Most data points cluster together along x-axis
– However, one car weighs 192.5 pounds and other gets over 500
miles per gallon?
# Create a Scatterplot
plot(cars$weight,cars$mpg,xlim = c(0, 5000),
ylim = c(0, 600),xlab = "Weight", ylab = "MPG",
main = "Scatterplot of MPG by Weight", type
= "p", pch = 20,col = "blue")
#Add open black circles
points(cars$weight, cars$mpg, type = "p", col =
"black")

21
Graphical Methods for Identifying Outliers (cont’d)

– Most data points cluster together along x-axis

– However, one car weighs 192.5 pounds and other gets over 500
miles per gallon?

22
Measures of Center And Spread
• The numerical measures of center estimates where the center of a
particular variable Lies
– Mean
– Median
– Mode
• Mean: the average of the valid values taken by the variable
‒ For extremely skewed data sets, the mean becomes less representative of the
variable center
‒ Also, the mean is sensitive to the presence of outliers
• Median: defined as the field value in the middle when the field
values are sorted into ascending order
‒ The median is resistant to the presence of outliers
• Mode: represents the field value occurring with the greatest
frequency
‒ The mode may be used with either numerical or categorical data, but is not always
associated with the variable center
23
Measures of Center And Spread (cont’d)

• Measures of spread (variability) include the range (maximum —

minimum), the standard deviation, the mean absolute deviation,
and the interquartile range

• The sample standard deviation is perhaps the most widespread

measure of variability and is defined by

• The standard deviation can be interpreted as the “typical”

distance between a field value and the mean, and most field
values lie within two standard deviations of the mean.

24
Measures of Center And Spread (cont’d)

# Descriptive Statistics
mean(cars.cleaned$weightlbs) # Mean

median(cars.cleaned$weightlbs) # Median

length(cars.cleaned$weightlbs) # Number of
observations

sd(cars.cleaned$weightlbs) # Standard
deviation

summary(cars.cleaned$weightlbs) # Min,
Q1,Median, Mean,Q3, Max

25
Data Transformation
• Variables tend to have ranges different from each other
• In baseball, two fields may have ranges:
– Batting average: [ 0.0, 0.400 ]
– Number of home runs: [ 0, 70 ]

• Some data mining algorithms adversely affected by

differences in variable ranges
• Variables with greater ranges tend to have larger influence on
data model’s results

• Therefore, numeric field values should be normalized

26
Min-Max Normalization

• Min-max normalization works by seeing how much greater the field value is
than the minimum value min(X), and scaling this difference by the range

• For example, an ultra-light vehicle, weighing only 1613 pounds (the field mini-
mum), the min–max normalization is:

• The heaviest vehicle has a min-max normalization value of

27
Z-score Standardization

• Z-score standardization works by taking the difference between the field

value and the field mean value, and scaling this difference by the standard
deviation of the field values

• For example, a vehicle weighing only 1613 pounds, the Z-score standard-
ization is:

• For the heaviest car, the Z-score standardization is:

28
Z-score Standardization

# Transformations
# Min-max normalization

mi = min(cars.cleaned$weightlbs)
ma = max(cars.cleaned$weightlbs)
minmax.weight = (cars.cleaned$weightlbs -
mi)/(ma - mi)
minmax.weight

# Z-score standarization
m = mean(cars.cleaned$weightlbs);
s = sd(cars.cleaned$weightlbs)
z.weight = (cars.cleaned$weightlbs - m)/s
z.weight
29
Transformations To Achieve Normality

• Normal distribution is a continuous probability distribution (bell curve)

• Centered at mean 𝜇 and its spread determined by SD 𝜎 (sigma)

• Figure below shows the normal distribution that has mean 𝜇 = 0 and
SD 𝜎 = 1, known as the standard normal distribution Z

• Common misconception that variables that have had the Z-score

standardization applied to them follow the standard normal Z
distribution

• This is not correct!

30
Transformations To Achieve Normality (cont’d)

• Z-standardized data
Original
will have a mean = 0
and standard
deviation = 1 but does
not mean that they
are normally
distributed
Standardized

31
Skewness

• Measuring the skewness of

a distribution informs us of
symmetry

32
Skewness (cont’d)

# Skewness

(3*(mean(cars$weightlbs) -
median(cars$weightlbs)))/sd(cars$weightlbs)

(3*(mean(z.weight) -
median(z.weight)))/sd(z.weight)

33
Transformations To Achieve Normality

• To make our data “more normally distributed,” we

must first make it symmetric
• To eliminate skewness, we apply a transformation
to the data
• Common transformations are:
– Natural log transformation 𝒍𝒏(𝒘𝒆𝒊𝒈𝒉𝒕)
– Square root transformation weight
𝟏
– Inverse square root transformation
weight

34
Transformations To Achieve Normality (cont’d)

• Natural log transformation

ln(𝑤𝑒𝑖𝑔ℎ𝑡)

• Square root transformation

weight

• Inverse square root

1
transformation
weight
– (best but not really normal)
35
Transformations To Achieve Normality (cont’d)
# Transformations for Normality
# Square root
sqrt.weight = sqrt(cars.cleaned$weightlbs)

sqrt.weight_skew = (3*(mean(sqrt.weight) -
median(sqrt.weight))) / sd(sqrt.weight)
# Natural log
library(SciViews)
ln.weight = ln(cars.cleaned$weightlbs)

ln.weight_skew = (3*(mean(ln.weight) - median(ln.weight))) /

sd(ln.weight)
# Inverse square root
invsqrt.weight = 1 / sqrt(cars.cleaned$weight)

invsqrt.weight_skew = (3*(mean(invsqrt.weight) -
median(invsqrt.weight))) /sd(invsqrt.weight)

36
Transformations To Achieve Normality (cont’d)
• The 3 transformations may produce a
more normal distribution than one
another

• Check for normality => construct a

normal probability plot
Not Normal

• Distribution is normal, the bulk of the Normal

points in the plot should fall on a
straight line

• When the algorithm is done with its

analysis, don’t forget to “de-
transform” the data
37
Transformations To Achieve Normality (cont’d)

# Normal Q-Q Plot

qqnorm(invsqrt.weight,col = "red")

qqline(invsqrt.weight, col = "blue")

38
Transformations To Achieve Normality (cont’d)
# Side-by-Side Histograms
par(mfrow = c(1,2))
# Create two histograms
hist(cars$weight, breaks = 20,xlim = c(1000, 5000),
main = "Histogram of Weight", xlab = "Weight", ylab =
"Counts")

box(which = "plot",lty = "solid", col = "black")

hist(z.weight,breaks = 20,xlim = c(-2, 3),main =

"Histogram of Zscore of Weight", xlab = "Z-score of
Weight", ylab = "Counts")

box(which = "plot", lty = "solid",col = "black")

39
Histogram with Normal Distribution Overlay

# Histogram with Normal Distribution Overlay

par(mfrow=c(1,1))

x = rnorm(1000000, mean = mean(invsqrt.weight),sd =

sd(invsqrt.weight))

hist(invsqrt.weight, breaks = 30, xlim=c(0.0125,

0.0275), col = "lightblue", prob = TRUE, border =
"black", xlab="Inverse Square Root of Weight", ylab =
"Counts",main = "Histogram of Inverse Square Root of
Weight")

box(which = "plot",lty = "solid", col="black")

# Overlay with Normal density

lines(density(x), col="red")

40
Numerical Methods For Identifying Outliers

• The Z-score method for identifying outliers states:

– data value is an outlier if it has a Z-score that is
either less than −3 or greater than 3.
– Variable values with Z-scores much beyond this
range may bear further investigation
– However, one should not automatically omit
outliers from analysis.

41
IQR (Interquartile range)
• Unfortunately, the mean and SD, which are both
part of the formula for the Z-score standardization,
are both rather sensitive to the presence of outliers.

• Therefore, data analysts have developed more

robust statistical methods for outlier detection

• One elementary robust method is to use the IQR.

42
IQR
• The quartiles of a data set divide the data set into the following
four parts, each containing 25% of the data:
– The first quartile (Q1) is the 25th percentile.
– The second quartile (Q2) is the 50th percentile, that is, the median.
– The third quartile (Q3) is the 75th percentile.

• IQR is calculated as IQR=Q3−Q1, and may be interpreted to

represent the spread of the middle 50% of the data

• A data value is an outlier if:

– a. it is located 1.5(IQR) or more below Q1, or
– b. it is located 1.5(IQR) or more above Q3.

43
IQR (cont’d)

• Set of numbers: {1,6,3,14,5,2,7,8,4}

– 25th percentile was Q1=2.5
– 75th percentile was Q3=7.5
• Interquartile range, or the difference between
these quartiles was:
– IQR=7.5−2.5=5
• A number would identified as an outlier if:
– a. it is lower than Q1−1.5(IQR)=2.5−1.5(5)=-5,
or
– b. it is higher than Q3+1.5(IQR)=7.5+1.5(5)=12.5

44
Dummy Variables

• Some analytical methods, such as regression,

require predictors to be numeric

• A dummy variable is a categorical variable taking

only two values, 0 and 1

• When a categorical predictor takes k ≥ 3 possible

values, then define k−1 dummy variables

45
Dummy Variables (cont’d)
• Categorical predictor region has k=4 possible categories, {north,
east, south, west}, then the analyst could define the following
k−1=3 dummy variables
– north_dummy: If region = north then north_dummy = 1; otherwise
north_dummy = 0.
– east_dummy: If region = east then east_dummy = 1; otherwise east_dummy
= 0.
– south_dummy: If region = south then south_dummy = 1; otherwise
south_dummy = 0.
• The dummy variable for the west is not needed, as region=west
is already uniquely identified by zero values for each of the
three existing flag variable

library(fastDummies)

cars.cleaned.new= dummy_cols(cars.cleaned,
select_columns = c("brand","cylinders"))

46
Transforming Categorical Variables Into Numerical Variables

• In most instances, the data analyst should avoid

transforming categorical variables to numerical variables
(assumes order)

• Exception is for categorical variables that are clearly

ordered, such as the variable survey response, taking values
always, usually, sometimes, never

47
Removing Variables That Are Not Useful

• Duplicate records lead to an overweighting of the data

values

• Wish to remove variables that will not help the

analysis, regardless of the proposed data mining task
or algorithm
– Unary variables
• take on only a single value, so a unary variable is not
so much a variable as a constant
– Variables which are very nearly unary

• Example
– suppose that 99.95% of the players in a field hockey league are female,
with the remaining 0.05% male

48
The slides are derived from the following publisher instructor
material. This work is protected by United States copyright laws
and is provided solely for the use of instructors in teaching
their courses and assessing student learning, dissemination or
sale of any part of this work will destroy the integrity of the
work and is not permitted. All recipients of this work are
expected to abide by these restrictions.

Data Mining and Predictive Analytics, Second Edition, by Daniel Larose and Chantal Larose, John Wiley and Sons, Inc., 2015.

IFRS 15 Questions
No ratings yet
IFRS 15 Questions
6 pages
Statistics and Data Analysis For Nursing Research 2nd Edition by Denise F. Polit - Test Bank
No ratings yet
Statistics and Data Analysis For Nursing Research 2nd Edition by Denise F. Polit - Test Bank
21 pages
11-Data Pre-Processing, Exploratory Data Analysis.-23-03-2023
No ratings yet
11-Data Pre-Processing, Exploratory Data Analysis.-23-03-2023
37 pages
3 Data Preprocessing
No ratings yet
3 Data Preprocessing
33 pages
DMPA-2 Powerpoint Slides - Modified Audio
No ratings yet
DMPA-2 Powerpoint Slides - Modified Audio
38 pages
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
No ratings yet
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
40 pages
3 DSEngineering
No ratings yet
3 DSEngineering
64 pages
253777
No ratings yet
253777
66 pages
DSV-S8 Data Cleaning
No ratings yet
DSV-S8 Data Cleaning
34 pages
COS10022 - Lecture 03 - Data Preparation PDF
No ratings yet
COS10022 - Lecture 03 - Data Preparation PDF
61 pages
CH2 Data Cleaning
No ratings yet
CH2 Data Cleaning
41 pages
Preprocessing - M2
No ratings yet
Preprocessing - M2
53 pages
Preprocessing 935
No ratings yet
Preprocessing 935
68 pages
Lecture - 04 - Data Understanding and Preparation
No ratings yet
Lecture - 04 - Data Understanding and Preparation
59 pages
ML Unit 1 Part 2
No ratings yet
ML Unit 1 Part 2
56 pages
EDA - Zep
No ratings yet
EDA - Zep
33 pages
Chapter - 2 - Cleaning and Transforming Data
No ratings yet
Chapter - 2 - Cleaning and Transforming Data
27 pages
data science slides
No ratings yet
data science slides
57 pages
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 6
No ratings yet
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 6
22 pages
6-Significance of Exploratory Data Analysis, Making Sense of Data-06!02!2024
No ratings yet
6-Significance of Exploratory Data Analysis, Making Sense of Data-06!02!2024
85 pages
VIPDMTheoryChapter3
No ratings yet
VIPDMTheoryChapter3
87 pages
DM Chapter 3 Data Preprocessing
No ratings yet
DM Chapter 3 Data Preprocessing
76 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
BC 2014 Session2
No ratings yet
BC 2014 Session2
45 pages
Lect 2
No ratings yet
Lect 2
54 pages
Insy662 - f23 - Week 1
No ratings yet
Insy662 - f23 - Week 1
21 pages
DM-2Preprocessing 2
No ratings yet
DM-2Preprocessing 2
61 pages
03_Data_Preprocessing
No ratings yet
03_Data_Preprocessing
15 pages
Chapter 2_ Data Exploration, Preprocessing and Visualization
No ratings yet
Chapter 2_ Data Exploration, Preprocessing and Visualization
92 pages
Aiml Data Preprocessing
No ratings yet
Aiml Data Preprocessing
99 pages
DWDM 3
No ratings yet
DWDM 3
12 pages
Week 2 - Data Quality
No ratings yet
Week 2 - Data Quality
43 pages
Introduction To Analytics
100% (1)
Introduction To Analytics
45 pages
Pre Processing
No ratings yet
Pre Processing
68 pages
Preprocessing Techniques
No ratings yet
Preprocessing Techniques
63 pages
Unit - II
No ratings yet
Unit - II
56 pages
Class3-9 DataPreprocessing 22Aug-06Sept2019
No ratings yet
Class3-9 DataPreprocessing 22Aug-06Sept2019
53 pages
06 Data Mining-Data Preprocessing-Cleaning
No ratings yet
06 Data Mining-Data Preprocessing-Cleaning
6 pages
Data Cleaning
No ratings yet
Data Cleaning
42 pages
Dataminin Presentation (1) .PPTX - Read-Only
No ratings yet
Dataminin Presentation (1) .PPTX - Read-Only
23 pages
Que Es Datamin
No ratings yet
Que Es Datamin
52 pages
2 - Preprocessing
No ratings yet
2 - Preprocessing
74 pages
Unit 4
No ratings yet
Unit 4
66 pages
W4-5 03preprocessing
No ratings yet
W4-5 03preprocessing
83 pages
Week2-2
No ratings yet
Week2-2
25 pages
03preprocessing DMDW
No ratings yet
03preprocessing DMDW
81 pages
Data Mining Requires Collecting Great Amount of Data (Available in Data Warehouses or Databases) To Achieve The Intended Objective
No ratings yet
Data Mining Requires Collecting Great Amount of Data (Available in Data Warehouses or Databases) To Achieve The Intended Objective
37 pages
CH 2
No ratings yet
CH 2
36 pages
Data Preparation: KIT306/606: Data Analytics A/Prof. Quan Bai University of Tasmania
No ratings yet
Data Preparation: KIT306/606: Data Analytics A/Prof. Quan Bai University of Tasmania
49 pages
DEC_Unit II Data Pre-processing
No ratings yet
DEC_Unit II Data Pre-processing
96 pages
Data preprocessing (1)
No ratings yet
Data preprocessing (1)
77 pages
Unit2
No ratings yet
Unit2
76 pages
Data (1) (1)
No ratings yet
Data (1) (1)
81 pages
Lecture 3
No ratings yet
Lecture 3
32 pages
Lecture 7 -Data Preprocessing - Cleaning-M
No ratings yet
Lecture 7 -Data Preprocessing - Cleaning-M
21 pages
Unit 1
No ratings yet
Unit 1
21 pages
Data Cleaning Wrangling
No ratings yet
Data Cleaning Wrangling
42 pages
Unit 1
No ratings yet
Unit 1
26 pages
Linear Regression with Multiple Covariates
From Everand
Linear Regression with Multiple Covariates
Brett Kottmann
No ratings yet
100 Puzzles to Learn Data Warehousing
From Everand
100 Puzzles to Learn Data Warehousing
Cristian Scutaru
No ratings yet
Apache Cassandra Developer Associate - Exam Practice Tests
From Everand
Apache Cassandra Developer Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Linear Regression
No ratings yet
Linear Regression
35 pages
Dimension Reduction Methods
No ratings yet
Dimension Reduction Methods
32 pages
EDA
No ratings yet
EDA
40 pages
Introduction to Data Mining
No ratings yet
Introduction to Data Mining
27 pages
IFRS 3 Business Combinations Etc Inc Associates and JA
No ratings yet
IFRS 3 Business Combinations Etc Inc Associates and JA
41 pages
Group Statement of Financial Position Recap
No ratings yet
Group Statement of Financial Position Recap
14 pages
IAS 36 Impairment of Assets Including Goodwill
No ratings yet
IAS 36 Impairment of Assets Including Goodwill
39 pages
Data Analytics
No ratings yet
Data Analytics
12 pages
Presentation 3
100% (1)
Presentation 3
37 pages
0580 w19 QP 43 PDF
No ratings yet
0580 w19 QP 43 PDF
20 pages
Exp 2 Data Preprocessing_ Cleaning the Dataset Obtained from the UCI ML Repository
No ratings yet
Exp 2 Data Preprocessing_ Cleaning the Dataset Obtained from the UCI ML Repository
9 pages
Business Analytics - The Science of Data Driven Decision Making
No ratings yet
Business Analytics - The Science of Data Driven Decision Making
55 pages
Unit - III Univariate Analysis
No ratings yet
Unit - III Univariate Analysis
33 pages
Statistics 1-17
No ratings yet
Statistics 1-17
18 pages
mmc1 (2)
No ratings yet
mmc1 (2)
134 pages
Program Results by Specialty 2012
No ratings yet
Program Results by Specialty 2012
188 pages
2011 CJA Guide For Authors
No ratings yet
2011 CJA Guide For Authors
29 pages
Statistics and Probability (MAT02) Numerical Descriptive Measure
No ratings yet
Statistics and Probability (MAT02) Numerical Descriptive Measure
13 pages
Halo Effect Similar Study
No ratings yet
Halo Effect Similar Study
19 pages
Measures of Variation Include
No ratings yet
Measures of Variation Include
23 pages
Business Statistics Chapter 2
No ratings yet
Business Statistics Chapter 2
48 pages
CAMI 16 Data Analytics End Sem PDFs
No ratings yet
CAMI 16 Data Analytics End Sem PDFs
559 pages
Statistics Handbook
No ratings yet
Statistics Handbook
68 pages
Set+1 Descriptive Stats+Probability
No ratings yet
Set+1 Descriptive Stats+Probability
5 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
Bayesian Guide v0.12.2
No ratings yet
Bayesian Guide v0.12.2
120 pages
35PIIS1089947223010730
No ratings yet
35PIIS1089947223010730
5 pages
mental-2025-1-e66665
No ratings yet
mental-2025-1-e66665
11 pages
2descriptive Numerical Summary Measures Central
No ratings yet
2descriptive Numerical Summary Measures Central
52 pages
Stage 6 Support Material Mathematics Standard Yr 11 Unit Statistical Analysis
No ratings yet
Stage 6 Support Material Mathematics Standard Yr 11 Unit Statistical Analysis
16 pages
DWDM - (UNIT-1) : SVIT College of Engineering, ATP
No ratings yet
DWDM - (UNIT-1) : SVIT College of Engineering, ATP
40 pages
Descriptive Analysis in R Programming - GeeksforGeeks-1-12
No ratings yet
Descriptive Analysis in R Programming - GeeksforGeeks-1-12
12 pages
Applied Statistics Lab Manual No. 3 Minitab
No ratings yet
Applied Statistics Lab Manual No. 3 Minitab
11 pages
Chapter 3 Exploratory Data Analysis
No ratings yet
Chapter 3 Exploratory Data Analysis
22 pages
Lesson 5-Stem and Leaf Diagrams and Back To Back Stem and Leaf
No ratings yet
Lesson 5-Stem and Leaf Diagrams and Back To Back Stem and Leaf
12 pages
Jeannette Moriak: The Racers' Conjectures
100% (2)
Jeannette Moriak: The Racers' Conjectures
5 pages
New Jersey Student Learning Standards Mathematics - Grade 6
No ratings yet
New Jersey Student Learning Standards Mathematics - Grade 6
18 pages