Exploratory data analysis in the context of data mining and resampling

This document discusses the evolving nature of exploratory data analysis (EDA) in relation to data mining and resampling, emphasizing that EDA is not opposed to statistical modeling but rather complements it. It highlights three main goals of EDA: cluster detection, variable selection, and pattern recognition, and introduces techniques such as Two Step clustering, classification trees, and neural networks. The article also addresses the limitations of conventional EDA frameworks and the necessity for new methodologies to adapt to modern data analysis challenges.

Uploaded by

anikeit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views6 pages

Exploratory data analysis in the context of data mining and resampling

Uploaded by

anikeit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Exploratory data analysis in the context of data mining and

resampling.

ABSTRACT
Today there are quite a few widespread misconceptions of exploratory data analysis (EDA). One
of these misperceptions is that EDA is said to be opposed to statistical modeling. Actually, the
essence of EDA is not about putting aside all modeling and preconceptions; rather, researchers
are urged not to start the analysis with a strong preconception only, and thus modeling is still
legitimate in EDA. In addition, the nature of EDA has been changing due to the emergence of
new methods and convergence between EDA and other methodologies, such as data mining
and resampling. Therefore, conventional conceptual frameworks of EDA might no longer be
capable of coping with this trend. In this article, EDA is introduced in the context of data mining
and resampling with an emphasis on three goals: cluster detection, variable selection, and
pattern recognition. Two Step clustering, classification trees, and neural networks, which are
powerful techniques to accomplish the preceding goals, respectively, are illustrated with
concrete examples. Key words: exploratory data analysis, data mining, resampling, cross-
validation, data visualization, clustering, classification trees, neural network

COVENTIOAL VIEWS OF EDA

Exploratory data analysis was named by Tukey (1977) as an alternative to CDA. As mentioned
before, EDA is an attitude or philosophy about how data analysis should be carried out, instead
of being a fixed set of techniques. Tukey (1977) often related EDA to detective work. In EDA, the
role of the researcher is to explore the data in as many ways as possible until a plausible “story”
of the data emerges. Therefore, the “data detective” should be skeptical of the “face” value of
the data and keep an open mind to unanticipated results when the hidden patterns are
unearthed. Throughout many years, different researchers formulated different definitions,
classifications, and taxonomies of EDA. For example, Velleman and Hoaglin (1981) outlined four
basic elements of exploratory data analysis: residual, re-expression (data transformation),
resistant, and display (data visualization). Based upon Velleman and Hoaglin’s framework,
Behrens and Yu (2003) elaborated the above four elements with updated techniques, and
renamed “display” to “revelation.” Each of them is briefly introduced as follows:
1. Residual analysis: EDA follows the formula that data = fit + residual or data = model + error.
The fit or the model is the expected values of the data whereas the residual or the error is the
values that deviate from that expected value. By examining the residuals, the researcher can
assess the model’s adequacy (Yu, 2009b).
2. Re-expression or data transformation: When the distribution is skewed or the data structure
obscures thpattern, the data could be rescaled in order to improve interpretability. Typical
examples of data transformation include using natural log transformation or inverse probability
transformation to normalize a distribution, using square root transformation to stabilize
variances, and using logarithmic transformation to linearize a trend (Yu, 2009b). 3. Resistance
procedures: Parametric tests are based on the mean estimation, which is sensitive to outliers or
skewed distributions. In EDA, resistant estimators are usually used. The following are common
examples: median, trimean (a measure of central tendency based on the arithmetic average of
the values of the first quartile, the third quartile, and the median counted twice), Winsorized
mean (a robust version of the mean in which extreme scores are pulled back to the majority of
the data), and trimmed mean (a mean without outliers). It is important to point out that there is
a subtle difference between “resistance” and “robustness” though two terms are usually used
interchangeably. Resistance is about being immune to outliers while robustness is about being
immune to assumption violations. In the former, the goal is to obtain a data summary, while in
the latter the goal is to make a probabilistic inference. 4. Revelation or data visualization:
Graphing is a powerful tool for revealing hidden patterns and relationships among variables.
Typical examples of graphical tools for EDA are Trellis displays and 3D plots (Yu & Stockford,
2003). Although the use of scientific and statistical visualization is fundamental to EDA, they
should not be equated, because data visualization is concerned with just one data
characterization aspect (patterns) whereas EDA encompasses a wider focus, as introduced in
the previous three elements (NIST Semantech, 2006). According to NIST Semantech (2006),
EDA entails a variety of techniques for accomplishing the following tasks: 1) maximize insight; 2)
uncover underlying structure; 3) extract important variables; 4) detect outliers and anomalies;
5) test underlying assumptions; 6) develop parsimonious models; and 7) determine optimal
factor settings. Comparing the NIST’s EDA approach with Velleman and Hoaglin’s, and Behrens
and Yu’s, it is not difficult to see many common threads. For example, “maximize insight” and
“uncover underlying structure” is similar to revelation. LIMITATIO S OF COVE TIOAL VIEWS TO
EDA Although the preceding EDA framework provides researchers with helpful guidelines in
data analysis, some of the above elements are no longer as important as before due to the
emergence of new methods and convergence between EDA and other methodologies, such as
data mining and resampling. Data mining is a cluster of techniques that has been employed in
the Business Intelligence (BI) field for many years (Han & Kamber, 2006). According to Larose
(2005), data mining is the process of automatically extracting useful information and
relationships from immense quantities of data. Data mining does not start with a strong
preconception, a specific question, or a narrow hypothesis, rather it aims to detect patterns that
are already present in the data. Similarly, Luan (2002) views data mining as an extension of EDA.
Like EDA, resampling departs from theoretical distributions used by CDA. Rather, its inference is
based upon repeated sampling within the same sample, and that is why this school is called
resampling (Yu, 2003, 2007). How these two methodologies alter the features of EDA will be
discussed next.

CATEGORIES A D TECH IQUES OF EDA Clustering: TwoStep cluster analysis Clustering is

essentially grouping observations based upon their proximity to each other on multiple
dimensions. At first glance, clustering analysis is similar to discriminant analysis. But in the
latter the analyst musknow the group membership for the classification in advance. Because
discriminant analysis assigns cases to pre-existing groups, it is not as exploratory as cluster
analysis, which aims to identify the grouping categories in the first place. If there are just two
dimensions (variables), the analyst could simply use a scatterplot to look for the clumps. But
when there are many variables, the task becomes more challenging and thus it necessitates
algorithms. There are three major types of clustering algorithms: 1) Hierarchical clustering, 2)
non-hierarchical clustering (k-mean clustering), and 3) Two Step clustering. The last one is
considered the most versatile because it has several desirable features that are absent in other
clustering methods. For example, both hierarchical clustering and k-mean clustering could
handle continuous variables only, but Two Step clustering accepts both categorical and
continuous variables. This is the case because in Two Step clustering the distance measurement
is based on the log-likelihood method (Chiu et al., 2001). In computing log-likelihood, the
continuous variables are assumed to have a normal distribution and the categorical variables
are assumed to have a multinomial distribution. Nevertheless, the algorithm is reasonably
robust against the violation of these assumptions, and thus assumption checking is unnecessary.
Second, while k-mean clustering requires a pre-specified number of clusters and therefore
strong prior knowledge is required, Two Step clustering is truly data-driven due to its capability
of automatically returning the number of clusters. Last but not least, while hierarchical
clustering is suitable to a small data set only, Two Step clustering is so scalable that it could
analyze thousands of observations efficiently. As the name implies, Two Step clustering is
composed of two steps. The first step is called pre-clustering. In this step, the procedure
constructs a cluster features (CF) tree by scanning all cases one by one (Zhang et al., 1996).
When a case is scanned, the pre-cluster algorithm applies the log likelihood distance measure to
determine whether the case should be merged with other cases or form a new pre-cluster on its
own and wait for similar cases in further scanning. After all cases are exhausted, all pre-clusters
are treated as entities and become the raw data for the next step. In this way, the task is
manageable no matter how large the sample size is, because the size of the distance matrix is
dependent on just a few pre-clusters rather than all cases. Also, the researcher has the option to
turn on outlier handling. If this option is selected, entries that cannot fit into any pre-clusters
are treated as outliers at the end of CF-tree building. Further, in this pre-clustering step, all
continuous variables are automatically standardized. In other words, there is no need for the
analyst to perform outliers detection and data transformation in separate steps. In step two,
the hierarchical clustering algorithm is applied to the preclusters and then propose a set of
solutions. To determine the best number of clusters, each solution is compared against each
other based upon the Akaike Information Criterion (AIC) (Akaike, 1973) or the Bayesian
Information Criterion (BIC) (Schwarz, 1978). AIC is a fitness index for trading off the complexity
of a model against how well the model fits the data. To reach a balance between fitness and
parsimony, AIC not only rewards goodness of fit, but also gives a penalty to overfitting and
complexity. Hence, the best model is the one with the lowest AIC value. However, both Berk
(2008) and Shmueli (2009) agreed that although AIC is a good measure of predictive accuracy, it
can be over-optimistic in estimating fitness. In addition, because AIC aims to yield a predictive
model, using AIC for model selection is inappropriate for a model of causal explanation. BIC was
developed as a remedy to AIC. Like AIC, BIC also uses a penalty against complexity, but this
penalty is much stronger than that of the AIC. In this sense, BIC is in alignment to Ockham’s
razor: Given all things being equal, the simplest model tends to be the best one. To illustrate
Two Step clustering, a data set listing 400 of the world’s best colleges and universities compiled
by US news and World Report (2009) was utilized. The criteria used by US news and World
Report for selecting the best institutions include: Academic peer review score, employer review
score, student to faculty score, international faculty score, international students score, and
citations per faculty score. However, an educational researcher might not find the list helpful
because the report ranks these institutions by the overall scores. It is tempting for the
educational researcher to learn about how these best institutions relate to each other and what
their common threads are. In addition to the preceding measures, geographical location could
be taken into account. Because the data set contains both categorical and continuous variables,
the researcher employed the Two Step clustering analysis in Predictive Analytical Software
(PASW) Statistics (SPSS Inc., 2009). It is important to note that the clustering result may be
affected by the order of the cases in the file. In the original data set, the table has been sorted
by the rank in an ascending order. In an effort to minimize the order effect, the cases were re-
arranged in random order before the analysis was conducted. To run a Two Step cluster analysis,
the researcher must assign the categorical and continuous variables into the proper fields, as
shown in Figure 1, using BIC instead of AIC for simplicity

EDA AND RESAMPLING

At first glance, exploratory data mining is very similar to conventional EDA except that the
former employs certain advanced algorithms for automation. Actually, the differences between
conventional EDA and exploratory data mining could be found at the epistemological level. As
mentioned before, EDA suggests variables, constructs and hypotheses that are worth pursuing
and CDA takes the next step to confirm the findings. However, using resampling (Yu, 2003,
2007), data mining is capable of suggesting and validating a model at the same time. One may
argue that data mining should be classified as a form of CDA when validation has taken place. It
is important to point out that usually exploratory data mining aims to yield predication rather
than theoretical explanations of the relationships between variables (Shmueli & Koppius, 2008;
Yu, in press). Hence, the researcher still has to construct a theoretical model in the context of
CDA (e.g. structural equation modeling) if explanation is the research objective.
Resampling in the context of exploratory data mining addresses two important issues, namely,
generalization across samples and under-determination of theory by evidence (Kieseppa, 2001).
It is very common that in one sample a set of best predictors was yielded from regression
analysis, but in another sample a different set of best predictors was found (Thompson, 1995).
In other words, this kind of model can provide a post hoc model for an existing sample (in-
sample forecasting), but cannot be useful in out-of-sample forecasting. This occurs when a
specific model is overfitted to a specific data set and thus it weakens generalizability of the
conclusion. Further, even if a researcher found the so-called best fit model, there may be
numerous possible models to fit the same data. To counteract the preceding problems, most
data mining procedures employed cross-validation to enhance generalizability. For example, to
remediate the problem of under-determination of theory by data, neural networks exhaust
different models by the genetic algorithm, which begins by randomly generating pools of
equations. These initial randomly generated equations are estimated to the training data set
and prediction accuracy of the outcome measure is assessed using the test set to identify a
family of the fittest models. Next, these equations are hybridized or randomly recombined to
create the next generation of equations. Parameters from the surviving population of equations
may be combined or excluded to form new equations as if they were genetic traits inherited
from their “parents.” This process continues until no further improvement in predicting the
outcome measure of the test set can be achieved (Baker & Richards, 1999). In addition to cross-
validation, bootstrapping, another resampling technique, is also widely employed in data mining
(Salford Systems, 2009), but it is beyond the scope of this article to introduce bootstrapping.
Interested readers are encouraged to consult Yu (2003, 2007).

CONCLUDING REMARKS
This article introduces several new EDA tools, including TwoStep clustering, recursive
classification trees, and neural networks, in the context of data mining and resampling, but
these are just a fraction of the plethora ofexploratory data mining tools. In each category of EDA
there are different methods to accomplish the same goal, and each method has numerous
options (e.g. the number of k-fold cross-validation). In evaluating the efficacy of classification
trees and other classifers, Wolpert and Macready (1997) found that there is no single best
method and they termed this phenomenon “no free lunch” – every output comes with a price
(drawback). For instance, simplicity is obtained at the expense of fitness, and vice versa. As
illustrated before, sometimes simplicity could be an epistemologically sound criterion for
selecting the “best” solution. In the example of PISA data, the classification tree model is
preferable to the logistic regression model because of predictive accuracy. And also in the
example of world’s best universities, BIC, which tends to introduce heavy penalties to
complexity, is more favorable than AIC. But in the example of the retention study, when the
researcher suspected that there are entangled relationships among variables, a complex,
nonlinear neural net was constructed even though this black box lacks transparency. In one way
or the other the data explorer must pay a price. Ultimately, whether a simple and complex
approach should be adopted is tied to usefulness. Altman and Royston (2000) asserted that
“usefulness is determined by how well a model works in practice, not by how many zeros there
are in associated p values” (p.454). While this statement pinpoints the blind faith to p values in
using inferential statistics, it is also applicable to EDA. A data explorer should not hop around
solutions and refuse to commit himself/herself to a conclusion in the name of exploration;
rather, he/she should contemplate about which solution could yield more implications for the
research community. Last but not least, exploratory data mining techniques could be
simultaneously or sequentially employed. For example, because both neural networks and
classification trees are capable of selecting important predictors, they could be run side by side
and evaluated by classification agreement and ROC curves. On other occasions, a sequential
approach might be more appropriate. For instance, if the researcher suspects that the
observations are too heterogeneous to form a single population, clustering could be conducted
to divide the sample into sub-samples. Next, variable selection procedures could be run to
narrow down the predictor list for each sub-sample. Last, the researcher could focus on the
inter-relationships among just a few variables using pattern recognition methods. The
combinations and possibilities are virtually limitless. Data detectives are encouraged to explore
the data with skepticism and openness.
References
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle.
Altman, D. G., & Royston, P. (2000).What do we mean by validating a prognostic model?
Statistics in Medicine, 19, 453-473.

Baker, B. D., & Richards, C. E. (1999). A comparison of conventional linear regression methods
and neural networks for forecasting educational spending. Economics of Education Review, 18,
405-415.

NIST Sematech. (2006). What is EDA? Retrieved September 30, 2009, from
http://www.itl.nist.gov/div898/handbook/eda/secti on1/eda11.htm

Salford Systems. (2009). Random forest. [Computer software and manual]. San Diego, CA:
Author.

SPSS, Inc. (2009). PASW Statistics 17 [Computer software and manual]. Chicago, IL: Author.

Shmueli, G., Patel, N., & Bruce, P. (2007). Data mining for business intelligence: Concepts,
techniques, and applications in Microsoft Office Excel with XLMiner. Hoboken, N.J.: Wiley-
Interscience

Thompson, B. (1995). Stepwise regression and stepwise discriminant analysis need not apply
here: A guidelines editorial. Educational and Psychological Measurement, 55, 525-534.

Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley

Velleman, P. F., & Hoaglin, D. C. (1981). Applications, basics, and computing of exploratory data
analysis. Boston, MA: Duxbury Press
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE
Transactions on Evolutionary Computation, 1(1), 67–82.

Yu, C. H. (2003). Resampling methods: Concepts, applications, and justification. Practical

Assessment Research and Evaluation, 8(19). Retrieved July 4, 2009, from
http://pareonline.net/getvn.asp?v=8&n=19
Yu, C. H. (2007). Resampling: A conceptual and procedural introduction. In Jason Osborne (Ed.),
Best practices in quantitative methods (pp. 283298). Thousand Oaks, CA: Sage Publications
Yu, C. H. (2009b). Exploratory data analysis and data visualization. Retrieved October 10, 2009,
from http://www.creativewisdom.com/teaching/WBI/EDA.shtml

Unit I - Part I Notes
100% (7)
Unit I - Part I Notes
33 pages
Data Mining Notes
100% (1)
Data Mining Notes
75 pages
Relief Valve (Main) - Test and Adjust - Heavy Lift: 320D Excavator Hydraulic System
No ratings yet
Relief Valve (Main) - Test and Adjust - Heavy Lift: 320D Excavator Hydraulic System
6 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
7 pages
Exploratory Data Analysis Stephan Morgenthaler (2009)
100% (2)
Exploratory Data Analysis Stephan Morgenthaler (2009)
12 pages
827b551be7606030c4c1ca693fb54a0ed875
No ratings yet
827b551be7606030c4c1ca693fb54a0ed875
12 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
62 pages
unit-1
No ratings yet
unit-1
50 pages
Edab Module - 1
No ratings yet
Edab Module - 1
20 pages
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
Unit-1
No ratings yet
Unit-1
52 pages
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
No ratings yet
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
8 pages
EDA
No ratings yet
EDA
9 pages
ML Unit 3
No ratings yet
ML Unit 3
17 pages
Exploratory Data Analysis in ML
No ratings yet
Exploratory Data Analysis in ML
7 pages
Chapter 7 SQQS1033
No ratings yet
Chapter 7 SQQS1033
37 pages
Unit 5 Exploratory Data Analysis (EDA)
100% (1)
Unit 5 Exploratory Data Analysis (EDA)
41 pages
Unit 3
No ratings yet
Unit 3
31 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
23 pages
Unit - 1 EDA
No ratings yet
Unit - 1 EDA
123 pages
C21_SMA_EXP4[1]
No ratings yet
C21_SMA_EXP4[1]
12 pages
Data Science Process
No ratings yet
Data Science Process
30 pages
30-Additional Themes On Data Mining
No ratings yet
30-Additional Themes On Data Mining
9 pages
Unit 3
No ratings yet
Unit 3
47 pages
Edashsh
No ratings yet
Edashsh
7 pages
Unit i Exploratory Data Analysis
No ratings yet
Unit i Exploratory Data Analysis
38 pages
EDA Lecture notes
No ratings yet
EDA Lecture notes
205 pages
Data Availability: Ÿdopamine (Da)
No ratings yet
Data Availability: Ÿdopamine (Da)
107 pages
DataAnalytics(Unit 2)
No ratings yet
DataAnalytics(Unit 2)
131 pages
EDA Module1 Full Answers
No ratings yet
EDA Module1 Full Answers
5 pages
Exploratory_data_analysis
No ratings yet
Exploratory_data_analysis
7 pages
Data Mining
No ratings yet
Data Mining
34 pages
Notes Unit I
No ratings yet
Notes Unit I
47 pages
UNIT-2 BI
No ratings yet
UNIT-2 BI
26 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
Exploratory Data Analysis & Data Preprocessing
No ratings yet
Exploratory Data Analysis & Data Preprocessing
16 pages
DSV Module-2
No ratings yet
DSV Module-2
23 pages
Eda 2022 04 11 09352244
No ratings yet
Eda 2022 04 11 09352244
35 pages
Ch-1 Introduction To Data Analysis
No ratings yet
Ch-1 Introduction To Data Analysis
23 pages
Data Mining and Data Warehousing Notes ct1
No ratings yet
Data Mining and Data Warehousing Notes ct1
12 pages
Yihao Final Paper CCSC for Submission
No ratings yet
Yihao Final Paper CCSC for Submission
6 pages
AI6322 - Module 3 - Exploratory Data Analysis (EDA) - MODULE
No ratings yet
AI6322 - Module 3 - Exploratory Data Analysis (EDA) - MODULE
15 pages
Unit-1 Introduction To Data Mining
No ratings yet
Unit-1 Introduction To Data Mining
33 pages
Unit I Content Beyond Syllabus - I Introduction To Data Mining and Data Warehousing What Are Data Mining and Knowledge Discovery?
No ratings yet
Unit I Content Beyond Syllabus - I Introduction To Data Mining and Data Warehousing What Are Data Mining and Knowledge Discovery?
12 pages
Exploratory Data Analysis (EDA)
No ratings yet
Exploratory Data Analysis (EDA)
12 pages
DS Lecture 15
No ratings yet
DS Lecture 15
44 pages
Unit 1 Eda Qa (2marks)
No ratings yet
Unit 1 Eda Qa (2marks)
4 pages
Unit 2
No ratings yet
Unit 2
58 pages
Exploratory Data Analysis unit 2
No ratings yet
Exploratory Data Analysis unit 2
39 pages
Unit 3
No ratings yet
Unit 3
77 pages
Group-7
No ratings yet
Group-7
19 pages
The analysis_In_EDA
No ratings yet
The analysis_In_EDA
7 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
5 pages
Fundamentals of Data Source and Preparation For ML v31
No ratings yet
Fundamentals of Data Source and Preparation For ML v31
45 pages
Datamining and Analytics Unit V
No ratings yet
Datamining and Analytics Unit V
102 pages
big data analytics notes
No ratings yet
big data analytics notes
15 pages
DM-unit 1
No ratings yet
DM-unit 1
22 pages
Eda
100% (1)
Eda
12 pages
2010 Exploratory Data Analysis in the Context of Data Mining and Resampling
No ratings yet
2010 Exploratory Data Analysis in the Context of Data Mining and Resampling
15 pages
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
sustainability-15-09671
No ratings yet
sustainability-15-09671
17 pages
7j
No ratings yet
7j
1 page
Extra Questions
No ratings yet
Extra Questions
1 page
Lee Chap10
No ratings yet
Lee Chap10
29 pages
CS672TIFF
No ratings yet
CS672TIFF
5 pages
SSRN-id4335352 (1)
No ratings yet
SSRN-id4335352 (1)
18 pages
Closest-pair
No ratings yet
Closest-pair
14 pages
Recitation_NF_2
No ratings yet
Recitation_NF_2
4 pages
3rd Data Information
No ratings yet
3rd Data Information
26 pages
affine
No ratings yet
affine
2 pages
Software Project Planning
No ratings yet
Software Project Planning
33 pages
3161613
No ratings yet
3161613
1 page
DFD
No ratings yet
DFD
8 pages
SIR lect1
No ratings yet
SIR lect1
30 pages
1_intro_new
No ratings yet
1_intro_new
66 pages
Yusoff et al Dimensions driving business student satisfaction in higher educat
No ratings yet
Yusoff et al Dimensions driving business student satisfaction in higher educat
23 pages
Requirement Modelling
No ratings yet
Requirement Modelling
46 pages
Visualization of Line and Area Charts
No ratings yet
Visualization of Line and Area Charts
16 pages
SIR lect2
No ratings yet
SIR lect2
25 pages
Data File Handling
No ratings yet
Data File Handling
9 pages
SIR 2_Mobile Tranmission (2)
No ratings yet
SIR 2_Mobile Tranmission (2)
78 pages
SIR Notes of Mobile computing
No ratings yet
SIR Notes of Mobile computing
20 pages
Basic Electrical Technology (ELE - 101)
No ratings yet
Basic Electrical Technology (ELE - 101)
2 pages
Chapter 7 Curve Fitting(APY Material)
No ratings yet
Chapter 7 Curve Fitting(APY Material)
17 pages
Internships _ Jobs _ Trainings & Placement Guarantee Courses _ Post a Job WEB DEVELOPMENT
No ratings yet
Internships _ Jobs _ Trainings & Placement Guarantee Courses _ Post a Job WEB DEVELOPMENT
2 pages
Soal Dan Kunci Jawaban MTCNA
No ratings yet
Soal Dan Kunci Jawaban MTCNA
4 pages
Elt Systems Service Information Letter: Date: May 6, 2020 Number: SIL4013
100% (1)
Elt Systems Service Information Letter: Date: May 6, 2020 Number: SIL4013
4 pages
Tps400 Manual en
No ratings yet
Tps400 Manual en
154 pages
CS610 Quiz-3 by Vu Topper RM-2
No ratings yet
CS610 Quiz-3 by Vu Topper RM-2
46 pages
Strength Improvement and Prediction of Crack in Plastic Product
No ratings yet
Strength Improvement and Prediction of Crack in Plastic Product
38 pages
Disadvantages of Information Technology
100% (2)
Disadvantages of Information Technology
12 pages
Hyundai Transmission Problems
No ratings yet
Hyundai Transmission Problems
49 pages
4700 Vertical Multi-Stage Centrifugal Pumps: Installation and Operating Instructions
No ratings yet
4700 Vertical Multi-Stage Centrifugal Pumps: Installation and Operating Instructions
36 pages
Question paper on Microprocessors and Microcontrollers
No ratings yet
Question paper on Microprocessors and Microcontrollers
3 pages
Linde Li - ION 90V EN
No ratings yet
Linde Li - ION 90V EN
2 pages
Web Technology Lab Aim and Algorithm
No ratings yet
Web Technology Lab Aim and Algorithm
10 pages
02525
0% (1)
02525
3 pages
Working of Automatic Teller Machine (ATM) - Electronic Circuits and Diagram-Electronics Projects and Design
No ratings yet
Working of Automatic Teller Machine (ATM) - Electronic Circuits and Diagram-Electronics Projects and Design
10 pages
Art of Lean - Kaizen Course
No ratings yet
Art of Lean - Kaizen Course
73 pages
Assignment 1 IoTES
No ratings yet
Assignment 1 IoTES
2 pages
ECA
No ratings yet
ECA
14 pages
Devco Fitting
No ratings yet
Devco Fitting
35 pages
Ajith Selvam PDF
100% (1)
Ajith Selvam PDF
2 pages
Pre-Practica 05
No ratings yet
Pre-Practica 05
3 pages
Pricelist Hikvision Full Agustus 2018
No ratings yet
Pricelist Hikvision Full Agustus 2018
81 pages
Computerized Enrollment System
25% (8)
Computerized Enrollment System
29 pages
Indian Institute of Information Technology Guwahati: Summer Internship 2024
No ratings yet
Indian Institute of Information Technology Guwahati: Summer Internship 2024
2 pages
Production and Operations Management (Prof. K. C. Jain Etc.) (Z-Library)
No ratings yet
Production and Operations Management (Prof. K. C. Jain Etc.) (Z-Library)
476 pages
From Dimension Free Matrix Theory to Cross Dimensional Dynamic Systems 1st Edition Daizhan Cheng pdf download
100% (4)
From Dimension Free Matrix Theory to Cross Dimensional Dynamic Systems 1st Edition Daizhan Cheng pdf download
52 pages
Termos-Suitego Forsign 311 4353
No ratings yet
Termos-Suitego Forsign 311 4353
6 pages
Mobile Computing - Notes
No ratings yet
Mobile Computing - Notes
41 pages
Reading 1
No ratings yet
Reading 1
3 pages
4--Data Center Facility--Data Center Cooling Solution
No ratings yet
4--Data Center Facility--Data Center Cooling Solution
56 pages
Electronics 6th Sem
No ratings yet
Electronics 6th Sem
22 pages

Exploratory data analysis in the context of data mining and resampling

Uploaded by

Exploratory data analysis in the context of data mining and resampling

Uploaded by

Exploratory data analysis in the context of data mining and

COVENTIOAL VIEWS OF EDA

CATEGORIES A D TECH IQUES OF EDA Clustering: TwoStep cluster analysis Clustering is

EDA AND RESAMPLING

Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley

Yu, C. H. (2003). Resampling methods: Concepts, applications, and justification. Practical

You might also like