0% found this document useful (0 votes)

6 views13 pages

Using Visual Data Mining

This article discusses enhancements to traditional statistical process control (SPC) tools using visual data mining techniques to better understand and diagnose process variation in manufacturing. It presents a case study from a US manufacturer of structural tubular metal products, demonstrating how updated tools can provide insights into defect occurrences and trends over time. The authors also introduce a quality visualization toolkit (QVT) that allows practitioners to implement these visual tools easily without extensive statistical training.

Uploaded by

DARIO HERNANDEZ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views13 pages

Using Visual Data Mining

Uploaded by

DARIO HERNANDEZ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Special Issue Article

(wileyonlinelibrary.com) DOI: 10.1002/qre.1706 Published online 4 August 2014 in Wiley Online Library

Using Visual Data Mining to Enhance the

Simple Tools in Statistical Process Control:
A Case Study
Huw D. Smith,a Fadel M. Megahed,a*† L. Allison Jones-Farmerb
and Mark Clarkb
Statistical process control (SPC) is a collection of problem-solving tools used to achieve process stability and improve process
capability through variation reduction. Because of its sound statistical basis and intuitive use of visual displays, SPC has been
extensively used in manufacturing and health care and service industries. Deploying SPC involves both a technical aspect and
a proper environment for continuous improvement activities based on management support and worker empowerment.
Many of the commonly used SPC tools, including histograms, ﬁshbone diagrams, scatter plots, and defect concentration
diagrams, were proposed prior to the advent of microcomputers as efﬁcient methods to record and visualize data for single
(or few) variable(s) processes. As the volume, variety, and velocity of data continues to evolve, there are opportunities to
supplement and improve these methods for understanding and visualizing process variation. In this paper, we propose
enhancements to some of the basic quality tools that can be easily applied with a desktop computer. We demonstrate
how these updated tools can be used to better characterize, understand, and/or diagnose variation in a case study involving
a US manufacturer of structural tubular metal products. Finally, we create the quality visualization toolkit to allow
practitioners to implement some of these visualization tools without the need for training, extensive statistical background,
and/or specialized statistical software. Copyright © 2014 John Wiley & Sons, Ltd.

Keywords: animated graphs; ﬁshbone diagram; phase I methods; statistical engineering; visual analytics

1. Introduction
he goal of data analysis is to gain understanding from the collected data. A feature of process data is that it varies over time. The

T information in this variation is important to the understanding of how the process is performing, and statistical process control
(SPC) is the primary tool for monitoring and reducing this variation1 [p. 1]. Key tools used in SPC and process improvement in
general, include the histogram, check sheet, Pareto chart, cause-and-effect diagram, defect concentration diagram, scatter diagram,
and the control chart. The collective use of these seven visual tools has been popularized by Prof. Kaoru Ishikawa in the 1950–1960s.2
These tools are often referred to as the seven (basic) quality tools.3 Using these visual tools allowed factory workers to diagnose and
possibly eliminate their quality problems without detailed knowledge of statistics. This has been a well-documented reason for the
widespread adoption of these early SPC methods.3
Modern SPC applications are quite different from those of the early SPC applications, see, for example, Megahed et al.,4 Wells
et al.,5 and Wells et al.6 The volume of data being acquired from production systems continues to expand at exponential rates.7
Emerging measurement technologies (such as coordinate measuring machines, machine vision systems, and 3D surface scanners)
diversiﬁes the types of data being collected, pushing data collection away from the historically low-dimensional data. The use of
computerized data acquisition systems has transformed the nature of the process monitoring and control problem, as real-time
process data are now available on hundreds of processes and product quality characteristics.5,6,8
Despite the changes in the nature and volume of process data, the seven basic quality tools remain the most widely used methods
in industry. These basic tools constitute the basis for much of the work in six sigma9 and lean manufacturing.10 Simple graphical tools
have been historically useful in understanding the nature of process variation and diagnosing the root causes for process changes yet
have received little attention in the literature.5 To address the importance of a more holistic view of SPC, Hoerl and Snee11 proposed a
new paradigm for quantitative approaches to quality improvement, statistical engineering. Statistical engineering is deﬁned as ‘the

a
Department of Industrial and Systems Engineering, Auburn University, Auburn, AL 36849, USA
b
Department of Aviation and Supply Chain Management, Auburn University, Auburn, AL 36849, USA
905

*Correspondence to: Fadel M. Megahed, Department of Industrial and Systems Engineering, Auburn University, Auburn, AL 36849, USA.
†
E-mail: [email protected]

study of how to best use statistical concepts, methods and tools, and integrate them with IT and other relevant sciences to generate
improved results’.11
Aligned with the motivation behind statistical engineering, we explore how visual data mining tools can enhance some of the well-
known SPC graphical tools. We highlight several enhancements to the traditional seven basic tools. These enhancements were
deployed in a US manufacturer of structural tubular metal products. So that others may use these enhancements, we provide the
quality visualization toolkit (QVT) as supplemental material to this paper. The QVT contains an Excel program to create the measure
of risk and error (MORE) plot (as well as other plots discussed later) based on user input data. Microsoft Excel is a widely used software
package that can be found in almost any industry; its ease-of-use and widespread popularity among engineering practitioners made
it a compelling choice for our QVT.
We begin in Section 2 by providing background information on the ﬁeld of visual data mining and discuss the importance of sound
data collection methods. In Sections 3 and 3.1, we discuss tools for visualizing discrete and continuous data, respectively. In
Section 3.2, we discuss methods for visualizing relationships among variables. Advice to practitioners and concluding remarks are
given in Sections 4 and 4.2.

2. A brief introduction to visual data mining

Visual data mining is an approach to exploratory data analysis that is based on the integration of concepts from computer science,
cognitive psychology, and data analysis to assist in uncovering trends and patterns that may be missed with other nonvisual
methods.12 Visual data mining also helps overcome one of the main limitations in data mining approaches, where the ‘data is
analyzed in a hypothesis testing mode in which one might have a priori notions about what the important results will be before
the analysis actually begins’13 [p. i]. The use of visualizations has proven to be a simple, effective, and assumption-free approach to
discover trends in the data.12–15 In addition, visualizing the associations and correlations among the data can provide a solid
foundation for statistical and mathematical modeling in the cases when additional analysis is needed.
The use of visualizations in statistical and mathematical sciences is not a new phenomenon, as it dates back to the visual proof of
the Pythagorean theorem. A more applied example can be seen in the work of John Snow whose plots of the cholera outbreak on the
London Map in the 1850s allowed him to discover the cause of cholera.16,3,4 Wickham17 discusses some of the historical foundations
of statistical graphics. Well-done graphical displays can help us to solve complex problems without making any assumptions or the
need to understand complicated mathematical or statistical algorithms. The visual exploration of data can also be used to supplement
model-based methods, and often lead to better results, especially in situations when automated data mining algorithms fail.12
Not all graphical representations of data are useful, and some can be misleading. There are several factors/guidelines that can help in
choosing/developing informative statistical graphics. For example, Tufte18,13–15 introduced the term graphical excellence to reﬂect on
graphics that communicate complex ideas with clarity, precision, and efﬁciency. Keim et al.12 provided some general rules for expressive
and effective visualizations. The expressiveness of a visual relates to the constraint that all relevant attributes, without any others, must be
expressed by the visualization.19 Effective graphics are the ones that allow the viewer to interpret the information hidden within the data
correctly and quickly. We consider these guidelines when we develop our graphics for quality engineering applications.

3. Visualizing discrete data distributions

In this section, we highlight several tools that the authors have deployed in a leading US manufacturer of tubular metal products
while dealing with count data. The structural tubular metal product is created in a foundry where ingredients are combined and
heated to a specific temperature in a furnace. Once the mixture conditions are met and the molten metal has reached the correct
temperature, the molten metal is poured from the furnace into a ladle. The ladle is then transported to the molds where the molten
metal is poured into the molds and the tubular products are formed. Once the tubes are cooled to a certain temperature, they are
removed from the mold, and quality data are obtained. Critical-to-quality data are collected on the final product.
The data described throughout this paper is based on data obtained from this manufacturer but has been modified for
presentation to maintain confidentiality. Discrete, as well as continuous measurements are of interest in metal casting operations.
Discrete measurements of interest include defect counts and the location of defects on the product. This data is used to drive quality
improvement through deeper process understanding. In many cases, charts of discrete count data might be the first indication that
the process could be out-of-control, and the proportion nonconforming is increasing.

3.1. Pareto chart

Pareto charts are widely used to categorize defects or quality problems from the most frequently occurring to the least. One
limitation of Pareto charts is they do not provide temporal information on the occurrence of these defects. For example,
consider the Pareto chart in Figure 1. Under the assumption that the cost exposure for all these defects is the same, a
practitioner may conclude that it is most important to investigate the root causes behind the Joint Flash problem. This
conclusion is correct only if an analysis of the temporal nature of each of these defects show that there is no evidence for
906

an improved rate of the Joint Flash defect over the data collection period. However, if this defect only occurred in the ﬁrst

Figure 1. A Pareto chart of 1-year data from the metal casting operation

week of the data, this conclusion may not be valid (it would still be interesting to identify why the process improved) and may
be more urgent to focus on improving a different aspect of the process. Accordingly, it is useful to develop software tools that
allow users to observe temporal effects with the defect occurrence rate. This can be carried out through either animation or a
plot of the incidence rate of different defects over time.
Our proposed method for adding a temporal aspect into Pareto charts includes either the use of animation (Figure 2) or a line
chart that plots the cumulative occurrence of defects over time (Figure 3). Although it is not possible to show the full effect of the
animation here, Figure 2 shows three screenshots of the Pareto chart as it develops over time. The same period of the company
casting defect data was plotted on the line chart in Figure 3, yielding valuable additional information into the temporal nature
of defect development. A quick inspection of a typical Pareto chart Figure 1 leaves the practitioner prioritizing the blue (Joint Flash),
red (sand hole), and ﬁnally green (rippling) as the top three defect areas to focus on. However, the temporal analyses given in
Figures 2 and 3 yield an additional insight; the mechanical damage defect developed most of its occurrences in just a few small
jumps beginning at time 231. In this case, the same human error was made on several occasions within a short period. Additionally,
the practitioner gets an idea of the rate at which each defect is developing and can monitor whether the rate of change of the
defects accelerates over time. Both the animated Pareto chart and the cumulative line plot of defect occurrences can be created
in the QVT.
It is important to note that the animated Pareto reveals a very important piece of information for the metal casting operation.
Rippling stays relatively ﬂat in the middle of the analyzed period but starts to grow quickly around time 141. This pattern cannot
be extracted from the traditional Pareto chart in Figure 1. More importantly, this pattern indicates a possible tooling problem that

907

Figure 2. Snapshots of the animated Pareto chart feature taken from the provided QVT tool

Figure 3. Cumulative line chart for metal casting process

is developing over time that needs addressing in order to reduce rippling defects in future products. Rippling is a common defect in
metal casting operations and can be caused by a wear problem in the extrusion process.

3.2. Defect concentration diagram

A defect concentration diagram is a graphical tool that is used to analyze the causes of defects on a part or product. The defect
concentration diagram includes a schematic drawing of the part of interest, showing all the relevant views. Various types of
defects are overlaid on the drawing, indicating the location of the defect on the part. The diagram is analyzed to see if the location
of the defects on the part provides any useful information about the potential root cause of the defects. A common approach to
using a defect concentration diagram includes regression analysis. Speciﬁcally, a general framework for the occurrence of a
defect (or event in a nonmanufacturing setting) can be represented by the following logistic regression model:20

p
logit ðpÞ ¼ log ¼ gðx Þ ¼ β0 þ β1 X 1 þ … þ βk X k ; (1)
1p

where p is the defect rate and X1,…,Xk represent a comprehensive list of all potential processes and product variables that can
cause the defect. This comprehensive list can be obtained via a traditional fishbone diagram, machine learning techniques and/
or engineering process knowledge. Here, we focus on a visual data mining approach to generate insight without the need for
advanced statistical knowledge, evaluating the addition of interaction terms or the addition of any nonlinear terms to Eq. (1). Thus,
we suggest that practitioners would incorporate several of these characteristics (potentially ones they think are the most relevant)
to the defect concentration diagram. An illustrative example highlighting the differences between the more traditional defect
concentration diagram and our proposed is shown in Figures 4 and 5, respectively. In Figure 4, practitioners are asked to specify
only the location and type of the fault. In our modifications, we suggest to encode additional information using size to specify
the magnitude of the fault and color to indicate the shift where the fault has occurred. It is important to note however that our
QVT allows encoding additional information pertaining to the predictor variables of Eq. (1). For example, symbol outline color
can be used to encode an additional variable, and animation can be used to show temporal effects. Inclusion of these additions
in a quality/data analysis environment simplifies root cause analysis, presents valuable information about the process, and provides
pointers to practitioners regarding the right course of action to take (because it provides more visual information regarding
important predictor variables).
Figure 5 depicts a 360-degree view diagram of a tubular metal product produced in the casting operation. The diagram is
simplified for demonstration purposes. The casting operation performs visual inspections on all of their tubular metal products as part
of their quality program to look for pinholes, cracks, and other defects. Similar to the Pareto chart, the multicharacteristic
spatiotemporal defect concentration diagram uncovers information hidden by the aggregation process. Three snapshots of the visual
inspection results are shown in Figure 6 in time order. The animated diagram provides insight into the temporal nature of quality yet
again as time goes on, the ‘ridging’ defect spreads in an orderly fashion along the length of the product. A quick look into the process
shows that over time, the hydraulic support pistons are not supporting the tubular products evenly causing an angle to develop
during the extrusion process. This angle, caused by tool fatigue, creates a ‘brushstroke’, or ridging along the tubular metal product.
Other useful quality applications of the multicharacteristic data carrier detect are identification of time-related defect clusters. The
importance of such a phenomenon extends beyond metal casting operations. For example, when monitoring wafer production
processes, it is well documented that defective chips often occur in clusters or display systematic patterns,21 resulting from variations
908

in manufacturing process conditions.22

Figure 4. Traditional defect concentration diagram that only highlights the type and location of defects. This defect concentration diagram is from Montgomery3 [p. 212]

Figure 5. A multicharacteristic defect concentration diagram depicting the type of defect, its location, size, and shift number (i.e. morning, afternoon, and evening)

4. Visualizing continuous data distributions

4.1. The box plot
The box plot is a graphical display that can simultaneously depict several important features of the data, such as central tendency,
variability, departure from symmetry, and the identification of outliers. These features are shown through the ‘display of three quartiles,
the minimum and maximum of the data on a rectangular box, aligned either horizontally or vertically’3 [p. 75]. The box plots are
commonly used in designed experiments because they can provide insight to differences in location and/or scale parameters among
groups as well as the presence of potential outliers. The NIST/SEMATCH e-Handbook of Statistical Methods23 states that the box plot is
… an important EDA (exploratory data analysis) tool for determining if a factor has a significant effect on the response with
respect to either location or variation. The box plot is also an effective tool for summarizing large quantities of information.
Although box plots provide excellent summaries of data, the appearance of the box plot is quite dependent on the sample size.
Further, a box plot gives no information as to statistical differences among groups or adequacy of the sample size. One solution to
these limitations is the MORE plot. First proposed by Nelson,24 the MORE plot differs from the typical box plot in several ways:
inclusion of the mean in addition to the median; the use of flexible quantiles (instead of the usual first and third); and the presentation
of confidence intervals for the quantiles. A comparison of a standard box plot versus the MORE plot is shown in Figure 7.
909

This comparison is based on the casting temperature for one of the plant’s furnaces. Temperature is an important factor in metal

Figure 6. Snapshots of the animation of the multicharacteristic spatiotemporal defect concentration diagram

Figure 7. A box plot versus an MORE plot of casting temperature

casting operations as it can lead to defects, such as scabbing. Scabbing occurs when the formation of shells or irregular crusts from
the surface of a cast piece. As the temperature increases, the chance of scabs occurring also increases. The plots in Figure 7 are based
on 5706 observations obtained from temperature gages in the plant. The similarities between the two are visually obvious, but the
MORE plot gives the user flexibility to choose quantile values (here, we show the fifth and 95th quantiles on the MORE plot) and
overlays the confidence intervals for the quantiles. We arbitrarily chose the fifth and 95th percentiles, but as Banks et al.25 stated,
any percentiles (symmetric or asymmetric) could be chosen. The small shaded regions surrounding the upper and lower quantiles
show 95% confidence intervals created using Banks et al.25 large sample formula.
While there are some similarities between the two plots in Figure 7, the MORE plot can be more informative. The MORE plot helps
in verifying sample size adequacy, which is important for determining the amount of estimation error through its inclusion of
confidence intervals. For example, the tight confidence intervals depicted in Figure 7 indicate that the sample size for estimating
temperature is adequate and that the estimation error is negligible. This is not surprising because a sample size of ~5000 is often
considered to be sufficient to estimate the mean of a continuous variable.
Figure 8 shows a MORE plot of a random sample (without replacement) of 500 observations from the temperature data. When
comparing this plot with the MORE plot shown in Figure 7, it is clear that the width of the confidence regions for the quantiles is much
larger in Figure 8, indicating the presence of more sampling variability due to the smaller sample size. Although not used for
inference, this can help the user to better frame interpretations from the plots in terms of sample size and expected sampling
fluctuations. Plots similar to those in Figures 7 and 8 can be created in the QVT.
To take advantage of the time ordering of SPC data, we introduce the concept of an animated MORE plot. Although we cannot
present the animation directly in the paper, we have provided the animation tool in the QVT. The animated MORE plot gives the visual
appearance of the MORE plot but is animated to show distributional changes over time. In Figure 9, we give a snapshot of six
consecutive points in the MORE animation of the temperature data using the same quantiles and confidence interval settings. When
viewed in succession, a clear picture of how the process variability is changing is apparent. These plots were used to visualize the
changing casting temperature distribution at certain time intervals (to correlate with scabbing frequency). For processes measured
by continuous variables, the animated MORE plot provides clear benefits and allows the user to quantify the impact of low precision.

4.2. The histogram

The histogram graphically displays information regarding the distribution of quantitative variables, including shape, location, and
910

scale. The histogram can also indicate the presence of potential outliers and multiple modes within the data. As with the box plot,

Figure 8. The MORE plot for a random sample of 500 observations from the casting temperature dataset

Figure 9. A snapshot of six consecutive periods in a casting temperature data animation

there are opportunities for making the histogram more informative by incorporating additional information. For example, several
statistical software packages allow users to fit a distribution to the histogram to better characterize the shape of the distribution, as seen
in Figure 10. In many cases, we do not find this feature to be effective, and sometimes, it can be misleading. For example, it is not clear
whether the shape of the histogram matches the normal fit in Figure 10. A discussion on how traditional histograms can be improved
can be found in Potter et al.26 who created the summary plot to combine information from the box plot and the histogram.
Similar to the box plot, the histogram only provides information about the data distribution at a static point in time. In our QVT, we
have provided an animated histogram that allows the practitioner to see the binning of the data develop over time. Understanding
the temporal behavior of a quality metric provides valuable insight into what might be causing certain types of quality problems or
defects, information on what rates certain defects change relative to others, and a much clearer understanding of what happened in
the process during a certain period. An animated histogram allows the user to glean additional knowledge about their process by
giving clues to the cause of problems or changes. The three snapshots in Figure 11 show a histogram of metal product weight in
the midst of animation. The practitioner would expect to see the bins fill up in a random fashion, tending to normalize over time.
However, in this application, there is a process shift around two-thirds of the way through the animation. This could indicate tool
wear, a shift in casting temperature, or some other quality variable that merits investigation. Detecting the exact time a shift takes
place in this specific example could yield many new insights causing the change: a sudden shift in the elemental composition of
the pour mixture, a wear problem with the pour mechanism, a shift change (operator error), and so on. Insights such as these can
minimize the time the practitioner spends on exploratory data analysis, leading to faster diagnosis and corrective action.
In addition to the animated histogram, our QVT provides the user the option to overlay confidence intervals of selected quantiles of
the distribution onto a static histogram. This is similar to the use of the MORE box plot but with the visual depiction of the histogram.
Figure 12 shows an example histogram of the cast weight data overlaid with 95% confidence intervals of the fifth and 95th quantiles and
the mean of the distribution. In order to further explore the temporal nature of the histogram, we have also supplemented the MORE
histogram with a cumulative frequency plot for each histogram bin. The cumulative frequency plot of the bins provides a similar analysis
as the animated histogram, allowing the practitioner to see how the frequency in each bin develops over time, plotting each bin’s rate of
911

development as a line. We do not provide a ﬁgure here as we have provided a similar plot in Figure 3 along with analysis.

Figure 10. A histogram of 100 normally distributed points N(10, 1), with a ﬁtted distribution

Figure 11. Snapshots of the animated MORE histogram of casting weight

Figure 12. Overlaid percentiles with conﬁdence intervals to a histogram of 1344 casting operation weights

5. Visualizing the relationships among quality variables

In this section, we discuss additional visualization tools to better understand relationships among variables. The metal casting dataset
will be used to demonstrate the power of classical visual data mining techniques (classiﬁcation and regression trees) when used in
912

conjunction with the ﬁshbone diagram.

5.1. Scatter plot

A scatter plot is a standard technique to show potential relationships between two variables. Recently, Xie27 suggested that incorporating
animation can facilitate the identification of patterns in 2D plots. It is worth noting that Xie27 developed an R animation package: animate,
which contains about 30 functions that are related to statistical analysis and simulation modeling. The reader is encouraged to read Xie’s
work on his/her computer because some animations are also embedded in the PDF. In addition to Xie,27 an animated ‘motion bubble
chart’ made famous by Rosling28,29 makes possible the visualization of multidimensional data including a time dimension (Figure 13).
Similar to a basic scatter plot, points are plotted on the X- and Y-axes of a chart as bubbles. The size of the bubbles defines the third
dimension, and the colors of the bubbles may indicate separate data points or may be categorized according to method fourth grouping
variable. The fifth and final dimension contained in an animated bubble chart is time. Animation provides the ability to visualize the way
the relationships among variables change over time. In Figure 13, we perform exploratory data analysis on the metal casting operation
using the motion bubble chart. Because of confidentiality reasons, we cannot publish the identities of the element ratios. Comparing
the ratio of two elements (x- and y-axes) over time and temperature (size) allows the user to draw inferences about the cause of failures
in the casting operation. Although we cannot present the animation directly in the paper, a snapshot from one time point is provided.
Rosling28,29 has many examples of such charts on his website, www.gapminder.org. For quality engineering applications, such
charts can be used to visualize the variation among several variables. This type of chart allows one to understand general trends
among multiple variables on a 2D chart. We do not include the motion bubble chart in our QVT because there are numerous freely
available applications to create these charts. Figure 13 was created using the open-source Google spreadsheet application. For a
detailed introduction on the use of that application, the reader is referred to Jacks.30

5.2. Fishbone (cause-and-effect) diagram

The fishbone diagram was developed as an analysis tool to analyze potential root causes for a quality problem. Typically, the fishbone
diagram is used as a brainstorming tool by a quality improvement team in the analyze and improve steps of the define, measure,
analyze, improve, and control process3 [p. 210]. There are currently several automated tools that allow practitioners to generate
computer-based fishbone diagrams. For such tools, the practitioners decides on what goes into each of the bones, that is, what
are the potential causes for the machine, materials, methods, measurement, and man groupings. Once the diagram is constructed,
in-depth discussion helps the group to reach consensus on the most likely cause of the quality problem and then investigate which
of the underlying factors led to the occurrence of the problem.
Although widely used by practitioners and quite effective in many situations, the fishbone diagram may not be useful in all
situations. In situations when the process is complex or the quality team is substantially biased towards unsubstantiated causes of

913

Figure 13. An example of a motion bubble chart comparing two element ratios over time, using temperature as the size parameter. The colors of the bubbles correspond
to a pass/fail state

problems, it would be more beneficial to turn to the data for answers. Modern data analytic and computing tools allow the use of
machine learning methods15,31 such as classification and regression trees (CART) for fault identification and diagnosis. With data of
sufficient quality, these methods can be used to assign likelihood values for each possible root cause and may help to uncover
new insights into the source of quality problems more effectively than a fishbone diagram alone can.
Over the last few years, many software packages allowing high-powered data mining have become available and are beginning to
be used in data environments (manufacturing, health care, financial institutions, etc.). However, ‘big data’ and ‘data mining’ are
growing rapidly and many people are unfamiliar with the techniques. The ability to create a predictive model (using classification
and regression tree algorithms) to predict a dependent quality variable for a process based on several independent variables
presents a significant opportunity in the fields of quality, SPC, and data visualization. Understanding what combinations of
independent variables influence a dependent quality variable is beneficial in several ways: deeper process understanding and
the potential to reduce inspector costs as the process data management/capture system can predict based on this model in the
inspector’s place. A fundamental understanding of data mining is required to create such models with the currently available
software packages. Examples of this type of software include IBM’s SPSS Modeler, StatSoft’s Statistica, XLSTAT’s Excel Add-in, JMP
Pro, and SAS Enterprise Miner.
To illustrate the visualization power and intuitive appeal of CART methods, we used Statistica to generate a CART model for
chemistry data from the metal casting operation. The model predicts a categorical dependent variable with two possible values: pass
or fail. Chemical composition values obtained from spectroscopy were used as the 15 continuous predictor variables. It is important to
note here that CART does not limit the practitioner to a categorical response and can handle continuous responses also. Figure 14
shows a CART model that can help predict/distinguish pass or fail performance of tubular metal products.
In Figure 14, one can see that the ‘rule-based flow-chart’ type of analysis provided by the tree is very intuitive to understand. From
the 15 elements used to predict a pass/fail outcome for a tubular metal product, it is instantly obvious that element 12 is extremely
important to the final quality outcome of the product. The first level of the tree splits based on element 12 having a high or low value.
914

Figure 14. An example of the output from a classiﬁcation and regression tree analysis. This tree contains far fewer nodes for demonstration purposes, as many splits and
nodes as desired can be displayed with most software packages

For the high value, the CART model predicts almost entirely ‘pass’ outcomes. For the low value, the model predicts a more equal ratio
of pass versus fail. As the user traces further down the tree, it indicates various levels of elements that, in combination, lead to a
predicted ‘fail’ outcome. The level of classification and number of nodes present in a CART model is flexible, so more pinpointed
classification is possible if desired. While the fishbone diagram identifies potential causes and effects of problems, the CART model
identifies the values of the independent variables that are related to the different quality classifications and gives the practitioner
information about which variables are most important to quality ratings. This gives the practitioner a simple, visual, and objective
method for identifying potential root causes using data-driven approaches.
Similar data-driven methods are currently deployed in several large manufacturing corporations. For example, in the 1990s,
General Electric and Snecma S.A. developed the CASSIOPEE troubleshooting system to diagnose and predict problems in the Boeing
737 and Airbus A340s.31,32 Their system was based on deriving families of faults using clustering methodologies. It proved to be very
successful; it was adopted by three major European airlines31 and received the European first prize for innovative applications.33 For
an introduction on using machine learning approaches in fault diagnosis, we refer the reader to Isermann.34,35 It is important to note
that the CASSIOPEE troubleshooting system is made possible by computerized data management systems. We see the impact of data
management systems to continue to grow for fault identification and diagnosis, an important area for statistics research as
highlighted by Nair et al.36

6. Advice to practitioners
In this paper, we discuss potential improvements to the use of simple tools and graphics in SPC. The role of graphics remains as a
valuable explanatory data analysis toolset, but practitioners should use caution in making decisions on graphical tools alone. For
example, the use of any of the spatiotemporal visualizations should be used to gain insight on potential factors of interest. However,
analysis based on observational data should not be taken to infer a causal relationship among variables. Conclusions indicating causal
relationships can only be obtained with carefully designed and controlled experiments, and even then, causal inference is difficult to
confirm. For an excellent discussion on the criteria required to infer causality in epidemiological studies, see Hill.37 Although written
primarily for epidemiological studies, many of the criteria relate to more general research domains.
To use many of the methods described in this paper, practitioners may need to explore using multiple software
simultaneously. A positive aspect of the field of visual data mining is that the development and updates occur quite frequently
and that the field is heavily dependent on open-source software. Because of cost and other limitations, a significant portion of
the SPC and statistics research community has migrated to the open-source R package for computations and data analyses. We
believe that R and similar open-source technologies can allow for more flexibility in deploying state-of-the-art research into
practice. That being said, any migration of data management systems should be carried out with caution to avoid problems.
There exist several ‘top’ lists of visualization software online, see, for example, Suda.38 These lists are useful for generating
insights and learning about the wide variety of tools that are available online. However, the choice of software (or data
management systems) remains application dependent. For the tools developed/discussed in this paper, we chose Microsoft
Excel because of its widespread popularity among engineering/business practitioners, and it being the application of choice
by our customer in the case study.

7. Concluding remarks
In this paper, we provided an overview of visual data mining and the role it can play in transforming our understanding of different
data types (continuous, discrete, multivariate, and relationship among different variables). We also developed an Excel-based toolkit
that provides several enhancements over the basic quality tools. We encourage more research and case studies in how statistical
graphics can be used to explore SPC data and/or to communicate the results from statistical models and experiments to practitioners.
Work is needed on developing strategies to develop insights in big data analytics applications. In particular, additional methods are
necessary to diagnose control chart signals. The combination of visual data mining with control charting principles may prove helpful
in fault identiﬁcation and diagnosis, as shown in Wells et al.5 Bersimis et al.39 provided an overview of the visualization tools used in
interpreting out-of-control signals in multivariate SPC charts. There remains signiﬁcant work to be done in this area, including
integrating these methodologies into open-source software, better understanding the limitations of these approaches, and ensuring
that these approaches can be generalized to multiple domains.

Acknowledgements
The work of the ﬁrst two authors was partially supported by the NIOSH Deep South Center for Occupational Safety and Ergonomics,
grant number G00007701. The authors would like to thank Mohammad Ansari, Ashkan Negahban, and Prof. Jeffrey S. Smith for their
valuable insights regarding the MORE plots. We would also like to thank Ali Dag for his valuable discussions on machine learning with
the ﬁrst author. Prof. Jeffrey S. Smith is the Joe W. Forehand Jr. Professor of Industrial and Systems Engineering at Auburn University,
915

and the other aforementioned three researchers are PhD students at Auburn University.

Supplemental material
The QVT is located at: http://www.eng.auburn.edu/users/fmm0002/QVT_Smith_Program.zip. The contents of this link will be
continuously updated based on users’ feedback.

References
1. Stapenhurst T. Mastering Statistical Process Control: A Handbook for Performance Improvement Using Cases. 2005; Amsterdam; Boston: Elsevier
Butterworth-Heinemann. xxxv.
2. Big Data - What is it? | SAS. 2013. Available from: http://www.sas.com/big-data/. Last accessed on: 5/8/2013.
3. Montgomery DC. Introduction to Statistical Quality Control. 7th ed. 2013; Hoboken, NJ: Wiley. 754.
4. Megahed FM, Woodall WH, Camel JA. A review and perspective on control charting with image data. Journal of Quality Technology 2011; 43(2):83–98.
5. Wells LJ, Megahed FM, Camelio JA, Woodall WH. A framework for variation visualization and understanding in complex manufacturing systems.
Journal of Intelligent Manufacturing 2012; 23(5):2025–2036.
6. Wells LJ, Megahed FM, Niziolek CB, Camelio JA, Woodall WH. Statistical process monitoring approach for high-density point clouds. Journal of
Intelligent Manufacturing 2013; 24(6):1267–1279.
7. What is Big Data? | SAS. 2014 Available from: http://www.sas.com/en_us/insights/big-data/what-is-big-data.html. Last accessed on 07/17/2013.
8. Megahed FM, Jones-Farmer LA. Statistical perspectives on ‘Big Data’. In XIth International Workshop on Intelligent Statistical Quality Control. 2013;
Sydeny, Australlia: Physica Verlag Heidelberg.
9. Pyzdek T, Keller PA. The Six Sigma Handbook: A Complete Guide for Green Belts, Black Belts, and Managers at all Levels. 3rd ed. 2010; New York:
McGraw-Hill Companies. xii.
10. Spann MS, Adams M, Rahman M, Czarnecki H, Schroer BJ. Transferring Lean Manufacturing to Small Manufacturers: The Role of NIST-MEP. 1999;
University of Alabama in Huntsville. Available from: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.201.6147&rep=rep1&type=pdf. Last
accessed on 07/10/2013.
11. Hoerl RW, Snee RD. Closing the gap. Quality Progress 2010; 43(5):52–53.
12. Keim DA, Müller W, Schumann H. Visual data mining. State of the art report. In Eurographics’ 2002. 2002; Saarbruecken, Germany: Eurographics Association.
13. Simoff S, Böhlen MH, Mazeika A, Visual Data Mining: Theory, Techniques and Tools for Visual Analytics. Vol. 4404. 2008; Springer.
14. Greitzer, FL, Noonan CF, Franklin LR. Cognitive Foundations for Visual Analytics. 2011, Paciﬁc Northwest National Laboratory. Available from: http://
www.pnl.gov/main/publications/external/technical_reports/PNNL-20207.pdf. Last accessed on 08/01/2013.
15. Han J, M Kamber, Data Mining: Concepts and Techniques. 3rd ed. 2011; Burlington, MA: Elsevier. 703.
16. Rajaraman A, J Leskovec, JD Ullman. Mining of Massive Datasets. 2012, Cambridge University Press: New York. Available from: http://i.stanford.edu/
~ullman/mmds/book.pdf. Last accessed on 08/01/2013.
17. Wickham H. Graphical criticism: some historical notes. Journal of Computational and Graphical Statistics 2013; 22(1):38–44. DOI: 10.1080/
10618600.2012.761140
18. Tufte ER. The Visual Display of Quantitative Information. 1983, Cheshire, Conn. (Box 430, Cheshire 06410): Graphics Press.
19. Mackinlay J. Automating the design of graphical presentations of relational information. ACM Transactions on Graphics 1986; 5(2):110–141. DOI:
10.1145/22949.22950
20. Steiner S, MacKay RJ. Effective monitoring of processes with parts per million defective. A hard problem!. In Frontiers in Statistical Quality Control 7,
H-J Lenz, P-T Wilrich, Editors. 2004, Physica-Verlag HD, 140–149.
21. Hansen MH, Nair VN, Friedman DJ. Monitoring wafer map data from integrated circuit fabrication processes for spatially clustered defects.
Technometrics 1997; 39(3):241–253. DOI: 10.2307/1271129
22. Cunningham SP, MacKinnon S Statistical methods for visual defect metrology. Ieee Transactions on Semiconductor Manufacturing 1998; 11(1):48–53.
DOI: 10.1109/66.661284
23. NIST/SEMATECH e-handbook of statistical methods. 2012. Available from: http://www.itl.nist.gov/div898/handbook/eda/section3/boxplot.htm. Last
accessed on 03/10/2014.
24. Nelson BL. The more plot: displaying measures of risk & error from simulation output. in Proceedings of the 40th Conference on Winter Simulation.
2008. Winter Simulation Conference.
25. Banks J, Carson JS, Nelson BL, Nicol DM. Discrete-event system simulation. In Prentice-Hall International Series in Industrial and Systems
Engineering. 4th ed. 2005; Upper Saddle River, NJ: Pearson Prentice Hall. xvi.
26. Potter K, Kniss J, Riesenfeld R, Johnson CR. Visualizing summary statistics and uncertainty. In Computer Graphics Forum. 2010; Wiley Online Library.
27. Xie YH. Animation: an R package for creating animations and demonstrating statistical methods. Journal of Statistical Software 2013; 53(1):1–27.
28. Rosling H, Zhang Z. Health advocacy with Gapminder animated statistics. Journal of Epidemiol and Global Health 2011; 1(1):11–4. DOI: 10.1016/j.
jegh.2011.07.001.
29. Rosling H. Hans Rosling: the best stats you have ever seen. FILMED Feb 2006 2009 June 2006 Available from: http://www.ted.com/talks/
hans_rosling_shows_the_best_stats_you_ve_ever_seen.html. Last accessed on: 08/20/2013.
30. Jacks J. Visualization: Student Visas in the US 2000-2008. 2013; Auburn University: Auburn, AL, USA. Available from: http://auburnbigdata.blogspot.
com/2013/02/visualization-student-visas-in-us-2000.html. Last accessed on 08/01/2013.
31. Fayyad U, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery in databases. AI Magazine 1996; 17(3):37–54.
32. Esprit Project 21522 – CASSIOPEE: improving methodologies for efﬁciently designing decision support software for aircraft maintenance. 1996
Available from: http://cordis.europa.eu/esprit/src/21522.htm. Last accessed on: 8/13/2013.
33. Manago M, Auriol E. Mining for OR: case-based reasoning and data mining techniques show their mettle in a number of real-world applications.
ORMS Today (Special Issue on Data Mining) 1996; 23(1):28–32.
34. Isermann R. Supervision, fault-detection and fault-diagnosis methods – An introduction. Control Engineering Practice 1997; 5(5):639–652.
35. Isermann R. Model-based fault-detection and diagnosis – status and applications. Annual Reviews in Control 2005; 29(1):71–85.
36. Nair V, Hansen M, Shi J. Statistics in advanced manufacturing. Journal of the American Statistical Association 2000; 95(451):1002–1005. DOI: 10.2307/
2669486
37. Hill AB. The environment and disease: association or causation? Proceedings of the Royal Society of Medicine 1965; 58:295–300.
38. Suda B. The top 20 data visualization tools. 2012 09/17/2012 Available from: http://www.netmagazine.com/features/top-20-data-visualisation-tools.
Last accessed on: 8/15/2013.
916

39. Bersimis S, Panaretos J, Psarakis S. Multivariate statistical process control charts and the problem of interpretation: a short overview and some
applications in industry. in Proceedings of the 7th Hellenic European Conference on Computer Mathematics and its Applications. 2005. Athens, Greece.

Authors' biographies
Huw D. Smith is a graduate student in the Department of Industrial and Systems Engineering and is also pursuing a dual MBA degree
at Auburn University. This research was completed, while he was an undergraduate research student in the Department of Industrial
and Systems Engineering at Auburn University. His research interests include visual analytics, SPC, image monitoring, lean
manufacturing, big data, business analytics, and simulation.
Fadel M. Megahed is an Assistant Professor in the Department of Industrial and Systems Engineering at Auburn University. He
received his PhD and MS in Industrial and Systems Engineering from Virginia Tech, and his BS in Mechanical Engineering from the
American University in Cairo. He is the recipient of the Mary G. and Joseph Natrella Scholarship (2012) from the American Statistical
Association. His research interests are in the areas of data analytics, data visualization, statistical quality control, and reliability. His
work in these areas has been funded by the NIOSH Deep South Center for Occupational Safety and Ergonomics, Proctor and Gamble
(P&G) Fund of the Greater Cincinnati Foundation, Amazon Web Services (AWS), Windows Azure (Microsoft), and the National Science
Foundation.
L. Allison Jones-Farmer is the C&E Smith Associate Professor of analytics and statistics in the Raymond J. Harbert College of Business
at Auburn University. Her main research interests include business analytics, statistical surveillance methodologies, and multivariate
statistical methods. Currently, Professor Jones-Farmer is studying methods for monitoring multiple social media streams and data
quality. She is the founding director of the Auburn University Business Analytics Lab (AUBAL). Dr. Jones-Farmer has published her
work in several scholarly journals including Technometrics, Journal of Quality Technology, Quality and Reliability Engineering
International, and International Journal of Logistics Management. She served as an Associate Editor for Technometrics from
2001–2005 and is on the editorial review board for Journal of Quality Technology.
Mark Clark has a dual appointment in the Raymond J. Harbert College of Business at Auburn University. He serves as a Management
Scientist in the Auburn Technical Assistance Center, and he also serves as a visiting Assistant Professor in the Aviation and Supply
Chain Management Department. Dr Clark has a PhD in Industrial Engineering and specializes in the area of production and operations
management. He has 8 years of experience in the pulp and paper industry and served as a process engineer for approximately 3 years
before joining the management team at one of the International Paper production facilities. Dr Clark has co-authored numerous
articles accepted in Annals of Operations Research, Computers and Operations Research, and Forest Science. Dr Clark balances his
time between the classroom and his activities in the outreach ofﬁce. For the last 6 years, Dr Clark has focused, primarily, on
implementing the six sigma process improvement methodologies in manufacturing companies located in Alabama. In the classroom,
Dr Clark teaches quality management, project management, and quantitative methods at both the graduate and undergraduate
levels. He is also an instructor in the Executive MBA Program. Dr Clark has been in his current role since 1998.

917

MT8127 Android Scatter
100% (1)
MT8127 Android Scatter
7 pages
Cyber Threats and NATO 2030
100% (2)
Cyber Threats and NATO 2030
267 pages
Quality Control
No ratings yet
Quality Control
32 pages
7 Tools of Statistical Process Control
100% (1)
7 Tools of Statistical Process Control
3 pages
OM2 C AcadGroup39
No ratings yet
OM2 C AcadGroup39
10 pages
Process Control (SPC) PDF
No ratings yet
Process Control (SPC) PDF
5 pages
Bahan Ajar SCM 09
No ratings yet
Bahan Ajar SCM 09
21 pages
TQM Module 3
No ratings yet
TQM Module 3
42 pages
Earned Schedule
From Everand
Earned Schedule
Walter Lipke
No ratings yet
Methods For SPC
No ratings yet
Methods For SPC
5 pages
Statistical Process Control (S P C) : Presented by
100% (3)
Statistical Process Control (S P C) : Presented by
69 pages
Asq The 7 Basic Quality Tools For Process Improvement
No ratings yet
Asq The 7 Basic Quality Tools For Process Improvement
3 pages
Statistical Process Control For Quality Improvement
No ratings yet
Statistical Process Control For Quality Improvement
8 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Assignment-Set 1 1. Mention Some of The Quality Practices Used in Organizations. What Is Statistical Process Control?
No ratings yet
Assignment-Set 1 1. Mention Some of The Quality Practices Used in Organizations. What Is Statistical Process Control?
31 pages
7 QC Tools List
No ratings yet
7 QC Tools List
11 pages
Om II Unit III Notes
No ratings yet
Om II Unit III Notes
36 pages
Statistical Process Control
No ratings yet
Statistical Process Control
14 pages
112 98325err
No ratings yet
112 98325err
6 pages
SPC
100% (10)
SPC
49 pages
Polymer Processing 09 PDF
No ratings yet
Polymer Processing 09 PDF
6 pages
Implementation of SPC Techniques in Automotive Industry: A Case Study
No ratings yet
Implementation of SPC Techniques in Automotive Industry: A Case Study
15 pages
Statistical Process Control Implimentation For Process Optimization and Better Quality
No ratings yet
Statistical Process Control Implimentation For Process Optimization and Better Quality
4 pages
BZFZ Z
No ratings yet
BZFZ Z
16 pages
TQM (Module 2)
No ratings yet
TQM (Module 2)
26 pages
Statistical Approach To Quality Management
No ratings yet
Statistical Approach To Quality Management
57 pages
7 QC Tools 7 Quality Tools Process Improvement Tools
No ratings yet
7 QC Tools 7 Quality Tools Process Improvement Tools
14 pages
Unit - V Iem
No ratings yet
Unit - V Iem
19 pages
7 QC Tools
No ratings yet
7 QC Tools
13 pages
Introduction To Statistical Quality Control: Technometrics January 2012
No ratings yet
Introduction To Statistical Quality Control: Technometrics January 2012
6 pages
TQM 7 SPC Tool
No ratings yet
TQM 7 SPC Tool
7 pages
L100 STATISTICs I Update - 250202 - 172102
No ratings yet
L100 STATISTICs I Update - 250202 - 172102
80 pages
Statistical Process Control
100% (1)
Statistical Process Control
37 pages
Herramientas para Conservar y Mejorar La Conversion Esbelta
No ratings yet
Herramientas para Conservar y Mejorar La Conversion Esbelta
34 pages
Ch10 - The Tools of Quality
No ratings yet
Ch10 - The Tools of Quality
21 pages
Quality Glossary Definition
No ratings yet
Quality Glossary Definition
4 pages
Statistical Process Control A3
100% (2)
Statistical Process Control A3
14 pages
Quality Tools
No ratings yet
Quality Tools
36 pages
Quality Tools and Process Mapping
No ratings yet
Quality Tools and Process Mapping
54 pages
Unit 4 Six Sigma
No ratings yet
Unit 4 Six Sigma
26 pages
Qctool
No ratings yet
Qctool
5 pages
Seven Basic Tools
No ratings yet
Seven Basic Tools
8 pages
Nikunj Paper I JT RD 16044
No ratings yet
Nikunj Paper I JT RD 16044
5 pages
TQM Unit 3
No ratings yet
TQM Unit 3
31 pages
The 7 Basic Quality Tools For Process Improvement
No ratings yet
The 7 Basic Quality Tools For Process Improvement
22 pages
7QC Tools Training
No ratings yet
7QC Tools Training
11 pages
Shari2009 IN Plastic Packanage
No ratings yet
Shari2009 IN Plastic Packanage
5 pages
Ebe503 Quality Control Methods - Notes
No ratings yet
Ebe503 Quality Control Methods - Notes
47 pages
Unit-1 Statistical Quality Control: Learning Objectives
No ratings yet
Unit-1 Statistical Quality Control: Learning Objectives
69 pages
7qc Tools
No ratings yet
7qc Tools
19 pages
Class 4 - Quality Tools
No ratings yet
Class 4 - Quality Tools
35 pages
Business Analytics: A Comprehensive Guide
From Everand
Business Analytics: A Comprehensive Guide
Naila Hina
No ratings yet
TQM Unit 5
No ratings yet
TQM Unit 5
39 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Statistical Process Control SPC
No ratings yet
Statistical Process Control SPC
3 pages
Statistical Process Control
No ratings yet
Statistical Process Control
7 pages
QUALITY MANAGEMENT Assgn1
No ratings yet
QUALITY MANAGEMENT Assgn1
11 pages
7 Quality Tools
No ratings yet
7 Quality Tools
18 pages
7 QC Tools
No ratings yet
7 QC Tools
14 pages
Quiz 2 Notes
No ratings yet
Quiz 2 Notes
66 pages
QUALITY MANAGEMENT - Assgnment1
No ratings yet
QUALITY MANAGEMENT - Assgnment1
12 pages
Big Data and Data Science: Analytics for the Future
From Everand
Big Data and Data Science: Analytics for the Future
Dhaanyalakshmi Ahuja
No ratings yet
Corporation Wide Variation Reduction
No ratings yet
Corporation Wide Variation Reduction
9 pages
Quality 4 0 Conceptualisation and Theore
No ratings yet
Quality 4 0 Conceptualisation and Theore
26 pages
Quality 4.0 Implementation
No ratings yet
Quality 4.0 Implementation
15 pages
Gauge R&R Studies That Incorporate Baseline Information
No ratings yet
Gauge R&R Studies That Incorporate Baseline Information
11 pages
CoSM Vision Plan 2018 Small
No ratings yet
CoSM Vision Plan 2018 Small
64 pages
Analisis Perubahan Faktor Keamanan Lereng Akibat Hujan: (Analysis of Changes Safety Factor of Slope Due To Rainfall)
No ratings yet
Analisis Perubahan Faktor Keamanan Lereng Akibat Hujan: (Analysis of Changes Safety Factor of Slope Due To Rainfall)
8 pages
2nd Grade Skills Checklist: Reading & Language Arts
No ratings yet
2nd Grade Skills Checklist: Reading & Language Arts
4 pages
PR 2 Group 3
No ratings yet
PR 2 Group 3
41 pages
RDVV 2025 Time Table
No ratings yet
RDVV 2025 Time Table
3 pages
Larteh Final
No ratings yet
Larteh Final
11 pages
Tamil Sangam
No ratings yet
Tamil Sangam
3 pages
Safety Data Sheet: Product Name: MOBILGREASE 28
No ratings yet
Safety Data Sheet: Product Name: MOBILGREASE 28
10 pages
Term Paper On Chile
100% (1)
Term Paper On Chile
4 pages
Mekanisme Pengelolaan Persediaan Sparepart Sepeda Motor Honda Pada PT. Bintang Motor Jaya, TBK Cabang Cirebon
No ratings yet
Mekanisme Pengelolaan Persediaan Sparepart Sepeda Motor Honda Pada PT. Bintang Motor Jaya, TBK Cabang Cirebon
35 pages
W3 Product Market Fit - TPE
No ratings yet
W3 Product Market Fit - TPE
15 pages
CEE4708 EarthworkAssignment
No ratings yet
CEE4708 EarthworkAssignment
10 pages
Difference Between General and Technical Communication
No ratings yet
Difference Between General and Technical Communication
7 pages
Assessing Affective Learning Outcomes
50% (2)
Assessing Affective Learning Outcomes
45 pages
Geology of Anigeghe and Its Environs
No ratings yet
Geology of Anigeghe and Its Environs
72 pages
City Sci Yb2019
No ratings yet
City Sci Yb2019
19 pages
Hydraulic Seal Catalogue 2022
100% (3)
Hydraulic Seal Catalogue 2022
423 pages
The Muncaster Steam-Engine Models: 3-Simple Slide-Valve Engines
No ratings yet
The Muncaster Steam-Engine Models: 3-Simple Slide-Valve Engines
3 pages
Reading LOGs For MSI Troubleshooting
No ratings yet
Reading LOGs For MSI Troubleshooting
7 pages
Basic Calculus q4
No ratings yet
Basic Calculus q4
74 pages
Communicating As A Scientists
No ratings yet
Communicating As A Scientists
3 pages
Descent and Descending Turns 3
No ratings yet
Descent and Descending Turns 3
8 pages
Department of Education School Building Inventory Form (As of February 28, 2022)
No ratings yet
Department of Education School Building Inventory Form (As of February 28, 2022)
17 pages
Parts
No ratings yet
Parts
4 pages
Energy Storage Targets 2030 and 2050 Full Report
No ratings yet
Energy Storage Targets 2030 and 2050 Full Report
36 pages
The Neural Substrates of Religious Experience: John Rabin, M.D
No ratings yet
The Neural Substrates of Religious Experience: John Rabin, M.D
13 pages
Floating Solar Project at The Kariba Dam
No ratings yet
Floating Solar Project at The Kariba Dam
15 pages
Crack Propagation in Ansys
100% (2)
Crack Propagation in Ansys
24 pages

Using Visual Data Mining

Uploaded by

Using Visual Data Mining

Uploaded by

Special Issue Article

Using Visual Data Mining to Enhance the

2. A brief introduction to visual data mining

3. Visualizing discrete data distributions

3.1. Pareto chart

Figure 3. Cumulative line chart for metal casting process

3.2. Defect concentration diagram

in manufacturing process conditions.22

4. Visualizing continuous data distributions

Figure 7. A box plot versus an MORE plot of casting temperature

4.2. The histogram

Figure 9. A snapshot of six consecutive periods in a casting temperature data animation

Figure 11. Snapshots of the animated MORE histogram of casting weight

5. Visualizing the relationships among quality variables

conjunction with the ﬁshbone diagram.

5.1. Scatter plot

5.2. Fishbone (cause-and-effect) diagram

You might also like