Lecture 1 - Data Mining and Analytics
Lecture 1 - Data Mining and Analytics
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com.
Objectives
Introduction
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Define and describe data mining
Define and describe machine learning
Define and describe data visualization
Locate, search, and use common dataset repositories
Define and describe data quality
Define and describe the common data mining and machine learning
applications: clustering, classification, predictive analytics, and association
Introduction – Subject Learning Outcomes
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
organisations
b) Critically evaluate and recommend different data preparation
methods and strategies
c) Apply various data mining methods and models to provide results to
enhance business decision making
d) Design a predictive model using data, text, and web mining
techniques
e) Research and evaluate ethical issues related to data mining
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Introduction – Assessments
Understanding Data Mining
Data mining is the process of identifying patterns that exist within data. With
the patterns in hand, data analysts can apply them to other data sets
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Think of the actual “mining” as the search for the data patterns, as opposed
to the subsequent use of the patterns
The data mining process may involve the use of statistics, database queries,
visualization tools, traditional programming, and machine learning
Understanding Machine Learning
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
analysis, and data association without the need for explicit step-by-step
programming instructions to tell the algorithm how to perform tasks
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Excel
Visualization tools such as Tableau
Business intelligence tools such as Microsoft Power BI
Programming language solutions such as Python
Data mining tools such as RapidMiner, Orange, and Weka
Common Machine Learning Tools
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Visual-programming tools, such as RapidMiner and Orange
Data mining is the process of identifying patterns that exist within data
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Data science is the use of statistics, programming, scientific methods, and
machine learning to extract knowledge from a data set
The definitions of data mining and data science are very similar. In fact, the two
terms are often used interchangeably
A data scientist is an individual who analyzes and interprets data
The terms data scientist and data analyst are also quite similar. Both will use
data mining tools to gain insights into one or more datasets
Data Mining Versus Statistics
Data mining is the process of identifying patterns that exist within data
Statistics is the collection, analysis, modeling, and presentation of data
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Statistics is one component of data mining, meaning it is one tool in the data
analyst’s tool kit
Having knowledge and understanding of statistics will help a data analyst better
understand the behind-the-scenes processing of many of the data mining and
machine learning algorithms
The good news is that you don’t have to be a statistician to use the tools
Excel remains one of the most widely used data analytics tools and has many
built-in statistical functions
Many data analysts find success with only a basic understanding of statistical
processes
Data Visualization
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
the exception but now it is the
norm
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
data-driven dashboards
Depending on the information the
analyst must convey, they will often
create click-through dashboards,
which first display high-level, often
aggregated data, upon which the
user can click in order to drill
deeper into the underlying specifics
Time-based comparison charts, which represent how one or more sets of values
change over time
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Category-based comparison charts, which represent how two or more categories of
values compare
Composition charts, which represent how one or more values relate to a larger whole
Correlation charts, which represent how two or more variables relate
Dashboard charts, which represent key performance indicators that companies use to
track initiatives
Distribution charts, which represent the frequency of values within a data set
Geocharts, which represent how the values from one location compare to values in a
different location
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Google Charts
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
learning solutions
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Used with permission of RapidMiner
Business Intelligence
Business intelligence is the use of tools (data mining, machine learning, and
visualization) to convert data into actionable business insights and
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
recommendations
Business intelligence often leverages click-through dashboards, in which users
can click on items to display greater levels of detail
Business intelligence systems often include decision support tools that help
users make better decisions
Using historical data, such tools can describe what has happened and,
potentially, why
Using predictive analytics, such tools can predict what should happen in the
future, and they may possibly prescribe choices the user should make
Business Intelligence Tools
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Beyond visualization, the tools include the ability to perform many data mining
and machine learning capabilities
Microsoft Power BI
Tableau
Orange
Solver (previously known as XLMiner)
RapidMiner
Excel