0% found this document useful (0 votes)
32 views20 pages

Lecture 1 - Data Mining and Analytics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views20 pages

Lecture 1 - Data Mining and Analytics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

ICT731 – Week 1

Data Mining and


Analytics
(See Textbook Chapter 1)

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com.
Objectives

 Introduction

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
 Define and describe data mining
 Define and describe machine learning
 Define and describe data visualization
 Locate, search, and use common dataset repositories
 Define and describe data quality
 Define and describe the common data mining and machine learning
applications: clustering, classification, predictive analytics, and association
Introduction – Subject Learning Outcomes

a) Analyse the value, rationale and applications of data mining for

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
organisations
b) Critically evaluate and recommend different data preparation
methods and strategies
c) Apply various data mining methods and models to provide results to
enhance business decision making
d) Design a predictive model using data, text, and web mining
techniques
e) Research and evaluate ethical issues related to data mining
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Introduction – Assessments
Understanding Data Mining

 Data mining is the process of identifying patterns that exist within data. With
the patterns in hand, data analysts can apply them to other data sets

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
 Think of the actual “mining” as the search for the data patterns, as opposed
to the subsequent use of the patterns

 The data mining process may involve the use of statistics, database queries,
visualization tools, traditional programming, and machine learning
Understanding Machine Learning

 Machine learning is use of data pattern-recognition algorithms which allow


a program to solve problems, such as clustering, categorization, predictive

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
analysis, and data association without the need for explicit step-by-step
programming instructions to tell the algorithm how to perform tasks

 In this way, machine learning solutions can solve complex problems by


using data to drive discovery in using only a few lines of code
Common Data Mining Tools

 Databases such as MySQL or MongoDB

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
 Excel
 Visualization tools such as Tableau
 Business intelligence tools such as Microsoft Power BI
 Programming language solutions such as Python
 Data mining tools such as RapidMiner, Orange, and Weka
Common Machine Learning Tools

 Python and R programming solutions

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
 Visual-programming tools, such as RapidMiner and Orange

 Excel third-party add-ins, such as Solver


Data Mining Versus Data Science

 Data mining is the process of identifying patterns that exist within data

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
 Data science is the use of statistics, programming, scientific methods, and
machine learning to extract knowledge from a data set
 The definitions of data mining and data science are very similar. In fact, the two
terms are often used interchangeably
 A data scientist is an individual who analyzes and interprets data
 The terms data scientist and data analyst are also quite similar. Both will use
data mining tools to gain insights into one or more datasets
Data Mining Versus Statistics

 Data mining is the process of identifying patterns that exist within data
 Statistics is the collection, analysis, modeling, and presentation of data

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
 Statistics is one component of data mining, meaning it is one tool in the data
analyst’s tool kit
 Having knowledge and understanding of statistics will help a data analyst better
understand the behind-the-scenes processing of many of the data mining and
machine learning algorithms
 The good news is that you don’t have to be a statistician to use the tools
 Excel remains one of the most widely used data analytics tools and has many
built-in statistical functions
 Many data analysts find success with only a basic understanding of statistical
processes
Data Visualization

 A few years ago, big data was

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
the exception but now it is the
norm

 One of the first steps data


analysts perform to identify
patterns within data is to
represent the data visually
using charts and graphs

Used with permission of Google


Dashboards

 To communicate data trends and


findings, developers often create

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
data-driven dashboards
 Depending on the information the
analyst must convey, they will often
create click-through dashboards,
which first display high-level, often
aggregated data, upon which the
user can click in order to drill
deeper into the underlying specifics

Used with permission of TABLEAU SOFTWARE


Common Visualization Charts

 Time-based comparison charts, which represent how one or more sets of values
change over time

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
 Category-based comparison charts, which represent how two or more categories of
values compare
 Composition charts, which represent how one or more values relate to a larger whole
 Correlation charts, which represent how two or more variables relate
 Dashboard charts, which represent key performance indicators that companies use to
track initiatives
 Distribution charts, which represent the frequency of values within a data set
 Geocharts, which represent how the values from one location compare to values in a
different location
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Google Charts

Used with permission of Google


Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Correlation - Understanding How Variables Relate
Visual Programming

 For years, programmers have made extensive use of programming


languages, such as Python, to create data mining and machine

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
learning solutions

 The problem with creating programs to perform data mining and


machine learning tasks is that someone must know the
programming languages and know how to code

 To eliminate the need for such statement-based programming,


visual-programming environments are emerging, such as
RapidMiner, which is discussed in this subject
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Titanic - Predicting Who Lived and Died with Python
Titanic - Predicting Who Lived and Died with Visual Programming

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Used with permission of RapidMiner
Business Intelligence

 Business intelligence is the use of tools (data mining, machine learning, and
visualization) to convert data into actionable business insights and

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
recommendations
 Business intelligence often leverages click-through dashboards, in which users
can click on items to display greater levels of detail
 Business intelligence systems often include decision support tools that help
users make better decisions
 Using historical data, such tools can describe what has happened and,
potentially, why
 Using predictive analytics, such tools can predict what should happen in the
future, and they may possibly prescribe choices the user should make
Business Intelligence Tools

 Business intelligence software provides extensive visualization (charting)


capabilities

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
 Beyond visualization, the tools include the ability to perform many data mining
and machine learning capabilities
 Microsoft Power BI
 Tableau
 Orange
 Solver (previously known as XLMiner)
 RapidMiner
 Excel

You might also like