Lecture 1 - Introduction
Lecture 1 - Introduction
Lecture 1 - Introduction
Definition
• Data mining is the process of sorting through
large data sets to identify patterns and
relationships that can help solve business
problems through data analysis. Data mining
techniques and tools enable enterprises to
predict future trends and make more-
informed business decisions.
Why is data mining important?
• Data mining is a crucial component of
successful analytics initiatives in organizations.
The information it generates can be used in
business intelligence (BI) and advanced
analytics applications that involve analysis of
historical data, as well as
real-time analytics applications that examine
streaming data as it's created or collected.
• Effective data mining aids in various aspects of
planning business strategies and managing
operations. That includes customer-facing
functions such as marketing, advertising, sales
and customer support, plus manufacturing,
supply chain management, finance and HR. Data
mining supports fraud detection, risk
management, cybersecurity planning and many
other critical business use cases. It also plays an
important role in healthcare, government,
scientific research, mathematics, sports and
more.
Data Mining Process
• Data gathering. Relevant data for an analytics
application is identified and assembled. The
data may be located in different source systems,
a data warehouse or a data lake, an increasingly
common repository in big data environments
that contain a mix of structured and
unstructured data. External data sources may
also be used. Wherever the data comes from, a
data scientist often moves it to a data lake for
the remaining steps in the process.
Data Mining Process (cont)
• Data preparation. This stage includes a set of
steps to get the data ready to be mined. It
starts with data exploration, profiling and pre-
processing, followed by data cleansing work
to fix errors and other data quality issues.
Data transformation is also done to make data
sets consistent, unless a data scientist is
looking to analyze unfiltered raw data for a
particular application.
Data Mining Process (cont)
• Mining the data. Once the data is prepared, a
data scientist chooses the appropriate data
mining technique and then implements one or
more algorithms to do the mining. In machine
learning applications, the algorithms typically
must be trained on sample data sets to look
for the information being sought before
they're run against the full set of data.
Data Mining Process (cont)
• Data analysis and interpretation. The data
mining results are used to create analytical
models that can help drive decision-making
and other business actions. The data scientist
or another member of a data science team
also must communicate the findings to
business executives and users, often through
data visualization and the use of data
storytelling techniques.
The knowledge discovery in databases
(KDD) process
This is commonly defined with the stages
1. Selection
2. Pre-processing
3. Transformation
4. Data mining
5. Interpretation/evaluation
TYPES OF DATA MINING
• Predictive Data Mining Analysis
• Descriptive Data Mining Analysis
Predictive Data Mining Analysis
• Classification Analysis
• Regression Analysis
• Time series Analysis
• Prediction Analysis
Descriptive Data Mining
• Clustering Analysis
• Summarization Analysis
• Association Rules Analysis
• Sequence Discovery Analysis
Benefits of data mining
• More effective marketing and sales. Data
mining helps marketers better understand
customer behavior and preferences, which
enables them to create targeted marketing
and advertising campaigns. Similarly, sales
teams can use data mining results to improve
lead conversion rates and sell additional
products and services to existing customers.
• Better customer service. Thanks to data
mining, companies can identify potential
customer service issues more promptly and
give contact center agents up-to-date
information to use in calls and online chats
with customers.
• Improved supply chain management.
Organizations can spot market trends and
forecast product demand more accurately,
enabling them to better manage inventories
of goods and supplies. Supply chain managers
can also use information from data mining to
optimize warehousing, distribution and other
logistics operations.
• Increased production uptime. Mining
operational data from sensors on
manufacturing machines and other industrial
equipment supports predictive maintenance
applications to identify potential problems
before they occur, helping to avoid
unscheduled downtime.
• Stronger risk management. Risk managers
and business executives can better assess
financial, legal, cybersecurity and other risks
to a company and develop plans for managing
them
• Lower costs. Data mining helps drive cost
savings through operational efficiencies in
business processes and reduced redundancy
and waste in corporate spending.
Application of Data Mining
• Retail. Online retailers mine customer data
and internet clickstream records to help them
target marketing campaigns, ads and
promotional offers to individual shoppers.
Data mining and predictive modeling also
power the recommendation engines that
suggest possible purchases to website visitors,
as well as inventory and supply chain
management activities.
• Financial services. Banks and credit card
companies use data mining tools to build
financial risk models, detect fraudulent
transactions and vet loan and credit
applications. Data mining also plays a key role
in marketing and in identifying potential
upselling opportunities with existing
customers.
• Insurance. Insurers rely on data mining to aid
in pricing insurance policies and deciding
whether to approve policy applications,
including risk modeling and management for
prospective customers.
• Manufacturing. Data mining applications for
manufacturers include efforts to improve
uptime and operational efficiency in
production plants, supply chain performance
and product safety.
• Entertainment. Streaming services do data
mining to analyze what users are watching or
listening to and to make personalized
recommendations based on people's viewing
and listening habits.
• Healthcare. Data mining helps doctors
diagnose medical conditions, treat patients
and analyze X-rays and other medical imaging
results. Medical research also depends heavily
on data mining, machine learning and other
forms of analytics.
• Lie Detection: Apprehending a criminal is not a
big deal, but bringing out the truth from him is a
very challenging task. Law enforcement may use
data mining techniques to investigate offenses,
monitor suspected terrorist communications, etc.
This technique includes text mining also, and it
seeks meaningful patterns in data, which is
usually unstructured text. The information
collected from the previous investigations is
compared, and a model for lie detection is
constructed.
• Fraud detection: Billions of dollars are lost to the
action of frauds. Traditional methods of fraud
detection are a little bit time consuming and
sophisticated. Data mining provides meaningful
patterns and turning data into information. An
ideal fraud detection system should protect the
data of all the users. Supervised methods consist
of a collection of sample records, and these
records are classified as fraudulent or non-
fraudulent. A model is constructed using this
data, and the technique is made to identify
whether the document is fraudulent or not.
• Customer Service may be caused (or destroyed)
for a variety of reasons. Imagine a company that
ships goods. A customer may become unhappy
with ship time, shipping quality, or
communication on shipment expectations. That
same customer may become frustrated with long
telephone wait times or slow e-mail responses.
Data mining gathers operational information
about customer interactions and summarizes
findings to determine weak points as well as
highlights of what the company is doing right.
• Human Resources : often has a wide range of
data available for processing including data on
retention, promotions, salary ranges,
company benefits and utilization of those
benefits, and employee satisfaction surveys.
Data mining can correlate this data to get a
better understanding of why employees leave
and what entices recruits to join.
Challenges of Implementation in Data
mining
Limitations of Data Mining
• This complexity of data mining is one of the
largest disadvantages to the process. Data
analytics often requires technical skillsets and
certain software tools. Some smaller
companies may find this to be a barrier of
entry too difficult to overcome.
• Data mining doesn't always guarantee results.
A company may perform statistical analysis,
make conclusions based on strong data,
implement changes, and not reap any
benefits. Through inaccurate findings, market
changes, model errors, or inappropriate data
populations, data mining can only guide
decisions and not ensure outcomes.
• There is also a cost component to data
mining. Data tools may require ongoing costly
subscriptions, and some bits of data may be
expensive to obtain. Security and privacy
concerns can be pacified, though additional IT
infrastructure may be costly as well. Data
mining may also be most effective when using
huge data sets; however, these data sets must
be stored and require heavy computational
power to analyze.
Why is data mining important?
• Sift through all the chaotic and repetitive
noise in data.
• Understand what is relevant and then make
good use of that information to assess likely
outcomes.
• Accelerate the pace of making informed
decisions.
Advantages of Data Mining
• The Data Mining technique enables organizations to obtain
knowledge-based data.
• Data mining enables organizations to make lucrative
modifications in operation and production.
• Compared with other statistical data applications, data
mining is a cost-efficient.
• Data Mining helps the decision-making process of an
organization.
• It Facilitates the automated discovery of hidden patterns as
well as the prediction of trends and behaviors.
• It can be induced in the new system as well as the existing
platforms.
• It is a quick process that makes it easy for new users to
analyze enormous amounts of data in a short time.
Disadvantages of Data Mining
• There is a probability that the organizations may sell useful
data of customers to other organizations for money. As per
the report, American Express has sold credit card purchases
of their customers to other organizations.
• Many data mining analytics software is difficult to operate
and needs advance training to work on.
• Different data mining instruments operate in distinct ways
due to the different algorithms used in their design.
Therefore, the selection of the right data mining tools is a
very challenging task.
• The data mining techniques are not precise, so that it may
lead to severe consequences in certain conditions.
DISCUSSION QUESTION
• What are the differences between data
mining and data analytics??
DATA ANALYTICS
• Data analytics is the collection,
transformation, and organization of data in
order to draw conclusions, make predictions,
and drive informed decision making.
• is the process of examining data sets in order
to find trends and draw conclusions about the
information they contain. Increasingly, data
analytics is done with the aid of specialized
systems and software
• Data analytics is the process of analyzing raw
data in order to draw out meaningful,
actionable insights, which are then used to
inform and drive smart business decisions.
DISCUSSION QUESTION
• What are the applications of Data Analytics?
Types of data analytics
applications
• Exploratory Data Analysis (EDA): aims to find
patterns and relationships in data
• Confirmatory Data Analysis (CDA): applies
statistical techniques to determine whether
hypotheses about a data set are true or false
How data analytics is used
• Data is everywhere, and people use data
every day, whether they realize it or not. Daily
tasks such as measuring coffee beans to make
your morning cup, checking the weather
report before deciding what to wear, or
tracking your steps throughout the day with a
fitness tracker can all be forms of analyzing
and using data.
How data analytics is used
• Data analytics is important across many
industries, as many business leaders use data
to make informed decisions. A sneaker
manufacturer might look at sales data to
determine which designs to continue and
which to retire, or a health care administrator
may look at inventory data to determine the
medical supplies they should order.
Types of Data Analytics
• Descriptive analytics describes what has
happened over a given period of time.
• Diagnostic analytics tell us why something
happened. This involves more diverse data
inputs and a bit of hypothesizing
• Predictive analytics tell us what will likely
happen in the future.
• Prescriptive analytics suggests a course of
action/tell us how to act.
Data Analytics Techniques
• Regression analysis entails analyzing the
relationship between dependent variables to
determine how a change in one may affect the
change in another.
• Factor analysis entails taking a large data set
and shrinking it to a smaller data set. The goal
of this maneuver is to attempt to discover
hidden trends that would otherwise have
been more difficult to see.
Data Analytics Techniques (cont)
• Cohort analysis is the process of breaking a
data set into groups of similar data, often
broken into a customer demographic. This
allows data analysts and other users of data
analytics to further dive into the numbers
relating to a specific subset of data.
Data Analytics Techniques (cont)
• Monte Carlo simulations model the
probability of different outcomes happening.
Often used for risk mitigation and loss
prevention, these simulations incorporate
multiple values and variables and often have
greater forecasting capabilities than other
data analytics approaches.
Data Analytics Techniques (cont)
• Time series analysis tracks data over time and
solidifies the relationship between the value
of a data point and the occurrence of the data
point. This data analysis technique is usually
used to spot cyclical trends or to project
financial forecasts.