0% found this document useful (0 votes)
24 views8 pages

BI Module 4

Data mining is a multidisciplinary field that extracts useful patterns from large datasets, utilizing tools and techniques from statistics, AI, and machine learning. The data mining process consists of six steps: understanding the business, understanding and preparing the data, building and testing models, and deploying the model. It has applications across various industries, including CRM, banking, retail, and healthcare, aiding in decision-making under certainty, uncertainty, and risk.

Uploaded by

layappa2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views8 pages

BI Module 4

Data mining is a multidisciplinary field that extracts useful patterns from large datasets, utilizing tools and techniques from statistics, AI, and machine learning. The data mining process consists of six steps: understanding the business, understanding and preparing the data, building and testing models, and deploying the model. It has applications across various industries, including CRM, banking, retail, and healthcare, aiding in decision-making under certainty, uncertainty, and risk.

Uploaded by

layappa2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Module 04

Data mining is not a completely new field, but rather a combination of different subjects like statistics, artificial
intelligence, machine learning, databases, and management science. It focuses on finding useful patterns and
information from large amounts of data.

Here are the main features and goals of data mining:

 Large Data Storage: Data is often stored in huge databases that may contain information from many years. Sometimes, this
data is cleaned and organized into a data warehouse.
 Modern Systems: Data mining usually works on a client/server system or a web-based platform.
 Advanced Tools: Special tools, including visualization software, help find hidden patterns in data stored in company
records or public files. Some tools can even analyze unstructured text from emails, websites, and business networks.
 User-Friendly: Many data mining tools allow users (even those without programming skills) to explore data and ask
questions to get quick answers.
 Unexpected Discoveries: Sometimes, valuable insights come from unexpected results, so users need to think creatively
while analyzing the findings.
 Easy Integration: Data mining tools can be easily combined with spreadsheets and other software, making it simple to
analyze and use the results.
 Fast Processing: Since data mining deals with a huge amount of data, parallel processing (using multiple processors at once)
is often needed to speed up the process.

Overall, data mining helps businesses and researchers find useful patterns in data, leading to better decisions
and insights.

Data mining process

The Data Mining Process in Simple Words

Data mining is like solving a big puzzle using data. It helps businesses find useful information from large
amounts of data. The process involves six main steps:

Step 1: Understanding the Business

Before starting data mining, we need to understand why we are doing it. Businesses need answers to questions
like:
 Why are customers leaving for competitors?
 Which customers are the most valuable?

Once the goal is clear, a plan is created. This plan includes assigning tasks, collecting data, analyzing data, and
reporting the findings. A budget is also set.

Step 2: Understanding the Data

Now, we find and collect the right data. Different business problems need different types of data. For example,
a clothing store might look at customer age, income, and purchase history to understand shopping habits.

The analyst must know:

 Where the data is stored


 How it was collected (manually or automatically)
 If the data is reliable

To understand the data better, analysts use tools like graphs, tables, and statistics.

Step 3: Preparing the Data

Real-world data is often messy. It may have missing values, errors, or duplicate records. Cleaning and
organizing the data takes a lot of time—about 80% of the total project time.

This step makes sure the data is ready for analysis.

Step 4: Building the Model

Now, we apply different techniques to find useful patterns in the data. There are different types of models:

 Prediction models (e.g., predicting customer purchases)


 Clustering models (e.g., grouping similar customers)
 Association models (e.g., finding products that are often bought together)

Different models are tested to find the best one for the business problem.

Step 5: Testing and Evaluating the Model

Once the models are built, we check how well they work. The goal is to see if they provide useful and accurate
results.

If the models do not perform well, they may need to be improved or replaced. Businesses also look for any
unexpected insights that can help them make better decisions.

Step 6: Deploying the Model


After testing, the model is ready to be used. The results are presented in reports, dashboards, or automated
systems that help businesses make decisions.

Seema data mining prcess

kdd data mining process

quantitative model

Types of Variables in Decision Making

When making decisions, different types of variables influence the outcome. Let's break them down in simple
terms:
1. Result (Outcome) Variables

These variables show how successful a system is. They are the final results of a process.

Example:

 A company wants to improve customer satisfaction. The result variable is the customer satisfaction score.
 In a business, profit is a result variable—it shows how well the company is doing.

These are dependent variables because they depend on other factors like decisions made and external
conditions.

2. Decision Variables

These are the choices we can control. Decision-makers use these variables to take action.

Example:

 A business decides how much money to invest in stocks or bonds. The investment amount is a decision variable.
 A company schedules work shifts for employees. People, work hours, and schedules are decision variables.

Since decision-makers control these, they directly affect the results.

3. Uncontrollable Variables (or Parameters)

These are factors that affect the results but cannot be controlled by decision-makers.

Example:

 Interest rates set by the government


 Tax laws that businesses must follow
 Weather conditions affecting crop production

These variables set limits and constraints on what decisions can be made.

4. Intermediate Result Variables

These are middle steps that affect the final result. They help understand how one factor leads to another.

Example:

 A factory wants to maximize profit. Spoilage (wasted materials) is an intermediate result—it affects total profit.
 A company pays high employee salaries → Employees feel satisfied → Productivity increases. Here, employee satisfaction
is an intermediate result before the final result (higher productivity).

Final Thoughts
 Result variables show success.
 Decision variables are choices we can control.
 Uncontrollable variables are external factors we can't change.
 Intermediate result variables are steps that lead to the final outcome.

These concepts help businesses and organizations make better decisions!

Data mining application


Easy Explanation of Data Mining Applications

Data mining is used in many industries to solve problems and find new opportunities. Here are some examples
of how different sectors use data mining:

Customer Relationship Management (CRM)

1. Helps businesses understand their customers' needs and preferences.


2. Identifies customers who are most likely to buy new products.
3. Finds reasons why customers leave and how to keep them.
4. Suggests the best products for each customer to increase sales.

Banking

1. Helps approve or reject loan applications by predicting who might not repay.
2. Detects fraud in credit card and online banking transactions.
3. Identifies products that customers are most likely to buy.
4. Predicts how much cash is needed at ATMs and bank branches.

Retail & Logistics


1. Predicts sales at stores to manage inventory better.
2. Finds relationships between products to improve store layout and promotions.
3. Forecasts seasonal demand to manage logistics.
4. Tracks the movement of perishable goods to reduce waste.

Manufacturing & Production

1. Predicts when machines might break down to prevent failures.


2. Finds problems in production to improve efficiency.
3. Improves product quality by identifying patterns in production data.

Stock Market & Trading

1. Predicts stock price movements.


2. Analyzes the impact of news and events on the market.
3. Detects fraudulent activities in stock trading.

Insurance

1. Forecasts insurance claim amounts.


2. Helps set better pricing for policies.
3. Identifies customers likely to buy special insurance plans.
4. Detects fraudulent claims.

Technology & Cybersecurity

1. Predicts hard drive failures before they happen.


2. Filters out spam emails and harmful websites.
3. Detects and prevents hacking attempts.
4. Identifies security risks in software.

Government & Defense

1. Predicts the cost of moving military equipment.


2. Analyzes enemy movements to develop better strategies.
3. Forecasts resource needs for budgeting.
4. Identifies lessons from past military operations.

Travel & Hospitality

1. Predicts demand for flights, hotel rooms, and rental cars.


2. Helps set optimal pricing to maximize revenue.
3. Identifies high-value customers for personalized services.
4. Reduces employee turnover by finding reasons for staff leaving.

Healthcare & Medicine

 Finds reasons why some people don’t have health insurance.


 Identifies the best treatment options for patients.
 Predicts demand for healthcare services.
 Helps match organ donors with patients in need.
 Analyzes links between symptoms, illnesses, and treatments.

1. Entertainment Industry

 Decides which TV shows to air during peak hours.


 Predicts the success of movies before production.
 Helps plan events based on audience demand.
 Sets ticket prices to maximize revenue.

1. Law Enforcement & Security

 Identifies patterns in criminal behavior.


 Helps solve crimes faster by analyzing past data.
 Detects potential terrorist threats.
 Protects critical digital infrastructures from cyberattacks.

1. Sports ⚾

 Helps sports teams improve performance by analyzing player stats.


 Predicts game outcomes using past match data.
 Helps managers build winning teams with limited budgets.

Data mining helps businesses and organizations make better decisions by analyzing large amounts of data to
find useful patterns. ✅

Decision making under certainity , uncertainity , and risk

Decision-Making Under Certainty

Here, the decision-maker knows exactly what will happen for each choice. This means there is complete
information about the outcomes.

Example:investing in U.S. Treasury bills is one for which there is complete


availability of information about the future return on investment

This type of decision-making is common in structured problems and short-term decisions (up to 1 year).

Decision-Making Under Uncertainty

Here, there are multiple possible outcomes, but the decision-maker does not know their probability.

Example:

 A company launching a new product doesn’t know how customers will respond.

Managers try to reduce uncertainty by gathering more information. If that's not possible, they have to make a
decision with limited knowledge.

Decision-Making Under Risk (Risk Analysis)

Here, the decision-maker knows the possible outcomes and their probabilities. This helps in calculating
expected risk before making a decision.

Example:

 A company can estimate the probability of profit or loss when launching a new product based on past market trends.
Most big business decisions are made under assumed risk, where risks are analyzed and calculated before
deciding.

You might also like