0% found this document useful (0 votes)
6 views

Data Science Module 1 Notes

The document provides an overview of data mining, including its definitions, techniques, and applications. It distinguishes between data and information, outlines the data mining process, and discusses the architecture and goals of data mining. Additionally, it compares data mining with related concepts like KDD, DBMS, and OLAP, highlighting their differences and roles in data analysis.

Uploaded by

p98070734
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Data Science Module 1 Notes

The document provides an overview of data mining, including its definitions, techniques, and applications. It distinguishes between data and information, outlines the data mining process, and discusses the architecture and goals of data mining. Additionally, it compares data mining with related concepts like KDD, DBMS, and OLAP, highlighting their differences and roles in data analysis.

Uploaded by

p98070734
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

SHREE MEDHA DEGREE COLLEGE, BALLARI

FUNDAMENTALS OF DATA SCIENCE


MODULE 1- DATA MINING
Topics:

 Introduction to data mining


 Data Mining Definitions
 Knowledge Discovery
 in Databases (KDD) Vs Data Mining
 DBMS Vs Data Mining
 DM techniques
 Problems, Issues and Challenges in DM
 DM applications
What is Data?
Data is distinct pieces of information, usually formatted in a special way”. Data can
be measured, collected, reported, and analyzed, whereupon it is often visualized
using graphs, images, or other analysis tools. Raw data (“unprocessed data”) may
be a collection of numbers or characters before it’s been “cleaned” and corrected
by researchers.
What is Information ?
Information is data that has been processed , organized, or structured in a way that
makes it meaningful, valuable and useful.
Categories of Data
Data can be catogeries into two main parts –
Structured Data: This type of data is organized data into specific format, making it
easy to search , analyze and process. Structured data is found in a relational
databases that includes information like numbers, data and categories.
UnStructured Data: Unstructured data does not conform to a specific structure or
format. It may include some text documents , images, videos, and other data that is
not easily organized or analyzed without additional processing.

What is Data Mining ?


Definition: Data mining is the process of analyzing large datasets to discover patterns,
relationships, correlations, or meaningful insights that can help in making informed decisions and
predictions.
Purpose: The primary purpose of data mining is to extract valuable knowledge and information
from large volumes of data that might be hidden or not readily apparent. It involves using
advanced statistical and machine learning techniques to identify patterns and trends.
Functions: Data mining algorithms and techniques are applied to the data to identify
associations, clusters, classifications, and anomalies. It helps in understanding customer behavior,
predicting trends, detecting fraud, and making data-driven business decisions.
Usage: Data mining is widely used in areas such as marketing analysis, customer segmentation,
recommendation systems, fraud detection, healthcare research, and financial forecasting.

Compiled By-Ms. Poonam Warnulkar BCA VI 2024-25 1


SHREE MEDHA DEGREE COLLEGE, BALLARI

Goals of Data Mining:


• The goal of data mining is to extract useful information from large datasets and use it to make
predictions or inform decision-making.
• Data mining is important because it allows organizations to uncover insights and trends in their
data that would be difficult or impossible to discover manually.
• This can help organizations make better decisions, improve their operations, and gain a
competitive advantage.
Data Mining History and Origins
One of the earliest and most influential pioneers of data mining was Dr. Herbert Simon, a Nobel
laureate in economics who is widely considered to be the father of artificial intelligence. In the
1950s and 1960s, Simon and his colleagues developed a number of algorithms and techniques
for extracting useful information and insights from data, including clustering, classification, and
decision trees.
In the 1980s and 1990s, the field of data mining continued to evolve, and new algorithms and
techniques were developed to address the challenges of working with large and complex data
sets. The development of data mining software and platforms, such as SAS, SPSS, and RapidMiner,
made it easier for organizations to apply data mining techniques to their data.
In recent years, the availability of large data sets and the growth of cloud computing and big data
technologies have made data mining even more powerful and widely used. Today, data mining is
a crucial tool for many organizations and industries and is used to extract valuable insights and
information from data sets in a wide range of domains.
Tasks of Data Mining
1. Classification: Categorizing data into predefined classes.
2. Clustering: Grouping similar data points together.
3. Regression: Predicting numerical values based on data relationships.
4. Association Rule Mining: Discovering interesting relationships between variables.
5. Anomaly Detection: Identifying unusual patterns in data.
6. Text Mining: Extracting insights from unstructured text data.
7. Prediction and Forecasting: Predicting future trends based on historical data.
8. Pattern Mining: Identifying recurring patterns in sequential data.
9. Feature Selection and Dimensionality Reduction: Identifying relevant features and
reducing dataset complexity.

Compiled By-Ms. Poonam Warnulkar BCA VI 2024-25 2


SHREE MEDHA DEGREE COLLEGE, BALLARI

Architecture of Data Mining

Data mining architecture typically consists of several components:


1. Data Sources: These are the repositories of data where the raw information
resides. Sources can include databases, data warehouses, websites, and more.
2. Data Cleaning and Integration: This stage involves preprocessing the data to
ensure its quality and compatibility for mining. It includes tasks like removing
noise, handling missing values, and integrating data from different sources.
3. Data Selection and Transformation: Here, relevant data subsets are selected for
analysis based on the mining goals. The selected data may also undergo
transformation to better suit the mining algorithms.
4. Data Mining Engine: This is the core component where various data mining
algorithms are applied to the prepared data to discover patterns, trends, and
insights.
5. Pattern Evaluation: Once patterns are discovered, they need to be evaluated for
their relevance, validity, and usefulness. This step often involves statistical
techniques and domain expertise.
6. Knowledge Presentation: Finally, the discovered knowledge is presented to users
in a comprehensible format, such as reports, visualizations, or dashboards, to aid
in decision making.

Compiled By-Ms. Poonam Warnulkar BCA VI 2024-25 3


SHREE MEDHA DEGREE COLLEGE, BALLARI

Data Mining Process

The data mining process typically involves several key stages:


1. Understanding the Business Problem: The first step is to clearly understand the
business problem or objective that data mining aims to address. This involves
collaborating closely with domain experts to identify key questions and goals.
2. Data Collection: In this stage, relevant data is gathered from various sources such as
databases, data warehouses, spreadsheets, or even web scraping. The data collected
should be comprehensive and representative of the problem domain.
3. Data Preprocessing: Raw data often requires preprocessing to ensure its quality and
suitability for analysis. This includes tasks such as cleaning data to remove errors and
inconsistencies, handling missing values, and transforming data into a suitable format
for analysis.
4. Exploratory Data Analysis (EDA): EDA involves examining the collected data to
understand its characteristics, identify patterns, and detect outliers or anomalies.
Techniques such as descriptive statistics, data visualization, and clustering may be
used during this stage.
5. Feature Selection and Engineering: Feature selection involves identifying the most
relevant variables (features) that will be used for analysis, while feature engineering
may involve creating new features or transforming existing ones to enhance the
predictive power of the model.
6. Model Selection and Training: Based on the nature of the problem and the available
data,suitable data mining algorithms or models are selected. These may include
techniques such as decision trees, neural networks, support vector machines, or
clustering algorithms. The selected models are then trained on the prepared data.
7. Model Evaluation: Trained models need to be evaluated to assess their performance
and generalization ability. This involves using evaluation metrics such as accuracy,
precision, recall, or F1-score, and techniques such as cross- validation to ensure
robustness.
8. Model Deployment: Once a satisfactory model is obtained, it is deployed into
production to make predictions or generate insights on new, unseen data. This may
involve integrating the model into existing systems or workflows.
9. Monitoring and Maintenance: Deployed models should be regularly monitored to
ensure they continue to perform effectively over time. This may involve monitoring
for concept drift (changes in the underlying data distribution) and updating the model
or its parameters as necessary.

Classification of data mining


Classification Based on the mined Databases
A data mining system can be classified based on the types of databases that have been mined.
A database system can be further segmented based on distinct principles, such as data models,
types of data, etc., which further assist in classifying a data mining system.
For example, if we want to classify a database based on the data model, we need to select
either relational, transactional, object-relational or data warehouse mining systems.

Compiled By-Ms. Poonam Warnulkar BCA VI 2024-25 4


SHREE MEDHA DEGREE COLLEGE, BALLARI

Classification Based on the type of Knowledge Mined


A data mining system categorized based on the kind of knowledge mind may have the following
functionalities:
1. Characterization
2. Discrimination
3. Association and Correlation Analysis
4. Classification
5. Prediction
6. Outlier Analysis
7. Evolution Analysis

Classification Based on the Techniques Utilized


A data mining system can also be classified based on the type of techniques that are being
incorporated.
These techniques can be assessed based on the involvement of user interaction involved or the
methods of analysis employed.
Classification Based on the Applications Adapted
Data mining systems classified based on adapted applications adapted are as follows:
• Finance
• Telecommunications
• DNA
• Stock Markets
• E-mail

What is KDD (Knowledge Discovery in Databases)?

KDD is a computer science field specializing in extracting previously unknown and interesting
information from raw data. KDD is the whole process of trying to make sense of data by
developing appropriate methods or techniques. The following steps are included in KDD process:
Data Cleaning
Data cleaning is defined as removal of noisy and irrelevant data from collection.
• Cleaning in case of Missing values.
• Cleaning noisy data, where noise is a random or variance error.
• Cleaning with Data discrepancy detection and Data transformation tools.
Data Integration
Data integration is defined as heterogeneous data from multiple sources combined in a common
source(DataWarehouse). Data integration using Data Migration tools, Data Synchronization tools
and ETL(Extract-Load- Transformation) process.
Data Selection
Data selection is defined as the process where data relevant to the analysis is decided and
retrieved from the data collection. For this we can use Neural network, Decision Trees, Naive
bayes, Clustering, and Regression methods.

Compiled By-Ms. Poonam Warnulkar BCA VI 2024-25 5


SHREE MEDHA DEGREE COLLEGE, BALLARI

Data Transformation
Data Transformation is defined as the process of transforming data into appropriate form
required by mining procedure. Data Transformation is a two step process:
• Data Mapping: Assigning elements from source base to destination to capture
transformations.
• Code generation: Creation of the actual transformation program.
Data Mining
Data mining is defined as techniques that are applied to extract patterns potentially useful. It
transforms task relevant data into patterns, and decides purpose of model using classification or
characterization.
Pattern Evaluation
Pattern Evaluation is defined as identifying strictly increasing patterns representing knowledge
based on given measures. It find interestingness score of each pattern, and uses summarization
and Visualization to make data understandable by user.
Knowledge Representation
This involves presenting the results in a way that is meaningful and can be used to make decisions.

Compiled By-Ms. Poonam Warnulkar BCA VI 2024-25 6


SHREE MEDHA DEGREE COLLEGE, BALLARI

Difference between KDD and Data Mining


Parameter KDD Data Mining
Definition KDD refers to a process of identifying valid, Data Mining refers to a
novel, potentially useful, and ultimately process of extracting useful
understandable patterns and relationships and valuable information or
in data. patterns from large data sets.

Objective To find useful knowledge from data. To extract useful


information from data.
Techniques Data cleaning, data integration, data Association rules,
Used selection, data transformation, data mining, classification, clustering,
pattern evaluation, and knowledge regression, decision trees,
representation and neural networks, and
visualization. dimensionality reduction.
Output Structured information, such as rules and Patterns, associations, or
models, that can be used to make decisions insights that can be used to
or predictions improve decision-making or
understanding
Focus Focus is on the discovery of useful Data mining focus is on the
knowledge, rather than simply finding discovery of patterns or
patterns in data. relationships in data.
Role of Domain expertise is important in KDD, as it Domain expertise is
domain helps in defining the goals of the process, important in KDD, as it helps
expertise choosing appropriate data, and interpreting in defining the goals of the
the results. process, choosing
appropriate data, and
interpreting the results.

What is the difference between DBMS and Data mining?

Main Difference: DBMS is the infrastructure for storing and managing data, while data mining is
a process of analyzing and extracting knowledge from the data stored in the DBMS.
Scope: DBMS focuses on efficiently managing and storing data, ensuring data integrity and
security. Data mining, on the other hand, focuses on analyzing data to discover meaningful
patterns and insights.
Purpose: DBMS is used for data storage, retrieval, and management. Data mining is used for
knowledge discovery and gaining insights from the data.
Functionality: DBMS provides functionalities for data storage, retrieval, and manipulation. Data
mining employs algorithms and statistical techniques to identify patterns and relationships within
the data.
Role: DBMS serves as the foundation for data storage and retrieval, enabling efficient data
handling. Data mining is a process that builds on top of the data stored in the DBMS to extract
valuable information.

Compiled By-Ms. Poonam Warnulkar BCA VI 2024-25 7


SHREE MEDHA DEGREE COLLEGE, BALLARI

What is OLAP?
OLAP stands for Online Analytical Processing. It is a computing method that allows users to
extract useful information and query data in order to analyze it from different angles. For
example, OLAP business intelligence queries usually aid in financial reporting, budgeting, predict
future sales, trends analysis and other purposes. It enables the user to analyze database
information from different database systems simultaneously. OLAP data is stored in
multidimensional databases.
OLAP and data mining look similar since they operate on data to gain knowledge, but the major
difference is how they operate on data. OLAP tools provide multidimensional data analysis and a
summary of the data.

Key features of OLAP


• It supports complex calculations
• Time intelligence
• It has a multidimensional view of data
• Business-focused calculations
• Flexible and self-service reporting
• Applications of OLAP
• Database Marketing
• Marketing and sales analysis

Data Mining Vs. OLAP


Data Mining OLAP
Data mining refers to the field of computer OLAP is a technology of immediate access to
science, which deals with the extraction of data with the help of multidimensional
data, trends and patterns from huge sets of structures.
data.
It deals with the data summary. It deals with detailed transaction-level data
It is discovery-driven. It is query driven.
It is used for future data prediction. It is used for analyzing past data.
It has huge numbers of dimensions. It has a limited number of dimensions.
Bottom-up approach. Top-down approach.
It is an emerging field. It is widely used.

Data Mining as a Whole Process


The whole process of Data Mining consists of three main phases:
Data Pre-processing – Data cleaning, integration, selection, and transformation takes place
Data Extraction – Occurrence of exact data mining
Data Evaluation and Presentation – Analyzing and presenting results

Compiled By-Ms. Poonam Warnulkar BCA VI 2024-25 8


SHREE MEDHA DEGREE COLLEGE, BALLARI

What are the Data Mining Techniques?

Data mining techniques are algorithms and methods used to extract information and insights
from data sets.

1. Regression
Regression is a data mining technique that is used to model the relationship between a
dependent variable and one or more independent variables. In regression analysis, the goal is to
fit a mathematical model to the data that can be used to make predictions or forecasts about the
dependent variable based on the values of the independent variables.
There are many different types of regression models, including linear regression, logistic
regression, and non-linear regression. In general, regression models are used to answer questions
such as:
• What is the relationship between the dependent and independent variables?
• How well does the model fit the data?
• How accurate are the predictions or forecasts made by the model?

Compiled By-Ms. Poonam Warnulkar BCA VI 2024-25 9


SHREE MEDHA DEGREE COLLEGE, BALLARI

2.Classification
Classification is a data mining technique that is used to predict the class or category of an item or
instance based on its characteristics or attributes. There are many different types of classification
models, including decision trees, k-nearest neighbours, and support vector machines. In general,
classification models are used to answer questions such as:
• What is the relationship between the classes and the attributes
• How well does the model fit the data?
• How accurate are the predictions made by the model?
3.Clustering
Clustering is a data mining technique that is used to group items or instances in a data set into
clusters or groups based on their similarity or proximity. In clustering analysis, the goal is to
identify and explore the natural structure or organization of the data, and to uncover hidden
patterns and relationships.
There are many different types of clustering algorithms, including k-means clustering, hierarchical
clustering, and density-based clustering. In general, clustering is used to answer questions such
as:
• What is the natural structure or organization of the data?
• What are the main clusters or groups in the data?
• How similar or dissimilar are the items in the data set?

4.Association rule mining


Association rule mining is a data mining technique that is used to identify and explore
relationships between items or attributes in a data set. In association rule mining, the goal is to
identify patterns and rules that describe the co-occurrence or occurrence of items or attributes
in the data set and to evaluate the strength and significance of these patterns and rules.

Compiled By-Ms. Poonam Warnulkar BCA VI 2024-25 10


SHREE MEDHA DEGREE COLLEGE, BALLARI

There are many different algorithms and methods for association rule mining, including the
Apriori algorithm and the FP-growth algorithm. In general, association rule mining is used to
answer questions such as
• What are the main patterns and rules in the data?
• How strong and significant are these patterns and rules?
• What are the implications of these patterns and rules for the data set and the domain?

5.Dimensionality Reduction

Dimensionality reduction is a data mining technique that is used to reduce the number of
dimensions or features in a data set while retaining as much information and structure as
possible. There are many different methods for dimensionality reduction, including principal
component analysis (PCA), independent component analysis (ICA), and singular value
decomposition (SVD). In general, dimensionality reduction is used to answer questions such as:
• What are the main dimensions or features in the data set?
• How much information and structure can be retained in a lower- dimensional space?
• How can the data be visualized and analyzed in a lower-dimensional space?

Compiled By-Ms. Poonam Warnulkar BCA VI 2024-25 11


SHREE MEDHA DEGREE COLLEGE, BALLARI

6.Anomaly Detection: Anomaly detection identifies outliers or anomalies in data that deviate
from normal patterns. It is used for detecting fraud, network intrusions, and equipment failures.
Techniques include statistical methods, clustering-based approaches, and machine learning
algorithms such as isolation forests and one-class SVM(Support Vector Machine).

7.Sequential Pattern Mining: Sequential pattern mining discovers patterns that occur
sequentially or temporally in data. It is used in applications such as analyzing customer behavior
over time or identifying patterns in sequences of events. Examples include the Prefix Span
algorithm and the GSP (Generalized Sequential Pattern)algorithm.

8.Text Mining: Text mining techniques extract useful information from unstructured text data.
This includes tasks such as sentiment analysis, topic modeling, named entity recognition, and
document classification. Techniques such as natural language processing (NLP) and machine
learning algorithms are commonly used in text mining.

Benefits of Data Mining


• Improved decision-making: Data mining can provide valuable insights that can help
organizations make better decisions by identifying patterns and trends in large data sets.
• Increased efficiency: Data mining can automate repetitive and time-consuming tasks, such
as data cleaning and preparation, which can help organizations save time and resources.

Compiled By-Ms. Poonam Warnulkar BCA VI 2024-25 12


SHREE MEDHA DEGREE COLLEGE, BALLARI

• Enhanced competitiveness: Data mining can help organizations gain a competitive edge by
uncovering new business opportunities and identifying areas for improvement.
• Improved customer service: Data mining can help organizations better understand their
customers and tailor their products and services to meet their needs.
• Fraud detection: Data mining can be used to identify fraudulent activities by detecting
unusual patterns and anomalies in data.
• Predictive modeling: Data mining can be used to build models that can predict future
events and trends, which can be used to make proactive decisions.
• New product development: Data mining can be used to identify new product opportunities
by analyzing customer purchase patterns and preferences.
• Risk management: Data mining can be used to identify potential risks by analyzing data on
customer behavior, market conditions, and other factors.
Challenges and Issues in Data Mining
1. Data Quality
The quality of data used in data mining is one of the most significant challenges. The accuracy,
completeness, and consistency of the data affect the accuracy of the results obtained. The
data may contain errors, omissions, duplications, or inconsistencies, which may lead to
inaccurate results.
To address these challenges, data mining practitioners must apply data cleaning and data
preprocessing techniques to improve the quality of the data
2. Data Complexity
Data complexity refers to the vast amounts of data generated by various sources, such as
sensors, social media, and the internet of things (IoT). The complexity of the data may make it
challenging to process, analyze, and understand. In addition, the data may be in different
formats, making it challenging to integrate into a single dataset.
To address this challenge, data mining practitioners use advanced techniques such as
clustering, classification, and association rule mining.
3. Data Privacy and Security
Data privacy and security is another significant challenge in data mining. As more data is
collected, stored, and analyzed, the risk of data breaches and cyber-attacks increases. The data
may contain personal, sensitive, or confidential information that must be protected.
Moreover, data privacy regulations such as GDPR, CCPA, and HIPAA impose strict rules on how
data can be collected, used, and shared.
To address this challenge, data mining practitioners must apply data anonymization and data
encryption techniques to protect the privacy and security of the data. Data anonymization
involves removing personally identifiable information (PII) from the data, while data
encryption involves using algorithms to encode the data to make it unreadable to
unauthorized users.
4. Scalability
Data mining algorithms must be scalable to handle large datasets efficiently. As the size of the
dataset increases, the time and computational resources required to perform data mining
operations also increase.

Compiled By-Ms. Poonam Warnulkar BCA VI 2024-25 13


SHREE MEDHA DEGREE COLLEGE, BALLARI

To address this challenge, data mining practitioners use distributed computing frameworks
such as Hadoop and Spark.
5. Interpretability
Data mining algorithms can produce complex models that are difficult to interpret. This is
because the algorithms use a combination of statistical and mathematical techniques to
identify patterns and relationships in the data.
Data Mining Applications

Scientific Analysis: Scientific simulations are generating bulks of data every day. This includes
data collected from nuclear laboratories, data about human psychology, etc. Data mining
techniques are capable of the analysis of these data. Now we can capture and store more new
data faster than we can analyze the old data already accumulated. Example of scientific
analysis:
 Sequence analysis in bioinformatics
 Classification of astronomical objects
 Medical decision support.

Intrusion Detection: A network intrusion refers to any unauthorized activity on a


digital network. Network intrusions often involve stealing valuable network resources. Data
mining technique plays a vital role in searching intrusion detection, network attacks, and
anomalies. These techniques help in selecting and refining useful and relevant information from
large data sets. Data mining technique helps in classify relevant data for Intrusion Detection
System. Intrusion Detection system generates alarms for the network traffic about the foreign
invasions in the system. For example:
 Detect security violations
 Misuse Detection
 Anomaly Detection

Compiled By-Ms. Poonam Warnulkar BCA VI 2024-25 14


SHREE MEDHA DEGREE COLLEGE, BALLARI

Business Transactions: Every business industry is memorized for perpetuity. Such transactions
are usually time-related and can be inter-business deals or intra-business operations. The
effective and in-time use of the data in a reasonable time frame for competitive decision-
making is definitely the most important problem to solve for businesses that struggle to survive
in a highly competitive world. Data mining helps to analyze these business transactions and
identify marketing approaches and decision-making. Example :

 Direct mail targeting


 Stock trading
 Customer segmentation
 Churn prediction (Churn prediction is one of the most popular Big Data use cases in
business)

Market Basket Analysis: Market Basket Analysis is a technique that gives the careful study of
purchases done by a customer in a supermarket. This concept identifies the pattern of frequent
purchase items by customers. This analysis can help to promote deals, offers, sale by the
companies and data mining techniques helps to achieve this analysis task. Example:
 Data mining concepts are in use for Sales and marketing to provide better customer
service, to improve cross-selling opportunities, to increase direct mail response rates.
 Customer Retention in the form of pattern identification and prediction of likely
defections is possible by Data mining.
 Risk Assessment and Fraud area also use the data-mining concept for identifying
inappropriate or unusual behavior etc.

Education: For analyzing the education sector, data mining uses Educational Data Mining (EDM)
method. This method generates patterns that can be used both by learners and educators. By
using data mining EDM we can perform some educational task:
 Predicting students admission in higher education
 Predicting students profiling
 Predicting student performance
 Teachers teaching performance

Compiled By-Ms. Poonam Warnulkar BCA VI 2024-25 15


SHREE MEDHA DEGREE COLLEGE, BALLARI

 Curriculum development
 Predicting student placement opportunities

Research: A data mining technique can perform predictions, classification, clustering,


associations, and grouping of data with perfection in the research area. Rules generated by data
mining are unique to find results. In most of the technical research in data mining, we create a
training model and testing model. The training/testing model is a strategy to measure the
precision of the proposed model. It is called Train/Test because we split the data set into two
sets: a training data set and a testing data set. A training data set used to design the training
model whereas testing data set is used in the testing model. Example:
 Classification of uncertain data.
 Information-based clustering.
 Decision support system
 Web Mining
 Domain-driven data mining
 IoT (Internet of Things)and Cybersecurity
 Smart farming IoT(Internet of Things)

Healthcare and Insurance: A Pharmaceutical sector can examine its new deals force activity and
their outcomes to improve the focusing of high-value physicians and figure out which promoting
activities will have the best effect in the following upcoming months, Whereas the Insurance
sector, data mining can help to predict which customers will buy new policies, identify behavior
patterns of risky customers and identify fraudulent behavior of customers.
 Claims analysis i.e which medical procedures are claimed together.
 Identify successful medical therapies for different illnesses.
 Characterizes patient behavior to predict office visits.

Transportation: A diversified transportation company with a large direct sales force can apply
data mining to identify the best prospects for its services. A large consumer merchandise
organization can apply information mining to improve its business cycle to retailers.
 Determine the distribution schedules among outlets.
 Analyze loading patterns.

Financial/Banking Sector: A credit card company can leverage its vast warehouse of customer
transaction data to identify customers most likely to be interested in a new credit product.
 Credit card fraud detection.
 Identify ‘Loyal’ customers.
 Extraction of information related to customers.
 Determine credit card spending by customer groups.

Compiled By-Ms. Poonam Warnulkar BCA VI 2024-25 16

You might also like