0% found this document useful (0 votes)
5 views

Data mining

Data mining is the process of discovering knowledge from large datasets, also known as KDD, which aims to extract hidden patterns and meaningful information. It involves various technologies such as statistics, artificial intelligence, and machine learning, and is utilized in applications like market analysis, fraud detection, and weather forecasting. While data mining offers advantages like efficient data analysis and informed decision-making, it also faces challenges including complex tools, privacy concerns, and the need for large databases.

Uploaded by

Daniyal Sajid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Data mining

Data mining is the process of discovering knowledge from large datasets, also known as KDD, which aims to extract hidden patterns and meaningful information. It involves various technologies such as statistics, artificial intelligence, and machine learning, and is utilized in applications like market analysis, fraud detection, and weather forecasting. While data mining offers advantages like efficient data analysis and informed decision-making, it also faces challenges including complex tools, privacy concerns, and the need for large databases.

Uploaded by

Daniyal Sajid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

What is data mining?

● Process of discovering or mining knowledge from large amount of data


● Another term for data mining is KDD (knowledge discovery from Data)
● Attempts to extract hidden patterns from large database
● Support arithmetic exploration of data

Need Of data mining:


● The size of the database is very large so it is difficult to make changes
in data manually so we need automatic analysis which is done by data
mining.
● Finding hidden info in the database.
● Called as Exploratory data analysis, data driven and deductive learning
● Extracting meaningful Information.

Evaluation Of data mining:


● Introduced in 1990’s

Mining Technologies:

1) Statistics:
Regression analysis, Clustering analysis standard deviation
lays the foundation of data mining.

2) Artificial Intelligence:
Apply the Human thought like processing.

3) Machine Learning:
Union of Statics and Ai.
About learning by the software of data
Data mining process:
The data mining process can be broken down into these four primary stages:
Data mining process:

1. Shopping Market Analysis: Market basket analysis is basically a modeling


approach that is based on the notion that if you purchase one set of products,
you're more likely to purchase another set of items. This strategy may help a
retailer understand a buyer's purchasing habits. Using differential analysis, data
from different businesses and consumers from different demographic groups
may be compared.
2. Weather Forecasting Analysis: For prediction, weather forecasting systems
rely on massive amounts of historical data. Because massive amounts of data
are being processed, the appropriate data mining approach must be used.
3. Stock Market Analysis:In the stock market, there is a massive amount of data
to be analyzed. As a result, data mining techniques are utilized to model such
data in order to do the analysis.
4. Intrusion Detection: data mining can assist to enhance intrusion detection by
focusing on anomaly detection. It assists an analyst in distinguishing between
unusual network activity and normal network activity.
5. Fraud Detection:Traditional techniques of fraud detection are time-consuming
and difficult due to the amount of data. Data mining aids in the discovery of
relevant patterns and the transformation of data into information.
6. Surveillance: video surveillance is utilized practically everywhere in everyday
life for security perception. Because we must deal with a huge volume of
acquired data, data mining is employed in video surveillance.
7. Financial Banking:With each new transaction in computerized banking, a
massive amount of data is expected to be created. By identifying patterns,
causalities, and correlations in corporate data, data mining may help solve
business challenges in banking and finance.
Advantages of Data Mining:

● It helps companies gather reliable information


● It’s an efficient, cost-effective solution compared to other data applications
● It helps businesses make profitable production and operational adjustments
● Data mining uses both new and legacy systems
● It helps businesses make informed decisions
● It helps detect credit risks and fraud
● It helps data scientists easily analyze enormous amounts of data quickly
● Data scientists can use the information to detect fraud, build risk models, and
improve product safety
● It helps data scientists quickly initiate automated predictions of behaviors and
trends and discover hidden patterns.

Disadvantages to Data Mining:

Nothing’s perfect, including data mining. These are the major issues in data mining:

● Many data analytics tools are complex and challenging to use. Data scientists
need the right training to use the tools effectively.
● Speaking of the tools, different ones work with varying types of data mining,
depending on the algorithms they employ. Thus, data analysts must be sure to
choose the correct tools.
● Data mining techniques are not infallible, so there’s always the risk that the
information isn’t entirely accurate. This obstacle is especially relevant if there’s
a lack of diversity in the dataset.
● Companies can potentially sell the customer data they have gleaned to other
businesses and organizations, raising privacy concerns.
● Data mining requires large databases, making the process hard to manage.

Challenges of Implementation in Data Mining:

● Distributed Data

Real-world data saved on several platforms, such as databases, individual systems, or


the Internet, cannot be transferred to a centralized repository.

● Complex Data

It takes a long time and money to process big amounts of complicated data. Data in the
real world is structured, unstructured,semi-structured, and heterogeneous forms,
including multimedia such as photos, music, video, natural language text etc

● Domain Knowledge

It is simpler to dig some information with domain expertise, without which collecting
useful information from data might be tough.

● Data Visualization

The first interaction that presents the result correctly to the client is data visualization.
The information is conveyed with unique relevance based on its intended use.

● Incomplete Data
Large data amounts might be imprecise or unreliable owing to measurement equipment
problems. Customers that refuse to disclose their personal information may result in
incomplete data, which may be updated owing to system failures, resulting in noisy
data, making the data mining procedure difficult.

● Security and Privacy

Decision-making techniques necessitate security through data exchange for people,


organizations, and the government. Private and sensitive information about individuals
is gathered for customer profiles in order to better understand user activity trends.
Illegal access and the confidentiality of the information are significant issues here.

● Higher Costs

The expenses linked with purchasing and maintaining strong servers, software, and
hardware for handling massive amounts of data might be too expensive.

● Performance Issues

The performance of a data mining system is determined by the methods and


techniques utilized, which might have an impact on data mining performance. Large
database volumes, data flow, and data mining challenges can all contribute to the
development of parallel and distributed data mining methods.
Techniques of Data Mining:

● Association Rule Learning


This toolset, also called market basket analysis, searches for relationships
among dataset variables. For example, association rule learning can
determine which products are frequently purchased together (e.g., a
smartphone and a protective case).
● Clustering
This process partitions datasets into a set of meaningful sub-classes, known
as clusters. The process helps users understand the natural structure or
grouping within the data.
● Classification
This technique assigns particular items in a dataset to different target
categories or classes. The goal is to develop accurate predictions within the
target class for each case in the data.
● Data Analytics
The data analytics process enables professionals to evaluate digital
information and turn it into useful business intelligence.
● Data Cleansing and Preparation
This technique transforms the data into a form optimal for further analysis
and processing. Preparation includes activities such as identifying and
removing errors and missing or duplicate data.
● Data Warehousing
Data warehousing consists of an extensive collection of business data that
businesses use to help them make decisions. Warehousing is a fundamental
and necessary component of most large-scale data mining efforts.
● Machine Learning
Related to the AI technique mentioned earlier, machine learning is a computer
programming technique that employs statistical probabilities to provide
computers with the ability to learn without human intervention or being
manually programmed.
● Regression
The regression technique predicts a range of numeric values in categories
such as sales, stock prices, or even temperature. The ranges are based on the
information found in a particular data set.

You might also like