0% found this document useful (0 votes)
21 views8 pages

Seminar On Data Mining Concepts and Its

The document discusses data mining as a crucial process for extracting useful information from large data sets, involving various techniques such as association, classification, clustering, and prediction. It outlines the steps in data mining, including data cleaning, integration, selection, transformation, and presentation, emphasizing its applications in fields like market analysis, bioinformatics, and education. The conclusion highlights the importance of restructuring data to apply different mining techniques effectively.

Uploaded by

vinaynaidu6872
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views8 pages

Seminar On Data Mining Concepts and Its

The document discusses data mining as a crucial process for extracting useful information from large data sets, involving various techniques such as association, classification, clustering, and prediction. It outlines the steps in data mining, including data cleaning, integration, selection, transformation, and presentation, emphasizing its applications in fields like market analysis, bioinformatics, and education. The conclusion highlights the importance of restructuring data to apply different mining techniques effectively.

Uploaded by

vinaynaidu6872
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

SEMINAR

on
DATA MINING: CONCEPTS AND ITS TECHNIQUES

Submitted From Submitted To


Arun Bhagat Miss. Sakshi Malhotra
M.Lib
Roll.No:-05

Department of Library and Information Science


University of Jammu
Introduction

We are in an age often referred to as the information age. There is a huge amount
of data analysis in the information industry. This data is of no use until it is
converted into useful information. It is necessary to analyze this huge amount of
data and extract useful information from it.

Extraction of information is not only the single process, data mining also provides
other processes such as data cleaning, data integration, data transformation, pattern
evaluation, and data presentation.

Once all these processes are over, we would be able to use this information in
many applications such as fraud, detection, market analysis, science exploration,
etc.

Concept

Data mining is the process of discovering patterns in large data sets involving
methods at the intersection of machine learning, statistics, and database
systems. Data mining is an interdisciplinary subfield of computer science with an
overall goal to extract information (with intelligent methods) from a data set and
transform the information into a comprehensible structure for further use. Data
mining is the analysis step of the "knowledge discovery in databases" process, or
KDD. The term "data mining" is in fact a misnomer, because the goal is the
extraction of patterns and knowledge from large amounts of data, not the extraction
(mining) of data itself.

Definitions

1.“Data mining is the process of analyzing hidden patterns of data according to


different perspectives for categorization into useful information, which is collected
and assembled in common areas, such as data warehouses, for efficient analysis,
data mining algorithms, facilitating business decision making and other
information requirements to ultimately cut costs and increase revenue”.
2.“The process of collecting, searching through, and analyzing a large amountof da
ta in a database, as to discover patterns or relationships”

Data mining is also known as Knowledge Discovery in Data (KDD)

1.Data cleaning (to remove noise and inconsistent data)

2.Data integration (where multiple data sources may be combined)

3.Data selection (where data relevant to the analysis task are retrieved from the
database)

4.Data transformation (where data are transformed and consolidated into forms
appropriate for mining by performing summary or aggregation operations)

5.Data mining (an essential process where intelligent methods are applied to
extract data patterns)
6.Pattern evaluation (to identify the truly interesting patterns representing
knowledge based on interestingness measures.

7.Knowledge presentation (where visualization and knowledge representation


techniques are used to present mined knowledge to users)

TECHNIQUES

Several core techniques that are used in data mining describe the type of mining
and data recovery operation. These are the following:

1.Association

Association is one of the best-known data mining technique. In association, a


pattern is discovered based on a relationship between items in the same transaction.
That is the reason why association technique is also known as relation technique.
The association technique is used in market basket analysis to identify a set of
products that customers frequently purchase together. Retailers are using
association technique to research customer’s buying habits. Based on historical
sale data, retailers might find out that customers always buy crisps when they buy
beers, and, therefore, they can put beers and crisps next to each other to save time
for the customer and increase sales.

2.Classification

Classification is a classic data mining technique based on machine learning.


Basically, classification is used to classify each item in a set of data into one of a
predefined set of classes or groups. Classification method makes use of
mathematical techniques such as decision trees, linear programming, neural
network, and statistics. In classification, we develop the software that can learn
how to classify the data items into groups. For example, we can apply
classification in the application that “given all records of employees who left the
company, predict who will probably leave the company in a future period.” In this
case, we divide the records of employees into two groups that named “leave” and
“stay”. And then we can ask our data mining software to classify the employees
into separate groups
3.Clustering

Clustering is a data mining technique that makes a meaningful or useful cluster of


objects which have similar characteristics using the automatic technique. The
clustering technique defines the classes and puts objects in each class, while in the
classification techniques, objects are assigned into predefined classes. To make the
concept clearer, we can take book management in the library as an example. In a
library, there is a wide range of books on various topics available. The challenge is
how to keep those books in a way that readers can take several books on a
particular topic without hassle. By using the clustering technique, we can keep
books that have some kinds of similarities in one cluster or one shelf and label it
with a meaningful name. If readers want to grab books in that topic, they would
only have to go to that shelf instead of looking for the entire library.

4.Prediction

The prediction, as its name implied, is one of a data mining techniques that
discovers the relationship between independent variables and relationship between
dependent and independent variables. For instance, the prediction analysis
technique can be used in the sale to predict profit for the future if we consider the
sale is an independent variable, profit could be a dependent variable. Then based
on the historical sale and profit data, we can draw a fitted regression curve that is
used for profit prediction.

5.Regression

Regression, used primarily as a form of planning and modeling, is used to identify


the likelihood of a certain variable, given the presence of other variables. For
example, you could use it to project a certain price, based on other factors like
availability, consumer demand, and competition. More specifically, regression’s
main focus is to help you uncover the exact relationship between two (or more)
variables in a given data set.

6.Sequential patterns
Sequential patterns analysis is one of data mining technique that seeks to discover
or identify similar patterns, regular events or trends in transaction data over a
business period.

In sales, with historical transaction data, businesses can identify a set of items that
customers buy together different times in a year. Then businesses can use this
information to recommend customers buy it with better deals based on their
purchasing frequency in the past.

7.Decision trees

A decision tree is one of the most commonly used data mining techniques because
its model is easy to understand for users. In decision tree technique, the root of the
decision tree is a simple question or condition that has multiple answers. Each
answer then leads to a set of questions or conditions that help us determine the data
so that we can make the final decision based on it.
9. Tracking patterns.

One of the most basic techniques in data mining is learning to recognize patterns in
your data sets. This is usually a recognition of some aberration in your data
happening at regular intervals, or an ebb and flow of a certain variable over time.
For example, you might see that your sales of a certain product seem to spike just
before the holidays, or notice that warmer weather drives more people to your
website.

10.Outlier detection.

In many cases, simply recognizing the overarching pattern can’t give you a clear
understanding of your data set. You also need to be able to identify anomalies, or
outliers in your data. For example, if your purchasers are almost exclusively male,
but during one strange week in July, there’s a huge spike in female purchasers,
you’ll want to investigate the spike and see what drove it, so you can either
replicate it or better understand your audience in the process

APPLICATION OF DATA MINING

Market Basket Analysis

 Market basket analysis is a modeling technique based upon a theory that if


you buy a certain group of items you are more likely to buy another group of
items.
 This information may help the retailer to know the buyer’s needs and retailer
can enhance the store’s layout.

Bio Information

 Mining biological data helps to extract useful knowledge from massive


datasets gathered in biology and in other related life sciences area.
 Application of data mining to bioinformatics includes gene finding, protein
function inference, disease diagnosis, disease treatment.
Education

 Data mining can be used by an institution to take accurate decisions and also
to predict the results of the students.
 Learning pattern of the students can be captured and used to develop
techniques to teach them.

Customer Relationships Management (CRM)

 To maintain a proper relationship with a customer a business need to collect


data and analyze the information.
 With data mining technologies the collected data can be used for analysis.

CONCLUSION

Data mining is more than running some complex queries on the data you stored in
your database. You must work with your data, reformat it, or restructure it,
regardless of whether you are using SQL, document-based databases such as
Hadoop, or simple flat files. Identifying the format of the information that you
need is based upon the technique and the analysis that you want to do. After you
have the information in the format you need, you can apply the different
techniques (individually or together) regardless of the required underlying data
structure or data set.

You might also like