Data Mining-Introduction
Data Mining-Introduction
to extract valuable information from huge sets of data. Data mining is also called Knowledge
Discovery in Database (KDD).
● The process of extracting information to identify patterns, trends, and useful data that
would allow the business to take the data-driven decision from huge sets of data is
called Data Mining.
● Data Mining carried out by a person, in a specific situation, on a particular data set, with an
objective.
● This process includes various types of services such as text mining, web mining, audio and
Relational Database:
Data warehouses:
A Data Warehouse is the technology that collects the data from various sources within the
organization to provide meaningful business insights. The huge amount of data comes from
multiple places such as Marketing and Finance. The extracted data is utilized for analytical
purposes and helps in decision- making for a business organization. The data warehouse is
designed for the analysis of data rather than transaction processing.
Data Repositories:
The Data Repository generally refers to a destination for data storage. However, many IT
professionals utilize the term more clearly to refer to a specific kind of setup within an IT
structure. For example, a group of databases, where an organization has kept various kinds of
information.
Object-Relational Database:
One of the primary objectives of the Object-relational data model is to close the gap between
the Relational database and the object-oriented model practices frequently utilized in many
programming languages, for example, C++, Java, C#, and so on.
Transactional Database:
A transactional database refers to a database management system (DBMS) that has the
potential to undo a database transaction if it is not performed appropriately. Even though this
was a unique capability a very long while back, today, most of the relational database
systems support transactional database activities.
o It is a quick process that makes it easy for new users to analyse enormous amounts of data in
a short time.
Data Mining is primarily used by organizations with intense consumer demands- Retail,
Communication, Financial, marketing company, determine price, consumer preferences,
product positioning, and impact on sales, customer satisfaction, and corporate profits. Data
mining enables a retailer to use point-of-sale records of customer purchases to develop
products and promotions that help the organization to attract the customer.
These are the following areas where data mining is widely used:
Data mining in healthcare has excellent potential to improve the health system. It uses data
and analytics for better insights and to identify best practices that will enhance health care
services and reduce costs. Analysts use data mining approaches such as Machine learning,
Multi-dimensional database, Data visualization, Soft computing, and statistics. Data Mining
can be used to forecast patients in each category. The procedures ensure that the patients get
intensive care at the right place and at the right time. Data mining also enables healthcare
insurers to recognize fraud and abuse.
Market basket analysis is a modeling method based on a hypothesis. If you buy a specific
group of products, then you are more likely to buy another group of products. This technique
may enable the retailer to understand the purchase behavior of a buyer. This data may assist
the retailer in understanding the requirements of the buyer and altering the store's layout
accordingly. Using a different analytical comparison of results between various stores,
between customers in different demographic groups can be done.
Data mining in Education:
Education data mining is a newly emerging field, concerned with developing techniques that
explore knowledge from the data generated from educational Environments. EDM objectives
are recognized as affirming student's future learning behavior, studying the impact of
educational support, and promoting learning science. An organization can use data mining to
make precise decisions and also to predict the results of the student. With the results, the
institution can concentrate on what to teach and how to teach.
Knowledge is the best asset possessed by a manufacturing company. Data mining tools can
be beneficial to find patterns in a complex manufacturing process. Data mining can be used in
system-level designing to obtain the relationships between product architecture, product
portfolio, and data needs of the customers. It can also be used to forecast the product
development period, cost, and expectations among the other tasks.
Customer Relationship Management (CRM) is all about obtaining and holding Customers,
also enhancing customer loyalty and implementing customer-oriented strategies. To get a
decent relationship with the customer, a business organization needs to collect data and
analyze the data. With data mining technologies, the collected data can be used for analytics.
Billions of dollars are lost to the action of frauds. Traditional methods of fraud detection are a
little bit time consuming and sophisticated. Data mining provides meaningful patterns and
turning data into information. An ideal fraud detection system should protect the data of all
the users. Supervised methods consist of a collection of sample records, and these records are
classified as fraudulent or non-fraudulent. A model is constructed using this data, and the
technique is made to identify whether the document is fraudulent or not.
Apprehending a criminal is not a big deal, but bringing out the truth from him is a very
challenging task. Law enforcement may use data mining techniques to investigate offenses,
monitor suspected terrorist communications, etc. This technique includes text mining also,
and it seeks meaningful patterns in data, which is usually unstructured text. The information
collected from the previous investigations is compared, and a model for lie detection is
constructed.
The Digitalization of the banking system is supposed to generate an enormous amount of data
with every new transaction. The data mining technique can help bankers by solving business-
related problems in banking and finance by identifying trends, casualties, and correlations in
business information and market costs that are not instantly evident to managers or
executives because the data volume is too large or are produced too rapidly on the screen by
experts. The manager may find these data for better targeting, acquiring, retaining,
segmenting, and maintain a profitable customer.
Although data mining is very powerful, it faces many challenges during its execution.
Various challenges could be related to performance, data, methods, and techniques, etc. The
process of data mining becomes effective when the challenges or problems are correctly
recognized and adequately resolved.
Incomplete and noisy data:
● The process of extracting useful data from large volumes of data is data mining.
● These problems may occur due to data measuring instrument or because of human
errors.
● Suppose a retail chain collects phone numbers of customers who spend more than $
500, and the accounting employees put the information into their system. The person
may make a digit mistake when entering the phone number, which results in incorrect
data.
● Even some customers may not be willing to disclose their phone numbers, which
results in incomplete data. The data could get changed due to human or system error.
All these consequences (noisy and incomplete data) make data mining challenging.
Data Distribution:
environment.
● Practically, It is a quite tough task to make all the data to a centralized data repository
● For example, various regional offices may have their servers to store their data.
● It is not feasible to store, all the data from all the offices on a central server.
Therefore, data mining requires the development of tools and algorithms that allow
the mining of distributed data.
Complex Data:
● Real-world data is heterogeneous, and it could be multimedia data, including audio
and video, images, complex data, spatial data, time series, and so on.
● Managing these various types of data and extracting useful information is a tough
task.
● Most of the time, new technologies, new tools, and methodologies would have to be
Performance:
● The data mining system's performance relies primarily on the efficiency of algorithms
● If the designed algorithm and techniques are not up to the mark, then the efficiency of
Data mining usually leads to serious issues in terms of data security, governance, and
privacy. For example, if a retailer analyses the details of the purchased items, then it reveals
data about buying habits and preferences of the customers without their permission.
Data Visualization:
● In data mining, data visualization is a very important process because it is the primary
● The extracted data should convey the exact meaning of what it intends to express.
● But many times, representing the information to the end-user in a precise and easy
way is difficult.
● The input data and the output information being complicated, very efficient, and