0% found this document useful (0 votes)
57 views7 pages

IT Unit 4

Data mining is the process of extracting useful patterns and knowledge from large amounts of data. It can be performed on different types of data like relational databases, data warehouses, data repositories, object-relational databases, and transactional databases. Some key applications of data mining include healthcare, market basket analysis, education, manufacturing, customer relationship management, fraud detection, and banking. However, there are also challenges to implementing data mining effectively, such as dealing with distributed data sources, complex data types, and ensuring adequate performance of data mining algorithms and systems.

Uploaded by

Shreesti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views7 pages

IT Unit 4

Data mining is the process of extracting useful patterns and knowledge from large amounts of data. It can be performed on different types of data like relational databases, data warehouses, data repositories, object-relational databases, and transactional databases. Some key applications of data mining include healthcare, market basket analysis, education, manufacturing, customer relationship management, fraud detection, and banking. However, there are also challenges to implementing data mining effectively, such as dealing with distributed data sources, complex data types, and ensuring adequate performance of data mining algorithms and systems.

Uploaded by

Shreesti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Unit-4- Knowledge Management Concepts & Business

Intelligence
What is Data Mining?
The process of extracting information to identify patterns, trends, and useful data that would allow the business to take the data-driven decision from huge sets of data is called
Data Mining.

Types of Data Mining


Data mining can be performed on the following types of data:
Relational Database:
A relational database is a collection of multiple data sets formally organized by tables, records, and columns from which data can be accessed in various ways without having to
recognize the database tables. Tables convey and share information, which facilitates data searchability, reporting, and organization.
Data warehouses:
A Data Warehouse is the technology that collects the data from various sources within the organization to provide meaningful business insights. The huge amount of data
comes from multiple places such as Marketing and Finance. The extracted data is utilized for analytical purposes and helps in decision- making for a business organization. The
data warehouse is designed for the analysis of data rather than transaction processing.
Data Repositories:
The Data Repository generally refers to a destination for data storage. However, many IT professionals utilize the term more clearly to refer to a specific kind of setup within an IT
structure. For example, a group of databases, where an organization has kept various kinds of information.
Object-Relational Database:
A combination of an object-oriented database model and relational database model is called an object-relational model. It supports Classes, Objects, Inheritance, etc.
One of the primary objectives of the Object-relational data model is to close the gap between the Relational database and the object-oriented model practices frequently
utilized in many programming languages, for example, C++, Java, C#, and so on.
Transactional Database:
A transactional database refers to a database management system (DBMS) that has the potential to undo a database transaction if it is not performed appropriately. Even
though this was a unique capability a very long while back, today, most of the relational database systems support transactional database activities.
Advantages of Data Mining

o The Data Mining technique enables organizations to obtain knowledge-based data.


o Data mining enables organizations to make lucrative modifications in operation and production.
o Compared with other statistical data applications, data mining is a cost-efficient.
o Data Mining helps the decision-making process of an organization.
o It Facilitates the automated discovery of hidden patterns as well as the prediction of trends and behaviors.
o It can be induced in the new system as well as the existing platforms.
o It is a quick process that makes it easy for new users to analyze enormous amounts of data in a short time.

Disadvantages of Data Mining

o There is a probability that the organizations may sell useful data of customers to other organizations for money. As per the report, American Express has sold credit
card purchases of their customers to other organizations.
o Many data mining analytics software is difficult to operate and needs advance training to work on.
o Different data mining instruments operate in distinct ways due to the different algorithms used in their design. Therefore, the selection of the right data mining
tools is a very challenging task.
o The data mining techniques are not precise, so that it may lead to severe consequences in certain conditions.

Data Mining Applications


Data Mining is primarily used by organizations with intense consumer demands- Retail, Communication, Financial, marketing company, determine price, consumer preferences,
product positioning, and impact on sales, customer satisfaction, and corporate profits. Data mining enables a retailer to use point-of-sale records of customer purchases to
develop products and promotions that help the organization to attract the customer.

These are the following areas where data mining is widely used:
Data Mining in Healthcare:
Data mining in healthcare has excellent potential to improve the health system. It uses data and analytics for better insights and to identify best practices that will enhance
health care services and reduce costs. Analysts use data mining approaches such as Machine learning, Multi-dimensional database, Data visualization, Soft computing, and
statistics. Data Mining can be used to forecast patients in each category. The procedures ensure that the patients get intensive care at the right place and at the right time. Data
mining also enables healthcare insurers to recognize fraud and abuse.
Data Mining in Market Basket Analysis:
Market basket analysis is a modeling method based on a hypothesis. If you buy a specific group of products, then you are more likely to buy another group of products. This
technique may enable the retailer to understand the purchase behavior of a buyer. This data may assist the retailer in understanding the requirements of the buyer and altering
the store's layout accordingly. Using a different analytical comparison of results between various stores, between customers in different demographic groups can be done.
Data mining in Education:
Education data mining is a newly emerging field, concerned with developing techniques that explore knowledge from the data generated from educational Environments. EDM
objectives are recognized as affirming student's future learning behavior, studying the impact of educational support, and promoting learning science. An organization can use
data mining to make precise decisions and also to predict the results of the student. With the results, the institution can concentrate on what to teach and how to teach.
Data Mining in Manufacturing Engineering:
Knowledge is the best asset possessed by a manufacturing company. Data mining tools can be beneficial to find patterns in a complex manufacturing process. Data mining can
be used in system-level designing to obtain the relationships between product architecture, product portfolio, and data needs of the customers. It can also be used to forecast
the product development period, cost, and expectations among the other tasks.
Data Mining in CRM (Customer Relationship Management):
Customer Relationship Management (CRM) is all about obtaining and holding Customers, also enhancing customer loyalty and implementing customer-oriented strategies. To
get a decent relationship with the customer, a business organization needs to collect data and analyze the data. With data mining technologies, the collected data can be used
for analytics.
Data Mining in Fraud detection:
Billions of dollars are lost to the action of frauds. Traditional methods of fraud detection are a little bit time consuming and sophisticated. Data mining provides meaningful
patterns and turning data into information. An ideal fraud detection system should protect the data of all the users. Supervised methods consist of a collection of sample
records, and these records are classified as fraudulent or non-fraudulent. A model is constructed using this data, and the technique is made to identify whether the document is
fraudulent or not.
Data Mining in Lie Detection:
Apprehending a criminal is not a big deal, but bringing out the truth from him is a very challenging task. Law enforcement may use data mining techniques to investigate
offenses, monitor suspected terrorist communications, etc. This technique includes text mining also, and it seeks meaningful patterns in data, which is usually unstructured text.
The information collected from the previous investigations is compared, and a model for lie detection is constructed.
Data Mining Financial Banking:
The Digitalization of the banking system is supposed to generate an enormous amount of data with every new transaction. The data mining technique can help bankers by
solving business-related problems in banking and finance by identifying trends, casualties, and correlations in business information and market costs that are not instantly
evident to managers or executives because the data volume is too large or are produced too rapidly on the screen by experts. The manager may find these data for better
targeting, acquiring, retaining, segmenting, and maintain a profitable customer.
Challenges of Implementation in Data mining
Although data mining is very powerful, it faces many challenges during its execution. Various challenges could be related to performance, data, methods, and techniques, etc.
The process of data mining becomes effective when the challenges or problems are correctly recognized and adequately resolved.

Data Distribution:
Real-worlds data is usually stored on various platforms in a distributed computing environment. It might be in a database, individual systems, or even on the internet. Practically,
It is a quite tough task to make all the data to a centralized data repository mainly due to organizational and technical concerns. For example, various regional offices may have
their servers to store their data. It is not feasible to store, all the data from all the offices on a central server. Therefore, data mining requires the development of tools and
algorithms that allow the mining of distributed data.
Complex Data:
Real-world data is heterogeneous, and it could be multimedia data, including audio and video, images, complex data, spatial data, time series, and so on. Managing these
various types of data and extracting useful information is a tough task. Most of the time, new technologies, new tools, and methodologies would have to be refined to obtain
specific information.
Performance:
The data mining system's performance relies primarily on the efficiency of algorithms and techniques used. If the designed algorithm and techniques are not up to the mark,
then the efficiency of the data mining process will be affected adversely.
Data Privacy and Security:
Data mining usually leads to serious issues in terms of data security, governance, and privacy. For example, if a retailer analyzes the details of the purchased items, then it reveals
data about buying habits and preferences of the customers without their permission.

Data Mining Process

Data Mining Process


Generally, the process can be divided into the following steps:
1.Define the problem: Determine the scope of the business problem and objectives of the data exploration
project.
2.Explore the data: This step includes the exploration and collection of data that will help solve the stated business
problem.
3.Prepare the data: Clean and organize collected data to prepare it for further modeling procedures.
4.Modeling: Create a model using data mining techniques that will help solve the stated problem.
5.Interpretation and evaluation of results: Draw conclusions from the data model and assess its validity.
Translate the results into a business decision.

Data Mining Techniques

The most used techniques in the field include:

1. Detection of anomalies: Identifying unusual values in a dataset.


2. Dependency modelling: Discovering existing relationships within a dataset. This frequently involves
regression analysis.
3. Clustering: Identifying structures (clusters) in unstructured data.
4. Classification: Generalizing the known structure and applying it to the data.

Data Mining Tool


What is Data Mining Tool?
In today’s world, a large amount of data is generated within seconds. To handle this data, we should have some knowledge of different techniques and tools. Data mining tools
are nothing but a set of methodologies used to analyse this large amount of data and other data relationships.

List of Data Mining Tool


Here is the list of few notable data mining tools which are helpful for us to analyse data:
1. Rapid Miner
It is developed by Rapid Miner company; hence the name of this tool is a rapid miner. It is written using java language. The fast miner can be used for predictive analysis,
business application, education and research, commercial applications, etc. It increases the speed of delivery as it follows the template framework. It not only increases the
delivery speed but also reduces errors while transforming. There are three types of rapid miner – Rapid Miner Studio, Rapid Miner Server, and Rapid Miner Radoop.

• Rapid Miner Studio: Workflow design, prototyping, validation, etc., are done in this module.

• Rapid Miner Server: This module is used for operating predictive data models.

• Rapid Miner Radoop: For simplification of predictive analysis, this module executes a process in Hadoop.

2. Orange
It is open-source software written in python language. Orange is the best software for analysing data and machine learning. These components are called widgets. These widgets
are used for reading data, analysing components, allowing users to select the features, and showing the data. With orange, data formatting and moving them with the help of
widgets becomes fast and easy.
3. Weka
The University of Waikato develops weka. It is an open-source software used for predictive modelling and analysis of data. Weka has a GUI interface that provides easy and
interactive access to users. It supports SQL and allows a user to connects to the database, and performs operations by firing query. It stores data in a flat-file format.
4. KNIME
It is an open-source developed by KNIME.com AG used for data analytics. It is built by combining data mining and machine learning components. It has been used for
pharmaceutical research, business intelligence, and financial analysis.
5. Sisense
It is not open-source software; it is licensed software, and we have to purchase the license to use this. Small and large organizations use Sisense to handle the data. As it also
supports widgets like orange, it is easy to move data and creates reports by dragging and dropping. Not even technical people can work with Sisense as its GUI based. With the
help of widgets, Sisense generated words are in the form of bar chart, pie chart, line chart, etc.
6. Apache Mahout
The Apache foundation develops it. Apache Mahout aims to create algorithms for machine learning and focus on regression, clustering classification of data. As it is written in a
well-known language like java and contains java libraries that support mathematics operation, it is used for statistical analysis.
7. SSDT
SSDT is short for SQL Server Data Tools. It is used to expand the database development phases in a visual studio. It is widely used for data analysis and provides solutions to
solve business intelligence problems. SSDT provides a table designer to perform table operations like create a table, adding table data, deleting table data, modifying table
content. It allows a user to connect to the database as it supports SQL.
8. Rattle
The Rattle is an open-source developed using the R language. It provides a GUI interface. The inbuilt log close tab enables Rattle to generate duplicate for every activity.
9. DataMelt
It is also known as DMelt. It is used to analyze and visualize data. It is designed for students, engineers, and scientists. It is platform-independent, which means it can run on any
operating system which contains JVM( Java Virtual Machine). It is used to create 2D or 3D plots, random numbers, mathematical operations, algebra equations.
10. SAS
It is developed for managing a large amount of data. It allows a user to modify the data, store data from different locations into one space. As it provides a GUI interface, a non-
technical person can also use this quickly and handles their data efficiently.

Myths and Blunders in Data Mining


Data mining is a great analitics tool. Data mining helps many manager to see customer behaviour. Result in data mining is to increase revenue, to decrease production cost,
and to discover fraud. Data mining is usually linked into many myths, below are some of them.
• Data mining give instant results, this is not true because data mining is a step by step that need many consideration.

• Not yet ready for business, this is the best to implement in business environment.

• Need a separated database, data mining can use available data base.

• only those with high technology that only can use it.

• only for big company


many people think that data mining have many problems in the industries, but the real problem is that:
• selecting the wrong problem to solve with data mining.

• not obeying what data mining can and cannot do.

• not giving it the right amount of time

• not following the right procedure

• believe anything that data mining gives you.

Data Mining vs Text Mining vs Web Mining:


Comparison Table
Base for
Data Mining Text Mining Web Mining
Comparison

Web mining is a subset of Data Mining


Text mining is the subset of Data
Data mining is the statistical technique of that involves processing the data
Mining that involves processing
Concept processing the raw data into the related to the Web. It can be Web Logs,
unstructured text documents into
structural form. Web Structure data, or Web Contact
a structured format.
data.

Data is mined and then stored in the


Text Data are stored in Text Web Data can be in the form of
data warehouse. The data stored in
Data Documents, emails, and logs and Structure, Content, and usage data and
Databases and spreadsheets are used
Retrieval then processed to gather high- is later converted into useful
to gather information and perform
quality information. information.
analysis.

Web mining mainly deals with three


Types of Data The discovery of knowledge from Text Mining involves data from
types of data, i.e., Web Structure Data,
structured Data is homogeneous and text documents, emails, logs,
Web Content Data, and Web Usage
easy to access. PDFs, etc. Data.

Text Mining is used in the fields Web Mining is used to extract


Data Mining is used in fields like
Application like customer profile analysis, information from the web, analyze
medicine, marketing, healthcare, etc.
bioscience, etc. weblogs, etc.

In Web Mining, the data is structured as


In Data Mining the data is stored in a In Text Mining, the data is stored well as unstructured. The data format
Data Format
structured format in an unstructured format depends upon the type of Mining
method.

In web mining, Application-level


To retrieve the meaningful data from Text mining requires pattern
knowledge, Data engineering, statistics,
Skills Data Mining, one must be aware of Data recognition techniques and
and probability are required to
Required cleansing techniques, machine learning Natural language processing to
successfully retrieve the information
algorithms, statistics, probability enrich the meaning of the text.
from weblogs.

In Text Mining, Computational In web mining, Sequential pattern,


Techniques Statistical techniques are most helpful in
linguistic principles are used to clustering, and associative mining
Used analyzing data.
evaluate the meaning of the text. principles are used.

Business Performance Management

Business performance management is a metric that measures an organisation's overall progression towards
its objectives. When a company uses performance management, it collects and analyses data to evaluate its
business operations. This is a valuable technique that helps the organisation collect quantitative data, such as
the number of sales made in a month or the company's current cash flow. Management teams evaluate the
performance of individual employees and entire departments to make beneficial decisions. Along with
analysing the financial aspects of a business, it also considers employee and customer satisfaction.

Benefits Of Performance Management

Performance management is a beneficial way to evaluate employees and company progress. An organisation
that uses BPM considers crucial data and progress records to analyse its performance. The following are
important aspects of BPM:

Aligns with business goals

Performance management considers how well a company aligns with predetermined objectives. Business
goals serve as motivation and provide a clear objective for all employees to achieve. With performance
management, you can evaluate the rate at which the organisation achieves its milestones and make any
additional changes to help you advance. Performance management allows the management team to create
company-wide business objectives and monitor their progress throughout the year.

Evaluates alternatives

Considering alternatives is important when the business's initial approach produces unexpected results.
BPM's key benefit is that it invites new ideas and encourages innovative thinking among employees. Diverse
viewpoints can lead to a better approach because they consider additional data and help the management
team learn from previous experience.

Improves accountability
When a management team implements performance management, the company holds its employees
accountable. Since managers and supervisors evaluate employees' performance, they consider company goals
more frequently. Employees understand their responsibilities better when held to account by their employers.
When a company uses performance management, its assessment of the company is more transparent.

Sets expectations for staff

A proper structure for managing business performance allows an organisation to set clear expectations for
employees and supervisors. It allows management to create a list of employee expectations based on current
performance. Clear and achievable expectations are likely to produce consistent results.

Improves communication

Communication quality can influence your performance management system's success. BPM promotes a
culture of clear communication, greater team engagement and coherence between personal and company
goals. It encourages companies to engage in one-on-one conversations to provide consistent feedback, foster
skill development, integrate team building and promote collaboration.

Leads to training programmes

An effective performance management system is important for identifying employees' skill gaps and
providing a training system to close them. A training plan boosts employee morale because it demonstrates
the organisation values them. Besides acting as a potent talent magnet, professional development
opportunities also increase employee retention rates. To achieve these objectives, an organisation may create
a training budget and determine how specific skills help to maximise their return on investment.

Components Of A Performance Management

Goal selection

Goal selection is when the business decides on short- and long-term goals. Typically, several
members of the management team think of these goals, which are often realistic and reflect the
business's trajectory. The company can focus on specific goals while postponing others. This
prioritisation allows employees to dedicate time, energy and resources to select objectives.

Information consolidation

Information consolidation involves gathering data about the company. The management team
does this to evaluate and direct decision-making. When a business uses information consolidation,
they aim to provide accurate and reliable information for the team's reference.

Management intervention

Management interventions are steps managers take to enhance the business's operations. They
determine these interventions using the data they collect during information consolidation. Their
actions consider the company's mission and established goals. For example, a supervisor might
check in with an employee weekly rather than bi-weekly. This approach provides an additional
opportunity to ask questions.
IT tool

Cell References in Excel - javatpoint

Nested IF function example - Excel formula | Exceljet

Goal Seek in Excel (Examples) | How to Use Goal Seek in Excel? (educba.com)

You might also like