Notes DATA MINING MBA III
Notes DATA MINING MBA III
Data mining is one of the most useful techniques that help entrepreneurs,
researchers, and individuals to extract valuable information from huge sets of
data. Data mining is also called Knowledge Discovery in Database (KDD). The
knowledge discovery process includes Data cleaning, Data integration, Data
selection, Data transformation, Data mining, Pattern evaluation, and Knowledge
presentation.
In other words, we can say that Data Mining is the process of investigating
hidden patterns of information to various perspectives for categorization into
useful data, which is collected and assembled in particular areas such as data
warehouses, efficient analysis, data mining algorithm, helping decision
making and other data requirement to eventually cost-cutting and generating
revenue.
Data mining is the act of automatically searching for large stores of information to
find trends and patterns that go beyond simple analysis procedures. Data mining
utilizes complex mathematical algorithms for data segments and evaluates the
probability of future events. Data Mining is also called Knowledge Discovery of
Data (KDD).
Data Mining is a process used by organizations to extract specific data from huge
databases to solve business problems. It primarily turns raw data into useful
information.
Relational Database:
Data warehouses:
A Data Warehouse is the technology that collects the data from various sources
within the organization to provide meaningful business insights. The huge amount
of data comes from multiple places such as Marketing and Finance. The extracted
data is utilized for analytical purposes and helps in decision- making for a business
organization. The data warehouse is designed for the analysis of data rather than
transaction processing.
Data Repositories:
The Data Repository generally refers to a destination for data storage. However,
many IT professionals utilize the term more clearly to refer to a specific kind of
setup within an IT structure. For example, a group of databases, where an
organization has kept various kinds of information.
Object-Relational Database:
One of the primary objectives of the Object-relational data model is to close the
gap between the Relational database and the object-oriented model practices
frequently utilized in many programming languages, for example, C++, Java, C,
and so on.
These are the following areas where data mining is widely used:
Billions of dollars are lost to the action of frauds. Traditional methods of fraud
detection are a little bit time consuming and sophisticated. Data mining provides
meaningful patterns and turning data into information. An ideal fraud detection
system should protect the data of all the users. Supervised methods consist of a
collection of sample records, and these records are classified as fraudulent or
non-fraudulent. A model is constructed using this data, and the technique is made
to identify whether the document is fraudulent or not.
Apprehending a criminal is not a big deal, but bringing out the truth from him is a
very challenging task. Law enforcement may use data mining techniques to
investigate offenses, monitor suspected terrorist communications, etc. This
technique includes text mining also, and it seeks meaningful patterns in data,
which is usually unstructured text. The information collected from the previous
investigations is compared, and a model for lie detection is constructed.
The process of extracting useful data from large volumes of data is data
mining. The data in the real-world is heterogeneous, incomplete, and noisy. Data
in huge quantities will usually be inaccurate or unreliable. These problems may
occur due to data measuring instrument or because of human errors. The data
could get changed due to human or system error. All these consequences
(noisy and incomplete data) makes data mining challenging.
2. Data Distribution:
3. Complex Data:
4. Performance:
6. Data Visualization:
There are many more challenges in data mining in addition to the problems above-mentioned.
More problems are disclosed as the actual data mining process begins, and the success of data
mining relies on getting rid of all these difficulties