1.data Mining Functionalities
1.data Mining Functionalities
5. Data Cleaning:
Data cleaning is one of the important parts of Data Mining. It plays a significant part in
building a model. However, the success or failure of a project relies on proper data
cleaning.
If we have a well-cleaned dataset, there are chances that we can get achieve good
results with simple algorithms also, which can prove very beneficial at times
especially in terms of computation when the dataset size is large.
Obviously, different types of data will require different types of cleaning. However,
this systematic approach can always serve as a good starting point.
5. Complex Data:
Real-world data is heterogeneous and it could be multimedia data containing
images, audio and video, complex data, temporal data, spatial data, time series,
natural language text etc. It is difficult to handle these various kinds of data and
extract the required information. New tools and methodologies are developing
to extract relevant information.
(i) Complex data types: The database can include complex data elements,
objects with graphical data, spatial data, and temporal data. Mining all these
kinds of data is not practical to be done one device.
(ii) Mining from Varied Sources: The data is gathered from different sources
on Network. The data source may be of different kinds depending on how they
are stored such as structured, semi-structured or unstructured.
6. Performance:
The performance of the data mining system depends on the efficiency of
algorithms and techniques are using. The algorithms and techniques designed
are not up to the mark lead to affect the performance of the data mining
process.
(i) Efficiency and Scalability of the Algorithms: The data mining algorithm
must be efficient and scalable to extract information from huge amounts of
data in the database.
(ii) Improvement of Mining Algorithms: Factors such as the enormous size of
the database, the entire data flow and the difficulty of data mining approaches
inspire the creation of parallel & distributed data mining algorithms.
8.Architecture of Data mining:
Data mining refers to the detection and extraction of new patterns from the already
collected data. Data mining is the amalgamation of the field of statistics and
computer science aiming to discover patterns in incredibly large datasets and then
transforming them into a comprehensible structure for later use.
Architecture of Data Mining:
Basic Working:
1. It all starts when the user puts up certain data mining requests, these
requests are then sent to data mining engines for pattern evaluation.
2. These applications try to find the solution of the query using the already
present database.
3. The metadata then extracted is sent for proper analysis to the data mining
engine which sometimes interacts with pattern evaluation modules to
determine the result.
4. This result is then sent to the front end in an easily understandable manner
using a suitable interface.
A detailed description of parts of data mining architecture is shown:
1. Data Sources:
Database, WWW and data warhouse are parts of data sources. The data in
these sources may be in the form of plain text, spreadsheets or in other
forms of media like photos or videos. WWW is one of the biggest sources of
data.
2. Database Server:
The database server contains the actual data ready to be processed. It
performs the task of handling data retrieval as per the request of the user.
3. Data Mining Engine:
It is one of the core components of the data mining architecture that
performs all kinds of data mining techniques like association, classification,
characterization, clustering, prediction, etc.
4. Pattern Evaluation Modules:
They are responsible for finding interesting patterns in the data and
sometimes they also interact with the database servers for producing the
result of the user requests.
5. Graphic User Interface:
Since the user cannot fully understand the complexity of the data mining
process so graphical user interface helps the user to communicate
effectively with the data mining system.
6. Knowledge Base:
Knowledge Base is an important part of the data mining engine that is quite
beneficial in guiding the search for the result patterns. Data mining engine
may also sometimes get inputs from the knowledge base. This knowledge
base may contain data from user experiences. The objective of the
knowledge base is to make the result more accurate and reliable.