Data mining is the process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques.
It is the analysis of observational datasets to discover unsuspected relationships and to summarize the records in novel techniques that are both understandable and beneficial to the data owner. There are various applications of data mining which are as follows −
Data warehouses and data preprocessing − Data warehouses are essential for information exchange and data mining. In the area of geospatial data, but, no true geospatial data warehouse exists now.
It can be creating such a warehouse requires finding means for resolving geographic and temporal data incompatibilities, such as reconciling semantics, referencing systems, geometry, accuracy, and precision.
For mathematical software in general, methods are needed for integrating information from heterogeneous sources (including data covering different time periods) and for identifying activities. For climate and ecosystem data, for instance (which are spatial and temporal), the problem is that there are too many events in the spatial domain and too few in the temporal domain.
Mining complex data types − Scientific data sets are heterogeneous in nature, generally contain semi-structured and unstructured data, including multimedia data and geo-referenced stream data. Robust methods are needed for handling spatiotemporal data, related concept hierarchies, and complex geographic relationships (e.g., non-Euclidian distances).
Graph-based mining − It is often difficult or impossible to model several physical phenomena and processes due to the limitations of existing modeling approaches. Alternatively, labeled graphs can be used to conquer some of the spatial, topological, geometric, and other relational characteristics present in numerical data sets.
In graph modeling, every data to be mined is described by a vertex in a graph, and edges between vertices describe relationships between objects. For example, graphs can be used to model chemical structures and data generated by numerical simulations, such as fluid-flow simulations.
The success of graph modeling, however, depends on improvements in the scalability and efficiency of many classical data mining tasks, such as classification, frequent pattern mining, and clustering.
Visualization tools and domain-specific knowledge − High-level graphical user interfaces and visualization tools are needed for numerical data mining systems. These must be unified with existing domain-specific data systems and database systems to guide researchers and general users in searching for designs, interpreting and visualizing discovered designs, and utilizing discovered knowledge in their decision making.