Computer >> Computer tutorials >  >> Programming >> Programming

What are the various issues related to data mining?


Data mining is the procedure of finding useful new correlations, patterns, and trends by sharing through a high amount of data saved in repositories, using pattern recognition technologies including statistical and mathematical techniques. It is the analysis of factual datasets to discover unsuspected relationships and to summarize the records in novel methods that are both logical and helpful to the data owner.

There are various issues related to data mining are as follows −

  • Privacy issues − This is the fundamental issue that is not associated with the business of technological, but a social one. It is the issue of single privacy. Data mining creates it applicable to analyze routine business transactions and gather an essential amount of data about single buying habits and preferences.

  • Data integrity issues − A key implementation challenge is merging conflicting or redundant information from multiple sources. For instance, a bank can protect credit cards accounts on different databases. The addresses of an individual cardholder can be different in each. Software should translate data from one system to another and choose the address most currently entered.

  • Relational database structure or multidimensional one − The technical issue is whether it is better to start a relational database structure or a multidimensional one. In a relational structure, data is saved in tables, enabling Adhoc queries. In a multidimensional structure, sets of cubes are linked in arrays, with subsets generated according to category. While multidimensional structures support multidimensional data mining, relational structures that have implemented far better in client/server environment.

  • Cost − The more effective the data mining queries, the greater the utility of the information being collected from the data and the greater the pressure to increase the amount of data being collected and maintained, which improves the pressure for quicker, more powerful data mining queries. This increases pressure for larger, quicker systems, which are more high-priced.

  • Data quality − It is one of the biggest challenges for data mining. Data quality defines the accuracy and integrity of the data. Data quality can also be concerned by the structure and consistency of the information being analyzed. The presence of duplicate data, the absence of data standards, the timeliness of updates, and human error can automatically impact the effectiveness of the more complicated data mining techniques.

  • Interoperability − It defines the ability of a computer system or data to operate with other systems or data using average standards and processes. For data mining,interoperability of databases and software is essential to allow the search and analysis of several databases simultaneously and to provide the compatibility of data mining activities of multiple agencies.