The document describes data mining and the steps involved in the knowledge discovery process. It then discusses a proposed architecture for a university course database data mining system, including components like a database, data mining engine, and graphical user interface. Finally, it compares approaches for integrating a data mining system with a database/data warehouse system from no coupling to loose, semitight, and tight coupling, stating that tight coupling is most desirable as it provides an efficient, integrated information processing environment.
Download as DOCX, PDF, TXT or read online on Scribd
100%(1)100% found this document useful (1 vote)
1K views
Questions in Data Mining
The document describes data mining and the steps involved in the knowledge discovery process. It then discusses a proposed architecture for a university course database data mining system, including components like a database, data mining engine, and graphical user interface. Finally, it compares approaches for integrating a data mining system with a database/data warehouse system from no coupling to loose, semitight, and tight coupling, stating that tight coupling is most desirable as it provides an efficient, integrated information processing environment.
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3
Sura rahim
1. What is data mining? Describe the steps involved in data
mining when viewed as a process of knowledge discovery. data mining is the process of discovering interesting knowledge from large amounts of data stored in databases, data warehouses, or other information repositories. The steps involved in data mining when viewed as a process of knowledge discovery arc as follows: 1. Data cleaning (to remove noise and inconsistent data) 2. Data integration (where multiple data sources may be combined) 3. Data selection (where data relevant to the analysis task are retrieved fromthe database) 4. Data transformation (where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance) 5. Data mining (an essential process where intelligent methods are applied in order to extract data patterns) 6. Pattern evaluation (to identify the truly interesting patterns representing knowledge based on some interestingness measures) 7. Knowledge presentation (where visualization and knowledge representation techniques are used to present the mined knowledge to the user)
2. Suppose your task as a software engineer at Big-University is
to design a data mining system to examine their university course database, which contains the following information: the name, address, and status(e.g., undergraduate or graduate) of each student, the courses taken, and their cumulative grade point average (GPA). Describe the architecture you would choose. What is the purpose of each component of this architecture? - A data mining architecture that can be used for this application would consist of the following major components: 1. A database, data warehouse, or other information repository, which consists of the set of databases, data warehouses, spreadsheets, or other kinds of information repositories containing the student and course information. Sura rahim
2. A database or data warehouse server, which fetches the
relevant data based on the users' data mining requests. 3. A knowledge base that contains the domain knowledge used to guide the search or to evaluate the interestingness of resulting patterns. For example, the knowledge base may contain concept hierarchies and metadata (e.g., describing data from multiple heterogeneous sources). 4. A data mining engine, which consists of a set of functional modules for tasks such as classiffication,association, classiffication, cluster analysis, and evolution and deviation analysis. 5. A pattern evaluation module that works in tandem with the data mining modules by employing interestingness measures to help focus the search towards interesting patterns. 6. A graphical user interface that provides the user with an interactive approach to the data mining system.
3. Describe the differences between the following approaches for
the integration of a data mining system with a database or data warehouse system: no coupling, loose coupling, semitight coupling, and tight coupling. State which approach you think is the most popular, and why? A good system architecture will facilitate the data mining system to make best use of the software environment, accomplish data mining tasks in an efficient and timely manner and exchange information with other information systems. A critical question in the design of a data mining (DM) system is how to integrate or couple the DM system with a database (DB) system and/or a data warehouse (DW) system. If a DM system works as a stand-alone system there are no DB or DW systems with which it has to communicate. This simple scheme is called no coupling, when a DM System works in an environment that requires it to communicate with other information system components, such as DB and DW systems, possible integration schemes include as follows:
1-No coupling: No coupling means that a DM system will not
utilize any function of a DB or DW system. It may fetch data Sura rahim
from a particular source (such as a file system), process data
using some data mining algorithms, and then store the mining results in another file. Such a system suffers from several drawbacks. First, a DB system provides flexibility and efficiency at storing, accessing, and processing data .Without using a DB/DW system, a DM system may spend amount of time finding, collecting and transforming data In DB and/or DW systems Second, there are many tested, scalable algorithms implemented in DB and DWsystems, Without any coupling of such systems, a DM system will need to use other tools to extract data, making it difficult to integrate such a system into an information processing environment. Thus, no coupling represents a poor design.
2-Loose coupling: Loose coupling means that a DM system
will use some facilities of a DB or DW system, Loose coupling is better than no coupling because it can fetch any portion of data stored in databases or data warehouses by using query processing, indexing, and other system facilities. it is difficult for loose coupling to achieve high scalability and good performance with large data sets.
3-Semitight coupling: Semitight coupling is a compromise
between loose and tight coupling. Semitight coupling means that besides linking a DM system to a DB/DW system, efficient implementations of a few essential data mining primitives. Moreover, some frequently used intermediate mining results can be precomputed and stored in the DB/DW system. Because these intermediate mining results are either precomputed or can be computed efficiently, this design will enhance the performance of a DM system.
4-Tight coupling: Tight coupling means that a DM system is
smoothly integrated into the DB/DW system. DM, DB, and DW systems will evolve and integrate together as one information system with multiple functionalities. This will provide a uniform information processing environment. This approach is highly desirable because it facilitates efficient implementations of data mining functions, high system performance, and an integrated information processing environment.