A data mining system consists of a data mining engine and a repository to store mining artifacts like models. Data is obtained from a database or file system. The data mining process involves deciding learning objectives, preparing data, choosing algorithms, building models, testing and refining models, and reporting or applying results. Common machine learning algorithms include supervised, unsupervised, semi-supervised, reinforcement, and transduction learning. Pattern recognition aims to classify data using prior knowledge or statistics.
A data mining system consists of a data mining engine and a repository to store mining artifacts like models. Data is obtained from a database or file system. The data mining process involves deciding learning objectives, preparing data, choosing algorithms, building models, testing and refining models, and reporting or applying results. Common machine learning algorithms include supervised, unsupervised, semi-supervised, reinforcement, and transduction learning. Pattern recognition aims to classify data using prior knowledge or statistics.
A data mining system consists of a data mining engine and a repository to store mining artifacts like models. Data is obtained from a database or file system. The data mining process involves deciding learning objectives, preparing data, choosing algorithms, building models, testing and refining models, and reporting or applying results. Common machine learning algorithms include supervised, unsupervised, semi-supervised, reinforcement, and transduction learning. Pattern recognition aims to classify data using prior knowledge or statistics.
A data mining system consists of a data mining engine and a repository to store mining artifacts like models. Data is obtained from a database or file system. The data mining process involves deciding learning objectives, preparing data, choosing algorithms, building models, testing and refining models, and reporting or applying results. Common machine learning algorithms include supervised, unsupervised, semi-supervised, reinforcement, and transduction learning. Pattern recognition aims to classify data using prior knowledge or statistics.
Download as PPTX, PDF, TXT or read online from Scribd
Download as pptx, pdf, or txt
You are on page 1of 13
Data Mining System
A typical data-mining system consists
of --a data-mining engine --a repository that persists the data- mining artifacts, such as the models, created in the process. The actual data is obtained via a database connection, or via a file- system API. Building a data-mining model 1. Decide what you want to learn. 2. Select and prepare your data. 3. Choose mining tasks and configure the mining algorithms. 4. Build your data-mining model. 5. Test and refine the models. 6. Report findings or predict future outcomes. Data Mining Process
Figure 2. Data mining
steps. Using data model and results Once you've created a model, you can test that model, and then even apply the model to additional data. Building, testing, and applying the model to additional data is an iterative process that, ideally, yields increasingly accurate models. Those models can then be saved in the MOR, and used to either explain data, or to predict the outcome of new data in relation to your data- mining objective. Data Mining Knowledge-Discovery in Databases (KDD) Searching large volumes of data for patterns. The nontrivial extraction of implicit, previously known, and potentially useful information from data. The science of extracting useful information from large data sets or databases. Uses computational techniques from statistics, machine learning, and pattern recognition. Descriptive Statistics Collect data Classify data Summarize data present data Make inferences to draw a conclusions --Point and interval estimation --Hypothesis testing --Prediction Machine Learning Concerned with the development of techniques which allow computers to "learn". Concerned with the algorithmic complexity of computational implementations. Many inference problems turn out to be NP-hard or harder . Common Machine Learning Algorithm Supervised learningprior knowledge Unsupervised learning statistical regularity of the patterns Semi-supervised learning Reinforcement learning Transduction Learning to learn Pattern Recognition The act of taking in raw data and taking an action based on the category of the data. Aims to classify data patterns based on prior knowledge or on statistical info. Based on availability of training set: supervised and unsupervised leanings Two approaches: statistical (decision theory) and syntactic (structural). Supervised Techniques Classification: -- k-Nearest Neighbors --Nave Bayes --Classification Trees --Descriminant Analysis --Logistic Regression --Neural Nets Supervised Techniques Prediction (Estimation): --Regression --Regression Trees --k-Nearest Neighbors Unsupervised Techniques Cluster Analysis Principle Components Association Rules Collaborative Filtering