0% found this document useful (0 votes)
47 views22 pages

Micro Project

Oo

Uploaded by

frankycoc667
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
47 views22 pages

Micro Project

Oo

Uploaded by

frankycoc667
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 22
What is data mining? It is the extraction of previously unknown, valid, novel and understandable information or patterns from data in repositories or sources: > Databases > Text files > Social networks > Computer simulation The information obtained should be such that is can be used in any organizations/enterprises for business making. Why data mining * Lots of data is being collected within organizations such as banks, on e-commerce based stores and it’s stored/warehoused. The need to explore the data and find possible solutions to known problems may arise. These solutions may be in a form of a pattern based on previous data in this case the knowledge obtained may enhance good decision making in organizations hence why data mining is needed. Applications of Data mining Promotion analysis Image, video, speech Components of data mining * Knowledge Discovery Concrete information gleaned from known data. Data you may not have known but which is supported by recorded facts. * Knowledge prediction Uses known data to forecast future trends, events for example, stock market predictions Steps in data mini 1. Data Integration This involves combining data residing in different sources and providing users with a unified or combined view of these data. 2. Data Selection This is the process of determining the appropriate data type and source as well as suitable instruments to collect data. 3. Data cleaning Data cleaning is the process of detecting and correcting corrupt or inaccurate records from a set, table or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant part of data and replacing, modifying or deleting the dirty data. 4. Data transformation Data transformation converts a set of data values from the data format of a source data system into the data format of a destination data system 5. Data mining Here techniques are applied to extract data or patterns of interest of which decisions will be made. 6. Pattern evaluation In Pattern evaluation patterns are identified and analyzed based on given measures. 7. Knowledge presentation This is the final phase in which the discovered knowledge is visually represented to the user. This phase uses understandable techniques to help users understand and interpret the data mining results. Data mining diagram based on a Knowledge Discovery in databases Advantages of data mining * Marketing or Retailing Data mining helps marketing companies build models based on historical data to predict who will respond to the new marketing campaign etc. through the results markets will have an appropriate approach to selling profitable products to target customers. Appropriate production arrangements can be made based on marketing analysis and in that way customers can buy products frequently. * Banking or Finance Data mining gives financial institutions information about loan information and credit reporting. By building a model from historical customer’s data, the bank, and financial institution can determine good and bad loans. Moreover, data mining helps banks detect fraudulent credit card transactions to protect credit cards owner * Manufacturing Applying data mining in operational engineering data, manufactures can detect faulty equipment and determine optimal control parameters. * Governments Data mining helps government’s agencies by digging and analyzing records of the financial transaction t build patterns that can detect money laundering or criminal activities. Disadvantages of data mining * Privacy issues The use of the internet with social networks, e-commerce, forums, blogs etc. raise a lot of privacy concerns, people are afraid of their personal information is collected and used in an unethical way that potentially causes them trouble. * Security issues Businesses own information about their employees and customers including social security numbers, birthdays, payroll etc. incase hackers access and steal the data of customers so much personal information may lead to an unsafe environment especially if the information obtained involves finances. * Misuse of information Information may be exploited by unethical people or businesses to take advantage of vulnerable people or discriminate against a group of people Data mining techniques are also inaccurate meaning if inaccurate information is used for decision making then it may cause serious consequences. Current research * Super computer data mining The aim of the project is to produce a super computing data mining resource for use by the United Kingdom academic community which utilizes a number of advanced machine learning and statistical algorithms and the ensemble machine approach will be used to exploit the large scale parallelism possible in super-computing. This purpose is embodied in the following objectives : To develop a massively parallel approach for commonly used statistical and machine learning techniques for exploratory data analysis To develop a massively parallel approach to the use of evolutionary computing techniques for feature creation and selection. To develop a massively parallel approach to the use of evolutionary computing techniques for data modeling. To develop a massively parallel approach to the use of ensemble machines for data modeling consisting of many well-known machine learning algorithms * To develop an appropriate super-computing infra-structure to support the use of such advanced machine learning techniques with large datasets. * Medical data mining It is estimated that 150 million people have diabetes worldwide, and that this number may double by 2025. There Is no cure for diabetes, however, the condition can be managed and early treatment can minimize the complications described. A key factor in providing early treatment is to identify those most at risk of complications at an early stage. The data mining group of university of East Anglia has been working on this area for some time ona collaborative project with St. Thomas Hospital London. * St. Thomas Hospital London since 1973 had stored patients information in a computerized clinical records system * In their research they identified factors that were associated with early mortality. Current research and teaching on outcome in people with diabetes identifies cardiac risk factors as being the most likely indicators of early mortality. The data mining study occurred in parallel with the independent analysis of a cohort of 1000 patients with diabetes re- examined after 10 years. This analysis also identified peripheral neuropathy as the most important risk factor for premature death. * Time series data mining electricity usage patterns This is set to take place over the next decade and will result in over 27 million households being equipped with intelligent metering systems that can monitor electricity consumption in 15 minutes intervals and facilitate easy communication of data usage. Future research In future it is highly likely that data mining becomes predictive analysis. data mining applications that will enrich human life in various fields such as business, education, medical field, scientific field, politics include: Data mining in security and privacy preserving. For example, recording of electronic commination like email logs and web logs have captured human process Challenges in mining financial data for example , investors use models of assets prices to gain bigger profits Detecting eco-system disturbances. Distributed data mining. Distributed algorithm is developed for association analysis such as parallel decision tree construction Text mining: an example is the use of opinion or questionnaire mining where the objective is to obtain useful information. Image mining: An example is the classification of retinal image data and magnetic resonance imaging scan data to identify disorders. conclusion Information extracted through data mining is valuable for different organizations in different industries that is, health sector, logistics, marketing, finance, engineering etc. through it businesses become information brokers, we can weed out fraud, bad customers while targeting good business customers, promising markets and cross selling.

You might also like