0% found this document useful (0 votes)
54 views10 pages

Data Mining: What Is Data Mining?

Data mining is the process of finding correlations or patterns among fields in large relational databases. Data are any facts, numbers, or text that can be processed by a computer. Organizations are accumulating vast and growing amounts of data in different formats and different databases.

Uploaded by

kpcindrella
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views10 pages

Data Mining: What Is Data Mining?

Data mining is the process of finding correlations or patterns among fields in large relational databases. Data are any facts, numbers, or text that can be processed by a computer. Organizations are accumulating vast and growing amounts of data in different formats and different databases.

Uploaded by

kpcindrella
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Data Mining: What is Data Mining?

Overview
Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.

Continuous Innovation
Although data mining is a relatively new term, the technology is not. Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research reports for years. However, continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy of analysis while driving down the cost.

Example
For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyze local buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer. Further analysis showed that these shoppers typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they purchased the beer to have it available for the upcoming weekend. The grocery chain could use this newly discovered information in various ways to increase revenue. For example, they could move the beer display closer to the diaper display. And, they could make sure beer and diapers were sold at full price on Thursdays.

Data, Information, and Knowledge


Data Data are any facts, numbers, or text that can be processed by a computer. Today, organizations are accumulating vast and growing amounts of data in different formats and different databases. This includes:

operational or transactional data such as, sales, cost, inventory, payroll, and accounting nonoperational data, such as industry sales, forecast data, and macro economic data meta data - data about the data itself, such as logical database design or data dictionary definitions

Information The patterns, associations, or relationships among all this data can provide information. For example, analysis of retail point of sale transaction data can yield information on which products are selling and when. Knowledge Information can be converted into knowledge about historical patterns and future trends. For example, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behaviour. Thus, a manufacturer or retailer could determine which items are most susceptible to promotional efforts. Data Warehouses Dramatic advances in data capture, processing power, data transmission, and storage capabilities are enabling organizations to integrate their various databases into data warehouses. Data warehousing is defined as a process of centralized data management and retrieval. Data warehousing, like data mining, is a relatively new term although the concept itself has been around for years. Data warehousing represents an ideal vision of maintaining a central repository of all organizational data. Centralization of data is needed to maximize user access and analysis. Dramatic technological advances are making this vision a reality for many companies. And, equally dramatic advances in data analysis software are allowing users to access this data freely. The data analysis software is what supports data mining.

What can data mining do?


Data mining is primarily used today by companies with a strong consumer focus - retail, financial, communication, and marketing organizations. It enables these companies to determine relationships among "internal" factors such as price, product positioning, or staff skills, and "external" factors such as economic indicators, competition, and customer demographics. And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it enables them to "drill down" into summary information to view detail transactional data. With data mining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual's purchase history. By mining demographic data from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments. For example, Blockbuster Entertainment mines its video rental history database to recommend rentals to individual customers. American Express can suggest products to its cardholders based on analysis of their monthly expenditures. Wal-Mart is pioneering massive data mining to transform its supplier relationships. Wal-Mart captures point-of-sale transactions from over 2,900 stores in 6 countries and continuously transmits this data to its massive 7.5 terabyte Terawatt data warehouse. Wal-Mart allows more than 3,500 suppliers, to access data on their products and performs data analyses. These suppliers use this data to identify customer buying patterns at the store display level. They

use this information to manage local store inventory and identify new merchandising opportunities. In 1995, Wal-Mart computers processed over 1 million complex data queries. The National Basketball Association (NBA) is exploring a data mining application that can be used in conjunction with image recordings of basketball games. The Advanced Scout software analyzes the movements of players to help coaches orchestrate plays and strategies. For example, an analysis of the play-by-play sheet of the game played between the New York Knicks and the Cleveland Cavaliers on January 6, 1995 reveals that when Mark Price played the Guard position, John Williams attempted four jump shots and made each one! Advanced Scout not only finds this pattern, but explains that it is interesting because it differs considerably from the average shooting percentage of 49.30% for the Cavaliers during that game. By using the NBA universal clock, a coach can automatically bring up the video clips showing each of the jump shots attempted by Williams with Price on the floor, without needing to comb through hours of video footage. Those clips show a very successful pickand-roll play in which Price draws the Knacks defence and then finds Williams for an open jump shot.

How does data mining work?


While large-scale information technology has been evolving separate transaction and analytical systems, data mining provides the link between the two. Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user queries. Several types of analytical software are available: statistical, machine learning, and neural networks. Generally, any of four types of relationships are sought:

Classes: Stored data is used to locate data in predetermined groups. For example, a restaurant chain could mine customer purchase data to determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials. Clusters: Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities. Associations: Data can be mined to identify associations. The beer-diaper example is an example of associative mining. Sequential patterns: Data is mined to anticipate behaviour patterns and trends. For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes.

Data mining consists of five major elements:


Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals.

Analyze the data by application software. Present the data in a useful format, such as a graph or table.

Different levels of analysis are available:

Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure. Genetic algorithms: Optimization techniques that use process such as genetic combination, mutation, and natural selection in a design based on the concepts of natural evolution. Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID). CART and CHAID are decision tree techniques used for classification of a dataset. They provide a set of rules that you can apply to a new (unclassified) dataset to predict which records will have a given outcome. CART segments a dataset by creating 2-way splits while CHAID segments using chi square tests to create multi-way splits. CART typically requires less data preparation than CHAID. Nearest neighbour method: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k 1). Sometimes called the k-nearest neighbour technique. Rule induction: The extraction of useful if-then rules from data based on statistical significance. Data visualization: The visual interpretation of complex relationships in multidimensional data. Graphics tools are used to illustrate data relationships.

What technological infrastructure is required?


Today, data mining applications are available on all size systems for mainframe, client/server, and PC platforms. System prices range from several thousand dollars for the smallest applications up to $1 million a terabyte for the largest. Enterprise-wide applications generally range in size from 10 gigabytes to over 11 terabytes. NCR has the capacity to deliver applications exceeding 100 terabytes. There are two critical technological drivers:

Size of the database: the more data being processed and maintained, the more powerful the system required. Query complexity: the more complex the queries and the greater the number of queries being processed, the more powerful the system required.

Relational database storage and management technology is adequate for many data mining applications less than 50 gigabytes. However, this infrastructure needs to be significantly enhanced to support larger applications. Some vendors have added extensive indexing capabilities to improve query performance. Others use new hardware architectures such as Massively Parallel Processors (MPP) to achieve order-of-magnitude improvements in query time. For example, MPP systems from NCR link hundreds of high-speed Pentium processors to achieve performance levels exceeding those of the largest supercomputers.

The data mining process


Data mining is an iterative process that typically involves the following phases: Problem definition A data mining project starts with the understanding of the business problem. Data mining experts, business experts, and domain experts work closely together to define the project objectives and the requirements from a business perspective. The project objective is then translated into a data mining problem definition. In the problem definition phase, data mining tools are not yet required. Data exploration Domain experts understand the meaning of the metadata. They collect, describe, and explore the data. They also identify quality problems of the data. A frequent exchange with the data mining experts and the business experts from the problem definition phase is vital. In the data exploration phase, traditional data analysis tools, for example, statistics, are used to explore the data. Data preparation Domain experts build the data model for the modeling process. They collect, cleanse, and format the data because some of the mining functions accept data only in a certain format. They also create new derived attributes, for example, an average value. In the data preparation phase, data is tweaked multiple times in no prescribed order. Preparing the data for the modeling tool by selecting tables, records, and attributes, are typical tasks in this phase. The meaning of the data is not changed. Modeling Data mining experts select and apply various mining functions because you can use different mining functions for the same type of data mining problem. Some of the mining functions require specific data types. The data mining experts must assess each model. In the modeling phase, a frequent exchange with the domain experts from the data preparation phase is required. The modeling phase and the evaluation phase are coupled. They can be repeated several times to change parameters until optimal values are achieved. When the final modeling phase is completed, a model of high quality has been built. Evaluation Data mining experts evaluate the model. If the model does not satisfy their expectations, they go back to the modeling phase and rebuild the model by changing its parameters until optimal values

are achieved. When they are finally satisfied with the model, they can extract business explanations and evaluate the following questions: Does the model achieve the business objective? Have all business issues been considered? At the end of the evaluation phase, the data mining experts decide how to use the data mining results. Deployment Data mining experts use the mining results by exporting the results into database tables or into other applications, for example, spreadsheets. The Intelligent Miner products assist you to follow this process. You can apply the functions of the Intelligent Miner products independently, iteratively, or in combination. The following figure shows the phases of the Cross Industry Standard Process for data mining (CRISP DM) process model. Figure 1. The CRISP DM process model

IM Modeling helps you to select the input data, explore the data, transform the data, and mine the data. With IM Visualization you can display the data mining results to analyze and interpret them. With IM Scoring, you can apply the model that you have created with IM Modeling.

E-CRM

Introduction In todays world a company can survive only if they can manage to keep its customers happy. Promising latest and top class success to customers. Building a customer environment and using other means to maintain customer attention have now become the top priorities for any company that wants to make it big in the market. As technology charges more people all once the world have started buying and selling activities over the Internet. As a consequence companies also have to give customer a good in easy online enjoinment. The result is nothing but E-CRM.

Definition E-CRM provides companies with means to conduct interactive, personalized and relevant communication with customer across both electronic and traditional channels. It utilized a complete view of the customer to make decision about messaging, offers, and channel delivery. It synchronizes communication across otherwise disjointed customer facing systems. FAMEWORK OF E-CEM

Plan communication strategy Prints of sale SAF Call centre Direct mail Web E-Mail Tow-way dialogue and channel synchronisation Present Communication Optimize Strategy

Call centre Customer cc warehouse

Deliver recommendation in batch

Key Features of E-CRM 1. Driven a data warehouse:

Data warehouse is a bigger size database that can store huge about of data. The whole concept of e-CRM is based on managing this data and utilizing it properly. 2. Focused on consistent metrics to assess customer actions across channels: It uses complex mathematical models to find more about customer across the different channels that he uses-web, call centre etc. 3. Built accommodate the new market dynamics that place the customer in control: E-CRM is capable of utilizing information about changing market condition and customer expectation to deliver better customer expectation. 4. Most profitable customer: Structured to identify a customers profitability and to determine effective investment allocation decisions accordingly, so that more profitable customer could be identified and retained.

Six Es in E-CRM The business types must address the six Es in E-CRM to optimise the value of relationship between companies and their customer. They are: Electronic: New electronic channels such as the web and personalized e-messaging have become the medium for fast, interactive and economic customer communication, challenging companies to keep pace with this increased velocity. Enterprise: Through E-CRM, a company gains the means to touch and shape a customers expectation across the entire organization, reaching beyond just the bound of marketing to sale, services and corner offices whose occupants need to understand and assess customer behaviour. It relies heavily on the construction and maintenance of data warehouse that provides consolidated, details views of individual customers, cross channel customer behaviour and communication history. Empowerment: E-CRM strategies must be structured to accommodate customers, who have power to decide when and how to communication with the company and through which channel. With the ability to opt in opt out; consumer decides which firms earn the privilege. In light of this new consumer empowerment, an E-CRM solution must be structured to deliver timely, pertinent, valuable information that a consumer accepts in exchange for his or her attention.

Economics: Too many company execute customer communication strategies with little effort or ability to understand the economics of customer relationships and channel delivery choices. Yet, customer economics drives smart asset allocation decisions, directing dollars and efforts at individuals likely to provide the greatest return on customer communication initiative. Evaluation: Understanding customer economics relies on a companys ability to attribute customer behaviour to marketing programs, evaluate customer interactions along various customer touch point channel, and compare anticipated ROI (Rate of Investment) against actual returns through customer analytic reporting. Evaluation of result allows companies to continuously refine and improve efforts to optimize relationships between companies and their customer. Eternal Information: The use of consumer sanctioned external information can be employed to future understand customer needs. This information can be gained from such sources as third party information networks and web page profiler application under the condition that companies adhere consumer opt in rules and privacy concerns.

Similarities between CRM and E-CRM:

Characteristics Objective Levels of Interaction Media Usage

CRM & E-CRM They make the companies closer to the customer. They provide the best interaction between marketing, sale, services and support. The communication medias are phone , web ,emails , fax , mail etc They eliminate and reduce the disconnections between customer and company relationships. They both improve upon reality and perception of personalization.

Focus

Architecture Diagram:

Legacy system External sources

Customer Analytic software

Campaign Management and campaign Optimising Software

Cross channel

Computer Telephone customer Touch points

Customer Database
Metrics Campaign history Responses

Optimize S/W
E-messaging

Fax

Optimising info Internet Response Management

EDI Manager: This useful for systems that use EDI rather than XML. When an incoming EDI transaction is recognized, it translated to a PeopleSoft business document and then processed. Pen Query: This is a tool and it allows third party application to communicates with PeopleSoft. This s a classic representation of well functioning internet architecture. It is mature, ready to work with the third party system and as a whole sophisticated enough for any sized enterprise. Occupied Real Estate: The pure internet application normally rests on the server with browser as zero code clients. The web enabled content application needs downloaded applets and applications to desktop to carry out a specific function.

You might also like