0% found this document useful (0 votes)
81 views23 pages

Data Mining With Microsoft SQ L Server 2008

This document describes creating data mining models in SQL Server 2008. It includes steps to create views of an existing database, set up a data mining project, create decision tree and naive bayes models, and view model accuracy charts and predictions.

Uploaded by

Leandro Tg
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views23 pages

Data Mining With Microsoft SQ L Server 2008

This document describes creating data mining models in SQL Server 2008. It includes steps to create views of an existing database, set up a data mining project, create decision tree and naive bayes models, and view model accuracy charts and predictions.

Uploaded by

Leandro Tg
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Microsoft Virtual Labs

Data Mining with Microsoft SQL Server 2008

Deploying System Center Configuration Manager (SCCM) 2007

Table of Contents
Data Mining with Microsoft SQL Server 2008 .......................................................................... 1
Exercise 1 Creating Data Mining Models ......................................................................................................................2 Exercise 2 Viewing Mining Accuracy Charts ............................................................................................................. 10 Exercise 3 Creating a Prediction Query ....................................................................................................................... 13 Exercise 4 Creating a Time Series Model ................................................................................................................... 16

Deploying System Center Configuration Manager (SCCM) 2007

Data Mining with Microsoft SQL Server 2008


Objectives
After completing this lab, you will be better able to: Create Decision Tree and Nave Bayes data-mining models View mining accuracy charts Create a prediction query Create a Time Series model

Scenario

Adventure Works Cycles, a bicycle manufacturing company, uses business analytics to better understand its customer base. The company plans to analyze and improve the performance of its bicycle retail sector. Over time, the company has collected information about past customers and sales. It now wants to use this information to gain insights about its customers. 90 Minutes

Estimated Time to Complete This Lab Computers used in this Lab

SQL2008

The password for the Administrator account on all computers in this lab is: Pa$$w0rd

Page 1 of 20

Deploying System Center Configuration Manager (SCCM) 2007

Exercise 1 Creating Data Mining Models


Scenario
In this exercise, you will develop an Analysis Services solution by using the Microsoft Business Intelligence Development Studio environment. Business Intelligence Development Studio is an environment based on the Microsoft Visual Studio environment. Business Intelligence Development Studio provides an integrated development environment for designing, testing, editing, and deploying projects to an Analysis Services instance. You will create and view a data mining structure with Decision Trees and Nave Bayes data mining models.

Tasks Complete the following tasks on: SQL2008 Create views in the AdventureWorksD W database Create an Analysis Services project

Detailed Steps a. Click Start, and then click Computer. b. Browse to the C:\SQLHOLS\Data Mining\Starter folder. c. Double-click Setup.cmd. d. Wait for the Command Prompt window to close before proceeding to the next procedure.

1.

2.

a. Click Start, point to All Programs, click Microsoft SQL Server 2008, right-click SQL Server Business Intelligence Development Studio, and click Run as administrator. When prompted, click Continue. b. If prompted to choose a default environment setting, choose Business Intelligence Settings then click start Visual Studio c. On the File menu, point to New, and then click Project. d. In the New Project dialog box, in the Project Types pane, click Business Intelligence Projects. e. In the Templates pane, click Analysis Services Project. f. In the Name box, type DM Exercise 1 g. In the Location box, type C:\SQLHOLS\Data Mining\Starter\ h. Clear the Create directory for Solution checkbox, and then click OK. Note: The project is created in a new solution. A solution is the largest unit of management in the Business Intelligence Development Studio environment. Each solution contains one or more projects. An Analysis Services project is a group of related files that contain the XML code for all of the objects in an Analysis Services database. You can view the solution and its projects in the Solution Explorer window on the right-hand side in Business Intelligence Development Studio. If Solution Explorer is not visible, you can view it by selecting the View, Solution Explorer menu item (or the keyboard shortcut CTRL+ALT+L). i. In Solution Explorer, right-click the DM Exercise 1 project, and then click Properties. j. In the DM Exercise 1 Property Pages dialog box, under Configuration Properties, click Deployment. k. In the right pane, in the Deployment Mode drop-down list, click Deploy All, and then click OK.

Page 2 of 20

Deploying System Center Configuration Manager (SCCM) 2007


Tasks Detailed Steps Note: You can configure the build, debugging, and deployment properties of an Analysis Services project. a. In Solution Explorer, under the DM Exercise 1 project, right-click the Data Sources folder, and then click New Data Source. b. In the Data Source Wizard dialog box, on the Welcome to the Data Source Wizard page, click Next. c. On the Select how to define the connection page, ensure that the Create a data source based on an existing or new connection option is selected, and then click New. d. In the Connection Manager dialog box, in the Provider drop-down list at the top of the page, click SqlClient Data Provider in the .Net Providers folder, and then click OK. e. In the Server name box, type (local) f. Under Log on to the server, click Use Windows Authentication. g. In the Select or enter a database name drop-down list, click AdventureWorksDW. h. Click Test Connection, and then click OK to dismiss the message box. i. In the Connection Manager dialog box, click OK. j. In the Data Source Wizard dialog box, on the Select how to define the connection page, verify that (local).AdventureWorksDW is selected, and then click Next. k. On the Impersonation Information page, select Use the service account, and then click Next. l. On the Completing the Data Source Wizard page, leave the default data source name Adventure Works DW unchanged, and then click Finish. m. You have now set up the connection information for the database. You must now define the schema information that you want to use in the solution. You do this by creating a Data Source View. 4. Create a Data Source View a. In Solution Explorer, under the DM Exercise 1 project, right-click the Data Source Views folder, and on the shortcut menu, click New Data Source View. b. In the Data Source View Wizard dialog box, on the Welcome to the Data Source View Wizard page, click Next. c. On the Select a Data Source page, in the Relational data sources pane, verify that Adventure Works DW is selected, and then click Next. Note: At this point, Analysis Services may take a few moments to read the database schema. d. In this project, the Data Source View is not based on a table; instead, it is based on a view. On the Select Tables and Views page, double-click vDMLabCustomerTrain to add this view to the Included objects list, and then click Next. Note: You may need to expand the Name column, or the entire dialog box, to be able to select vDMLabCustomerTrain. e. On the Completing the Wizard page, in the Name box, type Customers and then click Finish. The Data Source View Designer will open. The Data Source View Designer is a graphical representation of the data schema that you have defined. f. Right-click the vDMLabCustomerTrain table, and then click Explore Data. Note: Analysis Services may take a few moments to read the data.

3.

Create a data source

Page 3 of 20

Deploying System Center Configuration Manager (SCCM) 2007


Tasks Detailed Steps g. This opens a new window in which you can view the data for the table. If you want, you can make the tab into a dockable floating window instead. To do this, right-click the window tab, and then click Floating or Dockable. h. In the Explore vDMLabCustomerTrain Table window, click the Chart tab. Note: The values in the charts show the relationship between the number of records sampled for different properties. These values are based on the sampling that was taken when you clicked Explore Data. You can configure if the sampling should come from the first rows returned or from random rows. You can also configure how many rows you want to sample. i. In the Explore vDMLabCustomerTrain Table window, click the Table tab, scroll to view the data, and then close the Explore VDMLabCustomerTrain Table window. 5. Create a Data Mining Structure with a Decision Trees mining model a. In Solution Explorer, under the DM Exercise 1 database, right-click the Mining Structures folder, and then click New Mining Structure. b. In the Data Mining Wizard, on the Welcome to the Data Mining Wizard page, click Next. Note: The Data Mining Wizard is the starting point for all data mining operations. c. On the Select the Definition Method page, select From existing relational database or data warehouse, and then click Next. d. On the Create the Data Mining Structure page, under Create mining structure with a mining model option, verify that Microsoft Decision Trees is selected, and then click Next. e. On the Select Data Source View page, in the Available data source views pane, verify that the Customers data source view is selected, and then click Next. f. On the Specify Table Types page, in the Input tables pane, in the vDMLabCustomerTrain row, verify that the Case check box is selected, and then click Next. g. On the Specify the Training Data page, in the Mining model structure pane, select or deselect each cell by selecting or clearing the check box as shown in Figure below, and then click Next.

Page 4 of 20

Deploying System Center Configuration Manager (SCCM) 2007


Tasks Detailed Steps

h. Because CustomerKey is the primary key of the source table, the Data Mining Wizard has automatically selected it as the key. The key identifies the cases in the mining model. The attributes selected as Input are analyzed to determine their relationship and influence on the a attribute selected as Predictable. Predictable Note: The CustomerKey, FirstName, and LastName columns must not be selected as Input or Predictable columns. i. On the Specify Columns Content and Data Type page, review the Content Type column for all numeric rows, and then click Detect. j. When the detection is complete, notice that the NumberCarsOwned and NumberChildren NumberChildrenAtHome fields have been changed from Continuous to Discrete and then click Next. Discrete, Note: The Content and Data Type page shows the Data Type determined from the source data. When you click Detect, Analysis Services scans numeric fields to , determine if they are continuous or discrete data. After the detection has occurred, the interface provides you with the flexibility to manually edit both the Data Type and provides Content Type fields. k. On the Split data into training and testing sets page, ensure that Percentage of testing data is set to 30%, and then click Next. Note: SQL Server 2008 Analysis Services enables you to partition your input data into Services training and testing sets of data. The training data will be used by the mining model algorithm to determine patterns and relations. A randomly selected portion of the data will be held to test the accuracy of the data mining models created by comparing of model predictions to actual values. All mining models associated with this mining structure will use the training and testing sets defined in this step. l. On the Completing the Wizard page, in the Mining Structure Name box, type Structur Customers select the Allow drill through check box, and then click Finish. The Customers, Mining Structure designer will open. Note: A data mining structure can contain multiple data mining models. Each data : mining model uses a subset of the data referenced by the data mining structure. When referenced the data mining structure is processed, the source data is queried once and then all of

Page 5 of 20

Deploying System Center Configuration Manager (SCCM) 2007


Tasks 6. Modify the mining model Detailed Steps the data mining models are processed in parallel. a. Select the Mining Models tab to view information about the model. b. In the Mining Models grid, right-click the second columns heading, and then click Properties. c. In the Properties window, in the Name property box, type Customers DT to rename the mining model, and then press ENTER. Note: Step c renames the Decision Tree mining model but does not rename the mining model structure. 7. Create a filtered mining model a. In the Mining Models tab, click anywhere in the Customers DT model. b. Click the Create a related mining model button on the toolbar in the Mining Models tab. c. In the New Mining Model dialog box, in the Model name box, type Australia Customers DT and click OK. d. Verify that all of the columns for the Australia Customers DT model are set to Input except for the Bike Buyer and Customer Key columns. Bike Buyer must be set to PredictOnly and Customer Key must be set to Key. e. Right-click the Australia Customers DT model, and then click Set Model Filter. f. On the [Australia Customers DT] Model Filter page, in the first row, in the Mining Structure Column box, select English Country Region Name, verify that the Operator is = , and in the Value column, type Australia. Then click OK. 8. Modify the mining structure a. In the Customers.dmm designer, click the Mining Structure tab. b. In the Mining Structure tree view on the left side of the designer window, rightclick the Customers structure, and then click Add a Column. c. On the Select a Column page, click Age, and then click OK. In the Microsoft Visual Studio warning box that appears, click Yes to continue. d. In the Mining Structure tree view on the left side of the designer window, rightclick the Customers structure, and then click Add a Column. e. On the Select a Column page, click Yearly Income, and then click OK. If the Microsoft Visual Studio warning box appears, click Yes to continue. f. In the Mining Structure tree view on the left side of the designer window, rightclick the Age 1 column, and then click Properties. g. In the Basic section, change the Name to Age Discretized and then press ENTER. In the Microsoft Visual Studio warning box that appears, click Yes to change the name in all related mining model columns. h. In the Data Type section, in the Content box, select Discretized, and then verify that the DiscretizationMethod is set to Automatic. i. In the Mining Structure tree view on the left side of the designer window, rightclick the Yearly Income 1 column, and then click Properties. j. In the Basic section, change the Name to Income Discretized and then press ENTER. In the Microsoft Visual Studio warning box that appears, click Yes to change the name in all related mining model columns. k. In the Data Type section, in the Content box, select Discretized, and then verify that the DiscretizationMethod is set to Automatic. l. In the Mining Structure tree view on the left side of the designer window, rightclick the Number Cars Owned column, and then click Properties. m. On the Properties page, in the Content box, verify that Discrete is selected.

Page 6 of 20

Deploying System Center Configuration Manager (SCCM) 2007


Tasks Detailed Steps n. In the Mining Structure tree view on the left side of the designer window, rightclick the Number Children At Home column, and then click Properties. o. On the Properties page, in the Content box, verify that Discrete is selected. Note: You can also discretize the original Age and Yearly Income columns. This change would be reflected in all models including the Decision Trees model created earlier in the exercise. Discretizing data may cause the model predictions to change. You must carefully review your data and test multiple approaches to determine the most effective approach. 9. Create a Nave Bayes mining model a. In the Customers.dmm designer, click the Mining Models tab. b. Click the Create a related mining model button on the toolbar in the Mining Models tab. c. In the New Mining Model dialog box, in the Model Name box, type Customers NB d. In the New Mining Model dialog box, in the Algorithm Name drop-down list, click Microsoft Naive Bayes, and then click OK. e. When the alert appears to confirm that you want to use the Microsoft Naive Bayes algorithm and that some columns will be ignored, click Yes to approve and dismiss the dialog box. Note: The Nave Bayes algorithm does not use continuous columns. Therefore, the Yearly Income and Age continuous columns will be ignored in this mining model, but they will continue to be utilized in the Decision Tree model. You will use the Age Discretized and Income Discretized columns for this model. f. In the Customers NB column, click in the Age Discretized cell (the content is currently Ignore), and in the cell drop-down list, select Input. g. In the Customers NB column, click in the Income Discretized cell (the content is currently Ignore), and in the cell drop-down list, select Input. 10. Deploy the Analysis Services solution a. On the Build menu, click Deploy DM Exercise 1. b. Observe the deployment progress shown in the Deployment Progress pane (normally on the right side of Business Intelligence Development Studio). The Deployment Progress pane gives you detailed information about what happens during deployment. Note: Analysis Services might take a while to process the data mining models. c. In the previous procedures, various wizards and editors have created XML code based on your input. Deployment sends the XML code to the Analysis Server and then processes the Analysis Services database. 11. View the Decision Trees mining model decision tree a. Click the Mining Model Viewer tab. Note: If an alert appears indicating that changes have been made, click No. b. In the Mining Model drop-down list, click Customers DT. c. On the View menu, click Full Screen (or press SHIFT+LEFT-ALT+ENTER) to display the designer window in full screen view. Repeat this process to return to normal view. Note: If you accidentally close the Mining Model Viewer of the Mining Model Designer, you can re-open it. Select the View, Solution Explorer menu item. In the Solution Explorer window, under the Mining Structures folder, right-click Customers.dmm, and on the shortcut menu, click Browse. d. In the Tree drop-down list, ensure that Bike Buyer is selected. e. If the entire decision tree is not visible, the Navigation pane can be used to select the area of the tree that you want to view. In the lower-right corner of the Mining Page 7 of 20

Deploying System Center Configuration Manager (SCCM) 2007


Tasks Detailed Steps Model Viewer, click and hold on the small + button in the lower-right corner of the Mining Model Viewer. The mouse pointer will change to a cross-arrow icon and the Navigation window will appear. You may drag the mouse to navigate within the Mining Model Viewer. The figure below shows the location of the navigation button (it is highlighted in a circle). You may need to use the scroll bars (highlighted in a rectangle) to see the + button.

Note: The Mining Legend window on the right side of the display may be relocated and resized to improve the display of the decision tree. If you accidentally close the Mining Legend window, click the Refresh icon next to the Viewer box and the Mining Legend window will return. f. On the Show Level slider, drag the pointer to the left so that only one level of the decision tree is displayed. g. Click the All node. This node contains a histogram with blue representing bike buyers and red representing non-bike buyers. h. Information about all customers is displayed in the Mining Legend window. Review the values showing how many customers are bike buyers. Compare the percentage of bike buyers to non-bike buyers. (You may need to widen the Mining Legend window to be able to see the percentages.) i. On the Show Level slider, drag the pointer to the right so that two levels of the decision tree are displayed. Note that Number Cars Owned is most predictive of a customer's bike buying behavior. j. Click each node of level 2. The Mining Legend window displays detailed information for each node. k. In the Background drop-down list, click Yes. l. The shade of each node indicates the concentration of the value in the Background drop-down list. The dark blue color tells you that the greatest number of bike buyers have no cars. m. Click the + to the right of the Number Cars Owned = 0 box. Again, the dark blue coloring shows that most bike buyers are under the age of 43. Expand and contract nodes in the diagram to investigate the predicting factors for each group. 12. View the Decision Trees mining model dependency network a. In the Mining Model Viewer, click the Dependency Network tab. b. The Dependency Network viewer displays the strength of the relationships between the attributes in a decision tree model. c. On the links slider, drag the pointer to the bottom. As the threshold for links becomes higher, dependencies are removed from the chart. Note: The strongest link is shown to be Number Cars Owned as previously shown on the Decision Tree tab. d. In the Dependency Network diagram, click the Bike Buyer node. e. The color of each node indicates that attribute's relationship to the Bike Buyer

Page 8 of 20

Deploying System Center Configuration Manager (SCCM) 2007


Tasks Detailed Steps attribute. f. On the links slider, slowly drag the pointer up to the top. As you drag the pointer upward, the relationships within the data are displayed according to the strength of the dependency. 13. Review the filtered mining model a. In the Mining Model drop-down list, click Australia Customers DT. b. Use the steps documented in the previous two procedures to review the Decision Tree and Dependency Network information for the Australia Customers DT model. a. In the Mining Model drop-down list, click Customers NB to view the Nave Bayes mining model. b. Select the Attribute Profiles tab c. In the Predictable drop-down list, ensure that Bike Buyer is selected. d. If necessary, move the Mining Legend window to view and explore the attribute profiles. The Attribute Profiles tab displays the attributes that impact the state of the predictable value selected. e. Click the bar chart where the Yes column and the Commute row intersect. The Mining Legend window now displays the distribution of commute miles among those cases where the value of Bike Buyer is Yes. f. Click the Attribute Characteristics tab. g. In the Attribute drop-down list, ensure that Bike Buyer is selected, and in the Value drop-down list, click Yes. h. The characteristics of bike buyers, ordered by their frequency, are displayed. i. In the Value drop-down list, click No. Note: The characteristics of non-bike buyers are different from the characteristics of bike buyers. On the Characteristics tab, a high percentage of probability does not necessarily mean a strong correlation. It is the difference in probability between the various possible values of each attribute that indicates the correlation. Marital status has only two options (married or single). For bike buyers, the probability of 50.627 percent for married verses 49.373 percent for single is not a significant factor. However, for non bike buyers, marital status is more significant. 56.940 percent of non-bike buyers were married and 43.060 percent were single. j. Click the Attribute Discrimination tab. k. In the Attribute drop-down list, ensure that Bike Buyer is selected. l. In the Value1 drop-down list, click Yes. m. In the Value 2 drop-down list, click No. The attribute values that impact a customer's bike-buying decision are displayed. The attribute values are ordered by how strongly they favor bike buyers or non-bike buyers. For example, people from Australia are more likely to be bike buyers; however, people who have to commute over 10 miles to work are more likely to be non-bike buyers. n. Select different bars under Favors Yes and Favors No, and then review the values in the Mining Legend. o. Click the Dependency Network tab. p. On the links slider, drag the pointer to the bottom. q. In the Dependency Network diagram, click the Bike Buyer node. The color of each node indicates that attribute's relationship to the Bike Buyer attribute. r. On the links slider, slowly drag the pointer up to the top. As you drag the pointer

14. Review the Nave Bayes mining model

Page 9 of 20

Deploying System Center Configuration Manager (SCCM) 2007


Tasks Detailed Steps upward, the relationships within the data are displayed. s. On the File menu, click Close Solution. If prompted to save changes, click Yes.

Page 10 of 20

Deploying System Center Configuration Manager (SCCM) 2007

Exercise 2 Viewing Mining Accuracy Charts


Scenario
In this exercise, you will validate mining models by using the Mining Accuracy Chart view and a cross-validation report. These validation tools enable you to determine which model will provide you with the most accurate view of your data.

Tasks Complete the following tasks on: SQL Server 2008 HOLs 1. Open and deploy an existing project

Detailed Steps a. Open Business Intelligence Development Studio as Administrator if it is not already open. b. On the File menu, point to Open, and then click Project/Solution. c. In the Open Project dialog box, browse to C:\SQLHOLS\Data Mining\Starter\DM Exercise 2, click DM Exercise 2.sln, and then click Open. d. On the Build menu, click Deploy DM Exercise 2. e. Observe the deployment progress shown in the Deployment Progress pane. The Deployment Progress pane gives you detailed information about what happens during deployment. Note: Analysis Services might take a while to process the data mining models. f. When deployment is complete, you can close the Deployment Progress window if you want.

2.

Define the data and models to test

a. In Solution Explorer, expand the Mining Structures folder, and then double-click Customers.dmm. Note: If Solution Explorer is not visible, click View, and then click Solution Explorer. b. Click the Mining Accuracy Chart tab. c. Verify that the Synchronize Prediction Columns and Values box is selected. Note: You should only clear the Synchronize Prediction Columns and Values box if you know that two mining structure columns derive from the same underlying relational or multidimensional source and the columns contain the same states or have been discretized in the same way. In this scenario, you will enable Analysis Services to synchronize the columns and values because all columns may not be discretized in the same way. d. Verify that the Show check box is selected for both the Customers DT and Customers NB mining models. e. In the Predictable Column Name column, verify that Bike Buyer is selected for both mining models. Note: In the Predictable Column Name drop-down lists, the mining model column names are restricted to columns that have their usage type set to Predict or Predict Only. f. In the Select data set to be used for Accuracy Chart area, select Specify a different data set, and then click the ellipsis (). The Specify Column Mapping page opens. You will use the Column Mapping page to design a Prediction Query that will be run to compare the mining model's predicted values with the validation data set's actual values. g. On the Column Mapping page, in the Select Input Table(s) window, click Select Page 11 of 20

Deploying System Center Configuration Manager (SCCM) 2007


Tasks Detailed Steps Case Table. h. In the Select Table window, verify that Customers is selected in the Data Source drop-down list, click the vDMLabCustomerValidate table, and then click OK. i. Relationships between the mining structure and the input table are automatically created between columns with the same name. Relationships can be added or deleted by the user. j. On the Specify Column Mapping page, click Close. k. In the Predict Value column, in the drop-down list, click Yes for both mining models. 3. View a lift chart a. Click the Lift Chart tab. The prediction query designed on the Column Mapping page will be run. The prediction query will return a prediction for each case in the validation data set. The lift chart enables you to compare the validity of models side-by-side, as well as to the ideal model (which represents the actual values) and the random-guess model. Note: The mining model will have greater confidence in its prediction for some cases than for others. For each case, the prediction query will also return the probability that the prediction is correct. The cases are sorted by the probability that the prediction is correct, and the percentages of correct predictions are then displayed on the lift chart. b. Point to any of the lines on the chart to display a tooltip showing information relevant to the location on the line. Note: The Mining Legend window opens automatically when the Lift Chart tab is selected. Values in the legend change when you click different areas within the lift chart. c. Click the area of the lift chart where the Customers DT model has a Target Population of 90%. d. Notice that at the top of the mining legend, it states that the Population percentage is 70.00%. This means that the Customers DT model needs to select only approximately 71% of the customers to identify 90% of the bike buyers. e. Repeat this process for the Customers NB, Ideal, and Random Guess models. f. The Ideal model needs to select only approximately 45% of the customers to identify 90% of the bike buyers. g. The Customers NB model needs to select approximately 85% of the customers to identify 90% of the bike buyers. h. The Random Guess model needs to select approximately 90% of the customers to identify 90% of the bike buyers. Note: Because the Customers DT mining model needs to select fewer cases to identify a specified percentage of bike buyers, it would be deemed more accurate than the Customers NB model. 4. Perform crossvalidation a. Click the Cross Validation tab. Cross-validation provides another method that you can use to test and view the accuracy of your mining models. With crossvalidation, your data is partitioned into cross-sections that are used to interactively train and test models against each other cross-section. One portion of the data is used to test the data, and the remaining data is used to train the model. b. On the Cross Validation page, set the Fold Count to 2, Max Cases to 500, and Target Attribute to Bike Buyer, and in the Target State box, type Yes and then click Get Results. Note: The Fold Count defines the number of folds or partitions that the data is broken

Page 12 of 20

Deploying System Center Configuration Manager (SCCM) 2007


Tasks Detailed Steps into for testing. The Max Cases value defines the maximum number of cases, or rows in a relational data set, across all folds that can be analyzed and tested. If the Max Cases value is set to 0, the entire dataset defined in the structure will be used. By increasing the Fold Count and Max Cases values, you can see how larger data sets will affect the accuracy of your models. Increasing the Fold Count and Max Cases values will increase the time and resources required to generate the report. c. The Cross-Validation report provides numerous statistical values to provide you with information about the accuracy of the different models that you have chosen. d. The first section of the report is for the Customers DT model. The first test results reported are for the Classification Test (The Test column includes the value Classification). For each fold, the report includes values representing whether the model correctly predicted the Target State in each case. The number of Partition Indexes reported will be equal to the Fold Count value defined for the report. The Partition Size will be the Max Cases defined divided by the Fold Count defined. e. Because we included a Target State of Yes in our definition, the Measure column can be interpreted as follows: When the Test Case value is Yes, and the model prediction is Yes, the Measure is reported as a True Positive. When the Test Case value is Yes, and the model prediction is No, the Measure is reported as a False Negative. When the Test Case value is No, and the model prediction is Yes, the Measure is reported as a False Positive. When the Test Case value is No, and the model prediction is No, the Measure is reported as a True Negative. f. On the File menu, click Close Solution. If prompted to save changes, click Yes. Note: To deliver reliable and meaningful predictions, you should build multiple mining models and try different algorithms, parameters, and variables until you have identified the best model. This may require an iterative process of data preparation in which you add variables, transform values, and compare results by using the validation tools described in this exercise.

Page 13 of 20

Deploying System Center Configuration Manager (SCCM) 2007

Exercise 3 Creating a Prediction Query


Scenario
In this exercise, you will make predictions by using the Mining Model Prediction view of the Mining Structure Designer. This view will enable you to optimize the marketing plan by contacting those potential buyers who are most likely to make a purchase. The Prediction Query Designer enables you to create prediction queries through a graphical view or a SQL query view. The results of the prediction query created in this exercise will show the probability of each potential customer buying a bike. You can also use Reporting Services to provide a more robust report based on the prediction query created in this exercise.

Tasks Complete the following tasks on: SQL Server 2008 HOLs 1. Open and deploy an existing project

Detailed Steps a. Open Business Intelligence Development Studio if it is not already open. b. On the File menu, point to Open, and then click Project/Solution. c. In the Open Project dialog box, browse to C:\SQLHOLS\Data Mining\DM Exercise 3, click DM Exercise 3.sln, and then click Open. Note: The solution used in Exercise 3 is different from the solution created in Exercise 2. d. On the Build menu, click Deploy DM Exercise 3. e. Observe the deployment progress shown in the Deployment Progress window. The Deployment Progress pane gives you detailed information about what happens during deployment. Note: Analysis Services may take a while to process the data-mining models. f. When deployment is complete, you can close the Deployment Progress window if you want.

2.

Select the mining model and input table for a prediction query

a. In Solution Explorer, in the Mining Structures folder, double-click Customers.dmm. Note: If Solution Explorer is not visible, click View, and then click Solution Explorer. b. In the Customers.dmm designer, click the Mining Model Prediction tab. c. In the Mining Model window, click Select Model. d. In the Select Mining Model dialog box, expand Customers, click Customers DT, and then click OK. e. In the Select Input Table(s) window, click Select Case Table. f. In the Select Table window, in the Data Source box, ensure that Customers is selected, click the vDMLabCustomerPredict table, and then click OK. Note: Relationships between the mining structure and the input table are automatically created between columns with the same name. Relationships can be added or deleted by the user.

3.

Build the Decision Tree prediction query

a. Enter the following values into the first row of the table at the bottom of the designer. Columns Source Field Values vDMLabCustomerPredicts CustomerKey

Page 14 of 20

Deploying System Center Configuration Manager (SCCM) 2007


Tasks Detailed Steps Show (Checked)

Note: You can resize the columns of the table by dragging the dividing line between the column headings. b. Enter the following values into the second row of the table. Columns Source Field Show Columns Source Field Show Values vDMLabCustomerPredicts FirstName (Checked) Values vDMLabCustomerPredicts LastName (Checked)

c. Enter the following values into the third row of the table.

Note: Adding the customers name to the report will make the report more readable and useful for users. If the view or table defined in the Data Source View contained additional information such as phone number or e-mail address, these columns can also be added to the report, giving the company a report created in a single step that included all contact information required by the marketing department. d. Enter the following values into the fourth row of the table. Columns Source Field Show Values Customers DT mining model Bike Buyer (Checked)

Note: The value in the Source column will change from Customers DT mining model to Customers DT. e. Enter the following values into the fifth row of the table. Columns Source Field Alias Show Criteria/arguments Values Prediction Function PredictProbability Confidence (Checked) [Customers DT].[Bike Buyer]

f. On the Mining Model menu, click Query to view the Data Mining Extensions to SQL language (DMX) syntax for the query that you defined in the previous steps. DMX is designed to create, train, modify and query data-mining models, providing a simple and familiar language for embedding prediction in applications. 4. Display the Decision Tree query results a. On the Mining Model menu, click Result to view the results of the query that you defined in the previous steps. b. The results of the prediction query are displayed as follows: The CustomerKey column identifies each record from the input table.

Page 15 of 20

Deploying System Center Configuration Manager (SCCM) 2007


Tasks Detailed Steps The FirstName and LastName columns show the name of each customer. The Bike Buyer column contains the mining model's prediction of the customer's bike-buying behavior. The Confidence column contains a value representing how certain the mining model is in the prediction contained in the Bike Buyer column. Larger values in the Confidence column represent a greater certainty that the prediction is correct. c. Adventure Works can use the results of the prediction query to promote their bikes to those individuals most likely to purchase a bike. Adventure Works marketing expenses will be reduced because only potential customers who are likely to be bike buyers need be contacted. This ensures that marketing campaigns can be targeted more efficiently. d. On the File menu, click Close Solution. If prompted to save changes, click Yes.

Page 16 of 20

Deploying System Center Configuration Manager (SCCM) 2007

Exercise 4 Creating a Time Series Model


Scenario
In this exercise, you will create and view a Time Series data mining model. Data mining in SQL Server 2008 supports both the ARTXP algorithm which is developed by Microsoft Research and optimized to predict the next likely value in a series and SQL Server 2008 adds support for the ARIMA algorithm which improves accuracy for long-term predictions.

Tasks Complete the following tasks on: SQL Server 2008 HOLs 1. Create an Analysis Services project

Detailed Steps a. Open Business Intelligence Development Studio as Administrator if it is not already open. b. On the File menu, point to New, and then click Project. c. In the New Project dialog box, in the Project Types pane, click the Business Intelligence Projects folder. d. In the Templates pane, click Analysis Services Project. e. In the Name box, type DM Exercise 4 f. In the Location box, enter C:\SQLHOLS\Data Mining\Starter\ g. Clear the Create directory for Solution checkbox. h. Click OK. i. In Solution Explorer, right-click the DM Exercise 4 project, and then click Properties. j. In the DM Exercise 4 Property Pages dialog box, under Configuration Properties, click Deployment. k. In the right pane, in the Deployment Mode drop-down list, click Deploy All, and then click OK.

2.

Create a data source

a. In the Solution Explorer window, under the DM Exercise 4 project, right-click the Data Sources folder, and on the shortcut menu, click New Data Source. b. In the Data Source Wizard dialog box, on the Welcome to the Data Source Wizard page, click Next. Note: If the Data connections pane already includes (local).AdventureWorksDW, move to step 10. c. On the Select how to define the connection page, ensure the Create a data source based on an existing or new connection option is selected, and then click New. d. In the Connection Manager dialog box, click the SqlClient Data Provider from the .Net Providers folder in the Provider drop-down list at the top of the page and click OK. e. In the Server name box, type (local) f. Under Log on to the server, click Use Windows Authentication. g. In the Select or enter a database name drop-down list, click AdventureWorksDW. h. Click Test Connection, and then click OK to dismiss the message box.

Page 17 of 20

Deploying System Center Configuration Manager (SCCM) 2007


Tasks Detailed Steps i. In the Connection Manager dialog box, click OK. j. In the Data Source Wizard dialog box, on the Select how to define the connection page, verify that (local).AdventureWorksDW is selected, and then click Next. k. On the Impersonation Information page, select Use the service account, and then click Next. l. On the Completing the Data Source Wizard page, leave the default data source name Adventure Works DW unchanged, and then click Finish. 3. Create a Data Source View a. In Solution Explorer, under the DM Exercise 4 project, right-click the Data Source Views folder, and click New Data Source View. b. In the Data Source View Wizard dialog box, on the Welcome to the Data Source View Wizard page, click Next. c. On the Select a Data Source page, in the Relational data sources pane, verify that Adventure Works DW is selected, and then click Next. Note: At this point, Analysis Services might take a few moments to read the database schema. d. In this project, your Data Source View is not based on a table; instead, it is based on a view. On the Select Tables and Views page, double-click vTimeSeries to add this table to the Included objects list, and then click Next. e. On the Completing the Wizard page, in the Name box, type FutureSales and then click Finish. The Data Source View Designer will open. The Data Source View Designer is a graphical representation of the data schema that you have defined. f. Right-click the vTimeSeries table, and then click Explore Data. Note: Analysis Services might take a few moments to read the data. g. This opens a new window in which you can view the data for the table. Review the data in the view and note that the data is stored on a monthly basis. You will use this information in the next procedure to inform the algorithm that our data pattern repeats itself every 12 periods (months). You will rename the Quantity and Amount columns to clarify the meaning of the data when viewing the data mining model. h. Close the Explore vTimeSeries Table window. i. In the FutureSales.dsv design window, in the Tables pane, under vTimeSeries, right-click the Amount column, and then click Properties. j. In the Properties pane, change the FriendlyName to Sales Amount. k. In the FutureSales.dsv design window, in the Tables pane, right-click the Quantity column, and then click Properties. l. In the Properties pane, change the FriendlyName to Units Sold. 4. Create a Data Mining Structure a. In Solution Explorer, under the DM Exercise 4 database, right-click the Mining Structures folder, and then click New Mining Structure. b. In the Data Mining Wizard, on the Welcome to the Data Mining Wizard page, click Next. c. On the Select the Definition Method page, click From existing relational database or data warehouse, and then click Next. d. On the Create the Data Mining Structure page, under the Create mining structure with a mining model option, in the Which data mining technique do you want to use list, click Microsoft Time Series, and then click Next.

Page 18 of 20

Deploying System Center Configuration Manager (SCCM) 2007


Tasks Detailed Steps e. On the Select Data Source View page, in the Available data source views pane, verify that the FutureSales data source view is selected, and then click Next. f. On the Specify Table Types page, in the Input tables pane, in the vTimeSeries row, verify that the Case check box is selected, and then click Next. g. On the Specify the Training Data page, in the Mining model structure pane, select the Key check boxes for the Time Index and Model Region columns, select the Input and Predictable check boxes for the Quantity column, and then click Next. h. On the Specify Columns Content and Data Type page, review the default content and data types, and then click Next. i. On the Completing the Wizard page, change the Mining structure name to SalesForecast, and then click Finish. 5. Modify a Mining Structure and Model a. On the Mining Structure page, right-click Columns under the SalesForecast structure, and then click Add a Column. b. On the Select a Column page, click Sales Amount as the Source column, and then click OK. c. In the SalesForecast pane, under Columns, right click Amount, and then click Properties. d. In the Basic section, change the Name to Sales Amount and then press ENTER. In the Microsoft Visual Studio warning box that appears, click Yes to change the name in all related mining model columns. e. In the SalesForecast pane, under Columns, right click Quantity, and then click Properties. f. In the Basic section, change the Name to Units Sold and then press ENTER. In the Microsoft Visual Studio warning box that appears, click Yes to change the name in all related mining model columns as well. g. In the designer window, click the Mining Models tab. h. Set the Sales Amount column to Predict for the vTimeSeries model. This will set the Sales Amount value to be used as both input and predicted values. i. Right-click the vTimeSeries model, and then click Set Algorithm Parameters. j. On the Algorithm Parameters page, review the parameters and default values. Note: The FORECAST_METHOD parameter defines which algorithm(s) will be used to provide the forecasting. The default value is MIXED. With this setting, Analysis Services uses each algorithm separately to train the data and then combines the results to yield the best prediction possible. The PREDICTION_SMOOTHING parameter can be used to define whether the MIXED FORECAST_METHOD favors one algorithm over the other. If you are looking for short-term predictions, you can specify the ARTXP method. For longer-range predictions, you can specify the ARIMA method. These parameters and others are shown in Figure 3.

Page 19 of 20

Deploying System Center Configuration Manager (SCCM) 2007


Tasks Detailed Steps

k. Set the PERIODICITY_HINT Value to {12} and then click OK. The data in the OK data source is organized into monthly results. The PERIODICITY_HINT Value tells the algorithm that a pattern repeats itself every 12 periods (or months in this case). 6. Deploy the Analysis Services solution a. On the Build menu, click Deploy DM Exercise 4. b. Observe the deployment progress shown in the Deployment Progress pane. The Deployment Progress pane gives you detailed information about what happens during deployment. Note: Analysis Services might take a while to process the data mining models. models a. Click the Mining Model Viewer tab. b. Hide Solution Explorer and any other windows that are blocking the chart information by clicking the Auto Hide icon (the pushpin in the upper upper-right corner). c. The chart shows the quantity and sales amount information for bike sales by region. The values from July 2001 to March 2004 are actual values and are shown as solid lines. The values for April 2004 and beyond are predicted values and are shown as dotted lines. d. Point to any of the lines on the chart. A tooltip will appear and display information relevant to the location on the line. e. Click in the chart where the chart background color changes to a darker shade of grey. The mining legend now shows the values related to that point in time. The . timestamp at the top of the legend tells you the month that you are looking at. For example, Timestamp: 200406 is June 2004. The most recent data represented in example, the mining model input data is for June of 2004. The dashed data lines to the right of the line you clicked represent predicted future Sales Amounts and Units Sold. f. To compare Sales Amount for three different product models in Europe, under the Amount Prediction steps in the drop-down list, select only M200 Europe: Sales steps, Amount, R250 Europe: Sales Amount, and T1000 Europe: Sales Amount. Amount Then click OK. g. Click in the chart where the chart background color changes to a darker shade of background Page 20 of 20

7.

View the Time Series mining model

Deploying System Center Configuration Manager (SCCM) 2007


Tasks Detailed Steps grey. Notice that the mining legend is updated for the new chart view. This chart makes it easy to see that in the European market, the outlook for the M200 model far exceeds the other models currently being sold. h. On the File menu, click Exit. If prompted to save changes, click Yes. i. Close Virtual PC and discard changes.

Page 21 of 21

You might also like