IS5740 W05 Tutorial Note (Regression)
IS5740 W05 Tutorial Note (Regression)
Budget Input Interval Total budget for Production, Meeting, and Casting Fees
Genre Input Nominal Genre (i.e., Action, Comedy, Drama, and Thriller)
1
IS5740 Mgt. Support & BI Systems
6. Click the “Movie” file import node and click the button of the “Import File” option. Find
your movie.xlsx and import it.
7. Click the right button of your mouse and click the “Edit Variable”.
2
IS5740 Mgt. Support & BI Systems
9. Click on the 'Movie' data node and navigate to the property window. Change the
'Summarize' option to 'Yes'.
10. Let’s conduct data visualization using the GraphExplore node. Don’t forget to set the Size
option to “Max”.
3
IS5740 Mgt. Support & BI Systems
b. Please check the scatter plot matrix of all input variables, and then check the matrix.
11. Select the Data Partition node icon in the Sample tab. Drag the node into the Diagram
Workspace. Connect the Movie data node to the Data Partition node. Click the “Data
Partition” node and put 50% for training and 50% for validation, 0% for test data in the
4
IS5740 Mgt. Support & BI Systems
property window. Make sure that the total, train, and validation datasets are all
balanced.
13. We could do logistic regression using Regression node (the same node as linear
regression). Select the Model tab. Drag a Regression tool into the diagram workspace.
Connect the Data Partition node to the Regression node.
a. Rename the regression node as “Full Regression”.
b. Select the Regression node and examine the Property panel. By default, the
regression type is logistic, so we have to change it to “Linear Regression”. Rename
it to “Full Regression”
5
IS5740 Mgt. Support & BI Systems
16. In the pop-up window, select the “VALIDATE’, and click the “Browse..” button.
17. You can see the columns of ‘Box Office’, and ‘Predicted Box Office’, and ‘Residual Box
Office’
6
IS5740 Mgt. Support & BI Systems
18. Click the ‘Full Regression’ node, and click the right button of your mouse. Select the
‘Results’.
- Score Rankings Matrix — The data were sorted by a target variable in ascending. Y-
axis shows a target variable, and X-axis shows the percentage of used observation.
- Effects Plot — displays a bar graph of the absolute values of the coefficients in the
final model. The bars are color coded to indicate the algebraic signs of the coefficients.
a. Maximize the output window. Check the r-square and model significance. Also, you need
to check which variables have significant effects on a target variable.
7
IS5740 Mgt. Support & BI Systems
b. Restore the Output window to its original size by double-clicking its title bar. Maximize the
Fit Statistics window.
If estimate predictions are the focus, model fit can be assessed by RMSE. There appears to be
some discrepancy between the values of these two statistics in the train and validation data.
8
IS5740 Mgt. Support & BI Systems
b. By clicking the right button of your mouse, open “Edit Variables”. Set the below
variables’ uses to “No”. Check the results!
Lead_Actor_rating, Lead_Actress_rating, and Producer_rating
9
IS5740 Mgt. Support & BI Systems
2. Backward
a. Add the “Regression” node in the Model tab. Rename it to “Backward Regression”.
b. Select Selection Model Backward on the Regression node Properties panel.
c. Connect the “Forward Regression” node to the “Data Partition” node.
d. Run and check the results.
3. Forward
Add the “Regression” node in the Model tab. Rename it to “Forward Regression”. Repeat
the above steps.
10
IS5740 Mgt. Support & BI Systems
3. In the Output window, see the “Fit Statistics Model Selection based on Valid: Average Squared
Error”.
11
IS5740 Mgt. Support & BI Systems
2) Validation dataset
12