Final Exam For SAS Enterprise Miner
Final Exam For SAS Enterprise Miner
Final Exam For SAS Enterprise Miner
b. 1) Set the model roles for the analysis variables as shown above.
2) The proportion of individuals that purchased organic products was 25% of the same
proportion.
3) The variable DemClusterGroup contains collapsed levels of the variable
DemCluster.Presume that, based on previous experience, you believe that
DemClusterGroup is sufficient for this type of modeling effort. Set the model role for
DemCluster to Rejected.
4) As noted above, only TargetBuy can be sued for this analysis, and should have a role
of Target.
It is possible since you purchased organic food if you spend money on it. We could
utilize TargetAmt to see if someone purchased Organic food, however it is redundant
with TargetBuy. We don't care how much they spent; we just want to know whether they
bought anything so we can figure out who to market to in order to elicit further
purchases. It might be utilized in an indirect manner to achieve the same purpose.
5) Finish the Organics data source definition.
c. Add the AAEM.ORGANICS data source to the Organics diagram workspace.
d. Add a Data Partition node to the diagram and connect it to the Data Source node.
Assign 50% of the data for training and 50% for validation.
e.Add a Decision Tree node to the diagram and connect it to the Data Partition node.
2) The replacement variable of age was used to make the first split.
The values for the first split are greater or equal to 44.5 or less than 44.5.
g.Add a second Decision Tree node to the diagram and connect it to the Data Partition
node.
1) In the Properties panel of the new Decision Tree node, change the maximum
number of branches from a node to 3 to enable three-way splits.
2) Create a decision tree model. Use average square error as the
model assessment statistic.
The squared error for the maximal tree is .1413 meaning that our predictions are
extremely close to the actual values. We produced the first tree with an average
squared error of .1508. This suggests that the projected values are relatively accurate,
but not as precise as the maximum tree.
Only 25% of the sample population we modeled purchased organic food. People that
buy organic food are often younger than 44 years old, have an affluence grade of
greater than 9.5, and are predominantly female. This suggests that if we want to attract
purchasers for our organic goods, we should target moderately affluent ladies under the
age of 44. We would like to sell and market our organic food near younger regions of
the city, and we would like to post advertisements in locations frequented by younger
women, such as shopping centers and internet sites popular with women.
2. Use the assigned data set for the final project in this course (Virtual Exchange
student please pickup any data set you prefer to. Repeat the product again and answer
the questions except instep b. Moreover, define the data set AAEM.assigned dataset’s
name in step b.
Lesson 4. Regression Predictive Model
Predictive Modeling Using Regression
a. Return to the lesson 3 Organics diagram Attach the StaExplore tool to the ORGANICS
data source and run it.
b. In preparation for regression, is any missing values imputation needed? If yes, should
you do this imputation before generating the decision tree models? Why or why not?
Yes. Imputation is necessary in order to prevent the biased model from being used. The
missing values are replaced with the aid of imputation.
c. Add an Impute node to the diagram and connect it to the Data Partition nodes. Set the
node to impute U for Unknown class variable values and the overall mean for unknown
interval variable values. Create imputation indicators for all imputed inputs.
Type U Indicator-Role-Input
d. Add a Regression node to the diagram and connect it to the Impute node.
e. Choose Stepwise as the selection model and Validation Error as the selection criterion.
f. Run the Regression node and view the results. Which variables are included in the final
model? Which variables are important in this model? What is the validation ASE(average
square error)?
Result
h. Disconnect the Impute node from the Data Partition node. -Done
i. Add a Transform Variables node to the diagram and connect it to the Data Partition
node. -Done
j. Connect the Transform Variables node to the Impute node.
m. Rerun the Regression node. Do the selected variables change? How about the
validation ASE?
-Validation ASE changed from 0.1371 to 0.1382
n. Create a full second-degree polynomial model. How does the validation average
squared error for the polynomial model compare to the original regression model?
The additional terms reduce validation ASE slightly
2. Use the assigned data set for final project in this course (Virtual Exchange student please
pick up any data set you prefer to. Repeat the produce again and answer the questions above.
c. Open the exported data from the Model comparison node. Explore the RANK
data set. What is the number of event cases for each model at a selection depth
of 5%
Decision Tree2-469.0044
Decision Tree-288.2516
Polynominal-476
Regression-479.4
Neural-477