Tutorial7 ClassificationTrees
Tutorial7 ClassificationTrees
TUTORIAL #7
A recruitment agency for PA’s (Personal Assistants to a Chief Executive Office (CEO)) proposes to
develop a selection model to assist with the screening of PA positions in Commerce and Industry. The
following attributes are measured through a set of questions included in the written application form
which an applicant submits together with their CV. A recruitment officer assesses each application and
rates each applicant on these criteria.
An analyst used the data mining tool of Classification Trees to identify an appropriate decision rule to
classify future applicants for the ‘PA to the CEO’ position as suitable or not suitable. Those applicants
that are considered suitable are granted an interview, are placed on the company’s files, and are
recommended to clients who approach them for PA to the CEO applicants.
In the training data set, 200 previous PA applications were reviewed and a rating was assigned to each
of the four selection criteria above. In addition, the client companies who employed each of these 200
applicants were approached and asked to indicate the extent to which they were satisfied with these
placements. This resulted in the response category “suitable” or “unsuitable” (variable with the code
suitable).
The input data for the 4 selection criteria considered appropriate and relevant results from the
Classification Tree module of Statistica is given below.
1
Questions:
1. Construct the classification tree from the relevant data. Annotate your tree diagram clearly
labelling each split with the appropriate criteria and the levels of the split; the number of cases
in each split, and the number and percentage of cases in each node (both intermediate and root
nodes).
2. Interpret the Classification Tree and define an appropriate decision rule for selecting suitable
Personal Assistants from their application forms for further assessment through an interview.
3. What percentage of applicants is correctly identified as suitable by the chosen criteria? Justify.
Use this finding to comment on the reliability of the derived decision rule.
4. What is the likelihood of being considered suitable for a particular applicant who has little
relevant PA experience, but high interpersonal skills? Briefly explain your reasoning.
In a tennis match we are trying to classify the times as play-not play. The following attributes are
observed for 10 days:
Play: yes – no
Outlook: rainy – sunny – overcast
Temperature: cool – hot – mild
Humidity: normal – high
Windy: true – false
Using the CART algorithm, construct a classification tree with a stopping rule of node size = 1.
Questions:
1. Using the CART algorithm, construct a classification tree with a stopping rule of node size = 1.
Important annotations are: criteria for each split, levels of the split, total number of cases in
each node, and number and percentage of cases in each of the groups at each node.
2. Construct the classification matrix. What is the probability that a day chosen at random is likely
to be correctly classified by the decision rules? Justify.
3. Write down the set of decision rules that should be presented.
4. How does CHAID differ from CART in the way it chooses the split variables?