Module 6 - CART - Inclassb
Module 6 - CART - Inclassb
Module 6: CART
1
CART – Outcome of US Supreme Court
§ Using Classification and Regression Trees (Interpretable analytics method)
to predict the outcome of cases in the US Supreme Court
2
The American legal system
§ The legal system of the United States operates at the state level and at the
federal level
§ Federal courts deal with cases beyond the scope of state law (disputes
between states, violations of federal laws)
§ Federal courts are divided into:
• District Courts
• Makes initial decision
• Circuit Courts
• Hears appeals from the district courts (may change the decisions district courts made)
• Supreme Court
• Highest level – makes final decision
3
The Supreme Court of the United States
• Consists of nine judges ("justices"), appointed by the President
• Justices are distinguished judges, professors of law, state and federal attorneys
• The Supreme Court of the United States (SCOTUS) decides on most difficult and
controversial cases
• Often involve interpretation of Constitution
• Significant social, political and economic consequences
Nine Supreme Court justices from the time period in 1994 through 2005 (longest
period of time with the same set of justices in over a hundred and eighty years the
4
people appointed)
Famous SCOTUS Decisions
This case recognized the power of the federal government to regulate economic
activity.
Filburn was a farmer who was growing wheat for on-farm consumption however
the US had established limits on wheat production (to stabilize wheat prices and
supplies)
Filburn was exceeding those limits so even though the extra wheat he was
producing was for his own use and he had no intention of selling it he was forced to
destroy it.
5
Famous SCOTUS Decisions
The case began when the public school system in Kansas, refused to enroll local black
resident Oliver Brown's daughter at the elementary school closest to their home.
The Browns filed a class-action lawsuit in U.S. federal court against the Kansas Board
of Education, alleging that its segregation policy was unconstitutional. A special three-
judge court of the U.S. District Court for the District of Kansas rendered a verdict
against the Browns.
The Browns then appealed the ruling directly to the Supreme Court. In May 1954, the
Supreme Court issued a unanimous 9–0 decision in favor of the Browns. The Court
ruled that "separate educational facilities are inherently unequal”. The Court's decision
in Brown paved the way for integration and was a major victory of the civil rights
movement. 6
Famous SCOTUS Decisions
• Miranda v. Arizona, 1966 (5-4 decision)
• Prisoners must be advised of their rights before being questioned by police
On March, 1963, Ernesto Miranda was arrested by the Phoenix Police Department,
based on evidence linking him to the kidnapping and rape of a woman. After two
hours of interrogation by police officers, Miranda signed a confession to the rape
charge on forms that included the typed statement: "I do hereby swear that I make
this statement voluntarily….”
However, at no time was Miranda told of his right to remain silent,. His lawyer
filed Miranda's appeal to the Arizona Supreme Court, claiming that Miranda's
confession was not fully voluntary and should not have been admitted into the
court proceedings. The Arizona Supreme Court affirmed the trial court's decision
to admit the confession. The case further went to the Supreme Court.
On June 13, 1966, the Supreme Court issued a 5–4 decision in Miranda's favor
that overturned his conviction and remanded his case back to Arizona for retrial.
7
Famous SCOTUS Decisions
The Bush campaign immediately asked the U.S. Supreme Court to stay the
decision and halt the recount.
In a 5-4 decision, the Court ruled, strictly on equal protection grounds, that the
recount be stopped. Specifically, the use of different standards of counting in
different counties violated the Equal Protection Clause of the U.S. Constitution
8
Famous SCOTUS Decisions
• Roe v. Wade, 1973 (7-2 decision)
• Legalized abortion
The case was brought "Jane Roe"—who, in 1969, became pregnant with her third
child. Roe wanted an abortion, but she lived in Texas where abortion was illegal.
Her attorneys filed a lawsuit in U.S. federal court alleging that Texas's abortion
laws were unconstitutional. A special three-judge court of the U.S. District Court
heard the case and ruled in her favor. The parties appealed this ruling to the
Supreme Court.
On January 1973, the Supreme Court issued a 7–2 decision protects a pregnant
woman's right to an abortion.
In June 2022, the Supreme Court overruled Roe in Dobbs v. Jackson Women's
Health Organization on the grounds that the substantive right to abortion was not
"deeply rooted in this Nation's history or tradition”.
9
Famous SCOTUS Decisions
• Roe v. Wade, 1973 (7-2 decision)
• Legalized abortion
In June 2022, the Supreme Court overruled Roe in Dobbs v. Jackson Women's
Health Organization on the grounds that the substantive right to abortion was not
"deeply rooted in this Nation's history or tradition”.
When Amy Coney Barrett replaced Ruth Bader Ginsburg in late 2020, the Court's
ideological makeup shifted, creating a 6–3 conservative majority and providing an
opportunity to additionally limit and overturn Roe.
Ginsburg had generally been in the majority of past Supreme Court cases that
enjoined stricter abortion laws. Conversely, Barrett held anti-abortion views.
10
Predicting SCOTUS decisions
• Legal academics and political scientists regularly make predictions of SCOTUS
decisions from detailed studies of cases and individual justices
• Nonprofits voters and anybody interested in long-term planning can benefit from
knowing the outcomes of the Supreme Court cases before they happen
• Together with his colleagues, he decided to test this model against a panel of
experts
11
Predicting SCOTUS decisions
• Martin used a method called Classification and Regression Trees (CART)
• Outcome is binary: predict will the Supreme Court affirmed the case or reject the case
• How much data do you think Andrew Martin should use to build his model?
• Information from all cases with the same set of justices as those he is trying to
predict. Data from cases where the justices were different might just add noise to
our problem.
12
Data
• Cases from 1994 through 2001
13
Variables
• Dependent Variable: Did Justice Stevens vote to reverse the lower court
decision?
• 1 = Yes (reverse)
• 0 = No (affirm)
• Independent Variables??
14
Variables
• Dependent Variable: Did Justice Stevens vote to reverse the lower court
decision?
• 1 = Yes (reverse)
• 0 = No (affirm)
16
Variables
• Dependent Variable: Did Justice Stevens vote to reverse the lower court
decision?
• 1 = Yes (reverse)
• 0 = No (affirm)
17
Variables
• Dependent Variable: Did Justice Stevens vote to reverse the lower court
decision?
• 1 = Yes (reverse)
• 0 = No (affirm)
18
Variables
• Dependent Variable: Did Justice Stevens vote to reverse the lower court
decision?
• 1 = Yes (reverse)
• 0 = No (affirm)
• Independent Variables: Properties of the case
• Circuit court of origin (1st – 11th , DC, FED)
• Issue area of case (e.g., civil rights, federal taxation)
• Type of petitioner, type of respondent (e.g., US, an employer)
• Ideological direction of lower court decision (conservative or liberal)
• Whether petitioner argued that a law/practice was unconstitutional
§ To collect this data, researchers read through all of the cases and coded the
information.
– Some of it, like the circuit court, is straightforward. But other information
required a judgment call, like the ideological direction of the lower court.
19
Logistic regression for Justice Stevens
• Some significant variables and their coefficients:
• Case is from 2nd circuit court: +1.66
• Case is from 4th circuit court: +2.82
• Lower court decision is liberal: -1.22
– What does this mean?
q The case being from the 2nd or 4th circuit courts is predictive of Justice
Stevens reversing the case. The lower court decision being liberal is
predictive of Justice Stevens affirming the case.
q It's difficult to understand which factors are more important due to things
like the scales of the variables.
20
Logistic regression for Justice Stevens
• Some significant variables and their coefficients:
• Case is from 2nd circuit court: +1.66
• Case is from 4th circuit court: +2.82
• Lower court decision is liberal: -1.22
– What does this mean?
21
Classification and Regression Trees (CART)
• Build a tree by splitting on values of the independent variables
• To predict the outcome for an observation, follow the splits in the tree and
at the end, predict the most frequent outcome
• Interpretable
22
Splits in CART
§ CART tries to split this data into subsets so that each subset is as pure or
homogeneous as possible.
§ A standard prediction made by a CART model is just the majority in each subset.
23
Final tree
1. In this tree, and for the trees we’ll
generate in R, a yes response is
always to the left and a no response is
always to the right.
24
Final tree
Quick question: For which data
observations should we predict "Red",
according to this tree?
25
When does CART stop splitting?
•There are different ways to control how many splits are generated
•One way is by setting a lower bound for the number of points in each
subset
26
Predictions from CART
• In each subset of a CART tree, we have a bucket of observations, which
may contain both outcomes (i.e., affirm and reverse, red and gray)
27
CART model
•1 = Yes (reverse)
•0 = No (affirm)
28
ROC curve for CART
Vary the threshold to obtain an ROC curve
Area under ROC curve evaluates the model
29
Quick question:
§ Suppose you have a subset of 20 observations, where 14 have outcome A
and 6 have outcome B. What proportion of observations have outcome A?
30
Random Forest
• Designed to further enhance prediction accuracy of CART
31
Building many trees
1. Each tree can split on only a random subset of the variables
32
Random Forest parameters
•Minimum number of observations in a subset (like minbucket in CART)
•In R, this is controlled by the parameter nodesize
•Smaller value of nodesize, leads to bigger trees, take longer in R.
•Random Forests is much more computationally intensive than CART.
•Number of trees
•In R, this is the parameter ntree
•Should not be too small, because bagging procedure may miss observations
•More trees take longer to build
•In CART, the value of ’minbucket’ can affect the model’s out-of-sample accuracy
•if minbucket is too small, over-fitting might occur
•if minbucket is too large, the model might be too simple
•Use k−1 folds to estimate a model, and test model on remaining one fold
("validation set") for each candidate parameter value
•Use k−1 folds to estimate a model, and test model on remaining one fold
("validation set") for each candidate parameter value
•Like Adjusted
•Measures trade-off between model complexity and accuracy on
the training set
An unusual property of the CART trees that Martin and his colleagues developed
They use predictions for some trees as independent variables for other trees. (The first
split is whether or not Justice Ginsburg predicted decision is liberal.)
•Experts only asked to predict within their area of expertise; more than one
expert to each case
•A model that predicts these decisions is both more accurate and faster
than experts
• CART model based on very high-level details of case beats experts
who can process much more detailed and complex information