0% found this document useful (0 votes)
44 views

Module 6 - CART - Inclassb

The document discusses using a Classification and Regression Tree (CART) model to predict outcomes of cases in the US Supreme Court. A group of academics built a CART model using data from cases between 1994-2001, when there were nine consistent justices. The model aimed to predict whether Justice Stevens would vote to reverse or affirm the lower court's decision. Variables in the model included the circuit court of origin where the case was originally heard. The academics hoped to test if the CART model could predict outcomes more accurately than a panel of legal experts.

Uploaded by

Vanessa Wong
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Module 6 - CART - Inclassb

The document discusses using a Classification and Regression Tree (CART) model to predict outcomes of cases in the US Supreme Court. A group of academics built a CART model using data from cases between 1994-2001, when there were nine consistent justices. The model aimed to predict whether Justice Stevens would vote to reverse or affirm the lower court's decision. Variables in the model included the circuit court of origin where the case was originally heard. The academics hoped to test if the CART model could predict outcomes more accurately than a panel of legal experts.

Uploaded by

Vanessa Wong
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

IIMT 2641 Introduction to Business Analytics

Module 6: CART

1
CART – Outcome of US Supreme Court
§ Using Classification and Regression Trees (Interpretable analytics method)
to predict the outcome of cases in the US Supreme Court

§ In 2002, a group of political science and law academics want to test if a


model can do better than a group of experts at predict the decisions of the
Supreme Court

2
The American legal system
§ The legal system of the United States operates at the state level and at the
federal level
§ Federal courts deal with cases beyond the scope of state law (disputes
between states, violations of federal laws)
§ Federal courts are divided into:
• District Courts
• Makes initial decision
• Circuit Courts
• Hears appeals from the district courts (may change the decisions district courts made)
• Supreme Court
• Highest level – makes final decision

3
The Supreme Court of the United States
• Consists of nine judges ("justices"), appointed by the President
• Justices are distinguished judges, professors of law, state and federal attorneys
• The Supreme Court of the United States (SCOTUS) decides on most difficult and
controversial cases
• Often involve interpretation of Constitution
• Significant social, political and economic consequences

Nine Supreme Court justices from the time period in 1994 through 2005 (longest
period of time with the same set of justices in over a hundred and eighty years the
4
people appointed)
Famous SCOTUS Decisions

• Wickard (Secretary of Agriculture) v. Filburn 1942


• Congress allowed to intervene in industrial/economic activity

This case recognized the power of the federal government to regulate economic
activity.

Filburn was a farmer who was growing wheat for on-farm consumption however
the US had established limits on wheat production (to stabilize wheat prices and
supplies)

Filburn was exceeding those limits so even though the extra wheat he was
producing was for his own use and he had no intention of selling it he was forced to
destroy it.

5
Famous SCOTUS Decisions

• Brown v. Board of Education, 1954 (9-0 decision)


• Separating black and white students in public schools is
unconstitutional

The case began when the public school system in Kansas, refused to enroll local black
resident Oliver Brown's daughter at the elementary school closest to their home.

The Browns filed a class-action lawsuit in U.S. federal court against the Kansas Board
of Education, alleging that its segregation policy was unconstitutional. A special three-
judge court of the U.S. District Court for the District of Kansas rendered a verdict
against the Browns.

The Browns then appealed the ruling directly to the Supreme Court. In May 1954, the
Supreme Court issued a unanimous 9–0 decision in favor of the Browns. The Court
ruled that "separate educational facilities are inherently unequal”. The Court's decision
in Brown paved the way for integration and was a major victory of the civil rights
movement. 6
Famous SCOTUS Decisions
• Miranda v. Arizona, 1966 (5-4 decision)
• Prisoners must be advised of their rights before being questioned by police
On March, 1963, Ernesto Miranda was arrested by the Phoenix Police Department,
based on evidence linking him to the kidnapping and rape of a woman. After two
hours of interrogation by police officers, Miranda signed a confession to the rape
charge on forms that included the typed statement: "I do hereby swear that I make
this statement voluntarily….”

However, at no time was Miranda told of his right to remain silent,. His lawyer
filed Miranda's appeal to the Arizona Supreme Court, claiming that Miranda's
confession was not fully voluntary and should not have been admitted into the
court proceedings. The Arizona Supreme Court affirmed the trial court's decision
to admit the confession. The case further went to the Supreme Court.

On June 13, 1966, the Supreme Court issued a 5–4 decision in Miranda's favor
that overturned his conviction and remanded his case back to Arizona for retrial.
7
Famous SCOTUS Decisions

• Bush v. Gore, 2000 (5-4 decision)


• Decided outcome of presidential election!
On November 8, 2000, the Florida Division of Elections reported that Bush
won with 48.8% of the vote in Florida, a margin of victory of 1,784 votes. The
margin of victory was less than 0.5% of the votes cast, so a mandated
automatic machine recount occurred. On November 10, with the machine
recount, Bush's margin of victory had decreased to 327 votes. Gore requested
manual recounts in four Florida counties.

The Bush campaign immediately asked the U.S. Supreme Court to stay the
decision and halt the recount.

In a 5-4 decision, the Court ruled, strictly on equal protection grounds, that the
recount be stopped. Specifically, the use of different standards of counting in
different counties violated the Equal Protection Clause of the U.S. Constitution
8
Famous SCOTUS Decisions
• Roe v. Wade, 1973 (7-2 decision)
• Legalized abortion
The case was brought "Jane Roe"—who, in 1969, became pregnant with her third
child. Roe wanted an abortion, but she lived in Texas where abortion was illegal.

Her attorneys filed a lawsuit in U.S. federal court alleging that Texas's abortion
laws were unconstitutional. A special three-judge court of the U.S. District Court
heard the case and ruled in her favor. The parties appealed this ruling to the
Supreme Court.

On January 1973, the Supreme Court issued a 7–2 decision protects a pregnant
woman's right to an abortion.

In June 2022, the Supreme Court overruled Roe in Dobbs v. Jackson Women's
Health Organization on the grounds that the substantive right to abortion was not
"deeply rooted in this Nation's history or tradition”.
9
Famous SCOTUS Decisions
• Roe v. Wade, 1973 (7-2 decision)
• Legalized abortion

In June 2022, the Supreme Court overruled Roe in Dobbs v. Jackson Women's
Health Organization on the grounds that the substantive right to abortion was not
"deeply rooted in this Nation's history or tradition”.

When Amy Coney Barrett replaced Ruth Bader Ginsburg in late 2020, the Court's
ideological makeup shifted, creating a 6–3 conservative majority and providing an
opportunity to additionally limit and overturn Roe.

Ginsburg had generally been in the majority of past Supreme Court cases that
enjoined stricter abortion laws. Conversely, Barrett held anti-abortion views.

10
Predicting SCOTUS decisions
• Legal academics and political scientists regularly make predictions of SCOTUS
decisions from detailed studies of cases and individual justices
• Nonprofits voters and anybody interested in long-term planning can benefit from
knowing the outcomes of the Supreme Court cases before they happen

• In 2002, Andrew Martin, a professor of political science at Washington University


in St. Louis, decided to instead predict decisions using a statistical model built from
data

• Together with his colleagues, he decided to test this model against a panel of
experts

11
Predicting SCOTUS decisions
• Martin used a method called Classification and Regression Trees (CART)
• Outcome is binary: predict will the Supreme Court affirmed the case or reject the case

• Why not logistic regression?


• Logistic regression models are generally not interpretable
• Model coefficients indicate importance and relative effect of variables, but do not give a
simple explanation of how decision is made

• How much data do you think Andrew Martin should use to build his model?
• Information from all cases with the same set of justices as those he is trying to
predict. Data from cases where the justices were different might just add noise to
our problem.

12
Data
• Cases from 1994 through 2001

• In this period, same nine justices presided SCOTUS


• Breyer, Ginsburg, Kennedy, O’Connor, Rehnquist (Chief Justice),
Scalia, Souter, Stevens, Thomas
• Rare data set: longest period of time with the same set of justices in
over 180 years

• We will focus on predicting Justice Stevens’ decisions


• Started out moderate, but became more liberal
• Self - proclaimmed conservative

13
Variables
• Dependent Variable: Did Justice Stevens vote to reverse the lower court
decision?
• 1 = Yes (reverse)
• 0 = No (affirm)

• Independent Variables??

14
Variables
• Dependent Variable: Did Justice Stevens vote to reverse the lower court
decision?
• 1 = Yes (reverse)
• 0 = No (affirm)

• Independent Variables: Properties of the case


• Circuit court of origin (1st – 11th , DC, FED)
– The circuit court of origin is the circuit or lower court where the case came from. There
are 13 different circuit courts in the United States. The 1st through 11th and Washington,
DC courts are defined by region. And the federal court is defined by the subject matter of
the case.

• Issue area of case (e.g., civil rights, federal taxation)


• Type of petitioner, type of respondent (e.g., US, an employer)
• Ideological direction of lower court decision (conservative or liberal)
• Whether petitioner argued that a law/practice was unconstitutional
15
Variables
• Dependent Variable: Did Justice Stevens vote to reverse the lower court
decision?
• 1 = Yes (reverse)
• 0 = No (affirm)

• Independent Variables: Properties of the case


• Circuit court of origin (1st – 11th , DC, FED)
• Issue area of case (e.g., civil rights, federal taxation)
• Provides a category
• Type of petitioner, type of respondent (e.g., US, an employer)
• Ideological direction of lower court decision (conservative or liberal)
• Whether petitioner argued that a law/practice was unconstitutional

16
Variables
• Dependent Variable: Did Justice Stevens vote to reverse the lower court
decision?
• 1 = Yes (reverse)
• 0 = No (affirm)

• Independent Variables: Properties of the case


• Circuit court of origin (1st – 11th , DC, FED)
• Issue area of case (e.g., civil rights, federal taxation)
• Type of petitioner, type of respondent (e.g., US, an employer)
• The type of petitioner and type of respondent define two parties in the case. Some
examples are the United States, an employer, or an employee.
• Ideological direction of lower court decision (conservative or liberal)
• Whether petitioner argued that a law/practice was unconstitutional

17
Variables
• Dependent Variable: Did Justice Stevens vote to reverse the lower court
decision?
• 1 = Yes (reverse)
• 0 = No (affirm)

• Independent Variables: Properties of the case


• Circuit court of origin (1st – 11th , DC, FED)
• Issue area of case (e.g., civil rights, federal taxation)
• Type of petitioner, type of respondent (e.g., US, an employer)
• Ideological direction of lower court decision (conservative or liberal)
– The ideological direction of the lower court decision describes whether the lower court
made what was considered a liberal or a conservative decision.
• Whether petitioner argued that a law/practice was unconstitutional

18
Variables
• Dependent Variable: Did Justice Stevens vote to reverse the lower court
decision?
• 1 = Yes (reverse)
• 0 = No (affirm)
• Independent Variables: Properties of the case
• Circuit court of origin (1st – 11th , DC, FED)
• Issue area of case (e.g., civil rights, federal taxation)
• Type of petitioner, type of respondent (e.g., US, an employer)
• Ideological direction of lower court decision (conservative or liberal)
• Whether petitioner argued that a law/practice was unconstitutional
§ To collect this data, researchers read through all of the cases and coded the
information.
– Some of it, like the circuit court, is straightforward. But other information
required a judgment call, like the ideological direction of the lower court.

19
Logistic regression for Justice Stevens
• Some significant variables and their coefficients:
• Case is from 2nd circuit court: +1.66
• Case is from 4th circuit court: +2.82
• Lower court decision is liberal: -1.22
– What does this mean?
q The case being from the 2nd or 4th circuit courts is predictive of Justice
Stevens reversing the case. The lower court decision being liberal is
predictive of Justice Stevens affirming the case.
q It's difficult to understand which factors are more important due to things
like the scales of the variables.

20
Logistic regression for Justice Stevens
• Some significant variables and their coefficients:
• Case is from 2nd circuit court: +1.66
• Case is from 4th circuit court: +2.82
• Lower court decision is liberal: -1.22
– What does this mean?

• This is hardly interpretable...


• Difficult to understand which factors are more important
• Difficult to quickly evaluate what prediction is for a new case

21
Classification and Regression Trees (CART)
• Build a tree by splitting on values of the independent variables

• To predict the outcome for an observation, follow the splits in the tree and
at the end, predict the most frequent outcome

• Does not assume a linear model (different from linear/logistic regression)

• Interpretable

22
Splits in CART

§ CART tries to split this data into subsets so that each subset is as pure or
homogeneous as possible.
§ A standard prediction made by a CART model is just the majority in each subset.

23
Final tree
1. In this tree, and for the trees we’ll
generate in R, a yes response is
always to the left and a no response is
always to the right.

2. Make sure you always start at the


top of the tree.
The x less than 85 split only counts
for observations for
which x is greater or equal to 60 and y
is less than 20.

24
Final tree
Quick question: For which data
observations should we predict "Red",
according to this tree?

• If X is less than 60, and Y is any value.

• If X is greater than or equal to 60, and


Y is greater than or equal to 20.

• If X is greater than or equal to 85, and


Y is less than 20.

• If X is greater than or equal to 60 and


less than 85, and Y is less than 20.

25
When does CART stop splitting?
•There are different ways to control how many splits are generated
•One way is by setting a lower bound for the number of points in each
subset

•In R, a parameter that controls this is minbucket


•The smaller it is, the more splits will be generated
•If it is too small, overfitting will occur (perform bad on test data set)
•If it is too large, model will be too simple and accuracy will be poor
•A method for selecting parameter will be introduced

26
Predictions from CART
• In each subset of a CART tree, we have a bucket of observations, which
may contain both outcomes (i.e., affirm and reverse, red and gray)

• Compute the percentage of data in a subset of each type


• E.g., 10 affirm, 2 reverse; 10/(10+2) = 0.867

• Just like in logistic regression, we can use a threshold to obtain a prediction


• Threshold of 0.5 corresponds to picking most frequent outcome
(Taking the majority)

27
CART model

•1 = Yes (reverse)
•0 = No (affirm)

28
ROC curve for CART
Vary the threshold to obtain an ROC curve
Area under ROC curve evaluates the model

29
Quick question:
§ Suppose you have a subset of 20 observations, where 14 have outcome A
and 6 have outcome B. What proportion of observations have outcome A?

§ If we set the threshold to 0.25 when computing predictions of outcome A,


will we predict A or B for these observations?

§ If we set the threshold to 0.5 when computing predictions of outcome A,


will we predict A or B for these observations?

§ If we set the threshold to 0.75 when computing predictions of outcome A,


will we predict A or B for these observations?

Since 70% of these observations have outcome A, we will predict A if the


threshold is below 0.7, and we will predict B if the threshold is above 0.7.

30
Random Forest
• Designed to further enhance prediction accuracy of CART

• Works by building a large number of CART trees

• But makes model less interpretable

• Tradeoff: Interpretability or Accuracy

• To make a prediction for a new observation, each tree "votes" on the


outcome, and we pick the outcome that receives the majority of the votes

31
Building many trees
1. Each tree can split on only a random subset of the variables

2. CART does not include randomness while Random Forests have.

3. Each tree is built from a "bagged"/"bootstrapped" sample of the data


1. Select observations randomly with replacement
2. Example – original data: 1 2 3 4 5
3. New "data” (3 different trees):
1. 23125
2. 31451
3. 44215

32
Random Forest parameters
•Minimum number of observations in a subset (like minbucket in CART)
•In R, this is controlled by the parameter nodesize
•Smaller value of nodesize, leads to bigger trees, take longer in R.
•Random Forests is much more computationally intensive than CART.

•Number of trees
•In R, this is the parameter ntree
•Should not be too small, because bagging procedure may miss observations
•More trees take longer to build

•Default parameter settings are typically okay

33 •Not that sensitive to the parameter values


Parameter selection (CART)

•In CART, the value of ’minbucket’ can affect the model’s out-of-sample accuracy
•if minbucket is too small, over-fitting might occur
•if minbucket is too large, the model might be too simple

•How should we set this parameter?


K-fold cross-validation

•Given training set, split into k pieces ("folds")

•Use k−1 folds to estimate a model, and test model on remaining one fold
("validation set") for each candidate parameter value

•Repeat for each of the k folds


K-fold cross-validation

•Given training set, split into k pieces ("folds")

•Use k−1 folds to estimate a model, and test model on remaining one fold
("validation set") for each candidate parameter value

•Repeat for each of the k folds

•For each candidate parameter value (ex. minbucket), average accuracy


over the k folds, or validation sets
Cross-Validation in R
•Before, we limited our tree using minbucket

•When we use cross-validation in R, we’ll use a parameter called cp


instead
•Complexity Parameter

•Like Adjusted
•Measures trade-off between model complexity and accuracy on
the training set

•Smaller cp leads to a bigger tree (might overfit)


Martin’s Model

•Used 628 previous SCOTUS cases between 1994 and 2001

•Made predictions for the 68 cases that would be decided in October


2002, before the term started

•Two stage approach based on CART:


• First stage: one tree to predict a unanimous liberal decision, other
tree to predict unanimous conservative decision
• If conflicting predictions or both predict no, move to next stage
• 50% of Supreme Court cases result in a unanimous decision

• Second stage consists of predicting decision of each individual


justice, and using majority decision as prediction
Tree for Justice O’Connor
Tree for Justice Souter

An unusual property of the CART trees that Martin and his colleagues developed

They use predictions for some trees as independent variables for other trees. (The first
split is whether or not Justice Ginsburg predicted decision is liberal.)

In summary, liberal – liberal – Affirm (liberal); Conservative – conservative – Affirm


(conservative).
The experts

•Martin and his colleagues recruited 83 legal experts


• 71 academics and 12 attorneys
• 38 previously clerked for a Supreme Court justice, 33 were chaired
professors and 5 were current or former law school deans

•Experts only asked to predict within their area of expertise; more than one
expert to each case

•Allowed to consider any source of information, but not allowed to


communicate with each other regarding predictions
The results

•68 cases in October 2002

•Overall case predictions:


• Model accuracy: 75%
• Experts accuracy: 59%

•Individual justice predictions:


• Model accuracy: 67%
• Experts accuracy: 68%
• For some justices, the model performed better. And for some justices, the experts
performed better.
Individual Justice Predictions
Takeaway messages

•Predicting Supreme Court decisions is very valuable to firms, politicians


and non-governmental organizations

•A model that predicts these decisions is both more accurate and faster
than experts
• CART model based on very high-level details of case beats experts
who can process much more detailed and complex information

You might also like