Formal Methods Public
Formal Methods Public
in
Philosophy of Language
admires x2
1
Contents
A Constituents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
B Constituency Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
C Constituency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 7
D Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
G Grammatical Categories . . . . . . . . . . . . . . . . . . . . . . . . . 13
I Recursive Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
J X-Bar Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
L Meanings as Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 27
M Sentential Meanings . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
S Typing Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2
Y Three More Complicated Linguistic Examples . . . . . . . . . . . . . 85
3
AWResting On Our Laurels and Reflecting on How the Pieces Fit
Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
BE Modals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
BMS5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
4
A Constituents
The substrings that seem to belong together we call constituents of the sentence.
B Constituency Tests
We can give tests for constituents that provide more precision than just intuitive
“is this a real part” judgments.
Test 1: Coordination
(2) From boyhood up this old boy had put the fear of death and the
hope of victory into me.
(3) #From boyhood up this and adolescence down that old boy had put
the fear of death into me.
5
Substrings that can be combined – coordinated – using “and” pass this con-
stituency test.
Test 2: Topicalization
Sometimes a substring of a sentence can be moved to the front and made the
topic of a sentence (topicalized):
(4) The fear of death, from boyhood up this old boy had put into
me.
(5) #Boyhood up this, from old man had put the fear of death into
me.
(6) From boyhood up this old man had put it into me.
Pronouns work (unsurprisingly) for nouns, but there are similar words for
other parts of speech:
(8) #From it/do/so/thus old man had put the fear of death into me.
Parenthetical insertions can be made in some places and not in others. Compare:
6
(11) From boyhood up this old man put the fear of death (luckily)
into me.
(12) #From boyhood up this (luckily) old man put the fear of death
into me.
According to this test, spots of permissible parenthetical insertion mark ends
of constituents.
Problem 4: What are the permissible locations of parenthetical in-
sertion in:
(13) The quick brown fox jumped over the lazy dog.
Assuming each permissible location marks the end of a constituent,
is this enough to tell us what the constituents of the sentence are? If
so, how? If not, what additional information or assumptions do we
need?
C Constituency Analysis
7
• This student will speak very slowly to that professor.
• Fred knows John knows Joe.
• John rang up his mother.
• He may have been writing a letter.
[[][[[]]]]
And the other has a constituency structure with the form:
[[][[][]]]
Use the various constituency tests to determine which sentence
should be matched with which form.
D Ambiguity
Some sentences are ambiguous, allowing for more than one reading/interpretation.
Consider:
8
(15) a. Among the happy are both (i) very old men, and (ii) very old
women.
b. Among the happy are both (i) very old men, and (ii) women
(of any age).
(Try some constituency tests to see if you agree with this.) In (15i), very old is
a constituent modifying the constituent men and women:
Also, it looks like in reading (15b) we take very old to describe just the men,
while in reading (15a) we take very old to describe the men and the women.
It’s thus tempting to think that the two different constituency analyses explain
the ambiguity of (15), with analysis (15i) corresponding to reading (15a) and
analysis (15ii) corresponding to reading (15b). Constituency structure thus
might help us give a general theory of ambiguity.
9
E Brackets and Trees
the ate
tall giraffe every leaf
Some terminology for tree diagrams:
Sibling: Nodes A and B are siblings if they have the same parent.
Tree diagrams and bracket diagrams contain exactly the same information, and
it is straightforward to translate between the two methods of representation.
Problem 11: For each of the following bracket diagrams, give a cor-
responding tree diagram. For each of the following tree diagrams,
give a corresponding bracket diagram.
10
1. [ [ John ] [ [ gave ] [ [ the ] [ firefighter ] ] [ [
an ] [ axe ] ] ] ]
3.
4.
the man
saw
a dog in
the park
5.
the man
saw
and
a dog a cat
Problem 12: From the trees in the previous problem, give examples
of pairs of nodes that stand in the parent, child, sibling, dominating,
and c-commanding relations.
Problem 13: Almost all of the trees considered above are binary-
branching trees. A tree is binary branching if no parent has more
than two children. Find the one place where a tree is not binary
branching. Is there a way we could plausibly have done the con-
stituency analysis so that the tree would have been binary branch-
ing?
11
F Mathematics of Constituents and Trees
Our discussion of constituents has tacitly assumed two facts about constituents:
Given Nesting, the constituent relation (the relation that holds between A and
B when A is a constituent of B) is a partial order. That means the constituent
relation has two features:
Problem 15: Explain what features nodes and edges in a tree have,
that guarantee that a tree models a constituent relation that is tran-
sitive and asymmetric.
We can then think of a tree as a collection of nodes with (a) some partial ordering
on them, and (b) some node acting as the root node, dominating all of the other
nodes in the tree.
12
• True or false: if any additional edge is added to a tree, a cycle
is formed. (There will be a sequence of edges that allow you
to travel in a circle through part of the graph.) If the claim is
true, give a proof of it. If it is false, give a counterexample to it.
G Grammatical Categories
First Proposal: Take the constituent tree for any grammatical sen-
tence:
the ate
tall giraffe every leaf
Then remove the particular words on the leaves of the tree:
• •
• • • •
Then place any words you want on the now-empty leaves:
almost unless
13
That didn’t work so well. Maybe the problem is that we didn’t respect the
internal constituent structure. Every leaf was a constituent of the original
sentence, and marked as a constituent of the sentence in the tree, but we replaced
it with hybrid penultimately, which is not a constituent.
quickly quite
drink lemonade distinctly ambitious
The basic problem is clear: we are replacing (for example) the adjective tall
in the original sentence with the noun sunset or the verb drink. We need to
restrict ourselves to substitution variants of the constituent tree that keep the
grammatical categories the same.
But to restrict in this way, we need to know what grammatical category each
word belongs to. But defining the various grammatical categories is very
difficult. The canonical source Schoolhouse Rock tells us:
• A noun is a person, place, or thing.
• Adjectives are words you use to really describe things.
• Verb! That’s what’s happening.
None of these “definitions” are very helpful. (Is “fortitude” a thing? Or
“asymptote”? Does “tiger” describe something?)
14
Instead of trying to give substantive definitions of the grammatical categories,
we can give a structural definition:
If we replace giraffe in The tall giraffe ate every leaf with ambition,
we get The tall ambition ate every leaf. Is that grammatical? (We need
to carefully distinguish grammaticality from truth or sensibility.) If not, giraffe
and ambition belong to different grammatical categories. If it is, we have some
evidence that giraffe and ambition belong to the same grammatical category.
We can then hope that all of the familiar English nouns end up in the same
category, which we will then label “noun”. Similarly for all the verbs, all the
adjectives, and so on.
Problem 19: For each of the following pairs of words, give a sentence
that shows, using the substitution test, that the two words do not
belong to the same grammatical category.
• And, but
• Kill, laugh
• All, each
• Think, want
• In, out
• College, university
15
• S (sentence)
• NP (noun phrase)
• VP (verb phrase)
Finally, we have phrase structure rules, which specify how the higher-level
phrase categories can be built out of lower-level phrase categories:
(R1) S → NP VP
(R2) NP → NAME
(R3) NP → DET N
(R4) NP → DET A N
(R5) VP → IV
(R6) VP → TV NP
Here’s a simple application. We start with a whole sentence, or S. Using (R1), S
decomposes into NP and VP. That gives us the beginning of a constituent tree
– or, as we’ll now call it, a phrase structure tree:
NP VP
We can then apply rule (R2) to the NP node. (We could also apply (R3) or (R4)
instead.) That gives us:
NP VP
NAME
Then we can apply rule (R6) to the VP node:
NP VP
NAME TV NP
Then we can apply rule (R4) to the new NP node:
16
S
NP VP
NAME TV NP
D A N
Finally, we can pick words in each bottom-level category from our starting
“dictionary”:
NP VP
NAME
TV NP
Aristotle
rescued
DET A N
Problem 20: How many sentences are generated using rules (R1)
through (R6) above, given the starting “dictionary”? How many
grammatical sentences are there in English? Explain why there’s a
problem revealed by those two answers, and think about how that
problem might be fixed.
Problem 21: Are all of the sentences generated using rules (R1)
through (R6) and our starting “dictionary” in fact grammatical sen-
tences? If not, give a counterexample. Are there any grammatical
sentences built using only the words in the starting “dictionary”
that can’t be produced using rules (R1) through (R6)? If so, give an
example, and suggest an improvement or addition to the rule set
that will allow that sentence also to be generated. Does your change
to the rules allow any new ungrammatical strings to be generated?
17
And the mome raths outgrabe.
I Recursive Rules
English and other natural languages have a finite number of words but an
infinite number of sentences. As we saw in Problem 20 above, that leads to a
difficulty for phrase structure grammars of the sort we’ve been considering –
the formation rules, given a finite starting vocabulary, will produce only a finite
number of sentences.
We can fix this difficulty by using recursive rules. In a recursive rule, one of
the output categories of the rule is the same as the input category. Previously
we allowed adjectival modification using the rule:
(R4) NP → DET A N
This rule allows only a single adjective, but that’s clearly not good enough
(“the tall menacing stranger”). We could add another rule for double adjective
modification:
(R4’) NP → DET A A N
But we’ll end up needing a lot of rules that way. Instead, we use a recursive rule:
(R7) N → A N
Problem 23: Use rule (R7) together with rule (R3) to produce trees
for the tall menacing stranger and the old dilapidated red
barn.
18
Rule (R7) lets us produce an infinite number of adjectivally modified nouns,
because we can re-apply the rule as many times as we want. Every time we
apply (R7) to analyze an N node, we produce another N node, and then we can
apply (R7) to that N node as well.
Problem 25: We can also modify nouns after their occurrence using
prepositional phrases, such as man in the house. Let’s add new
categories of P (preposition) and PP (prepositional phrase) with the
rule:
(R8) PP → P NP
(R9) N → N PP
Show that with these rules, there is more than one phrase structure
tree that can be given for every vicious angry tiger with sharp
claws behind the door. Is it a problem that we can produce more
than one tree?
We can also use recursive rules to handle sentential connectives like and and
or. We introduce a new grammatical category CONN, and add the rule:
(R10) S → S CONN S
Since S is both the input and part of the output of the rule, the rule is recursive,
and can be applied repeatedly to make arbitrarily large sentences.
Problem 26: Use (R10) (together with the other rules) to produce
two trees for the sentence:
Explain how the two trees correspond to two readings of the sen-
tence.
19
Is that a problem? Can we produce:
If not, how might we fix the rules to allow for producing that sen-
tence? Are there other words that plausibly belong in the CONN
category that create similar problems?
Recursive rules thus give us a powerful tool for generating an infinite number
of sentences using a finite number of rules. Linguists often take recursive rule
structure to be one of the central defining features of the human linguistic
capacity.
Problem 28: Another source of the infinite stock of English sen-
tences is propositional attitude verbs:
• It is raining.
• John believes it is raining.
• John believes John believes it is raining.
• John believes John believes John believes it is raining.
• ...
(S1) X → Y Z
(S2) Y → Z X
20
J X-Bar Syntax
As we try to build a phrase structure grammar for a larger and larger fragment of
the language, the collection of rules threatens to become gigantic, idiosyncratic,
and ad hoc.
(18) Motty, the son, was about twenty-three, tall and thin
and meek-looking.
Linguists have thus been interesting in trying to find a general format for phrase
structure rules that brings out a simpler underlying picture of how the language
works. One popular approach is X-Bar Theory.
In X-bar theory, we begin with the idea of the head of a phrase. In, for ex-
ample, a noun phrase (NP) like the tiger behind the door, the noun tiger
is the head. The thought then is that all heads produce surrounding phrases
in the same way. For each head, we have the possibility of specifiers and
complements. Roughly:
XP
SPEC X
X COMP
X COMP
X
Here “X” marks the category of the head, so “X” is “N” if we are producing a
noun phrase, “V” if we are producing a noun phrase, and so on. Notice that
21
we’ve introduced a new kind of node labeled X. We use this as an intermediate
step for producing the recursive introduction of COMPs. Sometimes “XP” is
instead written X, or “X double bar”.
We thus have the following rules, which can be used with any head:
(X1) X → SPEC X
(X2) X → X COMP
(X3) X → X
Problem 31: Use the basic X-bar framework to give trees for the
following phrases:
1. The tiger behind the door.
[Hints: let “tiger” head the main phrase. COMP is then a
prepositional phrase headed by behind. That prepositional
phrase then contains another noun phrase. Note that in some
cases either SPEC or COMP might need to be “empty”.]
2. Slowly run around the building.
3. Almost underneath the large threatening bulldozer.
Problem 33: Our rules always put SPEC to the left of the head and
COMP to the right of the head. Does that produce the right analysis?
If not, can we modify the rules to allow for SPEC and COMP to
appear in different positions? Are there interesting constraints on
when SPEC and COMP appear in which order?
Some linguists then propose an analysis in which there is a head type I, for
“inflection”, which carries tense and modal information about the sentence.
I then has the subject of the sentence as its SPEC and the verb phrase of the
sentence for its COMP. Thus we get examples like:
22
IP
SPEC = NP I
SPEC=D N
I COMP=VP
the N
may
tiger SPEC=AdvP V
Adv
V COMP=NP
Adv
V SPEC=D N
ruthlessly
attack the N
students
Notice that here “IP” plays the role that “S” previously played.
Problem 34: Give trees for the following sentences, assuming (i) in
each case the full sentence is an IP phrase, (ii) in various cases, vari-
ous of SPEC and COMP might be empty, and (iii) every basic lexical
item is in some grammatical category to which the X-bar structure
can be applied.
23
1. The student with long hair of ancient languages // the
student of ancient languages with long hair.
2. I like this student with long hair better than that one
with short hair. // I like this student of ancient languages
better than that one of physics.
3. I’d prefer a student with long hair to one with short
hair // I’d prefer a student of ancient languages to one
of physics.
Problem 36: Above we treated D (determiner) as the SPEC of N in
an NP. However, some linguists prefer to treat D as the head of a
phrase like the tall man, with NP then serving as the COMP of D.
Give a tree for the tall man that treats it in this way. If this is the
right analysis, we need two questions answered:
IP
SPEC = NP I
SPEC=D N I COMP=VP
the N PAST V
student
V COMP=NP
V SPEC=D N
read which N
book
24
But this tree then makes (19b) puzzling. read should take a COMP (the book),
but in (19b) read is the end of the sentence, and doesn’t appear to take a COMP.
The student is the SPEC of I and should thus be at the front of the sentence,
but in (19b) it’s buried within the sentence. And a mysterious new verb did
has appeared. How can we make sense of all this?
Linguists have taken constructions like (19b) as evidence that the full grammar
of English, and other natural languages, includes both a phrase structure tree
component, building initial trees, and a movement component, re-arranging
parts of the trees after they are built. In (19b), we want some kind of move-
ment rule that moves “the book” from the end to the front of the sentence (and
changes it to “which book”).
Pre-movement:
IP
SPEC = NP I
SPEC=D N I COMP=VP
the N PAST V
student
V COMP=NP
V SPEC=D N
read which N
book
Post-movement:
25
IP
SPEC1=NP IP
SPEC=D N
which N SPEC2=IP IP
book I
SPEC = NP I
PAST
SPEC=D N I COMP=VP
the N t2 V
student COMP=NP
V
V t1
read
The movement rule thus allows (i) a new IP node to be added to the top of the
tree, and (ii) a component of the tree to be moved to the top of the tree and
inserted as the SPEC of the new IP node, and (iii) a trace to be left behind in
place of the moved item. A trace is a covert silent element of the sentence.
Problem 37: Give pre- and post-movement trees for the following
two sentences:
(Make sure you see how (20b) is grammatical and what it means.)
Now note that in (20a), want to can be contracted to wanna, but in
(20b) want to cannot be contracted to wanna. Can we explain the
difference in availability of contraction by considering the differ-
ences in trees and movement?
26
(21) can be given a universal-existential reading, on which each book
is read by some student, but not necessarily the same student for
different books, and also an existential-universal reading, on which
one single student read every book.
L Meanings as Functions
Now consider ~the father of Annette. Here are two tempting theoretical
starting points:
1. Compositionality: The meanings of complex expressions are determined
(in some way yet to be determined) by the meanings of their component
parts.
2. Names Name The meaning of a name is just the object to which the name
refers. Thus:
• ~Annette = Annette
(To be clear, this says that the meaning of the word is the person, not that
the meaning of the word is the word.)
We know that Annette’s father is Christopher, so it’s tempting to think that we
should have ~the father of Annette = Christopher. The question then is
what we should use as ~the father of in order to get a compositional story.
(Ultimately we’d like to break things down further and have separate accounts
of ~the, ~father, and ~of from which we derive ~the father of, but we’ll
proceed in small steps.)
Whatever ~the father of is, it needs then to take us from Annette to Christo-
pher. Similarly it needs to take us from Elizabeth II to George VI, and from
George VI to George V. A convenient mathematical object for capturing this
idea is that of a function.
27
→
Annette Christopher
• f := Elizabeth II → George VI
George VI → George V
Here we have specified the function f by using rows on the table to indicate
which things are mapped to which other things. If we then let ~the father
of = f , then we’ve assigned a meaning to the father of that will be useful
in calculating ~the father of Annette.
In the British science fiction show Red Dwarf, the character Dave
Lister is revealed in the episode Ourobouros to be his own father.
Add an appropriate row to the table for f to incorporate this fact.
Do we need yet another row to capture the fact that Dave Lister’s
father’s father is Dave Lister?
28
We have used a strategy of calculating semantic values of complex expressions
via functional application. The general hope is that whenever we have a
complex expression A B, we can calculate ~A B by discovering that one of ~A
and ~B is a function, and then applying that function to the other of ~A and
~B. That is:
Functional Application: ~A B is either (i) ~A(~B) or (ii) ~B(~A).
We may sometimes make explicit the tree structure of the expressions we are
evaluating, in order to keep track of that aspect of syntax. So we might say:
•
is either (i) ~A(~B) or (ii) ~B(~A).
A B
Problem 41: The rule we’ve given of Functional Application al-
lows for two choices for what the meaning of a complex expres-
sion A B is. How do we know which one to use? (For example,
how do we know that ~the father of Annette is ~the father
of)(~Annette) rather than ~Annette(~the father of)? Can
you think of English constructions that might show that we want
both options in the the disjunctive definition of Functional Appli-
cation? (That’s a rather speculative question, given that we’re just
getting started on building meaning theories.)
Problem 42: Give an input-output chart for another function g that
is a reasonable meaning for the mother of. Then use all of the
pieces so far assembled to show carefully that ~the father of the
mother of Annette and ~the mother of the father of Annette
can be different from one another. Are we getting reasonable mean-
ings associated with those two phrases?
M Sentential Meanings
N2
laughs
N1
Annette
the father of
29
Given what we’ve already said, we are committed to:
• ~N1 = f
• ~N2 = Christopher
We then know that we want to use Functional Application to combine what-
ever ~laughs is to produce whatever ~N3 (that is, ~the father of Annette
laughs) is.
→ >
Christopher
Harry the Hyena → >
• h :=
Eeyore → ⊥
We can now get a detailed semantic analysis of The father of Annette laughs:
1. ~the father of Annette laughs = ~laughs(~the father of Annette)
2. ~laughs = h
6. ~Annette = Annette
7. ~the father of Annette = f (~Annette) [substitution, 4 and 5]
8. ~the father of Annette = f (Annette) [substitution, 6 and 7]
9. ~the father of Annette = Christopher
30
10. ~the father of Annette laughs = h(Christopher) [substitution, 3 and
9]
11. ~the father of Annette laughs = >
Problem 43: Use the above derivation to give a full labelling of the
syntactic tree for The father of Annette laughs, showing what
the meaning of each node in the tree is, and indicating how that
meaning is derived via functional application from the meanings of
its child nodes.
31
Paris → Canada
London → New Zealand
• f :=
Beijing → Jordan
Kuala Lumpur → Jordan
Dakar → Canada
These tables we’ve been giving for functions are one way of displaying inputs
and outputs of the function. Another way we could present the same informa-
tion is via a list of input-output pairs. Thus the function f from above could
also be given as:
Problem 48: Can a function contain an ordered pair of the form hx, xi
for any object x? If so, give an example of an English expression
whose semantic value might plausibly be a function containing such
an ordered pair.
32
Problem 49: The functions we have been giving have small do-
mains, and aren’t very plausible as real semantic values for English
expressions. The function we gave earlier for the father of, for
example, had only four people in its domain. As a result, ~the
father of Elizabeth I is undefined – the function that we’ve
given won’t accept Elizabeth I as an input.
Special Case: Sets and Characteristic Functions: Recall the function we used
for ~laughs:
→ >
Christopher
• h := Harry the Hyena → >
Eeyore → ⊥
Another natural way to think about the semantic value of laughs is by giving
a set: namely, the set of all and only things that laugh. Had we gone that route,
we would have said:
But if we do things this way, we have to change our rule for combining expres-
sions to make complex expressions. We can’t say:
• ~the father of Annette laughs = ~laughs(~the father of Annette)
because after filling in the values we’re committed to, this becomes:
33
• Set Membership: When neither ~A nor ~B is a function, then
A B
= > if either ~A∈~B or ~B∈~A; otherwise
= ⊥.
A B
Problem 50: This is a bit of a fussy point, but the qualifying clause
“when neither ~A nor ~B is a function” in the previous rule doesn’t
get things quite right. Give a case in which this clause leads to the
wrong result. Is there an easy way to fix things?
We could make everything work proceeding in this way. Sometimes we would
use the rule of Functional Application and sometimes we would use the rule of
Set Membership. We’d then have to be careful to be clear at each point about
which rule should be used. However, there is an easier solution.
34
Problem 52: If A is a set with 10 members, how many characteristic
functions are there on A?
If it’s ever easier to work with set extensions than with functions, then, we can
do so – but then we can use characteristic functions to transform that work on
set extensions back into work with functions.
The natural thing to say is that killed, unlike laughs, lets us correlate two
objects, rather than a single object, with a truth value. Killed correlates the pair
of Jones and Smith with > in the same way that laughs correlates Christopher
with >. We can capture this idea using two-place functions, functions that take
two inputs and produce a single output.
We can specify two-place functions using input-output tables the same way we
did with one-place functions. Thus we might say:
This function captures the facts that Jones killed Smith, Brutus killed both
Caesar and himself, and Caesar didn’t kill Napoleon. We could also specify the
same function using ordered triples:
~killed = {hJones, Smith, >i, hBrutus, Caesar, >i, hCaesar, Napoleon, >i, hBrutus, Brutus, >i}
35
Or we could specify the same collection of ordered triples descriptively rather
than by enumeration:
~killed={hx, y, zi : either x killed y and z = > or x did not kill y
and z = ⊥}
Problem 54: Give partial specifications for reasonable functions for
the following transitive verbs:
1. admires
2. is taller than
3. became
4. recognized
5. weighs
(Try doing some of the partial specifications in input-output list
format and some in ordered triple format.)
Problem 55: Using the semantic value given above for killed,
give an appropriate function to use for was killed by. Describe
the general form of the relation between the function for killed
and the function for was killed by. Will this general form give
us a universal procedure for deriving semantic values of passive
constructions? If so, how can we explain that resemble has no
passive form?
Problem 56: Can you find an example of a transitive verb in English
that ought to have the ordered triple h>, >, >i in it? What about
h⊥, ⊥, ⊥i?
Problem 57: If we use two-place functions for giving meanings of
transitive verbs like killed, what should we do with ditransitive
verbs like gives that take both a direct object and an indirect object?
Give a partial specification of a potential semantic value for gives.
How should we handle the fact that ditransitive verbs can often
accept direct and indirect objects in either order:
1. Jones gave Smith the poison.
2. Jones gave the poison to Smith.
Two-place functions are tempting as semantic values for transitive verbs, but un-
fortunately the tempting idea won’t quite work. Consider again Jones killed
Smith. Plausible this sentence has the following syntactic structure:
Jones
killed Smith
Suppose we then make the following assumptions:
36
• ~Jones = Jones
• ~Smith = Smith
But now we hit a snag. We’re trying to apply the two-place function ~killed
to the single argument Smith. And that can’t be done. Two-place functions
require two arguments, and can’t sensibly be applied to a single argument. We
can apply addition to a pair of objects, as in 3 + 5. But we can’t apply addition
to a single object, as in the absurd +3.
1, 1 → 1
1, 2 → 2
1, 3 → 3
→
2, 1 2
× =
2, 2 → 4
2, 3 → 6
3, 1 → 3
3, 2 → 6
3, 3 → 9
37
From this starting point, we will build a new function that maps a single input
number to an output function that then gives the product of that number with
a further input:
1 → 1
1 → 2 → 2
3 → 3
1 → 2
new − × = 2 → 2 → 4
3 → 6
3 → 3
2 → 6
1 →
3 → 9
The new-× function, for example, takes 3 as an input and produces as output
3 → 3
the function new-×(3) =
2 → 6
. That function then takes 2 as input and
3 → 9
produces as output 6. So whereas before we simply calculated:
• ×(3, 2) = 6
using the two-place function ×, now we calculate:
• new-×(3)(2) = 6
using a one-place function that maps the first multiplicand to another one-place
function that maps the second multiplicand to the product.
38
We can now Schönfinkel this function to produce:
" #
Jones → >
Smith →
Brutus → ⊥ i
h
Caesar
→ Brutus → >
h i
Brutus
→ Brutus → >
h i
Napoleon → Caesar → ⊥
Problem 60: Give both the left and right Schönfinkelization of the
following function:
2, 3 → 10
2, 4 → >
2, 5 → 101
f = 3, 3 → ⊥
3, 5 → Paris
5, 3 → ⊥
Paris, 4 → 18
Now we are ready to put all the pieces together and give a full semantic analysis
of Jones killed Smith. We start with:
1. ~Jones = Jones
2. ~Smith = Smith
39
" #
Jones → >
Smith →
h Brutus → ⊥ i
3. ~killed =
Caesar → Brutus → >
h i
Brutus → Brutus → >
h i
Napoleon → Caesar → ⊥
(Thus we are taking the semantic value of killed to be the left Schönfinkelization
of the natural two-place killed function.)
40
" #
Jones → >
12. (Jones) = >
Brutus → ⊥
celebrated
the winner
We could treat this exactly like earlier cases of intransitive verb sentences if we
could assign an object to ~the winner. And it seems like ~the winner ought
to be some object or other – that’s what we use definite descriptions of the form
The F for – to pick out some object.
But to have a full story, we need to figure out how to build up ~the winner
from ~the and ~winner. Let’s start with winner. A starting thought is that the
41
role of the word winner is to split the world into two categories – the winners
and the losers (‘second place is first place loser’). If that’s right, we can then
associate winner with a set: the set of all the things that make the cut. Usain Bolt
goes in the set; Beck goes out of the set. Furthermore, this looks like a plausible
story about the semantic function of common nouns in general. Giraffe sepa-
rates the world into two categories – the giraffes and the non-giraffes. Thus we
can associate giraffe with a set: the set of all the giraffes. And so on.
Using tools developed earlier, we can then move from these sets to functions by
taking the characteristic functions. Suppose we are discussing the 100m finals
in the London olympics. Then we would have:
Ryan Bailey → ⊥
Yohan Blake → ⊥
Usain Bolt → >
Justin Gatlin → ⊥
~winner =
Tyson Gay → ⊥
Churandy Martina → ⊥
Asafa Powell → ⊥
Richard Thompson → ⊥
Problem 65: Give functional semantic values for the common nouns
country, prime number, and noun. (You can either give partial
specifications using input-output tables, or full specifications by
descriptive identification of a set of ordered pairs.)
Now we need a story about ~the. We have two constraints:
1. We want ~the winner to be an object.
2. We’ve decided to make ~winner be a function from objects to truth
values.
Given these constraints, we know what kind of thing ~the needs to be:
• ~the is a function that (i) takes as input a function from objects to truth
values, and (ii) produces as output an object. That is, ~the is a function
from functions from objects to truth values to objects.
Problem 66: Which of the following is a function from functions
from objects to truth values to objects?
" #
1 → >
1.
2 → ⊥
" #
1 → > → 2
" ⊥ → 1 #
2.
> → 1
2 →
⊥ → 2
42
" #
> → 2
→ >
⊥ → 1 #
3. "
> → 1
→ ⊥
⊥ → 2
" #
1 → ⊥
→ 2
2 → ⊥ #
4. "
1 → >
→ 2
2 → >
Knowing that ~the is a function from functions from objects to truth values
to objects is only half the battle, though. We also need to know which function
from functions from objects to truth values to objects it is. As we’ve seen in the
previous problem, there are a lot of such functions, so we need to make sure
we pick the right one.
We know that ~the winner (in our London olympics scenario) is Usain Bolt.
And we also know that ~the winner = ~the(~winner). Thus we know that
whatever function from functions from objects to truth values to objects ~the
is, it must be a function such that:
Ryan Bailey → ⊥
Yohan Blake → ⊥
Usain Bolt → >
Justin Gatlin → ⊥
• ~the
= Usain Bolt
Tyson Gay → ⊥
Churandy Martina → ⊥
Asafa Powell → ⊥
Richard Thompson → ⊥
It’s then not hard to make a specific proposal. Usain Bolt is the winner because
he is the only one who wins – he, and he alone, is mapped to > by ~winner.
Thus we have:
• Given any function f from objects to truth values, ~the is the function
that maps f to the unique object o such that f (o) = >.
43
Problem 68: Give a function f such that ~the(f) = Barack Obama.
What is:
Given this definition of ~the, there are two problem cases we need to deal
with:
1. Consider the 1969 Best Actress Academy awards. There were five nom-
inees for the award – Katherine Hepburn, Patricia Neal, Vanessa Red-
grave, Barbara Streisand, and Joanne Woodward. The award was then
given jointly to Katherine Hepburn and Barbara Streisand. So for this
situation, it looks like we should have:
" #
New York Giants → ⊥
• ~winner =
Boston Americans → ⊥
Thus in these circumstances, ~the winner doesn’t get a semantic value, and
is meaningless. We’ll return later to some further examination of what to say
about the role of this sort of meaningless expression in our theory of meaning.
44
Problem 69: Given this choice for ~the, many expressions using
the will be meaningless. There are many books, so ~book maps
many objects to >. Thus ~the book is undefined. How could we
fix this problem? Is there a better choice of function for ~the?
• winner =
win er
Problem 71: Picking up where the previous problem left off, sup-
pose we think that the winner isn’t meant to occur by itself, but
only in conjunction with a subsequent of phrase, as in:
1. Option 1:
the
-er
win of X
45
2. Option 2:
the
win -er of X
Pick one of these two syntactic structures and, using it, make pro-
posals for both ~-er and ~of that will produce reasonable results
for expressions like the winner of the 2016 Olympic 100 meter
dash.
Given the theory we adopted in the previous section for the, a definite descrip-
tion of the form the F fails to have a semantic value if there is more than one
F (or if there are no Fs). Because almost any common noun will apply to more
than one object, that theory has problems with simple definite descriptions of
the form:
•
the N
for a common noun N.
But we can get better results if we consider definite descriptions built out of
adjectively modified nouns, such as the bald man. Suppose that the bald
man has the following syntactic structure:
NP
D N
the A N
bald N
man
46
In fact, we’ve really been needing, and implicitly making use of, that
principle all along. Strictly speaking, our semantic value function
~. assigns semantic values to nodes in a syntactic tree. So when we
talk about ~Jones killed Smith, we really more properly mean:
S
NP VP
NAME
V NP
Jones
killed NAME
Smith
So it’s the S node that really gets the semantic value. But that’s
annoying to write out, so we usually just discuss the semantic value
of a particular node by listing out the words dominated by that node.
47
We’ve been just as happy to obscure that difference, because (a)
it’s annoying to keep track of, and (b) it doesn’t actually matter for
anything we’re doing – we want all three of those nodes to have the
same semantic value. But to make that happen, we need the rule
that any non-branching node simply inherits the semantic value of
its child.
To calculate the semantic value of the bald man using the tree:
NP
D N
the A N
bald N
man
we need a semantic value for bald. There are two important constraints on
~bald:
1. ~bald needs to combine with ~man by functional application. Since
~man is a function from objects to truth values, ~bald needs to be one
of:
(a) An object, so that it can be an input to ~man.
(b) A function that takes as input a function from objects to truth values,
so that ~man can be the input to ~bald.
2. When ~bald and ~man combine, they need to produce as output some-
thing that can combine with ~the by functional application. Since ~the
is a function from functions from truth values to objects to objects, the
combination of ~bald and ~man needs to be one of:
(a) A function from objects to truth values, so that it can be an input to
~the.
(b) A function that takes as input a function from functions from objects
to truth values to objects, so that ~the can be the input to the
combination of ~bald and ~man.
Looking over these choices, if we go with option (a-i) and make ~bald an
object, then ~bald man will be a truth value, but a truth value doesn’t meet
either of the conditions (b-i) or (b-ii). So we can’t use (a-i). That means we need
to go with (a-ii), so ~bald must be a function that takes as input a function
from objects to truth values.
That leaves the question of what the output of the ~bald function should be.
Given (b-i) and (b-ii), we have two choices:
48
1. The output of ~bald is a function from objects to truth values. Then
~bald man will be a function from objects to truth values. That function
can then serve as the input to ~the, producing an object as output. Thus
~the bald man will be an object, which will combine nicely with an
intransitive verb like laughs, since ~laughs is a function from objects to
truth values, so ~the bald man laughs will be a truth value, as desired.
So on this option, ~bald is a function form functions from objects to truth
values to functions from objects to truth values.
2. The output of ~bald is a function from functions from functions from
objects to truth values to objects to something. In this case, ~bald man will
be a function from functions from functions from objects to truth values
to objects to something. That function will take as input ~the’s function
from functions from objects to truth values to objects, and produce as
output something. Whatever that something is, it will be the semantic
value of the bald man. Given that ~the bald man needs to combine
with ~laughs, something needs to be either an object or a function from
functions from objects to truth values to truth values. The result is two
choices for ~bald:
(a) A function from functions from objects to truth values to functions
from functions from functions from objects to truth values to objects
to objects.
(b) A function from functions from objects to truth values to functions
from functions from functions from objects to truth values to objects
to functions from functions from objects to truth values to truth
values.
We’re almost there. A function from functions from objects to truth values to functions
from functions from functions from objects to truth values to objects to functions from
functions from objects to truth values to truth values sounds like pretty desperate
territory – it would be nice to avoid quite that much complexity if we can. So
let’s see if we can make the first option work, and have ~bald be a function
49
from functions from objects to truth values to functions from objects to truth
values.
NP VP
D N TV NP
the A N met D N
bald N the A N
king bald N
prince
The remaining question is: which function from functions from objects to truth
values to functions from objects to truth values is ~bald?
Problem 74: Suppose there are ten objects. How many functions
from functions from objects to truth values to functions from objects
to truth values are there?
We can make this question easier by recalling that a function from objects to
truth values can be thought of as another way to present a set of objects. (The
function from objects to truth values is the chracteristic function of a set.) So if
~bald is a a function from functions from objects to truth values to functions
from objects to truth values, ~bald can also be thought of as a function from
one set to another set.
Put in that way, the problem is much easier. Consider ~bald man. ~bald need
to map one set to another set. The input set, of course, is the set of men. (~man
is the characteristic function of that set.) And ~bald man should be the set of
bald men. So how does ~bald take us from the set of men to the set of bald men?
Here’s an obvious suggestion. bald, like man, is associated with a set of objects.
man is associated with the set of men, and bald is associated with the set of bald
things. Let’s use small capitals to name these associated sets, so that bald is
the set of bald things. Then ~bald is, roughly, the function that maps an input
set (such as the set man) to the intersection of that set with bald.
50
More precisely (now we translate back from the talk of sets to talk of character-
istic functions of sets), we have:
• ~bald is the unique function f such that for any function g that is the
T function of some set G, f (g) is the characteristic function of
characteristic
the set G bald.
1. ~man
2. ~bird
3. ~bald man
4. ~bald bird
5. ~the bald man
6. ~the bald bird
7. ~bald
Problem 76: Give semantic values for the words angry, hungry, and
tiger such that:
Problem 77: Suppose we treat large the same way we treated bald
above. We identify a set large, and then we define ~large to be
the function that maps the characteristic
T function of any set G to the
characteristic function of G large.
51
Explain why it would be a consequence of that treatment that
if ~large mouse maps Mickey to >, then ~large mammal maps
Mickey to >. Is that a desirable or undesirable result?
Problem 78: Explain why the method we used for giving the se-
mantic value of bald won’t work for the adjective alleged. Give a
new proposal for specifying ~alleged and show that your proposal
produces a reasonable interpretation of the alleged murderer.
The functions we have been considering as semantic values for adjectives have
been getting rather complicated. “Functions from functions from truth values
to objects to functions from truth values to objects” takes a while to say, and
actually seeing clearly what kind of function is meant by that phrase can require
drawing a careful diagram. And things get even worse if we need to consider
“functions from functions from objects to truth values to functions from func-
tions from functions from objects to truth values to objects to functions from
functions from objects to truth values to truth values.”
We’ll now introduce a more convenient and compact method for describing
functions. We start with things that are not functions. The non-functions we’ve
been making use of fall into two categories:
1. Truth values: > and ⊥ are not functions, but do get used as semantic
values (in particular, as semantic values of whole sentences).
2. Ordinary objects: Paris and London, Socrates and Aristotle, and the
many other things that make up the world are not functions, but do get
used as semantic values (in particular, as semantic values of names and
of definite descriptions).
We will use e as a name for the collection of ordinary objects, and t for the
collection of truth values. We call e and t, as well as the other collections we’ll
go on to define, types.
Types e and t can then be used to build names for various types of functions.
For example:
1. (e, t) names the collection of functions from ordinary objects (members of
e) to truth values (members of t).
52
2. (e, e) names the collection of functions from ordinary objects to ordinary
objects.
3. (t, t) names the collection of functions from truth values to truth values.
4. ((e, t), e) names the collection of functions from functions from ordinary
objects to truth values (members of (e, t)) to ordinary objects.
There is a general pattern that lies behind these names for different types.
Type Formation: Given any types α and β, we use (α, β) as a name
for the collection of functions that take as input things in α and
produce as output things in β.
If two nodes in a phrase structure tree are going to combine successfully using
functional application, one of them needs to be of some type α and the other
needs to be of type (α, β) for that same α and some β. Thus consider the follow-
ing tree:
53
t
e (e, (t, t))
(t, t)
t
((e, t), (e, t))
e (e, (t, t))
(t, e) ((t, e), (e, t))
e (e, ((e, t), (e, t)))
(t, t)
t ((e, t), (e, t)) (e, t)
54
Next, ((e, t), (e, t)) can combine with (e, t) to produce (e, t):
t (e, t)
(t, t)
t
((e, t), (e, t)) (e, t)
e (e, (t, t))
e (e, ((e, t), (e, t))) (t, e) ((t, e), (e, t))
• • • •
Find ways to assign types to the leaves of this tree to satisfy each of
the following conditions:
Problem 81: Suppose there are three objects a, b, and c in type e. For
each of the following types, determine how many members there
are of the type, and give an example of a member of the type.
1. (e, t)
2. (t, t)
3. (e, e)
4. (e, (e, t))
5. ((e, t), t)
6. ((e, t), (e, t))
55
7. ((e, t), ((e, t), t))
Problem 82: In each of the following trees, fill in the types on all
nodes that don’t already have types specified. In some cases you
will need to reason both up and down the tree to determine all of
the nodes.
1.
2. t
e
e e
3.
t (t, t)
(e, t)
e
(t, ((t, t), (t, t))) (e, (t, t))
(e, t)
[Note: There is more than one way to complete the type la-
belling in this tree. Find the simplest completion.]
Three observations about the interaction between the type theory and the syn-
tax:
1. Certain grammatical categories look like they are systematically linked to
particular types. For example:
(a) Expressions of grammatical category NAME are of type e.
(b) Expressions of grammatical category S are of type t.
(c) Expressions of grammatical category N are of type (e, t).
(d) Expressions of grammatical category IV are of type (e, t).
(e) Expressions of grammatical category TV are of type (e, (e, t)).
(f) Expressions of grammatical category A are of type ((e, t), (e, t)).
56
So far it’s just a speculative generalization that these grammatical category
- type associations will hold up robustly. We might at any time encounter
tricky cases that require us to assign (for example) specific common nouns
a type other than (e, t).
2. The correlation between types and grammatical categories then suggests
that we might be able to explain the grammatical rules in terms of the se-
mantic values. Why does the grammar allow a name and an intransitive
verb to be combined:
The transitive verb admired is of type (e, (e, t)). In this case, that semantic
value can combine with the type e of ~Aristotle, but the result is of
type (e, t). Since we are trying to use Aristotle admired as an entire
sentence, and since entire sentences are of type t, again the typing does not
work out, and we get an explanation for the inability to make Aristotle
admired a sentence.
3. Nevertheless, there are substantial obstacles to deriving the grammatical
properties of the language directly from the types of the semantic values
of expressions. Two difficulties:
57
(a) Intransitive verbs like laughs and common nouns like man are both
being treated as of type (e, t). But of course intransitive verbs
and common nouns aren’t of the same grammatical category – they
clearly are not intersubstitutable:
We can’t use the type of the semantic values to determine the gram-
matical structure if we ever allow there to be two expressions of the
same semantic type but in different grammatical categories.
(b) Some words in English can have semantic values of more than one
type. For example, many transitive verbs can also be used intransi-
tively:
i. Jones ate breakfast earlier today.//Jones ate earlier today.
ii. Smith spilled the wine.//The wine spilled.
But since transitive verbs are of type (e, (e, t)) and intransitive verbs
are of type (e, t), we won’t be able to assign a single type to these
verbs that fully explains its range of grammatical uses.
S Typing Trees
Once we have worked out the right type for the semantic values of some words,
we can often use the type theory to work out the type of the semantic values of
other words. Consider some examples:
Aristotle
laughed loudly
58
e
(e, t)
Aristotle loudly
laughed
e
(e, t)
Aristotle loudly
laughed
But if ~Aristotle is of type e, the only way to make the whole sentence
be of type t is to have the semantic value of laughed loudly be of type
(e, t):
e (e, t)
Aristotle (e, t)
loudly
laughed
We can then see that ~loudly must be of type ((e, t), (e, t)), so that it can
take as input the type (e, t) semantic value of laughed and produce as
output the desired type (e, t) for laughed loudly.
59
tree:
Aristotle
kicked
the
man
beneath
the house
We’ve already built into our theory commitments about the type of each
of these words other than beneath:
• ~Aristotle is of type e.
• ~kicked is of type (e, (e, t)).
• ~the is of type ((e, t), t).
• ~man is of type (e, t).
• ~house is of type (e, t).
the (e, t)
the house
60
(b) Therefore ~the man beneath the house needs to be of type e in
order to combine with type (e, (e, t)) ~kicked in order to produce
that type (e, t).
Problem 85: Actually, there is a second more complicated
choice for the type of ~the man beneath the house. What
is the other choice?
85
(c) Therefore ~man beneath the house needs to be of type (e, t) in
order to combine with type ((e, t), e) ~the to produce that type e.
Problem 86: Again there is another choice for the type of
~man beneath the house. Give the other type that would
work.
61
t
e (e, t)
Aristotle
(e, (e, t)) e
kicked
((e, t), e) (e, t)
the
(e, t) ((e, t), (e, t))
man
(e, ((e, t), (e, t))) e
the house
Aristotle
kicked
the man
beneath
the house
What effect would this different syntactic tree have on the final
type assignment to ~beneath? Is there any semantic reason to
prefer one syntactic tree to the other?
62
The general strategy, then, is to work our semantic value types for new words
by seeing how those semantic values are constrained by the types of other
semantic values we’ve already built into the theory. As we’ve seen, in some
cases this won’t be enough to fix semantic types uniquely, but it’s often enough
to limit things down to a few choices.
S
S
and
Aristotle laughed
Plato cried
We give this tree rather than the more obvious:
S S
and
Aristotle laughed Plato cried
What type should ~and have to make the typing work out for
the first tree? Describe the particular function from that type that
should be used for ~and. It then looks like ~or should be of the
same semantic type as ~and. What function should be used for
~or?
Problem 90: And can be used to join expressions other than entire
sentences:
Give binary branching syntactic trees for each of those two sentences
along the lines used in the previous problem. What should then be
the type of ~and in Aristotle and Plato laughed? What should
be the type of ~and in Aristotle laughed and cried? Is there
any prospect of giving a single type to ~and that will work for all
uses of and?
63
Once you have determined types for these two uses of and, what
particular functions within those types should be used for ~and?
Problem 91:
Negation would be easy to incorporate into our semantic theory if
it always occurred in a position c-commanding an entire sentence:
S
not
Aristotle laughed
Unfortunately, this isn’t how negation typically works in English.
(Except, perhaps, for the rather artificial it is not the case that.)
Suppose we instead have:
Aristotle
did
not laugh
How can we assign types to ~not and ~did to make the typing
of this sentence work out? (You might consider making ~did an
identity function that simply passes the input value of its sibling
node up to the parent.) What particular function should be assigned
to ~not?
laughed
the
man
very tall
How should we assign semantic type to ~very to make the typing
of that sentence work out?
64
laughed
the
man
tall
very very
or:
laughed
the
man
very
very tall
We’ve seen reason earlier to think that adjectives are typically of type ((e, t), (e,
t)). But different adjectives will, of course, pick out different functions within
that type. For example:
1. ~bald is the function that maps any (e, t) function f to the characteristic
function of the intersection of the set bald with the set of objects that f
maps to >.
2. ~large is the function that maps any (e, t) function f to the function that
(i) maps any object o to > if the size of o is substantially greater than the
average size of objects that f maps to > and (ii) otherwise maps o to ⊥.
Complex descriptions of functions such as these bring out that we could use a
compact and clear terminology for naming specific functions, just as the type
theory gives us a compact and clear terminology for naming categories of func-
tions.
The notation of the lambda calculus provides a standard way for naming
functions. Consider first how functions are named in mathematics. In the
simplest cases, we write things like:
• f (x) = x2
65
• g(x) = sin(x + 1)
3x + 1 if x ≤ 0
(
• h(x) =
x3 + 5 if x > 0
• j(x) = 1
• k(x, y) = xy + x + y + 1
In each case we specify a function by doing two things:
1. We indicate what the input to the function is. That’s the role of the
variables that accompany the function name. Thus f – or as we sometimes
say, f (x) – is a function that takes x as an input, and h(x, y) is a function
that takes both x and y as inputs.
2. We specify the output of the function in terms of the input to the function.
That’s the role of the expression to the right of the ‘=’ sign. Thus f is a
function that for any input x produces x2 – that is, the square of the input
– as output. And j(x) is a function that, given any input, produces the
number 1 as output.
This common mathematical practice provides an easy way to name simple
functions, but it becomes more cumbersome with more complicated functions.
Our simple way of naming functions doesn’t give us any good tools for distin-
guishing between these two roles that 2x can be playing. The lambda notation
will correct that.
Let’s start with a simple case. In the lambda notation, we name the function
f (x) = x2 by writing:
• λx.x2
66
The expression λx.x2 has two parts, one before the period and one after the
period.
1. The part before the period (sometimes called the lambda abstract) indi-
cates that the expression names a function, and also specifies what the
input variable to the function is. In λx.x2 , then, the input variable is x.
But in λy.y2 , the input variable is y.
2. The part after the period specifies the output of the function. For any
particular value of the input variable, the output of the function is the
semantic value of the part of the lambda term after the period given that
value of the variable. Thus the output of λx.x2 for the input x = 3 is 9,
because x2 is 9 when x = 3.
Problem 93: Give lambda term names for each of the following
functions:
1. f (x) = x2 + 2x + 1
2. g(x) = 0
3. h(x) = e2x+1
• λx.Aristotle
is a function specified with a lambda term. λx.Aristotle is the constant function
that maps every input to Aristotle.
67
• λx.(the father of x)
This is the function that takes an object x as input and outputs the father of x.
Thus we can have:
• ~the father of = λx.(the father of x)
Departing a bit more from standard mathematical practice, we can have:
• λx.(x laughs)
For any given object o, this function maps o to the semantic value of x laughs
when x is assigned to o. Thus when the input to the function is Aristotle, the
output is ~Aristotle laughs, which is then > if Aristotle does indeed laugh.
One special case of this is that inputs for lambda terms don’t have to be of type
e. For example, we can have:
• λxt .(¬x)t
This is the function that takes a truth value (something from type t) as input
and produces an output also of type t. It’s thus a function of type (t, t). In
particular, we plausibly have:
• ~not = λxt .(¬x)t
Problem 96: How many functions of type (t, t) are there? Give
lambda terms that pick out each of those functions.
68
V Lambda Notation and Higher-Typed Functions
The real strength of the lambda notation is in its ability to name more compli-
cated functions of higher-order types – functions that take functions as inputs
or produce functions as outputs (or both).
• f (x) = x + 1
or in lambda notation:
• λx.x + 1
There is also the function of adding 2, which we can write in the lambda notation
as:
• λx.x + 2
In fact, for any number n, there is the function of adding n:
• λx.x + n
That means that there is a function that maps each number n to the function
λx.x + n. We can name that function with a lambda term as follows:
• λy.λx.x + y
To see more clearly what is going on with this lambda term, let’s add type
specifications. Both x and y are numbers to be added. They are thus both of
type e:
• λye .λxe .xe + ye
The inner lambda expression λxe .xe + ye is thus a function taking a type e as
input (namely, the variable xe and producing a type e as output (namely, the
sum xe + ye . That means that it is a type (e, e) function. We can also make that
explicit in our full term:
• λye .(λxe .xe + ye )(e, e)
Finally, we see that the full lambda term names a function that takes a type e
as input (namely, the variable ye ) and produces a type (e, e) as output (namely,
the function (λxe .xe + ye )(e, e) ). That means it is a type (e, (e, e)) function:
(Of course, we will rarely want to mark types as thoroughly as we have done
here. But if we get confused about how things are working, we can always fall
69
back on thorough type marking.)
First Example: Consider the function that takes as input any function f from
numbers to numbers and produces as output a function g from numbers to
numbers whose value for any input is twice f ’s value. (So our function maps
f (x) = x to g(x) = 2x, and maps f (x) = x2 − 3 to g(x) = 2x2 − 6, and so on).
Since this function takes as input an (e, e) input, its lambda term needs to begin:
• λx(e, e) .
Following the period, we need another expression of type (e, e) to name the
output function, To create an expression of type (e, e), we make another lambda
term of the form:
• λye .Ee
using some expression E of type e.
The expression E of type e then needs to give the output of the output “doubled”
function for the input y. So for any given value of y, the value of E for that
value of y needs to be twice the value of the input function for y. The input
function is given by the variable x(e, e) . The value of that function for the input
y is then given by x(y) – the functional application of the x function to the y
input. We want twice that value, so we want 2x(y). Thus the lambda term for
the desired function is:
• λx.λy.2x(y)
Or, to make all of the typing explicit:
• (λx(e, e) .(λye .(2x(e, e) (ye ))e )(e, e) )((e, e), (e, e))
Second Example: Consider the function that takes as input a number n and
produces as output a function that maps any function f to the function f + n
(that is, the function each of whose outputs is n greater than f ’s output for the
same input). This function has an input of type e and produces an output of
type ((e, e), (e, e)).
A lambda term for this function thus begins with λxe . The output of the
function needs to be of type ((e, e), (e, e)), so we need a lambda term of the
form:
70
• λxe .E((e, e), (e, e))
Now we just need to build the appropriate term E((e, e), (e, e)) . To build that
term, we want to start with an input of type (e, e) and then produce an output
that is also of type (e, e). Thus we start with λy(e, e) , and follow it with a lerm
of type (e, e):
• λy(e, e) .F(e, e)
That leaves us with the task of building the right term f(e, e) . That term will
have the form:
• λze .Ge
To get clear on what Ge should be, let’s think carefully about the various
functions involved.
1. There is the final function we are trying to build, the one that maps an
input number to a function that maps functions to functions shifted by
that number. This function will be named by the overall lambda term
λxe .E((e, e), (e, e)) .
2. There is the function that is the output of this function. This is the function
that shifts input functions by some specific amount. This function will be
named by the lambda term λy(e, e) .F(e, e) (which is then identical to the
expression E in the larger lambda term).
3. There is the input to that specific shifter function. This function is picked
out by the variable y(e, e) .
4. And there is the output shifted function. This function is named by the
lambda term λze .Ge (which is then identical to the expression F in the
previous lambda term).
G, then, gives the output of the shifted function for some specific input z. To
get the output of the shifted function, we need two things:
1. The output of the pre-shifted function for the input z. The pre-shifted
function is given by the variable y, so the output of that function for the
input z is y(z).
2. The shifting amount. That amount is given by the variable x.
Thus G needs to be y(z) + x. Assembling the pieces, we have:
• λx.λy.λz.(y(z) + x)
or, with explicit typing:
• (λxe .(λy(e, e) .((λze .((y(e, e) (ze ))e +xe )e )(e, e) ))((e, e), (e, e)) )(e, ((e, e), (e, e)))
71
Problem 97: Give lambda terms for each of the following functions:
1. The function that takes as input a function f from numbers to
numbers and outputs a function g from numbers to numbers
whose value for any input n is the same as f ’s output for 2n.
2. The function that takes as input a function f from truth val-
ues to truth values and outputs the function that results from
applying the function f twice. to any input truth value.
3. The function that takes as inputs two functions f and g and
produces as output the function f − g. (Hint: because this
function is described as taking two inputs, you’ll need to apply
Schönfinkelization to specify it in with a lambda expression.)
We’ve already seen how to use lambda terms to specify semantic values such
as:
1. ~laughs = λx.x laughs
2. ~the father of = λx.the father of x
Using higher-order functions, we can give semantic values for more expres-
sions.
72
1. λx.λy.x kicks y
2. λx.λy.y kicks x
Problem 100: Give explicit type marking for each of those two
lambda terms to confirm that they both name functions of type (e,
(e, t)). What type should the variables x and y be?
But which candidate is correct? Let’s consider the sentence Socrates kicked
Aristotle and work through the details with both candidates. We start by
assuming:
• ~Socrates = Socrates
• ~Aristotle = Aristotle
• {hx, yi : x kicked y} = {hSocrates, Aristotlei, hSocrates, Platoi, hPlato, Aristotlei}
Now consider the function picked out by each of our lambda term options:
1. λx.λy.x kicks y: This is the function that, given any input x, produces as
output the function that maps any input y to > if x kicked y and to ⊥ if x
didn’t kick y. Given our starting collection of kicking ordered pairs, that’s
the following function:
" #
Aristotle → >
Socrates →
Plato → >
" #
Socrates → ⊥
• Aristotle →
Plato → ⊥
" #
Aristotle → >
Plato →
Socrates → ⊥
2. λx.λy.y kicks x: This is the function that, given any input x, produces as
output the function that maps any input y to > if y kicked x and to ⊥ if y
didn’t kick x. Given our starting collection of kicking ordered pairs, that’s
the following function:
" #
Aristotle → ⊥
Socrates →
Plato → ⊥
" #
Socrates → >
• Aristotle →
Plato → >
" #
Aristotle → ⊥
Plato →
Socrates → >
73
Problem 101: Notice that the two functions just given are perfect
opposites of each other: the second function has ⊥ as an output
everywhere the first function has >, and > as an output everywhere
the first function has ⊥. Is that an inevitable feature of the functions
picked out by λx.λy.x kicks y and λx.λy.y kicks x? If so, why? If
not, give an alternative collection of kicking ordered pairs for which
the two functions are not perfect opposites.
We can now work out twice over the semantic analysis of:
Socrates
kicked Aristotle
And then:
• ~Socrates kicked Aristotle = ~kicked Aristotle(~Socrates)
" #
Socrates → ⊥
= (Socrates)
Plato → ⊥
=⊥
So we end up concluding incorrectly that Socrates kicked Aristotle
is false.
2. Using λx.λy.y kicks x: First:
74
= λx.λy.y kicks x(Aristotle)
" #
→ ⊥
Socrates → Aristotle
Plato → ⊥
" #
Socrates → >
= Aristotle →
(Aristotle
Plato → >
" #
Aristotle → ⊥
Plato →
Socrates → >
" #
Socrates → >
=
Plato → >
And then:
• ~Socrates kicked Aristotle = ~kicked Aristotle(~Socrates)
" #
Socrates → >
= (Socrates)
Plato → >
=>
Thus λx.λy.y kicks x is the right semantic value to use for ~kicked. This makes
sense if we think through the steps of functional application. λx.λy.y kicks x
takes x as its first input and y as its second input. Because it uses y kicks x,
it thus puts the first input in the kicked position and the second input in the
kicker position. But the structure of the tree for Socrates kicked Aristotle
guarantees that kicked will functionally combine with the kicked first (Aristotle)
and then with the kicker second (Socrates). So that’s the order we want the
variables in the lambda term to have.
Problem 102: Give appropriate lambda terms for the semantic val-
ues of each of the following:
1. kills
2. was killed by
3. resembles
4. resembles Superman
5. gives
6. gives the book
7. gives to Socrates
75
1. The: ~the is of type ((e, t), e). Its semantic value thus needs to be a
lambda term of the form:
• λx(e, t) .Ee
Ryan Bailey → ⊥
Yohan Blake → ⊥
Usain Bolt → >
Justin Gatlin → ⊥
~winner =
Tyson Gay → ⊥
Churandy Martina → ⊥
Asafa Powell → ⊥
Richard Thompson → ⊥
Ryan Bailey → ⊥
Yohan Blake → ⊥
Usain Bolt → >
Justin Gatlin → ⊥
λx(e, t) .(the unique object y such that x(y) = >)
Tyson Gay → ⊥
Churandy Martina → ⊥
Asafa Powell → ⊥
Richard Thompson → ⊥
Ryan Bailey → ⊥
Yohan Blake → ⊥
Usain Bolt
→ >
Justin Gatlin → ⊥
The function serves as input to the lambda
Tyson Gay
→ ⊥
Churandy Martina → ⊥
Asafa Powell → ⊥
Richard Thompson → ⊥
term, setting the value of x. The only value for y for which x(y) is > is
thus Usain Bolt, so ~the winner is Usain Bolt, as desired.
76
2. Bald: ~bald is of type ((e, t), (e, t)). Recall that the intent is that bald,
when combined with some common noun N, produce as semantic value
the intersection of bald with the set of things satisfying N. (More care-
fully, ~bald N is the characteristic function of the intersection of bald
with the set that ~N is the characteristic function of.) The lambda term
for ~bald should thus be of the general form:
• λx(e, t) .E(e, t)
The variable x will then be filled by the input (e, t) semantic value, which
will be provided by the common noun that bald modifies. To make E an
expression of type (e, t), we further assume that it has the form:
• λye .Ft
• Ft = x(y) ∧ y ∈ bald
The first conjunct, x(y), requires that y satisfy the common noun (because
the semantic value x of the common noun maps y to >). The second
conjunct requires that y also be in the set of bald things.
Putting the pieces together, we have:
Problem 103: Test the semantic value just given for bald by
building a case with four people: Albert, Beatrice, Charles, and
Dorothy. Give a function to use as the semantic value of man and
a set to use as bald. Then calculate ~bald. Use that function to
derive ~the bald man, and see if the final result is plausible.
Problem 104: Suppose we want to treat ~bald as the character-
istic function of bald (and hence of type (e, t). We thus suggest
that bald man in fact has the following structure:
man
bald INTERSECT
77
where INTERSECT is a covert term that transforms the (e, t)
bald into a semantic value that can appropriately interact with
~man by functional application. What type is ~INTERSECT?
Give an appropriate lambda term for ~INTERSECT.
Problem 105: Red is sometimes used to describe the color of the
exterior of an object, as in:
Give suitable lambda terms along these lines for ~large, ~tall,
and ~good. One of these three terms seems to function seman-
tically a bit differently from the other two. Which one? Does
this difference call for a difference in the lambda term?
78
mouse
large RELATIVE
Propose a suitable semantic value for RELATIVE.
• former president
• fake diamond
• alleged criminal
Aristotle
does
not laugh
e (e, t)
Aristotle
((e, t), (e, t)) (e, t)
not laugh
For simplicity, let’s assume that ~does is an identity function that will
simply pass the (e, t) value of ~not laugh up to the next node. Thus
~does is λx(e, t) .x. Then ~not needs to be a function that will map
~laugh to a new ([e, t) function – in particular, a function that maps an
79
object to > if ~laugh maps it to ⊥, and that maps an object to ⊥ if ~laugh
maps it to >.
We thus need:
Bolt
ran quickly
Then ~quickly needs to be of type ((e, t), (e, t)). What is wrong
with the following lambda term for ~quickly:
• λx(e, t) .x quickly
What about:
(In both cases there are precise technical problems for the proposed
lambda terms.) Try to give a lambda term for ~quickly that does
work properly. (This isn’t an easy task.) Does your proposal for
~quickly also work when quickly is combined with a transitive
verb, as in:
Lambda terms are useful for giving clear and succinct specifications of com-
plicated functions. They are also useful as a tool for calculating the values of
functions. Consider some simple examples:
1. (λx.x2 )(3) = 32 = 9
2. (λy.3y − 5)(7) = 3 · 7 − 5 = 21 − 5 = 16
80
3. (λz.z3 − 4z + 1)(0) = 03 − 4 · 0 + 1 = 1
1. Remove the initial λ+variable portion of the lambda term (the lambda
abstract).
2. Replace the variable with the value that is being used as input to the
function.
3. Simplify the resulting expression to determine what number it names.
This calculation uses the procedure for simple lambda terms, but uses it twice.
We start with the complex lambda term λx.λy.x + y. We apply this function to
the input 3. We thus remove the outermost lambda abstract – the λx portion of
the term – and replace the corresponding variable with the input 3. The result
is λy.3 + y. Note that we don’t replace the variable y with 3, because we are
evaluating the λx portion of the term, and so are only replacing the variable x
with the input.
We’ve thus learned that (λx.λy.x + y)(3) = λy.3 + y. Put into words, that is:
the function that maps any first number to the function that maps any second
number to the sum of the first and second numbers, when applied to the input
3, produces as output the function that maps any number to the sum of 3 and
that number.
We then apply that function to the input 4. That is, we evaluate (λy.3 + y)(4).
Again we follow our procedure. We remove the lambda extract – this time, the
y abstract λy. We replace the corresponding variable – namely, y – with the
input 4. That gives us 3 + 4. Then we use some arithmetic to simplify 3+4 to 7.
1. (λx.2x )(5)
2. (λy.y3 − y)(10)
3. (λx.λy.x − y)(3)(5)
4. (λx.λy.x − y)(5)(3)
81
5. (λx.3x + 5)(λy.y2 )(4))
y
6. (λx.λy. x )((λz.z3 − 6)(2))((λw.4 y+1 )(2))
7. (λx.x2 )((λy.5y + 1)((λz. 2z )(−2)))
We can use the same calculation method for lambda terms that aren’t just
straightforward mathematical functions. Consider:
1. (λx.x laughs)(Socrates). Following our procedure, we remove the lambda
extract and replace the variable x with the input to the function, which is
Socrates. The result is:
• Socrates laughs
But what does it mean for this to be the output? λx.x laughs should
name an (e, t) function, so it should take an entity (Socrates) as input and
produce a truth value as output. When we say:
• (λx.x laughs)(Socrates) = Socrates laughs
we thus mean that the output of λx.x laughs for the input Socrates is the
truth value named by the sentence Socrates laughs. If Socrates does in
fact laugh, then that truth value is >, so (λx.x laughs)(Socrates) = >.
2. Now let’s work in detail through a full sentence. Consider the sentence
Aristotle admires Socrates:
Aristotle
admires Socrates
Suppose we have the following semantic values for the individual words:
(a) ~Aristotle = Aristotle
(b) ~admires = λx.λy.y admires x
(c) ~Socrates = Socrates
We can place these semantic values on the leaves of the tree:
Aristotle
admires Socrates
82
We know that the higher nodes are determined by functional application,
so we can do an initial completion of the tree:
Aristotle
λx.λy.y admires x Socrates
admires Socrates
Now we just need to calculate the values of those higher node lambda
terms:
(a) (λx.λy.y admires x)(Socrates) = λy.y admires Socrates. (As usual,
we remove the initial lambda abstract λx and replace the variable x
with the input to the function, which is Socrates.
(b) (λx.λy.y admires x)(Socrates)(Aristotle), by the previous, is equal to
(λy.y admires Socrates)(Aristotle). And (λy.y admires Socrates)(Aristotle)
= Aristotle admires Socrates. (Again, we remove the lambda ab-
stract, in this case λy, and replace the variable y with the input to
the function, which is Aristotle.)
Thus the tree for Aristotle admires Socrates, fully evaluated, is:
Aristotle
λx.λy.y admires x Socrates
admires Socrates
As usual, this can look uninformative, since it seems to tell us that the final
semantic value of the sentence Aristotle admires Socrates is Aristotle
admires Socrates. But the labelling of the top node with “Aristotle admires
Socrates” is in fact a labelling of the top node with the truth value named
by “Aristotle admires Socrates”, and so is either > or ⊥ depending on
Aristotle’s particular pattern of admiration. (Presumably >.)
Applying Lambda Terms to Lambda Terms: We have been considering simple
cases so far, in which the inputs to lambda terms are themselves simple objects.
Thus we’ve been considering only lambda terms of type (e, α) for some type
83
α. But of course this doesn’t exhaust the range of lambda terms. We can in the
same way calculate the values of lambda terms of type (t, t), for example:
• (λxt .¬x)(⊥) = ¬⊥ = >
The more complicated cases occur when a lambda term is used as an input to
another lambda term, as in:
• (λx(e, e) .λye .x(y) + 1)(λze .2z − 3)
In this lambda term, the (e, e) function λz.2z − 3 serves as input to the ((e, e),
(e, e)) function λx.λy.x(y) + 1.
But in fact we can use the same procedure in these more complicated cases. We
remove the lambda abstract, and then replace occurrences of the corresponding
variable with the input to the function. The only difference is that the input to
the function is itself a function, named by a lambda term of its own, rather than
just an object.
1. Remove the lambda abstrct, which in this case is λx. That leaves us with
λy.x(y) + 1.
2. Replace the corresponding variable – in this case x – with the input to the
function, which in this case is λz.2z − 3:
λz.2z−3
z}|{
λy. x (y) + 1
⇓
λy.(λz.2z − 3)(y) + 1
λy.(λz.2z − 3)(y) + 1 can then be further simplified by calculating out the interior
piece (λz.2z − 3)(y). This simplifies to 2y − 3 in the usual way. Thus λy.(λz.2z −
3)(y) + 1 simplifies to λy.2y − 3 + 1 or λy.2y − 2. The final upshot, then, is
that our starting (λx.λy.x(y) + 1)(λz.2z − 3) simplifies to a function that maps
any input to twice that input minus 2. That’s because λx.λy.x(y) + 1 is itself a
function that maps an input function to an output function whose values are
always one more than the values of the input function, and that higher-order
function is then applied to the (e, e) function that maps each input to twice the
input minus 3.
84
4. λx.λy.λz.(x(y) − x(z))(λw.w3 + 4)(3)(5)
x(y)
5. λy.((λx.( x(x(y)) )(λz.z + 1))((λu.u((λv.v2 )(3)))(λt.2t))
We now work through a few more linguistic examples that also involve apply-
ing lambda terms to lambda terms.
1. Consider the sentence:
• The bald man killed Belicoff.
We start with semantic values for the individual words:
(a) ~the = λx.(the unique object y such that x(y) = >)
(b) ~bald = λx.λy.(x(y) ∧ y ∈ bald)
(c) ~man =λx.x is a man
(d) ~killed = λx.λy.y killed x
(e) ~Belicoff = Belicoff
The tree for the sentence with the initial lexical meanings is:
85
Adding these results to the tree we have:
86
• = the unique object y such that (y is a man and y ∈ bald) = > killed
Belicoff
We now have a fully decorated tree:
The truth value of the whole sentence The bald man killed Belicoff
is thus the same as the truth value of the claim that there is a unique
object which is both a man and a member of the set bald, and which
killed Belicoff. That truth value will be > if there is a unique bald man
and he killed Belicoff, ⊥ if there is a unique bald man and he did not kill
Belicoff, and undefined if there is no unique bald man.
2. Consider the sentence:
• The large mouse squeaked and the small elephant trumpeted.
87
squeaked and
the
large mouse
trumpeted
the
small elephant
88
λx.x squeaked
squeaked λx.λy.y ∧ x
small
First combine the two adjectives each with the corresponding common
noun:
(a) ~large mouse:
• ~large(~mouse)
Σz:x(z)=> size(z)
• = λx.λy.x(y)∧ size(y) > |{z:x(z)=>}| (λx.x is a mouse)
Σz:λx.x is a mouse(z)=> size(z)
• = λy.(λx.x is a mouse)(y)∧ size(y) >
|{z:λx.x is a mouse(z)=>}|
• λx.x is a mouse(z) = z is a mouse; λx.x is a mouse(y) = y is a
mouse
Σ size(z)
• So we simplify to: λy.y is a mouse∧ size(y) > z:z is a mouse=>
|{z:z is a mouse=>}|
• But the sentence:
– z is a mouse = >
is equivalent to the simpler:
– z is a mouse
• So we can replace the former with the latter to obtain:
Σ size(z)
– λy.y is a mouse∧size(y) > z:z is a mouse
|{z:z is a mouse}|
(b) ~small elephant
• ~small(~elephant)
Σz:x(z)=> size(z)
• = λx.λy.x(y)∧ size(y) < |{z:x(z)=>}| (λx.x
is an elephant)
Σz:λx.x is an elephant(z)=> size(z)
• = λy.(λx.x is an elephant)(y)∧ size(y) <
|{z:λx.x is an elephant(z)=>}|
• λx.x is an elephant(z) = z is an elephant; λx.x is an elephant(y) =
y is an elephant
Σz:z is an elephant=> size(z)
• So we simplify to: λy.y is an elephant∧ size(y) <
|{z:z is an elephant=>}|
89
• But the sentence:
– z is an elephant = >
is equivalent to the simpler:
– z is an elephant
• So we can replace the former with the latter to obtain:
Σz:z is an elephant size(z)
– λy.y is an elephant∧size(y) <
|{z:z is an elephant}|
λx.x squeaked
squeaked λx.λy.y ∧ x
small
Next the two phrases large mouse and small elephant can each be
combined with the:
(a) ~the large mouse
• ~the(~large mouse)
• = λx.(the unique object y such that x(y) = >)(λy.y is a mouse∧size(y)
Σ size(z)
> z:z is a mouse )
|{z:z is a mouse}|
• = λx.(the unique object y such that x(y) = >)(λw.w is a mouse∧size(w)
Σ size(z)
> z:z is a mouse ) [to avoid variable clash]
|{z:z is a mouse}|
• = the unique object y such that λw.(w is a mouse∧size(w) >
Σz:z is a mouse size(z)
)(y) = >)
|{z:z is a mouse}|
Σ size(z)
• = the unique object y such that y is a mouse∧ size(y) > z:z is a mouse =
|{z:z is a mouse}|
>
Σ size(z)
• = the unique object y such that y is a mouse∧ size(y) > z:z is a mouse
|{z:z is a mouse}|
(b) ~the small elephant
90
• ~the(~small elephant)
• = λx.(the unique object y such that x(y) = >)(λy.y is an elephant∧size(y)
Σz:z is an elephant size(z)
> )
|{z:z is an elephant}|
• = λx.(the unique object y such that x(y) = >)(λw.w is an elephant∧size(w)
Σz:z is an elephant size(z)
> ) [to avoid variable clash]
|{z:z is an elephant}|
• = the unique object y such that λw.(w is an elephant∧size(w)
Σz:z is an elephant size(z)
> )(y) = >)
|{z:z is an elephant}|
Σz:z is an elephant size(z)
• = the unique object y such that y is an elephant∧ size(y) > =
|{z:z is an elephant}|
>
Σz:z is an elephant size(z)
• = the unique object y such that y is an elephant∧ size(y) >
|{z:z is an elephant}|
small
Next we combine the two noun phrases the large mouse and the small
elephant with their respective intransitive verbs squeaked and trumpeted:
(a) ~the large mouse squeaked
• = ~squeaked(~the large mouse)
• = (λx.x squeaked)(the unique object y such that y is a mouse∧
Σ size(z)
size(y) > z:z is a mouse )
|{z:z is a mouse}|
91
Σ size(z)
• = the unique object y such that y is a mouse∧ size(y) > z:z is a mouse
|{z:z is a mouse}|
squeaked
(b) ~the small elephant trumpeted
• = ~trumpeted(~the small elephant)
• = (λx.x trumpeted)(the unique object y such that y is an elephant∧
Σz:z is an elephant size(z)
size(y) > )
|{z:z is an elephant}|
Σz:z is an elephant size(z)
• = the unique object y such that y is an elephant∧ size(y) >
|{z:z is an elephant}|
trumpeted
And we add these two semantic values to the tree:
small
The last two steps are straightforward. First we combine ~the small
elephant trumpeted with ~and:
• ~and the small elephant trumpeted
• = ~and(~the small elephant trumpeted)
• = λx.λy.y ∧ x(the unique object y such that y is an elephant∧ size(y)
Σz:z is an elephant size(z)
> trumpeted)
|{z:z is an elephant}|
92
• = λw.(w∧ the unique object y such that y is an elephant∧ size(y)
Σz:z is an elephant size(z)
> trumpeted)
|{z:z is an elephant}|
Then we combine ~the large mouse squeaked with ~and the small
elephant trumpeted:
• ~the large mouse squeaked and the small elephant trumpeted
• = ~and the small elephant trumpeted(~the large mouse squeaked)
• = λw.(w∧ the unique object y such that y is an elephant∧ size(y) >
Σz:z is an elephant size(z)
trumpeted)(the unique object y such that y is a mouse∧
|{z:z is an elephant}|
Σ size(z)
size(y) > z:z is a mouse squeaked)
|{z:z is a mouse}|
Σ size(z)
• = the unique object y such that y is a mouse∧ size(y) > z:z is a mouse
|{z:z is a mouse}|
squeaked ∧ the unique object y such that y is an elephant∧ size(y)
Σz:z is an elephant size(z)
> trumpeted
|{z:z is an elephant}|
Adding these semantic values to our tree we get our full analysis:
the unique object y such that λw.(w∧ the unique object y such that
y is a mouse∧ size(y) > y is an elephant∧ size(y) >
Σz:z is a mouse size(z) Σz:z is an elephant size(z)
squeaked trumpeted)
|{z:z is a mouse}| |{z:z is an elephant}|
large
small
93
3. Let’s do one more example. Consider the sentence:
• The tiger behind the tree roared.
Suppose we have the following syntactic structure:
roared
the
tiger
behind
the tree
The preposition behind should be of type (e, ((e, t), (e, t)) to make the
typing work:
e (e, t)
roared
((e, t), e) (e, t)
the
(e, t) ((e, t), (e, t))
tiger
(e, ((e, t), (e, t)) e
the tree
94
We then calculate ~the tree in the usual way:
• ~the ree
• = ~the(~tree)
• = λx.(the unique object y such that x(y) = >)(λx.x is a tree)
• = the unique object y such that λx.x is a tree(y) = >
• = the unique object y such that y is a tree = >
• = the unique object y such that y is a tree
Next we calculate ~behind the tree:
95
• = the unique object y such that y is a tiger ∧ y is behind the unique
object w such that w is a tree
And finally we calculate ~the tiger behind the tree roared:
• ~the tiger behind the tree roared
• = ~roared(~the tiger behind the tree)
• = λx.x roared(the unique object y such that y is a tiger ∧ y is behind
the unique object w such that w is a tree)
• = the unique object y such that y is a tiger ∧ y is behind the unique
object w such that w is a tree roared
the unique object y such that y is a tiger ∧ y is behind the unique object w such that w is a tree roared
the unique object y such that y is a tiger ∧ y is behind the unique object w such that w is a tree λx.x roared
roared
λx.(the unique object y such that x(y) = >) λz.z is a tiger ∧ z is behind the unique object y such that y is a tree
the
λx.x is a tiger λw.λz.w(z) ∧ z is behind the unique object y such that y is a tree
tiger
behind
the tree
96
4. Calculate the semantic values up the tree by combining lambda
terms.
Then test your theory by seeing what it predicts for:
• The very very tall man laughed.
Does it matter which of the following tree structures is given to very
very tall:
1.
very
very tall
2.
tall
very very
Are your predicted truth conditions for The very very tall man
laughed reasonable?
Problem 112: Combine your proposed value for ~very from the
previous term with ~bald to calculate ~very bald. Does the com-
bination make sense? Should it make sense?
We’ve been working throughout on the assumption that syntactic trees are al-
ways binary branching: each parent node has exactly two children node, Binary
branching is a syntactic assumption, but we haven;’t given any direct syntactic
argument in favor of it. (There has been important syntactic work arguing in
favor of binary branching, though.) Instead, our reliance on binary branching
has been driven by the fact that binary branching trees let us run a semantic
theory on which the semantic values of complex expressions are always deter-
mined by Functional Application. When all the branches in our trees have the
form:
β γ
we can have the general principle ~α = ~β(~γ) or ~γ(~β). But if we have a
trinary branching node:
β γ δ
97
we can’t give a straightforward functional application story. With two child
nodes, we can use one as function and one as argument, but with three child
nodes, we have more nodes than we have roles to distribute.
However, there are expressions that at least look like they ought to have a non-
binary structure. Two examples we’ve already encountered are ditransitive
verbs and sentential connectives like conjunction. With ditransitive verbs, it’s
tempting to think that direct and indirect object both link to the verb at a single
triple-branching node:
The rat
1.
The rat
gave
the villagers the plague
2.
The rat
the plague
gave the villagers
Similarly, it’s tempting to think that when two sentences are joined by and, they
join with and at a single triple-branching node:
and
Aristotle laughed Socrates smiled
98
This ternary branching structure seems to capture the symmetry of conjunctions
in a way that’s lost if we have to choose between the two available binary
branching constructions:
1.
2.
the
cold snap
froze solid
the lake
And again the symmetry of the ternary structure avoids the need to impose an
unsatisfactory asymmetry by associating either the verb with the direct object
or the direct object with the resulting state specification:
1.
the
cold snap solid
froze
the lake
99
2.
the
cold snap froze
solid
the lake
2 + 3
2. 12
5 7
+
2 + 3 3 + 4
If we want to start using functions of more than one argument, we need to adjust
our type theory. In our current notation, a type (α, β) contains functions from
members of type α to members of type β. The notation thus presupposes that
we use only functions of one argument. For functions of multiple arguments,
we will use the notation:
100
• (hα1 , . . . , αn i, β)
to name the type of n-place functions that take as inputs members of the types
α1 through αn , and produce as output a member of type β.
For example, (he, ei, t) is the type of two-place functions that take two objects
(two members of type e) as input and produce a truth value (a member of type
t) as output. If type e contains two objects a and b, then one member of type
(he, ei, t) is the two-place function:
a, a → >
a, b → ⊥
•
b, a → >
b, b → >
101
naming an n-place function of type (hτ1 , . . . , τn i, σ).
Problem 116: Suppose that f and g are both functions from the real
numbers to the real numbers, such that:
1. f (x) = λx.F(x)
2. g(x) = λx.G(x)
We can then define a two-place function h(x, y) that maps two real
numbers to a real number by setting h(x, y) = f (x) + g(y). Write a
lambda term for the two-place h(x, y) function.
Problem 117: Write a lambda term for a function that takes as input
a pair of integers and produces as output a new function that itself
takes as input a pair of integers and produces as output > if the two
input integers are both strictly between the earlier pair of integers
and ⊥ if the two input integers are not both strictly between the
earlier pair of integers. What is the type of this lambda term?
Once we write out lambda terms (in this new notation) for functions of multiple
arguments, it should be clear how multiple-argument functions are connected
to single-argument functions via Schönfinkelization. Consider two ways to
think about addition:
1. We can treat addition as a two-place function of type (he, ei, e). In this
case, we have:
• ~+ = λhx, yi.x + y
2. We can treat addition as a one-place function from a number to a one-
place function from a number to the sum of those two numbers. We thus
treat addition as being of type (e, (e, e)), and have:
• ~+ = λx.λy.x + y
The second approach is just the Schönfinkelization of the first approach. And
there is a simple algorithm for Schönfinkeling a lambda term involving the
h·i bracket notation for multiple-argument functions: we simply remove the
brackets and add lambdas for each variable. The general form is:
• The Schönfinkelization of λhx, yi.E is λx.λy.E.
102
Problem 120: Explain how we can use Schönfinkeliation twice to
move between the three-place function:
• λhx, y, zi.E(x, y, z)
and the single-argument (type (e, (e, (e, e)))) function:
• λx.λy.λz.E(x, y, z)
Consider both directions of transition (from three-place function
to single-argument function and from single-argument function to
three-place function). If you are careful about the details, you will
notice that we need some assumptions about the relation between
ha, hb, cii and ha, b, ci. Try to state the needed assumptions as clearly
as possible. Is there a way to think about what ordered pairs and
ordered triples are that will help justify those assumptions?
Problem 121: For each of the following lambda terms using the
multi-argument function notation, give an equivalent Schönfinkeled
expression using only functions of a single argument.
1. λhx, yi.x y
x+y
2. λhx, y, zi. z
3. λhx, y, z, wi.x admires y more than w admires z
4. λhx, yi.λz.(x + y) − z
5. λhxe , ye i.λz(he, ei,e) .z(x, y)
Problem 122: For each of the following lambda terms using single-
valued functions, give an equivalent un-Schönfinkeled expression
using functions of multiple arguments.
1. λx.λy.y − x
2. λx.λy.λz.z > yx
3. λx.λy.y admires x
4. λx.λy.λz.z(y(x))
103
AB Truth-Functional Connectives
We noted earlier that the natural syntactic structure for a connective like and
makes it part of a trinary branching structure:
and
Aristotle laughed Socrates smiled
Now that we can incorporate multiple-argument functions in our semantics,
we can respect that natural trinary structure for conjunction. We want ~and to
be a function that takes two truth values as input (in this case, the truth values of
Aristotle laughed and Socrates smiled) and produces a single truth value
as output:
A B A and B
> > >
> ⊥ ⊥
⊥ > ⊥
⊥ ⊥ ⊥
Or in an alternative truth table format:
and > ⊥
> > ⊥
⊥ ⊥ ⊥
In any of these formats, we are specifying the function that takes two truth
values as input, and produces > as output when the two inputs are both >, but
produces ⊥ as output if either output is ⊥.
104
There are 16 different functions in type (ht, ti, t). (A function in type (ht, ti, t)
takes a pair of truth values as input. There are four pairs of truth values. Each
input pair can be mapped to one of two truth values as output. Thus there are
24 = 16 functions available.) ~and is thus one of 16 members of its type. We
can easily make a list of all 16 members:
A B 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
> > > > > > > > > > ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥
> ⊥ > > > > ⊥ ⊥ ⊥ ⊥ > > > > ⊥ ⊥ ⊥ ⊥
⊥ > > > ⊥ ⊥ > > ⊥ ⊥ > > ⊥ ⊥ > > ⊥ ⊥
⊥ ⊥ > ⊥ > ⊥ > ⊥ > ⊥ > ⊥ > ⊥ > ⊥ > ⊥
~And is item 8 on this list. Other English connectives can also be modelled
using items from the list. Consider or. There are two plausible semantic values
for or among the 16 members of (ht, ti, t):
1. Inclusive Or: Item 2 on the list provides one plausible semantic value for
or:
A B A or B
> > >
> ⊥ >
⊥ > >
⊥ ⊥ ⊥
Both Inclusive Or and Exclusive Or agree that the truth of one disjunct is
sufficient for the truth of a disjunction. They disagree on the question of
whether the truth of the disjunction requires the truth of exactly one disjunct:
Exclusive Or imposes the exactness requirement, while Inclusive Or allows
a disjunction to be true when both disjuncts are true. There is longstanding
disagreement over whether the English word or expresses Inclusive Or or
Exclusive Or. Perhaps there are two homophonic and homographic words in
English, one for Inclusive Or and one for Exclusive Or. We could write them,
for disambiguation purposes, as ior and xor. But we will, as our default
assumption, treat English or as inclusive.
105
full sentences, but instead joins noun phrases, verb phrases, or other
subsentential constituents. We’ll return to this feature of connectives
later.)
1. None of the students did the Kant reading or the Hegel
reading.
2. You may have soup or salad with your dinner.
3. If you can run a mile in under five minutes or run a 5k
race in under seventeen minutes, you can run a marathon
in under three hours.
4. I doubt you’ll enjoy the apple pie or the peach cobbler.
5. The department administrator or the librarian can help
you order that book.
Consider the hypothesis that the either ...or construction is the
marker of exclusive disjunction. Does this seem right? Present some
data either for or against the hypothesis.
Another Engish connective that can be modelled using the members of type (ht,
ti, t) is if. Here we can use item 5 on our chart of options, so that:
• ~If is given by the truth table:
A B if A, B
> > >
> ⊥ ⊥
⊥ > >
⊥ ⊥ >
Or equivalently:
if > ⊥
> > ⊥
⊥ > >
(Note that if, unlike and and or, is non-commutative. A and B is equiv-
alent to B and A, and A or B is equivalent to B or A, but if A, B is not
106
equivalent to if B, A. So when we give the truth table in the alterna-
tive rectangular form, we need to be clear about how row and column
correspond to the two positions in the if construction. We have put the
antecedent (the A position in if A, B) on the column, and the consequent
(the B position in If A, B on the row.)
Problem 126: Using the truth table above for ~if, which of the
following claims about the logic of if sentences is correct?
1. A; if A,B B
2. B; if A,B A
3. if A,B; if B,C if A,C
4. if A,B; not B not A
5. B if A,B
6. A if A,B
7. not A if A,B
8. if A,if B,C if A and B,C
9. if A,C if A,B and if A,C
10. if A,if A,B if A,B
11. if A,B
12. if A,B if A,(B and C)
13. if A,B if A,(B or C)
14. if A,B If (A or C), B
15. if A,B or if B,A
107
2. but
3. only if
4. even if
5. whenever
6. because
7. nor
We noted earlier that ambiguous sentences often admit of more than one syn-
tactic analysis. We gave as an example the sentence Very old men and women
are happy, which has the two trees:
1.
are happy
and women
men
very old
2.
are happy
very old
men and women
But it remains to be seen whether the different trees that we assign to an am-
biguous sentence can then produce different meanings for that sentence that
correspond to the disambiguated readings of the sentence.
Unfortunately, we can’t give a semantic analysis to any version of Very old men
and women are happy yet. We don’t have an analysis of very, we don’t know
what to do with plural nouns like men and women, we don’t have a workable
treatment of predicative adjectives like happy in the verb phrase are happy,
and we don’t have a treatment of and that allows it to join things other than
sentences. (So, more or less, there are no parts of this sentence that our current
theory can handle.) But we can deal with a simpler case.
108
• Socrates smiled and Plato pouted or Aristotle arrived.
Assume that both and and or give rise to trinary branching trees using the
phrase structure rule:
• S → S CONN S
CONN S
or NP VP
S CONN S
NAME IV
NP VP and NP VP
Aristotle arrived
NAME IV NAME IV
2. Tree 2:
S
S CONN
NP VP and
S CONN S
NAME IV
NP VP or NP VP
Socrates smiled
NAME IV NAME IV
Plato pouted Aristotle arrived
109
We’ve had a syntactic theory that assigned these two trees for a while now,
but now we have a semantic theory that can interpret both trees. We’ll use the
following lexicon:
• ~Socrates = Socrates
• ~Plato = Plato
• ~Aristotle = Aristotle
• ~smiled = λx.x smiled
• ~pouted = λx.x pouted
We can then add these semantic values to the tree and add semantic types for
each node. (We’ll remove the syntactic categories for simplicity, and also strip
off the non-branching nodes):
1. Tree 1:
t
t (ht, ti, t) t
~or
>, >
→ >
e (e, t)
>, ⊥ → >
⊥, > → > ~Aristotle ~arrived
⊥, ⊥ → ⊥
Aristotle λx.x arrived
t (ht, ti, t) t
~and
e (e, t) e (e, t)
>, > → >
>, ⊥ → ⊥
~Socrates ~smiled ⊥, >
→ ⊥
~Plato ~pouted
λx.x smiled λx.x pouted
Socrates
⊥, ⊥ → ⊥
Plato
2. Tree 2:
110
t
t (ht, ti, t) t
~and
e (e, t) >, >
→ >
>, ⊥ → ⊥
~Socrates ~smiled ⊥, >
→ ⊥
λx.x smiled
Socrates
⊥, ⊥ → ⊥
t (ht, ti, t) t
~or
e (e, t) e (e, t)
>, > → >
>, ⊥ → >
~Plato ~pouted ⊥, >
→ >
~Aristotle ~arrived
λx.x pouted
Plato
⊥, ⊥ → ⊥
Aristotle λx.x arrived
Let’s then assume that Socrates doesn’t smile, Plato does not pout, and Aristotle
does arrive. We then have:
1. ~Socrates smiled = ~smiled(~Socrates) = (λx.x smiled)(Socrates) =
⊥
1. Tree 1:
t
t (ht, ti, t) t
~or >
>, > → >
t (ht, ti, t) t >, ⊥ →
>
⊥, > → >
⊥ ~and ⊥
⊥, ⊥ → ⊥
>, > → >
>, ⊥ → ⊥
⊥, > → ⊥
⊥, ⊥ → ⊥
2. Tree 2:
111
t ]
t (ht, ti, t) t
⊥ ~and
>, > → >
>, ⊥ →
t (ht, ti, t) t
⊥
⊥, > → ⊥
⊥ ~or >
⊥, ⊥ → ⊥
>, > → >
>, ⊥ → >
⊥, > → >
⊥, ⊥ → ⊥
t (ht, ti, t) t
t (ht, ti, t) t
⊥ ~or >
>, > → >
>, ⊥ → >
⊥, > → >
⊥, ⊥ → ⊥
112
2. Tree 2:
t ]
t (ht, ti, t) t
t ]
t (ht, ti, t) t
⊥ ~and >
>, > → >
>, ⊥ → ⊥
⊥, > → ⊥
⊥, ⊥ → ⊥
Tree 1 is true, more generally, whenever either (i) both Socrates smiled and
Plato pouted, or (ii) Aristotle arrived. And Tree 2 is true whenever both
(i) Socrates smiled, and (ii) either Plato pouted or Aristotle arrived. That
113
matches the two natural readings of the ambiguous sentence Socrates smiled
and Plato pouted or Aristotle arrived, so our semantic theory provides
a good analysis.
1. It made intuitive sense, given that there is a specific individual picked out
by the king of Nroway.
2. It made definite descriptions of the same semantic type as proper names,
which are also type e, thus explaining why names and definite descrip-
tions can appear in the same syntactic positions.
However, there are other reasons to be unhappy with typing the king of
Belgium as e. Consider a range of syntactically similar expressions:
• the king
• a king
• some king
114
• every king
• each king
• no king
• most kings
• few kings
• many kings
• all but one king
115
At best, then, Karl both laughs and doesn’t laugh – a peculiar kind of king
(or any other object). At worst, we’ve got an outright contradiction in our
theory, since the ~laughs function needs to map Karl both to > and to ⊥.
So if phrases of the form some N are of type e and pick out objects, those
must be strange objects. They are what are sometimes called glutty
objects – objects that have too many properties. Whatever object some
tiger picks out, it needs to be an object that is hungry and not hungry, in
Africa and in Asia, and also not in Africa and not in Asia.
Suppose that every king is of type e. Then again there is some specific
entity that it picks out – call that entity Keegan. We can similarly learn
things about Keegan.
Suppose no king is of type e. Then once more there is some specific entity
that it picks out – call that entity Kieran. Consider some truths involving
the expression n king:
116
• No king lives on the moon
• No king is a giraffe
• No king is a prime number
Then ~lives on the moon, ~is a giraffe, and ~is a prime number
must all map Kieran to >. If no king picks out an entity, it’s a moon-
dwelling entity that’s simultaneously a long-necked mammal and an ab-
stract mathematical object. Furthermore, notice that the following claim
is false:
• No king exists
We need an alternative to the default view that some king, every king, and so
on are of type e. Consider the basic typing puzzle:
? (e, t)
? (e, t) laughs
some king
If ~laughs is going to take ~some king as argument, then ~some king needs
to be of type e, which we want to avoid. But there is another option: ~some
king could take ~laughs as argument. In that case, ~some king would need
to be of type ((e, t), t), so that it could take the type (e, t) ~laughs as input and
produce type t for the whole sentence as output:
? (e, t) laughs
some king
(~some would then be of type (((e, t), ((e, t), t)), but we’ll come back to that later.)
117
What does type ((e, t), t) look like? These are functions from functions from
objects to truth values, to truth values.
But we can also think of sets as being basically equivalent to properties. The set
of all red things – {x : x is red} – plays more or less the same role as the property
of being red. So we can also think of members of (e, t) as being properties.
In that case, ((e, t), t) contains functions from properties to truth values. But
once again we can think of those functions as being characteristic functions of
sets, so that ((e, t), t) can be thought of as containing sets of properties. And
once again sets and properties are roughly equivalent, so we can think of ((e, t),
t) as containing properties of properties. Properties of properties are sometimes
called second-order properties. (As opposed to properties of objects, which
are called first-order properties.)
Equivalently, we can also think of things in type ((e, t), t) as being sets of sets
of things in type e. Suppose f is a function of type ((e, t), t). Then f↓ is the
corresponding set – the extension of f , or the set whose characteristic function
is f . f↓ is thus a set of things in (e, t). But since the members of f↓ are themselves
characteristic functions, we can also talk about f↓↓ , which will be the set of the
sets that are the extensions of the characteristic functions in f↓ .
118
For each of the following second-order properties, give an object that
has a first-order property that has the given second-order property.
1. Is a property.
2. Is a property had by more people in the northern hemisphere
than in the southern hemisphere.
3. Is a property that the Eiffel Tower doesn’t have.
119
• = λx.{y : x(y) = >} ∩ {y : y is a king} , ∅(λx.x laughs)
• = {y : λx.x laughs(y) = >} ∩ {y : y is a king} , ∅
• = {y : y laughs = >} ∩ {y : y is a king} , ∅
• = {y : y laughs} ∩ {y : y is a king} , ∅
Notice that we don’t end up with the kind of very simple truth conditions we’re
used to from earlier cases:
• ~Some king laughs = some king laughs
We can analyze every king using the same tools. Again we ask what second-
order property every king should express such that Every king Fs is true if
the F property has that second-order property and false if the F property lacks
that second-order property. What we want is:
• Being a property that is had by every king
Or equivalently:
• Being a property such that the set of things having the property of being
a king is a subset of the set of things having that property.
We can then give a lambda term for an ((e, t), t) function corresponding to that
second-order property:
• λx.{y : y is a king} ⊆ {y : x(y) = >}
120
• = λx.{y : y is a king} ⊆ {y : x(y) = >}(λx.x laughs)
• = {y : y is a king} ⊆ {y : λx.x ilaughs(y) = >}
• = {y : y is a king} ⊆ {y : y laughs = >}
• = {y : y is a king} ⊆ {y : y laughs}
Again a little thought will show that these are appropriate truth conditions for
Every king laughs.
Problem 138: Give a suitable lambda term for no king. Then verify
that it produces appropriate truth conditions by calculating ~No
king laughs.
Problem 139: Give a suitable lambda term for most kings. Then
verify that it produces appropriate truth conditions by calculating
~Most kings laughs. (Most kings is trickier than no king, and
you may need to make some somewhat arbitrary decisions about
exactly what is required for the truth of Most kings laugh.)
121
Problem 142: To combine with the type (e, t) king to produce a type
((e, t), t) some king, some needs to be of type ((e, t), ((e, t), t)). Give
an appropriate lambda term for a type ((e, t), ((e, t), t)) function to
use as ~some.
Suppose that e contains exactly four objects. We’ll call them a, b, c, and d. Then:
1. (e, t) contains 24 = 16 items. To specify a function from e to t, we need
for each input from e to specify an output in t. There are two things in t
(> and ⊥), so for each object in e we. have two choices about where to
map it. There are four objects in e, so we have a total of 2 · 2 · 2 · 2 · 2 = 24
choices for how to specify a function.
2. ((e, t), t) contains 216 = 65536 items. Since (e, t) contains 16 items, a
function in ((e, t), t) needs for each of those 16 inputs to pick one of two
possible outputs in t. Thus we make 16 independent choices from two
options, which gives a total of 216 functions.
To help get a better sense of what the contents of ((e, t), t) are like, we’ll start
with a picture of (e, t). The sixteen members of (e, t) can, as usual, be thought
of as subsets of e = {a, b, c, d}. Here’s a diagram showing those sixteen subsets:
122
abcd
cd ac bc ad ac ab
d c b a
∅
Members of ((e, t), t) can then be thought of as characteristic functions of subsets
of those 16 subsets of e. The following diagram, for example, picks out three
members of ((e, t), t):
abcd
cd bd bc ad ac ab
d c b a
Inside the blue line is the subset {{a, c, d}, {a, b, d}, {b, c}}, which corresponds to
the ((e, t), t) function that maps the following three (e, t) functions to >:
123
a → >
b → ⊥
1.
c → >
d → >
a → ⊥
b → >
2.
c → >
d → ⊥
a → >
b → >
3.
c → ⊥
d → >
Each of the 65536 members of the ((e, t), t) category corresponds to some subset
of the sixteen members of (e, t) shown in the diagram above. Suppose that of
the four members a, b, c, and d of e, a, b, and c are kings, but d is not. How
should we mark ~some king on the diagram?
abcd
cd bd bc ad ac ab
d c b a
∅
~Some king maps an (e, t) function to > just in case that (e, t) function picks
out a set with a non-empty intersection with the set of kings. Given that a, b,
and c are all kings, any subset of e that contains at least one of a, b, and c has a
non-empty intersection with the set of kings. Thus only {d} and ∅ fail to overlap
the set of kings, and hence fail to be mapped to > by ~some king.
124
Next consider ~every king. ~Every king maps an (e, t) function to > just in
case that function picks out a set that contains the set of linguists as a subset.
That gives us:
abcd
cd bd bc ad ac ab
d c b a
∅
Finally, consider ~no king. ~No king is given by the following lambda term:
• ~no king = λx.λy.{y : x(y) = >} ∩ {y : y is a king} = ∅
Given that a, b, and c are kings, ~no king maps to > only subsets of e that
contain none of a, b, and c. Thus:
abcd
cd bd bc ad ac ab
d c b a
125
Problem 143: Suppose again that type e has four objects a, b, c, and
d, and suppose that c and d are linguists. In a diagram similar to the
ones used above, mark appropriate subsets for:
1. ~some linguist
2. ~every linguist
3. ~no linguist
Then we can mark ~some king (in blue), ~some linguist (in red), ~some
philosopher (in green), and ~some person (in purple) on our diagram:
126
abcd
cd bd bc ad ac ab
d c b a
∅
All four of these semantic values are upward closed:
• Let X be of type ((e, t), t). X is upward closed if given any Y and Z of type
(e, t) such that Y↓ is a subset of Z↓ , if Y↓ ∈ X↓↓ , then Z↓ ∈ X↓↓ .
• {a}, {b}, {c}, {a, b}, {a, c}, {a, d}, {b, c}, {b, d}, {c, d}, {a, b, c}, {a, b, d}, {a, c, d}, {b, c, d}, {a, b, c, d}
Notice that the small set {a} is in ~some king, and so are all of the
expansion of it – {a, b}, {a, c}, {a, d}, {a, b, c}, {a, b, d}, {a, c, d}, and {a, b, c, d}.
That’s what upward closure requires. Similarly the small set {b} is in
~some king, and so is every expansion of it. And the small set {c} is in
~some king, as is every expansion of it.
Problem 145: Prove that, once we’ve decided what objects are
in type e, there is only one upward closed member of ((e, t), t)
that contains ∅.
Problem 146: Suppose f is some function in ((e, t), t), and f↓↓
is the corresponding set of sets of entities. Call a set X in f↓↓
minimal if there is no Y in f↓↓ such that Y ⊂ X.
Suppose that for each minimal set X in f↓↓ , for every Y such that
X ⊆ Y, Y is also in f↓↓ . Prove that f is then upward closed.
127
2. ~Some philosopher is upward closed. Using the results of the previous
problem, since a is the only philosopher, {a} is the only minimal set in ~some
philosopher. And every expansion of {a} is also in ~some philosopher
– are all within the green region.
It’s easy to find members of ((e, t), t) that are not upward closed. Thus consider:
abcd
cd bd bc ad ac ab
d c b a
1. The blue set {{c}, {d}, {b, c}, {b, d}} is not upward closed. For example, it
contains {c}, and {c} is a subset of {a, b, c, d}, but it does not contain {a, b, c, d}.
2. The green set {{b}, {a, c}} is not upward closed. For example, it contains
{b}, and {b} is a subset of {a, b, c, d}, but it does not contain {a, b, c, d}.
3. The red set {{a, d}, {a, b, d}, {a, b, c, d}} is not upward closed. It contains {a, d},
and {a, d} is a subset of {a, c, d}, but it does not contain {a, c, d}.
4. The purple set {∅, {a}, {b, d}, {c, d}, {a, b, c}, {a, b, d}} (which is scattered in mul-
tiple bubbles in the diagram) is not upward closed. For example, it con-
tains ∅, and ∅ is (trivially) a subset of {c}, but it does not contain {c}.
128
AG Sources of Upward Closure
We’ve now seen (i) that all of ~some king, ~some linguist, ~some philosopher,
~some people are upward closed (in our little model with four people), and
that (ii) not all members of ((e, t), t) are upward closed. (Indeed, if you’ve solved
the previous problem, you’ve seen that upward closed members of ((e, t), t) are
quite rare. It would be nice to have an explanation for why all of these some
phrases are upward closed, and perhaps some insight into whether phrases of
the form some N are always upward closed.
As a first step, we need a semantic value for some. The typing of some can be
settled easily. The basic constraints are given in a simple tree:
((e, t), t)
(e, t)
some
N
Some thus needs to take an input of type (e, t) and produce an output of type
((e, t), t), so it must be of type ((e, t), ((e, t), t)).
Next we need an appropriate lambda term for the specific semantic value of
some in type ((e, t), ((e, t), t)). Consider first a range of some X semantic values:
129
• = λx.{y : λw.w is a king(y) = >} ∩ {y : z(y) = >} , ∅
• = λx.{y : y is a king= >} ∩ {y : z(y) = >} , ∅
• = λx.{y : y is a king} ∩ {y : z(y) = >} , ∅
Problem 149: Give lambda terms (of type ((e, t), ((e, t), t))) for each
of the following determiners:
• every
• no
• most
• only
Using ~some = λz.λx.{y : x(y) = >} ∩ {y : z(y) = >} , ∅, we can then prove that
some N is upward closed for any noun N:
Proof: Take some noun N, and let N be the set ~N↓ (that is, the set
of objects of which N is true). Then ~some N is the set of sets of
entities that have a non-empty intersection with N. (More carefully,
~some N↓↓ is that set of sets, but we’ll speak loosely.) Now suppose
X is in ~some N and Y is some set of entities such that X ⊆ Y.
Since X ∈~some N, X has a non-empty intersection with N. But
since X ⊆ Y, everything that is in X is also in Y, so Y also has a
non-empty intersection with N. But then Y ∈~some N. Thus ~some
N is upward closed.
Some phrases are not the only ones that are upward closed. Recall our earlier
setup:
• a, b, and c are kings.
Here are ~every king (in blue), ~every linguist (in red), ~every philosopher
(in green), and ~every person (in purple):
130
abcd
cd bd bc ad ac ab
d c b a
∅
Examining the diagram shows that each one of these collections is upward
closed. We can see that every N phrases are always upward closed by giving
an appropriate semantic value for every:
• ~every = λz.λx.{y : z(y) = >} ⊆ {y : x(y) = >}
Using this semantic value, we can then prove that every N is upward closed
for any noun N:
Proof: Take some noun N, and let N be the set ~N↓ . Then ~every N
is the set of all supersets of N – that is, the set of all sets of which N is a
subset. Suppose X is in ~every N and Y is some set of entities such
that X ⊆ Y. Because X ∈~every N, we know N ⊆ X. Combining
that with X ⊆ Y, we conclude N ⊆ Y. But then Y ∈~every N. Thus
~every N is upward closed.
131
The upward closed nature of some king and every king has a helpful linguistic
consequence. Consider the following two verb phrases:
• owns a car
• owns a red car
Anyone who owns a red car owns a car. Thus the set of red car owners is a
subset of the set of car owners. That is, ~owns a red car↓ ⊆ ~owns a car↓ .
(We don’t actually have the tools yet to calculate ~owns a car from its compo-
nent parts, but just by understanding the language we can see that this subset
relation must be correct.)
Because some king is upward closed, if ~owns a red car↓ is in ~some king↓↓ ,
then ~owns a car↓ is in ~some king↓↓ . Therefore:
• Some king owns a red car logically implies ~Some king owns a car.
If Some king owns a red car is true, then Some king owns a car is
also true.
The same holds for every:
• Every king owns a red car logically implies Every king owns a car.
But this doesn’t work for all noun phrases. Consider:
132
1. no king
2. all but one king
3. most kings
4. many kings
5. few kings
6. only kings
7. exactly two kings
8. Finitely many kings
9. A prime number of kings
10. Usually kings
133
Which of the following determiners are left monotone up?
1. no
2. all but one
3. most
4. many
5. few
6. only
7. exactly two
8. Finitely many
9. A prime number of
10. Usually
Earlier we marked ~no king on our diagram. Looking back at the diagram,
we can immediate see that ~no king is not upward closed. ~No king con-
tains the sets ∅ and {d}, but fails to contain lots of larger sets of which these
two sets are subsets. And this isn’t an accidental feature of ~no king. Con-
sider a diagram marking all of ~no king (in blue), ~no linguist (in red), ~no
philosopher (in green), and ~no person (in purple):
abcd
cd bd bc ad ac ab
d c b a
Every one of these sets fails to be upward closed. No phrases are systematically
134
not upward closed. That’s not surprising given our diagnostic. The inference
from:
• No N owns a red car
to:
• No N owns a car.
won’t be valid for any choice of N. There is always the possibility that some N
owns a car that is not red.
But there is another interesting pattern among the no phrases. All of them are
downward closed:
• Let X be of type ((e, t), t). X is downward closed if given any Y and Z of
type (e, t) such that Y↓ is a subset of Z↓ , if Z↓ ∈ X↓↓ , then Y↓ ∈ X↓↓ .
Roughly: an ((e, t), t) expression is downward closed just in case whenever it
maps a subset of e to >, it also maps any smaller subset of e to >.
Proof: Take some noun N, and let N be the set ~N↓ . Then ~no N
is the set of sets that are disjoint from N – whose intersection with
N is empty. Suppose X is in ~no N and Y is a subset of X. Then
X ∩ N = ∅. But since Y ⊆ X, we also have Y ∩ N = ∅. Thus Y is in
~no N, and ~no N is downward closed.
We can also apply the car//red car diagnostic to downward closed expressions,
but in a slightly different way. We’ve already noted that:
• No king owns a red car.
does not imply:
135
Diagnostic: A noun phrase NP of type ((e, t), t) is downward closed
if and only if given any two verb phrases V1 and V2 such that
everything that V1s also V2s, the sentence:
• NP V2
logically implies the sentence:
• NP V1
Problem 155: Use the diagnostic to determine whether each of the
following is downward closed:
1. Few kings
2. Both kings
3. At most seven kings
4. At least seven kings
5. Only kings
Problem 156: We can extend our diagnostic to impose a four-fold
distinction on determiners. We’ll run the diagnostic with car//red
car, although we could generalize it to any two expressions, one of
which is more restrictive than the other.
• D kings who own a red car laugh • D kings who own a red car laugh
implies D kings who own a car laugh implies D kings who own a car laugh
D is left monotone down and D is left monotone down and
right monotone up right monotone down
(or ↓ mon↑ ) just in case: (or ↓ mon↓ ) just in case
• D kings who own a car laugh implies • D kings who own a car laugh implies
D kings who own a red car laugh D kings who own a red car laugh
136
Finally, consider exactly two kings. Again on the assumption that a, b, and c
are kings, we can mark off ~exactly two kings:
abcd
cd bd bc ad ac ab
d c b a
∅
~Exactly two kings is not upward closed. It contains {a, b}, and {a, b} is a
subset of {a, b, c}, but it doesn’t contain {a, b, c}. Similarly, ~exactly two kings
is not downward closed. It contains {a, b}, and {a} is a subset of {a, b}, but it
doesn’t contain {a}. Exactly two kings is thus non-monotonic. It’s neither
right monotone up nor right monotone down.
However, although exactly two kings is non-monotonic, it’s still closely re-
lated to monotonic expressions. In particular, ~exactly two kings is the
intersection of an upward closed set and a downward closed set. Note that:
137
abcd
cd bd bc ad ac ab
d c b a
The blue exactly two kings region contains all and only the subsets of e that
are in both of:
1. The upward closed {{a, b}, {a, c}, {b, c}, {a, b, c}, {a, b, d}, {a, c, d}, {b, c, d}, {a, b, c, d}}
2. The downward closed {∅, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}}
[Coming soon.]
Earlier we analyzed definite descriptions like the winner of the race as type
e, with the definite article the of type ((e, t), e) so that it can combine with a
type (e, t) noun to pick out an object.
138
But this analysis has the disadvantage of treating definite descriptions quite
differently from other syntactically similar noun phrases. Compare:
• Every linguist: analyzed as ((e, t), t)
• Some linguist: analyzed as ((e, t), t)
Suppose that Usain Bolt is the winner of the race. What kind of property (that
is, what kind of (e, t) value) does laughed need to pick out in order for The
winner of the race laughed to be true? What;’s needed is that laugh pick
out a property that Usain Bolt has. More generally, for:
• The F is G
to be true, we need is G to pick out a property had by whatever object is the
unique F object.
So we want the winner of the race to pick out the second order property:
• Being a property that Usain Bolt has.
and more generally, we want the F to pick out the second order property:
• Being a property that the unique object having the property F has.
Or, if we want to avoid using the word the in giving the semantic value of the,
we can rewrite this as:
• ~the N = λx(e, t) .(∃y(~N(y) = > ∧ ∀z(~N(z) = > → z = y) ∧ x(y) = >)
Problem 160: Should we care about whether we use the word the
in giving the semantic value of the word the? This isn’t a feature
we’ve avoided in other cases, as can be seen in values like:
• ~laughed = λx.x laughed
• ~linguist = λx.x is a linguist
139
Is there something helpful gained in the second clause above for
~the N that uses the logical quantifiers ∃ and ∀ rather than the
English the? If so, should we be trying to gain similar helpful
things in other cases by, for example, not using the word laughed
in giving the semantic value of laughed?
Problem 161: Return to the example of The bald man killed Belicoff
from section Y. Give another full computation of the semantic value
of this sentence and its parts using the new treatment of the as type
((e, t), t).
We’ve now considered two analyses of the winner of the race: one on
which it is of type e and picks out a specific object, and one on which it is
of type ((e, t), t), and picks out a second-order property (roughly, the second-
order property of being a property had by the object that was the semantic value
on the type e analysis). In many cases these two analyses end up producing
all of the same results when the winner of the race is used in a sentence.
However, there are cases in which the two analyses diverge in their larger
predictions:
1. Suppose no one wins the race. (Perhaps no one even finishes the race, as
in the 2019 Berkeley Marathons.) Then when the is treated as type ((e,
t), e), and thus the winner of the race is of type e, there is no object
for the winner of the race to pick out. Thus ~the winner of the
race is undefined, and sentences containing that phrase are semantically
defective. However, when the winner of the race is treated as type
((e, t), t), it is still defined.
2. Suppose more than one person wins the race. (There is a tie result, as in
the tie between Allyson Felix and Jeneba Tarmoh in the 100 meter dash
2012 Olympic qualification trials.) Then when the is treated as type ((e,
t), e), and thus the winner of the race is of type e, there is no object
for the winner of the race to pick out. Thus ~the winner of the
race is undefined, and sentences containing that phrase are semantically
defective. However, when the winner of the race is treated as type
((e, t), t), it is still defined.
The type e treatment of definite descriptions made them semantically defective
when the restricting noun wasn’t satisfied by exactly one object. What happens
with the type ((e, t), t) treatment in these cases?
140
Let’s work through a case carefully. Suppose type e contains three objects a, b,
and c, and that noun N is true of a and b, but not of c. What is ~the N? We
have:
a → >
• ~N =
b → >
c → ⊥
Therefore:
• ~the N
• = ~the(~N)
a → >
• = λw(e, t) .λx(e, t) .(∃y(w(y) = >∧∀z(w(z) = > → z = y)∧x(y) = >))( b → > )
c → ⊥
a → > a → >
• λx(e, t) .(∃y( (y) = > ∧ ∀z( (z) = > → z = y) ∧
b → > b → >
c → ⊥ c → ⊥
x(y) = >))
a → >
(z) = > → z = y) is equivalent to ∀z((z =
b → >
But notice that ∀z(
c → ⊥
a ∨ z = b) → z = y). But the truth of that sentence for any value of y requires
both that y = a and y = b. We can’t have y identical both to a and b, so
a → >
∀z( b → > (z) = > → z = y) is false for every value of y.
c → ⊥
a → >
Since ∀z( b → > (z) = > → z = y) is false for evert value of y, there is
c → ⊥
a → >
no suitable value for y, so the existential quantification ∃y( b → > (y) =
c → ⊥
As a result, ~the N maps any input (e, t) value to ⊥. That means ~the N is
the empty second-order property: the second-order property that is not had by
any first-order property. And as a result of that, we conclude that:
• The N VPs
141
is false for any verb phrase.
Notice the difference between the two accounts. Suppose there is more than
one winner of the race. Then:
1. When ~the is type ((e, t), e): The winner of the race laughed is se-
mantically defective: it does not have a semantic value and is neither true
nor false.
2. When ~the is type ((e, t), t): The winner of the race laughed is false.
Problem 163: So far we have seen the ((e, t), e) and the ((e, t),
t) analyses diverge in cases in which the first analysis makes a
sentence semantically defective while the second analysis makes
that sentence false. Are there cases in which the first analysis makes
a sentence semantically defective while the second analysis makes
that sentence true? If so, give such a case. How plausible is the
result?
Problem 164: We’ve seen that the ((e, t), e) analysis of the builds in
assumptions of existence and uniqueness – if there is no N, or more than
one N, the phrase the N lacks a semantic value and is semantically
defective. The ((e, t), e) and the ((e, t), t) cases thus come apart when
existence or uniqueness fail. But there are other cases in which the
two analyses differ. Consider sentences such as:
1. Every philosopher admires the linguist he learned semantics
from.
2. No linguist agreed with the paper she read.
Explain why these definite descriptions cause problems for the ((e,
t), e) approach. We don’t yet have tools in place for giving adequate
analyses of such sentences using the ((e, t), t) approach either, but
see if you can say anything about how that approach might provide
profitable routes for development.
Treating definite descriptions as ((e, t), t) rather than ((e, t), e) thus allows us
to trade out semantic defects for simple falsehood. Whether that’s good or bad
depends on what we think about sentences with definite descriptions when the
description isn’t satisfied by a unique object.
142
2. The first man on the moon was the king of France.
3. The king of France read Anna Karenina.
4. The king of France wrote Anna Karenina.
5. If France has a king, then the king of France probably
lives in Paris.
• ~a N = λx(e, t) .(an object y such that ~N(y) = > is such that x(y) = >)
• ~a = λw(e, t) .λx(e, t) .(a y such that w(y) = > is such that x(y) = >), or
equivalently:
• ~a = λw(e, t) .λx(e, t) .(∃y(w(y) = > ∧ x(y) = >))
A linguist is thus semantically the same as some linguist – the two expres-
sions will be interchangeable in all contexts.
143
1. Aristotle became a philosopher.
2. Obama became the president.
Explain carefully why the ((e, t), t) semantic values for ~a and
~the create difficulties here. (You may need to say something
about the semantic value for ~became in order to tell the story in
detail.) Do the ((e, t), e) semantic values work any better? Is there
another semantic story for ~a philosopher and ~the president
that handles this data well?
In the previous section we showed that we could stop treating definite descrip-
tions as type e, and instead unify them with their grammatical relatives such
as every linguist and no philosopher under a general treatment as type ((e,
t), t). Once we do this, we’re left with only proper names in category e.
Proper names don’t share the same syntactic similarity that every linguist, no
philosopher, and the king share – proper names aren’t formed by attaching
a determiner such as every, some, or no to a noun phrase. But proper names
do have the same syntactic distribution that quantified noun phrases have –
anywhere a proper name occurs, a quantified noun phrase could be put in its
place, and everywhere that a quantified noun phrase occurs, a proper name
could be put in its place.
144
But quantified noun phrases can’t be combined with (addi-
tional) determiners:
• #A thief stole three many paintings from the museum.
• #There are few the tallest boy in my class this year.
2. Some quantified noun phrases can be used in the restrictor of
another quantified noun phrase when put in the genitive/possessive:
• All of the linguists attended the lecture.
• Few of the students did well on the exam.
But proper names can’t be used in these same positions:
• #All of Chomsky attended the lecture.
• #Few of Aristotle did well on the exam.
3. Some quantified noun phrases can be combined with collective
verbs such as met and surrounded:
• Several students met in the park.
• The protestors surrounded the house.
But proper names can’t be used in these same positions:
• #Several Plato met in the park.
• #Socrates surrounded the house.
Not all of these data points are equally convincing. Which one do
you think presents the strongest challenge to the claim that proper
names and quantified noun phrases have the same syntactic distri-
bution? Are there any lessons to be learned about the semantics
either of proper names or of quantified noun phrases from that data
point?
e (e, e)
and Plato
And we can account for the conjunction of two quantified noun phrases as
follows:
145
((e, t), t)
((e, t), t)
some philosopher
But what do we do with the conjunction of a proper name and a noun phrase:
and
((e, t),((e, t), t)) (e, t)
some linguist
There are ways to make the typing work here, but they’re not theoretically
pretty. We have to give up the attractive idea that and is always of type (α, (α,
α)) for some type α.
Problem 170: Give two typings for and that allow the typing of
Aristotle and some linguist to work. What happens if we con-
sider some linguist and Airstotle instead?
Fortunately, it is possible to give proper names semantic type ((e, t), t) as well.
Suppose Aristotle laughed is true. Then we want to use as ~Aristotle a
second-order property that is possessed by the first-order property of laugh-
ing. Suppose Aristotle cried is false. Then we want to use as ~Aristotle
a second-order property that is not possessed by the first-order property of
crying.
146
What we want, then, is the second-order property being a property that Aristotle
has. In the lambda notation, we want:
• ~Aristotle = λx(e, t) .x(Aristotle)
• = λx.x(Aristotle)(λy.y laughed)
• = λy.y laughed(Aristotle)
• = Aristotle laughed
This idea generalizes. Take any proper name N. We want to distinguish the old
e type semantic value for N and the new ((e, t), t) type semantic value for N. Just
as a notational convenience, we’ll use ~Ne and ~N((e,t),t) to pick out these two
semantic values. Then we can give the following general rule for ~N((e,t),t) :
• ~N((e,t),t) = λx.x(~Ne )
Each proper name is thus semantically associated with the second-order prop-
erty of being a property had by the (ordinary, e-type) referent of the name.
Problem 171: When proper names are treated as ((e, t), t) quantifiers,
are they upward closed, downward closed, or neither? Are they
positive strong, negative strong, or neither?
When names are of type ((e, t), t), we have an easy story about how names and
quantified noun phrases can link together via conjunction:
147
((e, t), t)
Aristotle
((e, t), t), (((e, t), t),((e, t), t)) ((e, t), t)
and
((e, t),((e, t), t)) (e, t)
some linguist
What exactly this typing analysis predicts for the semantic value of these con-
junctions, though, depends on what we use as ~and – which particular member
of the rather complicated type ((e, t), t), (((e, t), t),((e, t), t)) we assign to it. We’ll
defer that question until later when we take a more careful look at sentential
connectives.
Going forward, we’ll usually for simplicity continue to treat proper names as
type e, but we’ll keep available the tool of moving to quantified noun phrase
values of type ((e, t), t) as something to try when the type e approach is creating
problems.
AL Type Lifting
In the previous section we set out a general method for taking a semantic value
of type e and creating via it a new semantic value of type ((e, t), t). That general
method can be applied to starting points other than e. Suppose, for example,
that we have some expression E of type t. Type (t, t) can then be thought of as
the type of properties of truth values.
Problem 173: Type (t, t) has four members. Say what those four
members are. So when we think of (t, t) as being the type of proper-
ties of truth values, we end up with four properties of truth values.
What are the resulting four properties? Should there be more than
four properties of truth values?
Given that (t, t) is the type of properties of truth values, ((t, t), t) is the type of
properties of properties of truth values, or of second-order properties of truth values.
148
Just as we set ~Aristotle((e,t),t) to be the second-order property of being a
property had by ~Aristotlee , we can assign E a higher-typed property of
being a property had by the t-type value of E. Thus we have:
• ~E((t,t),t) = λx(t, t) .x(~Et )
If, for example, ~E=>, the ((t, t), t) value associated to E by this rule is the set:
" # " #
> → > > → >
• { , }
⊥ → > ⊥ → ⊥
This idea can be generalized to any expression E of any type α. The type (α,
t) is then the type of properties of α’s, and the type ((α, t), t) is the type of
second-order properties of α’s. If ~Eα is the original type-α semantic value of
E, we then define a new semantic value of E:
• ~E((α, t), t) = λx(α, t) .x(~Eα )
We’ll refer to this new semantic value for E as the type-lifted semantic value,
and will use ~E+ to refer to type-lifted values. Any expression can be given a
type-lifted value using this procedure.
Problem 176: Suppose that e contains the three objects a, b, and c
and that a and b are philosophers but c is not. Determine each of
the following:
1. ~philosopher+
2. ~some philosopher+
3. ~no philosopher+
Crash and lift allows the following story for Aristotle and some linguist.
We start with ~Aristotle of type e. That gives us:
149
e (((e, t), t), ((e, t), t))
Aristotle
((e, t), t), (((e, t), t),((e, t), t)) ((e, t), t)
and
((e, t),((e, t), t)) (e, t)
some linguist
The type e ~Aristotle then won’t functionally combine with the type (((e, t),
t), ((e, t), t)) ~and some linguist, so we type-lift ~Aristotle:
((e, t), t)
Aristotle
((e, t), t), (((e, t), t),((e, t), t)) ((e, t), t)
and
((e, t),((e, t), t)) (e, t)
some linguist
+
The type-lifted value ~Aristotle will combine with ~and some linguist,
so now we get a successful computation of ~Aristotle and some linguist.
Crash and lift lets us combine the simplicity of the lower-typed semantic values,
such as simple objects as semantic values of proper names, with the increased
functional flexibility of higher-typed values like ((e, t), t) by bringing the higher-
typed values into play only as a repair strategy when the lower-typed values
won’t work out.
The semantic machinery we have been developing is centered around the idea
that sentences are of semantic type t, and thus have truth values as their se-
mantic values. There is then a puzzle about how to fit into this framework a
150
sentence such as:
• I am Greek.
This sentence is true when uttered by Aristotle, but false when uttered by
Abraham Lincoln. So what should we use as ~I am Greek? Neither > nor ⊥
seems right – > isn’t faithful to the use of the sentence by Lincoln, and ⊥ isn’t
faithful to the use of the sentence by Aristotle.
Problem 177: What is wrong with the following proposal for the
semantic value of I am Greek:
We should have:
• ~I am Greek=I am Greek
just as we have:
• ~Aristotle is Greek=Aristotle is Greek.
• ~Lincoln is Greek=Lincoln is Greek.
And it’s easy enough to see how we’ll get this result.
We’ll just have ~I=I, in the same way that we have
~Aristotle=Aristotle and ~Lincoln=Lincoln.
There are various problems that can be raised for the proposal.
Consider, for example, the question of who is making the proposal.
The basic problem is clear: the sentence I am Greek isn’t the kind of sentence
that gets a truth value. That’s because the sentence contains the word I, and
the word I picks out different people depending on who is speaking. When
Aristotle is speaking, I picks out Aristotle. Aristotle’s utterance of I am Greek
thus says the same thing as an utterance of Aristotle is Greek, and hence is
true. But when Lincoln is speaking, I picks out Lincoln. Lincoln’s utterance
of I am Greek thus says the same thing as an utterance of Lincoln is Greek,
and hence is false.
So there’s no such thing as what is said by the sentence I am Greek. That’s why
we can’t assign a truth value to the sentence – it’s the kind of sentence that says
different things as used by different speakers.
I isn’t the only word that creates this effect of varying in semantic value from
use to use. Consider other similar words such as:
151
Now picks out different times with different uses. That’s why we can’t hope
to have a semantic theory simply assign a truth value to a sentence such as
Chomsky is laughing now – that sentence can be true as used at one time and
false as used at another time. In this way now is like I. But there’s also an
important difference between now and I:
1. To determine the semantic value of a use of now, we need to know what
time the use occurred.
2. To determine the semantic value of a use of I, we need to know who the
speaker was for that use.
To handle all of these different use-variable words, let’s introduce the idea of
a context. Informally, a context is a situation in which a sentence is used.
But formally it will be easier to treat contexts as ordered sequences of bits of
information about the situation of use that are then useful in interpreting use-
variable words. In particular, we will treat a context as an ordered quadruple:
• hSpeaker, Audience, Time, Placei
When Aristotle speaks to Plato in 350 B.C. in Athens, he speaks in the context
hAristotle, Plato, 350 B.C., Athensi.When Lincoln speaks to Hannibal Hamlin
in 1862 in the White House, he speaks in the context hLincoln, Hamlin, 1862,
White Housei. The words I, you, now, and here can then have different semantic
values in these different contexts.
To make use of contexts, we will relativize semantic values to a context.
So instead of introducing a single semantic value ~I into our theory, we will
introduce many semantic values for I – one for each context. For any context
c, ~Ic is the semantic value of I in, or relative to, the context c. Thus:
It’s not just the semantic value of I that we want relativized to a context.
To capture our starting observation that I am Greek is sometimes true (for
example, when spoken by Aristotle) and sometimes false (when spoken by
Lincoln), we want the semantic value of the entire sentence to be relativized to
context, so that we can have:
152
• ~I am Greek = ~am Greek(~I)
But this won’t work any more. It doesn’t give us a context-relativized semantic
value for I am Greek, and it appeals to an unrelativized semantic value for I
that we’re not trying to provide.
We’ve already seen that ~IhAristotle, Plato, 350 B.C., Athensi =Aristotle, but
what should we make of ~am GreekhAristotle, Plato, 350 B.C., Athensi ? For
simplicity, let’s assume that the semantic value ~am Greek, that we would
have used (before turning our eye to matters
" of context-sensitivity)
# in analyzing
Aristotle → >
~Aristotle is Greek, is the function . What function
Lincoln → ⊥
should we then use for the semantic value of am Greek when relativized to a
particular context, such as the context of Aristotle speaking to Plato in Athens
in 350 B.C.?
• ~am GreekhAristotle,
"
Plato, 350 B.C., Athensi = ~am GreekhLincoln, Hamlin 1862, White Housei
#
Aristotle → >
= ~am Greek =
c
, for any context c.
Lincoln → ⊥
153
We can thus distinguish between context sensitive and context insensitive ex-
pressions. An expression E is context sensitive if there are any two contexts
c1 and c2 such that ~Ec1 ,~Ec2 . Otherwise, E is context insensitive. Context
insensitive expressions thus have the same semantic value relative to every
context. In effect, the relativization of semantic values to contexts makes no
difference for context insensitive expressions, and is done just to give a uni-
form presentation to the semantic machinery that allows us to use relativized
functional application throughout.
Problem 179: Calculate in detail ~Aristotle is GreekhAristotle, Plato, 350 B.C., Athensi .
Then calculate in detail ~Aristotle is GreekhLincoln, Hamlin 1862, White Housei ,
and compare the results. Will we have ~Aristotle is Greekc1 =~Aristotle
is Greekc2 for any two contexts c1 and c2 ?
Now we can give general rules for some context-sensitive terms. I, for example,
always picks out the speaker in the context. Given the way we have defined
contexts as ordered quadruples, that means I always picks out the first member
of the quadruple that is the context. That is:
• For any context c, ~Ic = c(1)
(where c(j) for any j picks out the jth element of c, if it has at least j elements).
Similarly we can say:
• ~youc = c(2)
• ~nowc = c(3)
• ~herec = c(4)
Let’s consider a test case for these semantic values. Consider an utterance of I
admire you, made by Aristotle to Plato in 350 B.C. in Athens. We are thus inter-
ested in determining ~I admire youhAristotle, Plato, 350 B.C., Athensi . For
convenience, let’s use c as a name for the context hAristotle, Plato, 350 B.C., Athensi.
Then we have:
• ~I admire youc
• = ~admire youc (~Ic )
• = ~admire youc (c(1))
• = ~admire youc (Aristotle)
• = (~admirec (~youc ))(Aristotle)
• = (~admirec (c(2)))(Aristotle)
• = (~admirec (Plato))(Aristotle)
• = (λx.λy.y admires x(Plato))(Aristotle)
154
• =(λy.t admires Plato)(Aristotle)
• = Aristotle admires Plato
We thus get the (desirable) result that an utterance of I admire you in context c,
spoken by Aristotle addressing Plato, is equivalent to an utterance of Aristotle
admires Plato.
Problem 181: Our semantic rules for I, you, now, and here tell
us how the meanings of these words are related to the context in
which they are uttered. But we haven’t yet said anything about how
the context, when considered as an ordered quadruple hα, β, γ, δi is
related to the speech situation in which an utterance is produced.
It’s tempting to think that contexts are connected to utterances in
the following way:
Utterances to Contexts: Utterance u is made in (and
should be evaluated relative to) context c = hα, β, γ, δi,
where:
1. α is the speaker in u
2. β is the audience in u
3. γ is the time of production of u
4. δ is the place of production of u
Consider each of the following problem cases for the Utterances to
Contexts thesis. In each case, explain why it is a problem (be specific
about which component of Utterances to Contexts is challenged by
the case), and consider what might be done in response to that
problem. (Should Utterances to Contexts be modified in some way
to handle the problem case? Does the problem case show that the
underlying idea of Utterances to Contexts needs to be given up
altogether?)
1. Aristotle, speaking to Plato and Socrates, says I admire you.
2. Professor X, leaving his office, puts a note on the door saying
I am not here now -- be back soon.
3. The waiter arrives with everyone’s orders, and James says I am
the ham sandwich. Later, leaving the restaurant James says I
am parked down the alley.
4. Questioned about how unusual it is to have a South American
pope, Francis says Usually I’m Italian.
155
5. Members of Jefferson Smith’s re-election campaign design and
arrange for billboards with a picture of Smith and a cap-
tion below reading I promise to fight against graft in
Congress.
6. Socrates says Only I know myself to be ignorant.
7. Sarah, answering the phone, says Oh, I thought you were
my mother calling.
As one possible (but not the only) test, you might consider whether
there is any change of meaning when the pronoun is replaced by an
appropriate proper name (for example, the name of the speaker).
156
3. In a context c = hα, β, γ, δi, ~todayc = the period from the
midnight, according to the time zone of δ, prior to γ to the
midnight, according to the time zone of δ, after γ.
4. Add a fifth element to contexts so that a context c takes the
form hα, β, γ, δ, i, and then have ~todayc = c(5) = .
How might we decide among these proposed semantic values?
Some cases that might be useful in thinking about the decision
process:
1. If we were on Venus, today would last another 116 days.
2. If we were on Venus, the Superbowl would be played today.
(uttered on January 1)
3. If the earth rotated faster, today would already be over.
4. I’ll get the proposal to you later today (spoken by some-
one on one side of the International Date Line to someone on
the other side of the International Date Line.)
(None of these cases is meant to be uncontroversial in its interpre-
tation and evaluation.)
Today and this day don’t behave exactly the same. To see this,
compare:
• This day is Christmas (said pointing to December 25 on a
calendar)
• Today is Christmas (said pointing to December 25 on a cal-
endar)
What lessons for the possible semantic value of to- should be drawn
from this contrast?
157
to a relativized framework in which we had a function ~·c that assigns each
expression a semantic value relative to each context. But it’s not inevitable that
we make this shift and start relativizing semantic values.
Similarly, you, now, and here are type (c, t), with:
• ~you=λc.c(2)
• ~now=λc.c(3)
• ~here=λc.c(4)
But of course it’s not just single words such as I and you that need to be
context-sensitive. In the relativizing approach, a sentence such as I laughed
gets a truth value only relative to a context. So if we are going to treat I as hav-
ing an unrelativized semantic value of type (c, e), we need to treat I laughed
as having an unrelativized semantic value of type (c, t).
We could then complicate semantic values in the same way throughout our
entire system. Intransitive verbs, rather than being simply (e, t), would be (c,
(e, t)). Transitive verbs would shift from (e, (e, t)) to (c, (e, (e, t))). Quantified
noun phrase would shift from ((e, t), t) tp (c,((e, t), t)). And so on.
But there is another complication that comes with this approach. The compli-
cated version of ~laughed, for example, is going to be:
• ~laughed=λc.λx.x laughed
But now consider the calculation of ~I laughed:
• ~I laughed
• = ~laughed(~I)
• = (λc.λx.x laughed)(λc.c(1))
But here the computation crashes. λc.λx.x laughed requires a context (a member
of c) as input, but what it’s getting as an input is λc.c(1), which is not a context.
The source of the crash is clear in the typing. Laughed is type (c, (e, t)) and I
is type (c, e). But these two types won’t functionally combine – each of them
takes type c as input, and neither is type c, so neither can take the other as input.
158
We can fix this by changing to a fancier version of functional application. The
key thought is this: before we started adding the new semantic type c to our
system, we would have two expressions of types α and (α, β), and we would
combine them using functional application. After we add type c to the system,
these two expressions become types (c, α) and (c, (α, β)). So we now need to
be able to combine two expressions of type (c, α) and (c, (α, β)). To do this, we
want to take an arbitrary context c, calculate the α value of the first expression
applied to c, calculate the (α, β) value of the second expression applied to c, and
then apply the resulting (α, β) value to the resulting α value. Then we want to
generalize that whole procedure for all choices of c.
Problem
" 184: Suppose# we have two contexts" c1 and c2 , and ~I = #
c1 → Aristotle Aristotle → >
. Let f be the function ,
c2 → Plato Plato → ⊥
" #
c1 → f
and let ~laughed = . (We thus represent laughed as
c2 → f
context-insensitive.)
Semantic values of this complicated sort – values that are functions from con-
texts to our earlier simpler semantic values – are called characters. The character
of I is thus a function from contexts to the ordinary referent of I in contexts. The
ordinary referent of I varies from context to context – in one context, I picks
out Aristotle, and in another context, I picks out Plato. Thus if we want ordi-
nary referents as semantic values for I, semantic values need to be relativized
to contexts. But the character of I doesn’t vary from context to context. No
matter what context we are in, the character of I is the function from contexts
to speakers in (that is, first elements of) those contexts.
159
What we have seen, then, is that we can avoid relativization of semantic values
to contexts by running our entire system at the level of characters. There are
two prices that we pay when we do this. First, as we’ve already seen, we have
to shift from simple functional application to the somewhat more complicated
c-functional application. Second, because everything is done at the level of
characters, we never actually assign truth values to any sentences. When we
work with relativized semantic values, we do end up saying:
That is, Aristotle admires Plato is assigned the character that maps every
context to the true – but isn’t directly assigned a truth value.
Here is one more option for incorporating context sensitivity into our semantic
machinery. We can build context sensitivity into our starting semantic types.
So:
• Instead of type e being the type of objects, type e – or what we might
call type e∗ to avoid confusion – is the type of functions from contexts to
objects. Type e∗ is thus the type of object characters.
• Instead of type t being the type of truth values, type t – or what we might
call type t∗ to avoid confusion – is the type of functions from contexts to
truth values. Type t∗ is thus the type of truth value characters.
So far this is just a notational variant on our previous approach. On the pre-
vious approach, expressions that we originally had put in category e were in
category (c, e), and expressions that we had originally put in category t were
in category (c, t). But e∗ just is (c, e) – both are the collection of functions from
contexts to objects. And t∗ just is (c, t) – both are the collection of functions from
contexts to truth values.
But from here, the two approaches diverge. Once we set e6∗ and t∗ as the basic
categories, we can (as in our original system) treat intransitive verbs as type
(e∗ , t∗ ). That’s unlike the previous option, which treated intransitive verbs as
type (c, (e, t)). The previous option, that is, assigned to intransitive verbs (e,
160
t) characters. But that’s not what we’re now doing – (e∗ , t∗ ) isn’t a character,
because it’s not a function from a context to anything.
161
We’ve now seen three different ways to add context-sensitivity to our semantic
theory. All three start with the use of contexts, represented as ordered quadru-
ples of speaker, audience, time, and location. But the three then make use of
contexts in different ways:
1. Relativized Semantic Values: Relativize all semantic values to contexts,
so that instead of using ~I or ~You laughed, we use ~Ic or ~You
laughedc .
2. Character Semantic Values:Use unrelativized semantic values, but change
the type of all expressions to add a context argument, so that where we
formerly used type (e, —bf t) we now use type (c, (e, t)) and where we
formerly used type (t, t) we now use type (c, (t, t)).
3. Setwise Semantic Values: Use unrelativized semantic values, but change
the basic types from e (objects) and t (truth values) to e∗ (functions from
contexts to objects) and t∗ functions from contexts to truth values. Then
build up semantic values of other expressions in the familiar way from
this different starting point, so that where we formerly used type (e, t) we
now use type (e∗ , t∗ ) and where we formerly used type (t, t) we now use
type (t∗ , t∗ ).
An overview comparison of the features of the three approaches:
162
whereas using setwise semantic values replaces only the basic types
e and t with functions from contexts to old-style semantic values,
and then builds up complex types from there. We might then ex-
pect a difference in how well the two approaches deal with context-
sensitivity in expressions that are not (on the old approach) e type
or t type.
Suppose, then, that Sam’s mouth curvature is 60, and that c1 (5) = 50
and c2 (5) = 80. We then have:
• ~Sam smilesc1 = >
• ~Sam smilesc2 = ⊥
That’s the result when we implement context-sensitivity by rela-
tivizing semantic values to contexts. Let’s now see how things
work out on the other two approaches.
1. Suppose we implement context-sensitivity by using character
semantic values. Then smiles is type (c, (e, t)). Give an ap-
propriate lambda expression of this type for ~smiles, and
calculate the resulting ~Sam smiles. Is the result plausible?
2. Suppose we implement context-sensitivity by using setwise se-
mantic values. Then smiles is type (e∗ , t∗ ). Give an appropriate
lambda expression of this type for ~smiles, and calculate the
resulting ~Sam smiles. Is the result plausible?
What do we learn from all of this about the comparative abilities
of the character semantic value approach and the setwise semantic
value approach to deal with context-sensitive expressions across a
wider range of the language?
163
AO A Problem About Every Linguist
We now have an account that gives us a satisfactory typing for Every linguist
admires Aristotle:
e ?
Aristotle
(e, (e, t)) ((e, t), t)
admires
((e, t), ((e, t), t)) (e, t)
every linguist
If we type each of Aristotle, admires, every, and linguist as before, we
encounter an immediate problem. The type (e, (e, t)) admires cannot combine
with the type ((e, t), t) every linguist. Admires takes type e as input, and
every linguist is not of type e, so every linguist can’t serve as input to
admires. And every linguist takes type (e, t) as input, and admires is not of
type (e, t), so admires can’t serve as input to every linguist. But those are
the only options for functional application, so the semantic composition crashes.
164
t
e
(e, (e, t))
Aristotle
admires every linguist
Admires every linguist then needs to be type (e, t) in order to combine with
type e Aristotle. That gives us two options:
1. Every linguist is the input to admires, and thus is type e.
2. Admires is the input to every linguist, so every lihguist takes an (e,
(e, t)) input and produces an (e, t) output. Thus every linguist is type
((e, (e, t)), (e, t)).
But we’ve already tried and abandoned the first approach. So instead we’ll
make every linguist be of type ((e, (e, t)), (e, t)). Given that linguist is type
(e, t), we then need every to be type ((e, t), ((e, (e, t)), (e, t))):
e (e, t)
Aristotle
admires
((e, t), ((e, (e, t)), (e, t))) (e, t)
every linguist
That makes the typing work out. The only typing change that we have made
is for every, so we now need a suitable specific semantic value for every. We
first note that we need ~every linguist to be:
• ~every linguist = λx.λy.{z : z is a linguist} ⊆ {z : x(z)(y) = >}
To achieve this effect, we need ~every to be:
• ~every = λw.λx.λy.{z : w(z) = >} ⊆ {z : x(z)(y) = >}
This semantic value for every gives us a workable theory that produces the
right result for Aristotle admires every linguist. But we’ve had to pay a
significant price for the theory – we now have two words every in the language,
one of type ((e, t), ((e, t), t) and one of type ((e, t), ((e, (e, t)), (e, t))). And in fact
two versions of every won’t be enough:
165
1. If ditransitive verbs like give are of type (e, (e, (e, t))), then neither of
the above typings for every will allow every linguist to combine with
give to form give every linguist.
2. If prepositions like under are of type (e, ((e, t), (e, t))), then neither of the
above typings for every will allow every tree to combine with under to
form under every tree.
Problem 188: Give semantic types for two more versions of every
– one that will combine with ditransitive verbs and one that will
combine with prepositions. Then give a lambda expression for an
appropriate semantic value for each of those two versions of every.
Second Attempt: Perhaps the difficulty here is that we haven’t fully integrating
an idea we discussed earlier. Let’s put together two thoughts:
1. Transitive verbs are type (e, (e, t)) in order to explain how a transitive
verb can combine with two names (as in Aristotle admires Plato),
given that the names are type e.
2. Once we’ve introduced the generalized quantifier framework, we can
treat names as special cases of generalized quantifiers, and thus give
names semantic values of type ((e, t), t).
If we change our treatment of names to make them ((e, t), t), perhaps we should
then change our treatment of verbs so that they take ((e, t), t) as input. We
would then have:
166
t
Aristotle
admires
((e, t), ((e, t), t)) (e, t)
every linguist
The basic idea is straightforward. We typed generalized quantifiers as ((e, t), t)
because we noticed that names like Aristotle and quantified noun phrases like
no linguist interacted differently with verbs. We could capture that difference
by sometimes using the verb as a function taking the subject as input (as with
Aristotle) and sometimes using the subject as a function taking the verb as
input (as with no linguist). But once we type-lift names to ((e, t), t), we’re no
longer trying to combine verbs sometimes with e and sometimes with ((e, t),
t), so we no longer need both modes of combination. Once we no longer need
both modes, we could set things up so that the verb is always taking the subject
as input, by changing verb inputs from e to ((e, t), t).
Saying that intransitive verbs are type (((e, t), t), t) and transitive verbs are type
(((e, t), t), (((e, t), t), t)) settles the typing, but it doesn’t address the question
of what the particular semantic values of verbs should be. Consider first the
intransitive case. When laughs is (e, t), we have:
167
1. First Option: One possibility is to stick as close as possible to our previous
strategy. We can’t use exactly the same semantic value as before, because
that value was (e, t). But maybe we can just change the typing of the
variable:
• ~laughs = λx((e, t), t) .x laughs
It’s a subtle question whether this semantic value is even sensible. Here’s
a worry. Suppose for simplicity that there
h is only ione object
h a in type
i e.
Then (e, t) contains the two functions a → > and a → ⊥ . A
sample member of ((e, t), t) is then:
h i
a → > → ⊥
• h i
a → ⊥ → >
But if that’s a member of type ((e, t), t), we should be able to provide it as
input to our proposed type (((e, t), t), t) semantic value for laughs:
h i
a → > → ⊥
• (λx((e, t), t) .x laughs)( h i )
a → ⊥ → >
h i
a → > → ⊥
• = h
i laughs
a → ⊥ → >
h i
a → > → ⊥
But that’s a curious final result. The function h i
a → ⊥ → >
surely doesn’t laugh. Since x((e, t), t) is always going to be a function,
the proposed semantic value is always going to make sentences about
laughing depend on whether some function laughs. All such sentences
will be false, which isn’t right.
It’s not clear that this worry is decisive. After all, the original motivation
for the ((e, t), t) treatment of quantified noun phrases was that those noun
phrases had meanings that could take the meaning of intransitive verbs
like laugh as an input. So maybe we should just insist on this view in
the use of laugh in the lambda term specification of the type (((e, t), t), t)
semantic value for laugh. In that case, we can just interpret “x laughs”
in that term as the application of x to the (e, t) value of “laugh”. That’s
not entirely happy, since it involves thinking of laugh as type (e, t) within
168
the lambda term, and then using that typing of laugh in order to specify
a more complex semantic value to type laugh outside the lambda term
as (((e, t), t), t). But it’s perhaps workable. Fortunately, there’s another
alternative that avoids the whole issue.
2. Second Option: We can avoid the typing worries we were just consider-
ing by continuing to assume that “laughs”, as used in the specification of
lambda terms, allows us to make proper sense of λx.x laughs as an (e, t)
term, and then making use of that term to specify a (((e, t), t), t) term. We
thus have:
• ~laughs = λx((e, t), t) .x(λy.y laughs)
Here there is no typing worry. The variable x is, by stipulation, type ((e,
t), t), and we already know that λy.y laughs is type (e, t). Thus x can take
λy.y laughs as input, and it will produce type t as output. Thus the entire
lambda term for ~laughs is a function that takes type ((e, t), t) as input
and produces type t as output. That’s a function of type (((e, t), t), t), as
desired.
Problem 192: Suppose again that Aristotle and Plato are the only
members of e, and that Aristotle laughs and Plato does not laugh.
Using the specification ~laughs = λx((e, t), t) .x(λy.y laughs), work
out in full detail what ~laughs is. Then specify ~Aristotle, with
Aristotle treated as type ((e, t), t). What do these two semantic
values then predict as the truth value of Aristotle laughs?
Then specify ~Plato (again treated as ((e, t), t), and determine
the predicted truth value of Plato laughs. Finally, assume both
Aristotle and Plato are philosophers. Specify ~some philosopher.
(If you want, you can (i) specify ~philosopher as type (((e, t), t), t),
rather than as type (e, t), and (ii) give an appropriate semantic value
for some treating it as type ((((e, t), t), t), ((e, t), t)), and then use these
two semantic values to calculate ~some philosopher. But you can
also just directly specify ~some philosopher if you prefer.) What
truth value is then predicted for Some philosopher laughs.
169
• Fullness: Every (e, t) function is a possible semantic value of
an intransitive verb.
If we shift to treating intransitive verbs as type (((e, t), t), t), there is
an obvious analog Fullness∗ of Fullness:
• Fullness∗ : Every (((e, t), t), t) function is a possible semantic
value of an intransitive verb.
Consider the implications of Fullness∗ . Show that Fullness∗ com-
mits us to the possibility of adding to English some intransitive verb
(let’s call it gimble such that both of the following sentences are true:
1. Some badger gimbles.
2. No badger gimbles.
(Be as thorough as possible in explaining this, detailing exactly what
semantic value gimble will have to produce these truth values.) To
what extent does this consequence generalize? Should we on this
basis reject Fullness∗ ? If we do reject Fullness∗ , what (if anything)
does this tell us about the plausibility of the (((e, t), t), t) treatment
of intransitive verbs?
170
Next let’s attempt to generalize. What type should the proposed
operator LIFT be? Propose a lambda expression as the specific
semantic value for ~LIFT. Then test your value by giving a simple
model with two objects Chomsky and Russell (where Chomsky is
a linguist and Russell is not), giving a (e, t) semantic value for
~laughs, and working out the truth value of both of:
1. Some linguist laughs
2. Russell laughs
in your model, on the assumption that these sentences have the
structures:
1.
Russell
LIFT laughs
Problem 195: Now let’s put all of the pieces together. First, give a
semantic value for admires. You can either:
1. Directly give a suitable (((e, t), t), (((e, t), t), t)) value for admires.
2. Give a traditional (e, (e, t)) value for admires, and then give
a generalized version of the LIFT operator from the previous
problem that will transform that (e, (e, t)) value into a suitable
(((e, t), t), (((e, t), t), t)) value.
Then give ((e), t), t) values for Aristotle and every linguist in the
usual way. Finally, calculate the full semantic value for Aristotle
admires every linguist via functional application, and check to
see that the resulting truth conditions are plausible.
By treating names and quantified noun phrases both as type ((e, t), t) and
by lifting verbs (and other associated expressions like adverbs, although we
haven’t focused on that aspect of things) to higher types such as (((e, t), t), t)
and (((e, t), t), (((e, t), t), t) (replacing e with ((e, t), t) throughout, we can get a
workable typing for a sentence like Aristotle admires every linguist with
a quantified noun phrase in object position. And we’ve made some preliminary
stabs at picking out appropriate semantic values within those verb types. For
an intransitive verb like laughs, we’ve considered two approaches:
171
1. Directly specify ~laughs via the condition ~laughs = λx((e, t), t) .x(λy.y laughs)
2. Start with the unlifted (e, t) semantic value for laughs, and then apply a
LIFT operator of type ((e, t), (((e, t), t), t) to produce a type (((e, t), t), t)
value for LIFT laughs.
But it’s not hard to see that the first approach goes wrong when we turn to
transitive verbs. The obvious generalization of the first approach would be
something like:
• ~admires = λx((e, t), t) .λy((e, t), t) .y(x(λu.λv.v admires u))
But now the typing does not work out. λu.λv.v admires u is type (e, (e, t)), and
x is type ((e, t), t). But x then cannot take λu.λv.v admires u as input. We can try
rearranging what is used as input to what, but it won’t help. A little inspection
shows that we’re back at our original problem – we can’t successfully combine
types ((e, t), t) and (e, (e, t)) in any order.
The typing will work once we get admires up to type (((e, t), t), (((e, t), t), t) –
the puzzle is just how to get this higher type. Perhaps, then, a suitable LIFT
operator will do the job.
Consider first the details of LIFT for intransitive verbs. Suppose type e con-
a → >
tains three objects a, b, and c, and suppose ~laughs = b → > . Then
c → ⊥
LIFT laughs should pick out a collection of ((e, t), t) values appropriately
corresponding to ~laughs. That is, ~LIFT laughs should pick out a set of
subsets of the diagram:
abc
bc ac ab
c b a
172
But which subsets of the diagram should be included in ~LIFT laughs?
So that we can consider specific sentences, let’s assume that Albert is a name
for a, Beatrice is a name for b, and Clarissa is a name for c, and that the
common noun linguist is true of all of a, b, and c. Then some observations:
1. Albert laughs should be true, so ~Albert = {{a}, {a, b}, {a, c}, {a, b, c}}
should be in ~LIFT laughs.
2. Beatrice laughs should be true, so ~Beatrice = {{b}, {a, b}, {b, c}, {a, b, c}}
should be in ~LIFT laughs.
3. Clarissa laughs should be false, so ~Clarissa = {{c}, {a, c}, {b, c}, {a, b, c}}
should not be in ~LIFT laughs.
4. Some linguist laughs should be true, so ~some linguist = {{a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}
should be in ~LIFT laughs.
5. No linguist laughs should be false, so ~no linguist = {∅} should not
be in ~LIFT laughs.
6. Every linguist laughs should be false, so ~every linguist = {{a, b, c}}
should not be in ~LIFT laughs.
7. Most linguists laugh should be true, so ~most linguists = {{a, b}, {a, c}, {b, c}, {a, b, c}}
should be in ~LIFT laughs.
8. At most one linguist laughs should be false, so ~at most one linguist
= {∅, {a}, {b}, {c}} should not be in ~LIFT laughs.
Looking for patterns, one striking observation here is that:
• The set of laughers – {a, b}, or ~laughs↓ – is a member of the semantic
value of every quantified noun phrase that should be in ~LIFT laughs
and is not a member of the semantic value of any quantified noun phrase
that should not be in ~LIFT laughs.
This observation suggests a hypothesis:
• ~LIFT laughs↓ = {X ⊆ {a, b, c} : {a, b} ∈ X}.
Or abstracting from the particulars of what individuals there are and which
individuals laugh:
• ~LIFT laughs↓ = {X ⊆ e : ~laughs↓ ∈ X}
And from this we can extract a rule for ~LIFT laughs itself, rather than its
corresponding set:
• ~LIFT laughs = λx((e, t), t) .x(λy.y laughs)
Finally, from there we can give the semantic value of LIFT itself:
• ~LIFT = λy(e, t) .λx((e, t), t) .x(y)
173
AQ Quantifiers and Scope
We’ve now seen that by treating names and quantified noun phrases both as
type ((e, t), t) and by lifting verbs (and other associated expressions like adverbs,
although we haven’t focused on that aspect of things) to higher types such as
(((e, t), t), t) and (((e, t), t), (((e, t), t), t) (replacing e with ((e, t), t) throughout,
we can get a workable typing for a sentence like Aristotle admires every
linguist with a quantified noun phrase in object position, and with a bit more
work we can even get plausible truth conditions for such a sentence.
We can bring out the difference between these two readings using diagrams.
Consider the following situation S1 :
Aristotle Chomsky
Plato Partee
Socrates Kratzer
This situation has three philosophers (Aristotle, Plato, and Socrates) on the
right and three linguists (Chomsky, Partee, and Kratzer) on the left. The ar-
rows indicate the admiring relation, so Aristotle admires Partee, Plato admires
Chomsky, and Socrates admires Kratzer. This situation then makes true the ∀∃
reading of Every philosopher admires some linguist, since each of Aristo-
tle, Plato, and Socrates has a linguist they admire. But it does not make true
the ∃∀ reading of Every philosopher admires some linguist, since there is
no one linguist that is admired by every philosopher.
174
Aristotle Chomsky
Plato Partee
Socrates Kratzer
This view would let us say that the scope reading of a sentence is
fully determined by the linear order of the quantified noun phrases
in the sentence. Thus:
1. In Every philosopher admires some linguist, the every noun
phrase precedes the some noun phrase, so the sentence has ∀∃
truth conditions.
2. In Some linguist is admired by every philosopher, the some
noun phrase precedes the every noun phrase, s the sentence
has ∃∀ truth conditions.
But consider the sentence:
• Someone is killed in St Louis every 48 hours.
This sentence naturally receives a ∀∃ interpretation. (Does it also
allow a ∃∀ interpretation? If so, why is the ∀∃ interpretation fa-
vored?) But the linear order of its quantified noun phrases has some
before every, so the linear order of the quantifiers can’t always de-
termine the scoping.
175
Give three more examples of sentences whose natural reading has a
scoping that does not match the linear order of the quantified noun
phrases in the sentence. (Try to make your examples as different as
possible from the St Louis example.) Can you give an example of
a sentence with three quantified noun phrases that is at least three
ways ambiguous due to multiple scope options?
The fact that the ∃∀ reading implies the ∀∃ reading can create a
worry about whether there are really two distinct readings of the
sentence. Perhaps there is only the ∀∃ reading, and what we have
been identifying as the ∃∀ reading is just a specific way of mak-
ing that ∀∃ reading true. (Compare the sentence The linguists
fought the the philosophers. One way this sentence can be
made true is via a one-to-one pairing that has each linguist fight-
ing a unique philosopher and each philosopher fought by a unique
linguist. But it’s not obvious that we want to say that there is a
special one-to-one reading of the sentence, rather than just one kind
of scenario among many that satisfies the general requirement that
there be a lot of fighting between linguists and philosophers.)
To address this worry, we’ll show that with quantifiers other than
some and every, we can get two different scope readings such that
neither is equivalent to the other. For each of the following sen-
tences, give two scenarios, one of which makes one scoping true
and the other false and one of which makes the other scoping true
and the first scoping false.
1. Most philosophers admire most linguists.
2. At least three linguists admire at least two philosophers.
3. Few philosophers admire no philosophers.
4. All but one linguist admires all but one philosopher.
176
every philosopher admires
some linguist
and then assign semantic values to each of the leaves of the tree, it’s completely
settled by the rule of Functional Application what semantic values will be
assigned to all of the higher nodes of the tree, including the root node for the
entire sentence. There’s thus no opportunity to produce more than one reading
of the sentence.
[Skipped Stuff. Moving on to Variable-Relativized Semantics]
177
2. σ(1) , Aristotle
3. σ(1) = σ(3), σ(2) = σ(4), but σ(1) , σ(2)
4. σ(i) = σ(j) for all i, j.
5. σ(i) > i + 1 for all i
Problem 202: If there is only one object in type e, how many different
variable assignments are there? What if there are two objects in type
e?
So we need to rework our basic tools to make semantic values relative to variable
assignments. We’ll use two central ideas to do the reworking:
1. Assignment Insensitivity: Expressions that don’t contain variables won’t
be sensitive to the choice of variable assignment. That is, they will have
the same semantic value relative to every variable assignment. Given that
we previously said:
• ~laugh = λx.x laughs
and given that laugh does not contain any variables, we’ll now say:
178
• For any variable assignment σ, ~laughσ = λx.x laughs
Because σ isn’t mentioned anywhere in the expression λx.x laughs, it
doesn’t matter which variable assignment we pick. So suppose:
(a) σ1 = hAristotle, Socrates, Plato, . . . i
(b) σ2 = hParis, London, Berlin, . . . i
Then ~laughσ1 =~laughσ2 = λx.x laughs.
Now that we are relativizing semantic values to variable assignments,
there really isn’t any such thing as simple ~laugh any more. But since
~laughσ is the same no matter what σ is, we’ll nevertheless still some-
times just refer to ~laugh, where this is just shorthand for ‘~laughσ , but
we don’t care what σ is.’
2. Assignment Sensitivity: Variables are sensitive to the choice of assign-
ment function, in a very simple way. Given assignment function σ and
variable xi , we have:
• ~xi σ = σ(i)
So with σ1 and σ2 as before, we have:
• ~x1 σ1 = σ1 (1) = Aristotle
• ~x1 σ2 = σ2 (1) = Paris
• ~x2 σ1 = σ1 (2) = Socrates
• ~x2 σ2 = σ2 (2) = London
From this starting point, we can proceed to use functional application as before.
So consider x1 laughs. We have:
• ~x1 laughsσ
• = ~laughsσ (~x1 σ )
• = λx.x laughs(σ(1))
• = σ(1) laughs
Suppose that e contains only Aristotle, Plato, and Socrates, and that ~laughs
is the function:
Aristotle → >
• ~laughs = → ⊥
Plato
Socrates → >
179
2. σ2 = hPlato, Socrates, Aristotle, . . . i
3. σ3 = hSocrates, Plato, Aristotle, . . . i
Then we obtain:
1. ~x1 laughsσ1 =σ1 (1) laughs = Aristotle laughs = >
So we have:
• ~x1 admires x2 σ
• = (~admiresσ (~x2 σ ))(~x1 σ )
180
AT Variable Binding: The Single-Variable Case
Unfortunately, this creates a problem for our plan to deal with quantifier noun
phrases by using movement-altered trees. Consider a simple example such
as Some linguist laughs. We’re now considering the following movement-
altered tree for that sentence:
((e, t), t) t
e (e, t)
((e, t), ((e, t), t)) (e, t)
x1 laughs
some1 linguist
Unfortunately, the ((e, t), t) value provided by some linguist won’t func-
tionally combine with the t value provided by x1 laughs. Our second-order
property treatment of quantified noun phrases was built on the assumption
that these noun phrases would then combine with verb phrases of type (e, t),
but we’ve now given up that assumption by restructuring the syntactic trees.
We’re going to fix this by adding one more hidden bit of syntax. This last bit of
hidden syntax will convert the type t value of x1 laughs back into a type (e, t)
value suitable for combining with the type ((e, t), t) some linguist:
181
t
t
((e, t), ((e, t), t)) (e, t) λ1
e (e, t)
some1 linguist
x1 laughs
Now we need to explain how λ1 works. Here’s the basic idea. x1 laughs is of
type t, relative to a variable assignment σ. The job of the variable assignment
is, in this case, to provide a value to the variable x1 . Once that value has been
provided, we can work out the simple truth value of x1 laughs. So we have
the makings of a function from objects to truth values. Given any particular
object, we report back the truth value of x1 laughs when x1 is assigned to that
object. A bit more carefully, we have:
• λ1 applied to ~x1 laughsσ yields the function that maps any given object
o to the truth value of x1 laughs relative to the variable assignment σ0
which is just like σ except that it contains o in the first position instead of
whatever object σ contains in its first position.
Let’s go through this carefully. Suppose that the laughter facts are as we
considered earlier:
Aristotle → >
• ~laughs = → ⊥
Plato
Socrates → >
Suppose we have:
• σ = hAristotle, Plato, Socrates, . . . i
We now apply λ1 to ~x1 laughsσ . We get a function that maps each object to
some truth value. To build that function, let’s go through the objects:
• ~laughsσ (~x1 σ )
0 0
182
• = Aristotle laughs
• =>
Thus our new function maps Aristotle to >.
2. Second, Plato. We make a new variable assignment σ0 that is just like σ
except that it puts Plato in the first position. Thus:
• σ0 = hPlato, Plato, Socrates, . . . i
We then check ~x1 laughsσ . This is:
0
~laughsσ (~x1 σ )
0 0
•
• = λx.x laughs(σ0 (1))
• = λx.x laughs(Plato)
• = Plato laughs
• =⊥
Thus our new function maps Plato to ⊥.
3. Third, Socrates. We make a new variable assignment σ0 that is just like σ
except that it puts Socrates in the first position. Thus:
• σ0 = hSocrates, Plato, Socrates, . . . i
We then check ~x1 laughsσ . This is:
0
~laughsσ (~x1 σ )
0 0
•
• = λx.x laughs(σ0 (1))
• = λx.x laughs(Socrates)
• = Socrates laughs
• =>
Thus our new function maps Socrates to >.
Thus applying λ1 to ~x1 laughsσ produces a function that maps Aristotle and
Socrates to > and Plato to ⊥. That is, it produces the function:
Aristotle → >
• ~laughs = → ⊥
Plato
Socrates → >
That function is, of course, exactly the function we started with as ~laughs.
Our very complicated bit of formal machinery has just enabled us to re-extract
that function from the variable-assignment relativized t value assigned higher
in the tree. That’s a lot of work, but it does make things function properly. The
application of λ1 transforms the t value of x1 laughs into an (e, t) value, and
now that (e, t) value can be the input to the ((e, t), t) some linguist, and we’ll
get the right result.
183
AU Variable Binding: The Multiple-Variable Case
We’ve just shown that applying λ1 to ~x1 laughsσ produces the function:
Aristotle → >
• ~laughs = Plato → ⊥
Socrates → >
some1 linguist
λ1
every2 philosoper λ2
x1
admires x2
In particular, we’ll focus here just on the subtree:
λ2
x1
admires x2
184
Answer: Good question. Roughly, the answer is that we’re prepar-
ing to join x1 admires x2 with every philosopher, and (as the ’2’
subscript on every indicates) every philosopher is meant to con-
nect to the variable x2 rather than the variable x1 . But to get a more
complete answer, we’ll need to see how the details work out.
To do this carefully, we’ll need a specific admires function to work with. Con-
tinuing with e containing Aristotle, Plato, and Socrates, we’ll use:
Aristotle → ⊥
Aristotle → Plato
→ ⊥
Socrates → >
Aristotle → >
• ~admires = Plato → >
Plato
→
Socrates → ⊥
Aristotle → >
Socrates → Plato → ⊥
Socrates → >
1. ~admires x2 σ1 = ~admiresσ1 (~x2 σ1 ) = (λx.λy.y admires x)(σ1 (2)) = λy.y
Aristotle → ⊥
admires Aristotle. λy.y admires Aristotle is then the function Plato → ⊥ .
Socrates → >
2. ~admires x2 σ2 = ~admiresσ2 (~x2 σ2 ) = (λx.λy.y admires x)(σ2 (2)) = λy.y
Aristotle → >
admires Plato. λy.y admires Plato is then the function Plato → > .
Socrates → ⊥
3. ~admires x2 σ3 = ~admiresσ3 (~x3 σ3 ) = (λx.λy.y admires x)(σ3 (2)) = λy.y
Aristotle → >
admires Socrates. λy.y admires Socrates is then the function Plato → ⊥ .
Socrates → >
185
So we need to know what our variable assignment has in its first position. We
didn’t specify earlier, so now we’ll split each of our three variable assignments
into three subcases:
1. ~x1 admires x2 σ1 = ~admires x2 σ1 (~x1 σ1 ) = (λy.y admires Aristotle)(σ11 (1))
1 1 1
2. ~x1 admires x2 σ1 = ~admires x2 σ1 (~x1 σ1 ) = (λy.y admires Aristotle)(σ21 (1))
2 2 2
3. ~x1 admires x2 σ1 = ~admires x2 σ1 (~x1 σ1 ) = (λy.y admires Aristotle)(σ31 (1))
3 3 3
4. ~x1 admires x2 σ2 = ~admires x2 σ2 (~x1 σ2 ) = (λy.y admires Plato)(σ12 (1))
1 1 1
5. ~x1 admires x2 σ2 = ~admires x2 σ2 (~x1 σ2 ) = (λy.y admires Plato)(σ22 (1))
2 2 2
6. ~x1 admires x2 σ2 = ~admires x2 σ2 (~x1 σ2 ) = (λy.y admires Plato)(σ32 (1))
3 3 3
7. ~x1 admires x2 σ3 = ~admires x2 σ3 (~x1 σ3 ) = (λy.y admires Socrates)(σ13 (1))
1 1 1
8. ~x1 admires x2 σ3 = ~admires x2 σ3 (~x1 σ3 ) = (λy.y admires Socrates)(σ23 (1))
2 2 2
9. ~x1 admires x2 σ3 = ~admires x2 σ3 (~x1 σ3 ) = (λy.y admires Socrates)(σ33 (1))
3 3 3
186
assignment (σ11 )0 that is just like σ11 except that it contains o in its second
position (that is, (σ11 )0 )(2) = o.
There are three choices for o: Aristotle, Plato, and Socrates. Consider each
in turn:
(a) When o is Aristotle, (σ11 )0 is the variable assignment that is just like
σ11 except that (σ11 )0 (2)=Aristotle. In fact, σ11 (2) is already Aristotle, so
(σ11 )0 = σ11 . From above, we see that ~x1 admires x2 σ1 = ⊥, so our
1
(c) When o is Socrates, (σ11 )0 is the variable assignment that is just like σ11
except that (σ11 )0 (2)=Socrates. Thus (σ13 )0 = σ13 . From above, we see
that ~x1 admires x2 σ3 = >, so our new function maps Socrates to
1
>.
Applying λ2 to ~x1 admires x2 σ1 thus produces a function that maps
1
Notice that this function is (e, t), so once again the application of λ has
transformed a t value into an (e, t) value, which is what we want to make
the typing work out. Looking back at our original function assigned as
~admires, we see that this new (e, t) function we’ve obtained is just
λx.Aristotle admires x.
This function is not any of the three subfunctions that are produced as
output by applying ~admires to one of Aristotle, Plato, or Socrates.
Rather, it’s the function that can be found ‘hiding inside’ ~admires by
taking the first row of each of the three subfunctions.
2. Now let’s try applying λ2 to ~x1 admires x2 σ2 . When we apply λ2 to ~x1
1
admires x2 σ2 , we get the function that assigns to each object o the truth
1
(a) When o is Aristotle, (σ12 )0 is the variable assignment that is just like
σ12 except that (σ12 )0 (2)=Aristotle. Thus (σ12 )0 = σ11 . From above, we
see that ~x1 admires x2 σ1 = ⊥, so our new function maps Aristotle
1
to ⊥.
187
(b) When o is Plato, (σ12 )0 is the variable assignment that is just like σ12
except that (σ12 )0 (2)=Plato. Thus (σ12 )0 is just σ12 . From above, we see
that ~x1 admires x2 σ2 = >, so our new function maps Plato to >.
1
(c) When o is Socrates, (σ12 )0 is the variable assignment that is just like σ12
except that (σ12 )0 (2)=Socrates. Thus (σ12 )0 = σ13 . From above, we see
that ~x1 admires x2 σ3 = >, so our new function maps Socrates to
1
>.
Applying λ2 to ~x1 admires x2 σ2 thus produces a function that maps
1
Aristotle → ⊥
Aristotle to ⊥ and Plato and Socrates to >, which is the function Plato
→ . > .
Socrates → >
That’s the same function that we got when we applied λ2 to ~x1 admires
x2 σ1 . Once again, the function we’ve obtained is just λx. Aristotle admires
1
x.
3. Let’s do one more, this time applying λ2 to ~x1 admires x2 σ3 . When
1
to ⊥.
(b) When o is Plato, (σ13 )0 is the variable assignment that is just like σ13
except that (σ13 )0 (2)=Plato. Thus (σ13 )0 = σ12 . From above, we see that
~x1 admires x2 σ2 = >, so our new function maps Plato to >.
1
(c) When o is Socrates, (σ13 )0 is the variable assignment that is just like σ13
except that (σ13 )0 (2)=Socrates. Thus (σ31 )0 is just σ13 . From above, we
see that ~x1 admires x2 σ3 = >, so our new function maps Socrates
1
to >.
Applying λ2 to ~x1 admires x2 σ3 thus produces a function that maps
1
Aristotle → ⊥
Aristotle to ⊥ and Plato and Socrates to >, which is the function
Plato → . > .
Socrates → >
We’ve gotten the same function again – this is the function that we got
when we applied λ2 to ~x1 admires x2 σ1 and to ~x1 admires x2 σ2 .
1 1
188
So λ2 produces the same result when applied to any of ~x1 admires x2 σ1 , ~x1
1
admires x2 σ2 , or ~x1 admires x2 σ3 . That’s because σ11 , σ12 , and σ13 differ only
1 1
in what object they contain in their second position. But what λ2 does is allow
us to vary the object in the second position, so that it no longer matters what
object the original sequence put in that position.
When we apply λ2 to any of ~x1 admires x2 σ1 , ~x1 admires x2 σ2 , or ~x1
1 1
Let’s now calculate the result of applying λ2 to ~x1 admires x2 σ1 . When we
2
apply λ2 to ~x1 admires x2 σ1 , we get the function that assigns to each object
2
o the truth value of x1 admires x2 relative to the variable assignment (σ21 )0 that
is just like σ21 except that it contains o in its second position (that is, (σ21 )0 )(2) = o.
Once again we have three choices for o: Aristotle, Plato, and Socrates. Consid-
ering each in turn:
1. When o is Aristotle, (σ21 )0 is the variable assignment that is just like σ21
except that (σ21 )0 (2)=Aristotle. Thus (σ21 )0 is just σ21 again. From above, we
see that ~x1 admires x2 σ1 = ⊥, so our new function maps Aristotle to ⊥.
2
2. When o is Plato, (σ21 )0 is the variable assignment that is just like σ21 except
that (σ21 )0 (2)=Plato. Thus (σ21 )0 = σ22 . From above, we see that ~x1 admires
x2 σ2 = >, so our new function maps Plato to >.
2
3. When o is Socrates, (σ21 )0 is the variable assignment that is just like σ21
except that (σ21 )0 (2)=Socrates. Thus (σ21 )0 is just σ23 . From above, we see
that ~x1 admires x2 σ1 = ⊥, so our new function maps Socrates to ⊥.
2
189
What we end up with, then, is a function that maps Aristotle and Socrates to ⊥
Aristotle → ⊥
and Plato to >, or Plato → > . With a little inspecting of our original
Socrates → ⊥
input-output chart for ~admires, we can recognize this as λx.Plato admires x.
Summary: We won’t go through all the details for the remaining 5 variable
assignments. Here’s the final result:
1. When we apply λ2 to ~x1 admires x2 relative to any of σ11 , σ12 , or σ13 –
that is, relative to any variable assignment that has Aristotle in the first
position – we get the (e, t) function λx.Aristotle admires x.
2. When we apply λ2 to ~x1 admires x2 relative to any of σ21 , σ22 , or σ23 – that
is, relative to any variable assignment that has Plato in the first position –
we get the (e, t) function λx.Plato admires x.
3. When we apply λ2 to ~x1 admires x2 relative to any of σ31 , σ32 , or σ33 –
that is, relative to any variable assignment that has Socrates in the first
position – we get the (e, t) function λx.Socrates admires x.
So unlike applying λ1 to ~x1 laughs, the result of the λ application does de-
pend on what the relativizing variable assignment is. It just doesn’t depend too
much on what the relativizing variable assignment is. We get different outputs
from the λ application depending on what the relativizing variable assignment
has in its first position and assigns to x1 . Other than that, nothing about the
variable assignment matters. (In particular, what the variable assignment has
in its second position and assigns to x2 doesn’t matter to the outcome of the λ
application.)
At long last, we have all the pieces in place to work carefully through:
190
t
some1 linguist t
λ1
t
((e, t), ((e, t), t)) (e, t) λ2
every2 philosopher e (e, t)
admires x2
2. Tree ∀∃:
t
admires x2
Next we want a scenario in which to evaluate these two sentences. To make
things easy, we’ll consider a scenario with just two linguists (Chomsky and
Partee) and just two philosophers (Aristotle and Plato). We can then draw a
picture of the scenario:
191
Chomsky Aristotle
Partee Plato
Linguists Philosophers
The arrows represent the admiring relation, so Chomsky admires Plato (but
not Aristotle) and Partee admires Aristotle (but not Plato). We can thus give
the input-output table for the ~admires function:
Chomsky → ⊥
Partee → ⊥
Chomsky →
Aristotle → ⊥
Plato → >
Chomsky → ⊥
Partee → ⊥
Partee
→
Aristotle → >
Plato → ⊥
• ~admires =
Chomsky → ⊥
Partee → ⊥
Aristotle →
Aristotle → ⊥
Plato → ⊥
Chomsky → ⊥
→ ⊥
Partee
Plato →
Aristotle → ⊥
Plato → ⊥
Finally, because we have just the four individuals Chomsky, Partee, Aristotle,
and Plato involved, the variable assignments we need to consider only involve
those four individuals. Because our sentence involves only the variables x1 and
x2 , we only care about the first two positions of the variable assignment. That
means that there are 16 variable assignments that matter. For convenience, we’ll
just refer to these as σ(X, Y), where X and Y are the two objects in the first and
second positions of the variable assignment. Thus, for example, σ(Chomsky,
Partee) is the variable assignment with Chomsky in the first position and Partee
in the second position.
Now we have all the pieces in place and we can start calculating final semantic
values for Tree ∃∀ and Tree ∀∃. The first step is the same for both of them:
192
we need to work out the semantic value of x1 admires x2 relative to different
variable assignments. Looking at the scenario we’ve set up, we can quickly see:
• ~x1 admires x2 is > relative to σ(Chomsky,Plato) and σ(Partee,Aristotle).
• ~x1 admires x2 is ⊥ relative to all the 14 other variable assignments.
After this, things diverge between Tree ∃∀ and Tree ∀∃. So we’ll work through
the remainder of each tree separately.
Tree ∃∀:
1. The next step is to apply λ2 to ~x1 admires x2 . As we’ve seen above,
this produces an (e, t) function that depends only on the first object in the
variable assignment sequence. We get:
(a) Whenever σ has Chomsky in the first position (that is, we have
σ(Chomsky,·)), the application of λ2 to ~x1 admires x2 σ produces
the (e, t) function λx.Chomsky admires x.
(b) Whenever σ has Partee in the first position, (that is, we have σ(Partee,·))
the application of λ2 to ~x1 admires x2 σ produces the (e, t) function
λx.Partee admires x.
(c) Whenever σ has Aristotle in the first position, (that is, we have
σ(Aristotle,·)) the application of λ2 to ~x1 admires x2 σ produces
the (e, t) function λx.Aristotle admires x.
(d) Whenever σ has Plato in the first position (that is, we have σ(Plato,·)),
the application of λ2 to ~x1 admires x2 σ produces the (e, t) function
λx.Plato admires x.
2. In each case, we then need to provide that (e, t) function as input to
~every philosopher. Every philosopher is assignment-insensitive, so
no matter what σ is, ~every philosopherσ is the ((e, t), t) function:
• λx.{z : z is a philosopher} ⊆ {z : x(z) = >}
What happens when this functional application occurs depends on which
variable assignment we’re considering:
(a) For σ(Chomsky,·), we get:
• (λx.{z : z is a philosopher} ⊆ {z : x(z) = >})(λx.Chomsky ad-
mires x)
• = {z : z is a philosopher} ⊆ {z :Chomsky admires z}
• = ⊥, because Plato is a philosopher and Chomsky doesn’t admire
Plato.
(b) For σ(Partee,·), we get:
• (λx.{z : z is a philosopher} ⊆ {z : x(z) = >})(λx.Partee admires x)
• = {z : z is a philosopher} ⊆ {z :Partee admires z}
193
• = ⊥, because Aristotle is a philosopher and Partee doesn’t ad-
mire Aristotle.
(c) For σ(Aristotle,·), we get:
• (λx.{z : z is a philosopher} ⊆ {z : x(z) = >})(λx.Aristotle admires
x)
• = {z : z is a philosopher} ⊆ {z :Aristotle admires z}
• = ⊥, because Plato is a philosopher and Aristotle doesn’t admire
Plato.
(d) For σ(Plato,·), we get:
• (λx.{z : z is a philosopher} ⊆ {z : x(z) = >})(λx.Plato admires x)
• = {z : z is a philosopher} ⊆ {z :Plato admires z}
• = ⊥, because Aristotle is a philosopher and Plato doesn’t admire
Aristotle.
3. We’ve thus learned that every2 philosopher λ2 x1 admires x2 is false
relative to every assignment function.
4. Next we need to apply λ1 to the variable-assignment-relative t values
we’ve just calculated for every2 philosopher λ2 x1 admires x2 . Doing
this will create an (e, t) function that maps an object o to the truth value of
every2 philosopher λ2 x1 admires x2 relative to a variable assignment
that puts o in the first position.
But since every2 philosopher λ2 x1 admires x2 is false relative to every
variable assignment, the resulting truth value will always be ⊥. Thus
applying λ1 produces a function that maps each object to ⊥:
Chomsky → ⊥
Partee → ⊥
•
Aristotle → ⊥
Plato → ⊥
5. That function is the semantic value that results from applying λ1 to every2
philosopher λ2 x1 admires x2 . (Notice that the function we get, at this
point, no longer depends on what variable assignment we are relativizing
to. So we’ve achieved an assignment-insensitive semantic value.) Finally,
that function serves as input to ~some linguist. We have:
• ~some linguist = λx.{z : z is a linguist} ∩ {z : xz = >} , ∅
Chomsky → ⊥
Partee → ⊥
So applying ~some linguist to our function yields:
Aristotle → ⊥
Plato → ⊥
Chomsky → ⊥
Partee → ⊥
• (λx.{z : z is a linguist} ∩ {z : xz = >} , ∅)( )
Aristotle → ⊥
Plato → ⊥
194
Chomsky → ⊥
Partee → ⊥
• = {z : z is a linguist} ∩ {z : (z) = >} , ∅
Aristotle → ⊥
Plato → ⊥
• = {z : z is a linguist} ∩ ∅ , ∅
• =⊥
So we conclude that, in our scenario, some linguist admires every philosopher
is false when given the existential-universal reading. This is a triumph, because
the sentence should be false in that scenario – we don’t have any one linguist
who admires all of (both of) the philosophers.
(Keep in mind that strictly speaking, some linguist admires every philosopher
in its Tree ∃∀ reading gets a truth value relative to a variable assignment. But the
final sentence is assignment-insensitive, so it gets the same truth value (namely,
⊥, in our scenario) relative to every variable assignment. Notice that there are
three t nodes in Tree ∃∀. The lowest t node gets a truth value in a way that in
sensitive to both the first and the second objects in the variable assignment. The
middle t node, which occurs after application of the λ2 operator, gets a truth
value in a way that is sensitive to the first object in the variable assignment
but not anymore to the second object in the variable assignment. And the top
t node, which occurs after application of the λ1 operator, isn’t sensitive to the
first object in the variable assignment, either.)
Tree ∀∃: Now we need to see what happens when we process things using the
other tree. As before, we know that ~x1 admires x2 is > relative to σ(Chomsky,
Plato) and σ(Partee, Aristotle), but ⊥ relative to all other variable assignments.
But this time the next step is to apply λ1 rather than λ2 . When we apply λ1 , we
make a new (e, t) function by seeing what truth value is produced by setting
a given input object to be the first value in our variable assignment. That
means it won’t matter what object our starting variable assignment has in its
first position (since that object will just be ‘overwritten’ as we change the first
position), but it will matter what object our starting variable assignment has
in its second position. So there are four cases to distinguish for the result of
applying λ1 to ~x1 admires x2 :
195
2. Variable assignments of the form σ(·,Partee): In these cases we check
whether the chosen object admires Partee, so the result of apply λ1 is λx.x
Chomsky → ⊥
Partee → ⊥
admires Partee, or .
Aristotle → ⊥
Plato → ⊥
We’ve now created an (e, t) value for λ1 x1 admires x2 , with the specific (e, t)
function depending on which variable assignment we’re relativizing to.
The next step is to apply the ((e, t), t) function ~some linguist to the (e, t)
functions we’ve just derived. We have:
• ~some linguist = λx.{z : z is a linguist} ∩ {z : x(z) = >} , ∅
Now we go through our four cases:
1. When we have σ(·,Chomsky), then we apply λx.{z : z is a linguist} ∩ {z :
x(z) = >} , ∅ to λx.x admires Chomsky:
• (λx.{z : z is a linguist} ∩ {z : x(z) = >} , ∅)(λx.x admires Chomsky)
• = {z : z is a linguist} ∩ {z : z admires Chomsky} , ∅
Chomsky → ⊥
Partee → ⊥
• = {z : z is a linguist} ∩ {z : (z) = >} , ∅
Aristotle → ⊥
Plato → ⊥
• {z : z is a linguist} ∩ ∅ , ∅
• =∅,∅
• =⊥
2. When we have σ(·,Partee), then we apply λx.{z : z is a linguist}∩{z : x(z) =
>} , ∅ to λx.x admires Partee:
• (λx.{z : z is a linguist} ∩ {z : x(z) = >} , ∅)(λx.x admires Partee)
• = {z : z is a linguist} ∩ {z : z admires Partee} , ∅
196
Chomsky → ⊥
Partee → ⊥
• = {z : z is a linguist} ∩ {z : (z) = >} , ∅
Aristotle → ⊥
Plato → ⊥
• {z : z is a linguist} ∩ ∅ , ∅
• =∅,∅
• =⊥
3. When we have σ(·,Aristotle), then we apply λx.{z : z is a linguist} ∩ {z :
x(z) = >} , ∅ to λx.x admires Aristotle:
• (λx.{z : z is a linguist} ∩ {z : x(z) = >} , ∅)(λx.x admires Aristotle)
• = {z : z is a linguist} ∩ {z : z admires Aristotle} , ∅
Chomsky → >
Partee → ⊥
• = {z : z is a linguist} ∩ {z : (z) = >} , ∅
Aristotle → ⊥
Plato → ⊥
• {z : z is a linguist} ∩ {Chomsky} , ∅
• = {Chomsky, Partee} ∩ {Chomsky} , ∅
• = {Chomsky} , ∅
• =>
4. When we have σ(·,Plato), then we apply λx.{z : z is a linguist} ∩ {z : x(z) =
>} , ∅ to λx.x admires Plato:
• (λx.{z : z is a linguist} ∩ {z : x(z) = >} , ∅)(λx.x admires Plato)
• = {z : z is a linguist} ∩ {z : z admires Plato} , ∅
Chomsky → ⊥
Partee → >
• = {z : z is a linguist} ∩ {z : (z) = >} , ∅
Aristotle → ⊥
Plato → ⊥
• {z : z is a linguist} ∩ {Partee} , ∅
• = {Chomsky, Partee} ∩ {Chomsky} , ∅
• = {Partee} , ∅
• =>
The upshot of all of that is that some linguist λ1 x1 admires x2 is true rela-
tive to variable assignments that have Aristotle or Plato in their second position,
but false relative to variable assignments that have Chomsky or Partee in their
second position:
197
• ~some linguist λ1 x1 admires x2 σ(·,Aristotle) = >
The final step is then to use that (e, t) function as input to the ((e, t), t) typed
~every philosopher. We have:
• ~every philosopher = λx.{z : z is a philosopher} ⊆ {z : x(z) = >}
So we do a last bit of calculation:
Chomsky → ⊥
Partee → ⊥
• = λx.{z : z is a philosopher} ⊆ {z : x(z) = >}( )
Aristotle → >
Plato → >
Chomsky → ⊥
Partee → ⊥
• = {z : z is a philosopher} ⊆ {z : (z) = >}
Aristotle → >
Plato → >
• = {Aristotle, Plato} ⊆ {Aristotle, Plato}
• =>
So the universal-existential reading Tree ∀∃ comes out true in our scenario.
That’s the right verdict, so it’s a complete triumph for our test run of the theory
of variable binding.
198
Show that this sentence is ambiguous, specifying two readings of
it. Then propose two post-movement trees for the two readings.
Finally, attempt a detailed calculation of the semantic values for both
trees. One of the two trees should be reasonably straightforward,
but the other may create some complications. Consider whether
you can use the same value for ~in in both cases.
Problem 208: Suppose we add more lambda term to the top of one
of our trees for Some linguist admires every philosopher:
t
λk
admires x2
What happens at the new top node resulting from applying λk to
the result we calculated above? What type is the output semantic
value, and what specific semantic value within that type do we get?
How does the result vary as we vary the particular k that we use for
λk ?
AW Resting On Our Laurels and Reflecting on How the Pieces Fit To-
gether
With a lot of work, we’ve produced a very general theory of quantified noun
phrase. The three central ideas of this theory are:
199
1. Quantified noun phrases are always of type ((e, t), t).
2. Using syntactic movement rules, quantifier noun phrases are systemati-
cally moved to the tops of trees and adjoined to S nodes of type t.
3. λ terms convert the relativized behavior of type-t nodes in new type (e,
t) nodes.
By putting all quantified noun phrases in type ((e, t), t), we get a rich standard
sandbox for characterizing quantified noun phrases. We can work out general
structural features of this category, such as the features of monotonicity and
strength and weakness we’ve explored above, and think about how different
quantified noun phrases are characterized by these structural features.
The price that’s paid for putting all of the quantified noun phrases in the
same typed sandbox is trouble in getting quantified noun phrases to combine
via functional application in a wide range of syntactic settings. Quantified
noun phrases can appear as subjects, as objects of transitive verbs, as objects
of prepositions, as indirect objects of ditransitive verbs, and so on. But when
those various syntactic contexts are built around an underlying core semantic
type e, they won’t all successfully interact with ((e, t), t).
The second central idea then pays that price. By using syntactic movement to
relocate quantified noun phrases to the tops of trees, adjoined to S nodes, we
put all quantified noun phrases in the same syntactic position, so that quanti-
fied noun phrases all need to interact with semantic values of whole sentences.
Thus we’re able to use the same semantic type for all quantified noun phrases.
The remaining difficulty is that the syntactic environment that we’ve put all
quantified noun phrases in doesn’t quite match the type-theoretic demands of
((]bf e, t), t). By adjoining quantified noun phrases to S nodes, we’ve set them
up to receive type t inputs, which is not the right input for a ((e, t), t) function.
So the final central concept is the use of λ terms. These terms, together with
variable-assignment-relativized semantic values throughout, allow us to shift
the type t values of the S nodes back to (e, t) values that are suitable to serve as
inputs to type ((e, t), t).
So the resulting picture lets us freely deploy members of the nicely-explored ((e,
t), t) category in lots of different places in sentences, and to get a variety of scop-
ing interactions among those different members by using syntactic movement
to rearrange them in different orders at the tops of trees. It’s a very powerful
and elegant picture that gives us a rich theory of a useful fragment of natural
language.
Let’s consider a sketch of how this resulting picture can let us handle a compli-
cated example. Consider the sentence:
• Most linguists read few books in most libraries.
200
This sentence contains three quantified noun phrases: most linguists, few
books, and most libraries. We can situate all three of these quantified noun
phrases in our general theory by giving appropriate ((e, t), ((e, t), t)) semantic
values for most and few:
1. ~most = λx.λy.|x↓ ∩ y↓ | > |x↓ − y↓ |
2. As object of the transitive verb read, which should be of type (e, (e, t)).
3. As object of the preposition in, which should be of type (e, ((e, t), (e, t))).
But our movement rules will pull all three quantified noun phrases to the top of
the tree, leaving type e variables behind that combine unproblematically with
all of those contexts. We get trees like:
most libraries
λ3
few books
λ2
in x3
most linguists λ1
x1 read
x2
Because there are three quantified noun phrases, there are in principle six dif-
ferent orders in which those three phrases can be moved to the top of the
tree. However, there’s an extra constraint at work here. The phrase most
libraries leaves behind the variable x3 (given the way we’re numbering the
variables). So we want most linguists to appear higher in the tree than λ3 ,
which in turn appear higher in the tree than x3 . But since x3 is the result of
201
the movement of most libraries out of few books in most libraries, we
therefore need most libraries to move higher in the tree than few books in
most libraries (which will then have become few books in x3 ).
That means there are really only three orders available for the three phrases (in
order from top to bottom):
1. most libraries, few books in most libraries, most linguists
2. most libraries, most linguists, few books in most libraries
3. most linguists, most libraries, few books in most libraries
most linguists
λ1
few books
λ2
in x3
most libraries λ3
x1 read
x2
What final semantic value do we get for the entire sentence with
this tree if we apply our rules?
202
λ2
λ1
most libraries
few books λ3
x1
in x3 read x2
What sort of semantic value results when we use one of these other
orders as the starting tree?
We’ve been a little cagey about how exactly these λ terms we have been using
work. In particular, we’ve carefully avoided putting a semantic typing on them.
It’s time to think more carefully about that issue. Consider a simple example
of the use of a lambda term:
t
((e, t), ((e, t), t)) (e, t) λ
e (e, t)
some linguist
x1 laughed
If everything is going to proceed via functional application as normal, this tree
requires λ to be type (t, (e, t)), so that it can map the type t input it receives
from x1 laughed to the type (e, t) output that the type ((e, t), t) expression some
linguist requires.
But making λ of type (t, (e, t)) creates serious problems. Here’s a first-draft
statement of the problem:
• There are only two truth values in t: > and ⊥. That means there are only
two outputs that λ can produce, which means that there are only two (e,
203
t) values that can be provided as input to the quantified noun phrase. But
two different (e, t) values isn’t enough. Consider the following collection
of data:
1. First triad:
– Some cat is a reptile: ⊥
– Every cat is a reptile: ⊥
– No cat is a reptile: >
2. Second triad:
– Some cat is hungry: >
– Every cat is hungry: ⊥
– No cat is hungry: ⊥
3. Third triad:
– Some cat is a mammal: >
– Every cat is a mammal: >
– No cat is a mammal: ⊥
In each of these three triads, we use some expression to provide an (e, t)
input to three different quantified noun phrases: some cat, every cat,
and no cat. The inputs are provided by λ x1 is a reptile, λ x1 is
hungry, and λ x1 is a mammal.
But we get different patterns of outputs for each of the three inputs. When
we use λ x1 is a reptile to provide the input, some cat and every
cat produce ⊥ as output, but no cat produces > as output. However,
when we use λ x1 is hungry to provide the input, some cat produces
> as output while every cat and no cat produce ⊥ as output. That
means that λ x1 is a reptile and λ x1 is hungry can’t be producing
the same output – if they did, they’d produce the same result when
combined with any ((e, t), t) expression.
And then λ x1 is a mammal produces yet another pattern when com-
bined with the various quantified noun phrases. For the input provided
by λ x1 is a mammal, some cat and every cat produce > as output
while no cat produces ⊥ as output. But now we need λ to be producing
three different outputs. But that’s impossible. Since λ takes a t value as
input, and since there are only two different t values, λ can produce only
two different outputs.
The first-draft statement of the problem is on the right track, but it doesn’t
get things exactly right. It’s true that there are only two members of type t,
so that if x1 laughed is type t, there are only two possible inputs to a type (t,
(e, t)) λ term. But this overlooks the fact that x1 laughed can have different t
values relative to different variable assignments. That was the whole point of
introducing variable assignment relativized semantic values – it wouldn’t be a
surprise if ignoring the relativization got us into trouble in understanding the
204
typing of the λ terms.
• ~λ(~x1 laughedσ)
~x1 laughedσ can be different truth values for different choices of σ, so we can
get different outputs from the application of λ. That’s helpful, but it doesn’t fix
the underlying problem.
(a) σ1 assigns x1 to C1
(b) σ2 assigns x1 to C2
Then we will have:
(a) ~x1 is hungryσ1 = >
(b) ~x1 is hungryσ2 = ⊥
What then happens when we apply λ to these assignment-relative truth
values? The whole point of paying attention to the assignment-relativity
was to get different inputs to λ (relative to different variable assignments),
so if we’re going to get anything useful out of this, λ had better produce
different outputs from these different inputs. So let’s assume:
(a) ~λ(~x1 is hungryσ1 ) = ~(λ)(>) = some (e, t) value V1 picking out
a set that has a non-empty overlap with the set of cats.
(b) ~λ(~x1 is hungryσ2 ) = ~(λ)(⊥) = some (e, t) value V2 picking out
a set that has an empty overlap with the set of cats.
We then have:
(a) ~some cat is hungryσ1 :
• ~some cat is hungryσ1
205
• = ~some cat λ x1 is hungryσ1
• = ~some catσ1 (~λ x1 is hungryσ1 )
• = ~some cat(~λσ1 (~x1 is hungryσ1 ))
• = ~some cat(V1)
• =>
(b) ~some cat is hungryσ2 :
• ~some cat is hungryσ2
• = ~some cat λ x1 is hungryσ2
• = ~some catσ2 (~λ x1 is hungryσ2 )
• = ~some cat(~λσ2 (~x1 is hungryσ2 ))
• = ~some cat(V2)
• =⊥
But this is an undesirable result. We get the sentence some cat is hungry
to be true relative to one variable assignment and false relative to an-
other. However, some cat is hungry shouldn’t be assignment-sensitive.
Rather, it should be simply true (since there is, in fact, a cat C1 that is hun-
gry).
The central problem here is that if we are going to use the relativization of
semantic values to variable assignments to get a greater diversity of inputs
to λ, we have to accept to relativization of semantic values throughout the
entire system. And that means we need to relativize to σ both above and
below the λ term in the tree – we get a diversity of relativized inputs to λ,
but we thereby also get a diversity of relativized outputs from λ, and thus
a diversity of relativized final truth values for the sentence. But that’s
not what we wanted. We wanted a single, unrelativized, truth value for
the final sentence, which means we wanted λ to somehow seal off the
relativization to variable assignment that x1 is hungry correctly shows.
But nothing in our current type-theoretic tools allows λ to perform that
sealing off.
2. Inheriting relativization of semantic values all the way up to the level of
complete sentences isn’t the only problem we face when we pay careful
attention to the effect of relativization on treating λ terms as type (t, (e,
t)). In addition, we’ll still find that we don’t get enough diversity in the
outputs of λ terms to get reasonable results.
AY Setwise Types
206
• ~x1 = λxs .x(1)
Proper names could also be of type (s, e), but would be constant functions that
didn’t depend on the input variable assignment, as in:
• ~Aristotle = λxs .Aristotle
In the same way, whole sentences could be of type (s, t), as in:
• ~x1 laughed = λxs .x(1) laughed
And sentences without free variables could also be of type (s, t), but again using
constant functions:
• ~Aristotle laughed = λxs .Aristotle laughed
We’ve been considering a type ((e, t), t) semantic framework in which we can
handle a wide range of quantified noun phrases such as:
• some king
• no linguist
• most philosophers
Another construction that looks like it has the same syntactic form as these
quantified noun phrases is the possessive noun phrase:
• Chomsky read my book.
• Your car hit a deer.
• Aristotle’s objection stymied Plato.
• The unhappy linguist’s party depressed everyone.
Let’s see if we can use some of the same semantic tools to model the possessive
noun phrases.
We’ll start with Aristotle’s objection stymied Plato. First we need a syn-
tactic structure for the sentence. A first draft is:
stymied Plato
Aristotle’s objection
207
In this structure, Aristotle’s objection is a quantified noun phrase with
determiner Aristotle’s and noun objection.
That is, there is some specific object – Aristotle’s objection – and the semantic
value of Aristotle’s objection is the second-order property of being a prop-
erty that that object has.
208
In each case, the restrictor answers the question: which man? (The one who
sold the world, the tall dark one, the one in the corner.) We have a model
for how adjectives and prepositional phrases restrict the quantification. These
expressions are type ((e, t), (e, t)), so they map the type (e, t) man to a restricted
(e, t) which then provides the domain of quantification for the. Can we tell a
similar story for relative clauses?
Let’s start by considering our initial typing constraints. We have the tree:
laughed
((e, t), ((e, t), t)) (e, t)
the
(e, t) ((e, t),(e, t))
man
(e, t)
who
(e, (e, t)) e
Problem 211: Using this semantic value for who as well as plausible
semantic values for the other words in the sentence, calculate the
final semantic value of The man who sold the world laughed. Is
the result plausible?
209
the semantics of the sentence. Does the more complicated semantic
treatment of the world have an effect on the appropriate semantic
value for who?
However, other cases of relative clauses don’t work out so nicely. Consider:
• The man who I admire laughed
Everything looks fine when we first consider the typing of the tree:
laughed
((e, t), ((e, t), t)) (e, t)
the
(e, t) ((e, t),(e, t))
man
((e, t), ((e, t), (e, t)) (e, t)
I admire
But trouble is lurking. A first sign of the lurking trouble is that this tree makes
I admire a constituent, but it shouldn’t be. Admire is a transitive verb, and
should combine with its object, not with the subject I, to form a constituent.
And the full trouble comes out when we consider the semantic consequences
of those constituency facts. Because the semantic value of admire is λx.λy.y
admires x, admires can combine with I, but the combination produces the (e,
t) property admiring me. And as a result, the entire sentence ends up meaning
that the man who admires me laughed, which is the wrong meaning.
Problem 213: Using the above tree for The man who I admire
laughed, calculate the final truth value for the sentence, and confirm
that it has the (incorrect) truth conditions that the man who admires
me laughed.
We can start to see what went wrong by noticing that, in a less colloquial register
of English, the who in The man who I admire laughed should have been whom.
That’s because the relative pronoun who/whom is, in some sense, playing the
role of direct object for admire. What we want is a movement-based syntactic
210
analysis something along the lines of:
laughed
the
man
whom
I
admire whom
If we then assume that when whom moves, it leaves a variable of type e, we can
begin typing the tree to figure out how the moved whom functions:
laughed
((e, t), ((e, t), t)) (e, t)
the
(e, t)
t
man whom
e (e, t)
admire x1
We need some method of transitioning from a t value (for I admire x1 ) to
something (e, t)-like. But this is a familiar problem, especially given the pres-
ence of a variable. We need to insert a lambda abstractor:
211
t
laughed
((e, t), ((e, t), t)) (e, t)
the
(e, t) ((e, t), (e, t))
man
((e, t), ((e, t), (e, t))) (e, t)
whom t
λ1
e (e, t)
admire x1
And now all the typing works out. Furthermore, we can use the intersective
semantic value for whom that we used above for who. Sketching the derivation,
we have:
• ~I admire x1 g = > if and only if I admire g(1).
• So ~λ1 I admire x1 is the (e, t) function that maps an object to > if I
admire that object.
• So The man whom I admire laughed is true if and only if the unique
thing that is both a man and something I admire laughed.
212
Determine whether the intersective ((e, t), ((e, t), (e, t))) treatment of
who given above will also work for these uses of relative pronouns.
In doing so, consider what the right syntactic analysis of The man
for whom the bell tolled died should be, give a speculative se-
mantic value for for, and determine what truth conditions for the
sentence are produced.
213
1.
λ3
some x3 laugh
linguist
whom
λ1
every philosopher λ2
x2 .
admires x1
2.
every philosopher
λ2
λ3
some x3 laugh
linguist
whom
λ1
x2 .
admires x1
Calculate final semantic values for both of these trees. Are both
truth conditions available as readings of the English sentence Some
linguist whom every philosopher admires laughed? Consider
214
other quantified noun phrases embedded in relative clauses and
their scope interactions with other quantified noun phrases in the
same sentence. Are there any good generalizations on which scope
readings are and are not available?
BC Possible Worlds
The framework we’ve been developing is centered around the thesis that the
semantic type of a full sentence is t. The assumption that the root node of a
sentence’s tree is type t helps drive the typing of the other nodes. But there is
a disadvantage to typing sentences t.
215
• Chomsky is a linguist.
But since sentences have semantic value of type t, we have:
• ~Aristotle is a philosopher = > = ~Chomsky is a linguist.
Thus Aristotle is a philosopher and Chomsky is a linguist have the
same semantic value. But that means that the two sentences are synonymous,
and that’s obviously an unacceptable conclusion.
More generally, the way things are currently set up, (i) every sentence has a
semantic value of type t, and (ii) there are only two truth values in t. Thus
each sentence has one of only two possible semantic values. All of the true
sentences are synonymous with one another, and all of the false sentences are
synonymous with one another.
216
To account for the semantic difference between Aristotle is a philosopher
and Chomsky is a linguist we’ll thus add tools for talking about the possible
truth values of sentences under various possible circumstances.
The central tools we need are possible worlds. A possible world is a maximally
specific state of affairs – a state of affairs that fully specifies everything about
the world, leaving nothing undetermined. For our purposes, we can think of
worlds as being given in two steps:
1. First we give a modelling language. The modelling language contains a
limited number of sentences that specify the basic features of a possible
world – roughly, enough features that we’ve fixed exactly how everything
is in the world, but not so many features that there are redundancies in
how we describe the world.
2. Second, we give a particular possible world by setting the truth value of
each sentence in the modelling language.
Given this modelling language, there are then four possible worlds we can
consider:
1. World w1 :
• Aristotle is a philosopher: >
• Chomsky is a linguist: >
2. World w2 :
• Aristotle is a philosopher: >
• Chomsky is a linguist: ⊥
3. World w3 :
• Aristotle is a philosopher: ⊥
• Chomsky is a linguist: >
4. World w4 :
• Aristotle is a philosopher: ⊥
• Chomsky is a linguist: ⊥
217
Notice that Aristotle is a philosopher is true only in w1 and w2 , while
Chomsky is a linguist is true only in w1 and w3 . That’s promising, because
it marks a difference between these two sentences that we were struggling to
differentiate using our previous semantic tools.
To make good on that promise, we’ll return to the idea of relativizing semantic
values that we investigated earlier. Previously we relativized semantic values
to variable assignments, but now we will relativize semantic values to possible
worlds. Instead of saying:
• ~Aristotle is a philosopher = >
4. ~Aristotle is a philosopherw4 = ⊥
So we can introduce a new kind of semantic value for sentences. We’ll use
~Aristotle is a philosopher to pick out the set of worlds in which Aristotle
is a philosopher is true:
218
Two important notes about worldly semantic values:
1. Worldly semantic values, at least as we’ve designed them so far, are
features only of entire sentences. ||Aristotle||, for example, is just unde-
fined.
Problem 220: It isn’t actually true, given what we’ve said so far,
that ||Aristotle|| is undefined. If we just apply the definition,
what do we get for ||Aristotle||? Explain why, however, that’s
not a reasonable or helpful result to get, and why we should just
treat subsentential expressions like Aristotle as being outside
the domain of the ||·|| function.
219
worldly equivalent if ||S1 || = ||S2 ||. Given our modelling language
with n sentences, what is the maximum number of non-worldly-
equivalent sentences?
220
Call the set of all of the possible worlds W. Then we can say:
• ||Aristotle is not a philosopher|| = W - ||Aristotle is a philosopher||
This feature generalizes to the interaction of every sentence with negation:
Claim: Given any sentence S, ||not S|| = W - ||S||.
Proof: We show two things: (i) any world in ||S|| is not in ||not S||,
and (ii) any world not in ||S|| is in ||not S||.
1. Suppose w ∈ ||S||. Then ~Sw = >. Since negation inverts truth
values, we have ~not Sw = ⊥. So w < ||not S||.
2. Suppose w < ||S||. Then ~Sw = ⊥. Since negation inverts truth
values, we have ~not Sw = >. So w ∈ ||S||.
From this, we see that ||S||∩||not S|| = ∅ and ||S||∪||not S|| = W. Thus
||not S|| = W - ||S||.
Thus at the level of worldly values, the effect of negation is set-theoretic comple-
mentation – the worldly value of the negation of any sentence is the complement
of the worldly value of the original sentence.
Problem 223: What is the relation between ||S|| and ||not not S||?
We can find similar rules connecting other connectives to worldly truth values:
1. And: Suppose we have two sentences A and B, and we know ||A|| and
||B||. Then we can from that information determine ||A and B||:
Claim: ||A and B|| = ||A|| ∩ ||B||
Proof: We show two things: (i) any world in both ||A|| and ||B||
is also in ||A and B||, and (ii) any world in wvA and B is in both
||A|| and ||B||.
(a) Suppose w ∈ ||A|| and w ∈ ||B||. Then ~Aw = > and ~Bw =
>. But from this it follows that ~A and Bw = >. Thus w ∈
||A and B||.
(b) Suppose w ∈ ||A and B||. Then ~A and Bw = >. But from
this it follows that ~Aw = > and ~Bw = >. Thus w ∈ ||A||
and w ∈ ||B||.
Therefore ||A and B|| = ||A|| ∩ ||B||.
Thus at the level of worldly values, the effect of conjunction is intersection
– the worldly value of the conjunction of two sentences is the intersection
of the worldly values of the two original sentences.
221
(a) ||A and not A||
(b) ||not(A and not A)||
Problem 225: Prove that for any sentence S, ||S|| = ||S and S||.
2. Or: Suppose we have two sentences A and B, and we know ||A|| and ||B||.
Then we can from that information determine ||A or B||:
222
B
if A
and then define ~if = λxt .λyt .(x = ⊥ ∨ y = >)
Now suppose we have two sentences A and B and we know ||A|| and ||B||.
Then we can from that information determine ||if A, B||:
Proof: We show two things: (i) any world that is in ||if A, B||
is either in ||B|| or not in ||A||, and (ii) any world that is either in
||B|| or not in ||A|| (or both) is in ||if A, B||.
(a) Suppose w ∈ ||if A, B||. Then ~if A, Bw = >. Therefore
either ~Aw = ⊥ or ~Bw = >. If ~Sw = ⊥ then w < ||A||,
so w ∈ W-||A||, so w ∈ (W-||A|| ∪ ||B||. And if ~Bw = >, then
w ∈ ||B||, so w ∈ (W-||A||) ∪ ||B||. So in either case, w ∈ (W -
||A||) ∪ ||B||.
(b) Suppose w ∈ W - ||A||. Then w < ||A||. Thus ~Aw = ⊥, so
~if A, Bw = >. Thus w ∈ ||if A, B||.Suppose instead
that w ∈ ||B||. Then ~Bw = >, so ~if A, Bw = >. Thus
again w ∈ ||if A, B||. So in either case, w ∈ ||if A, B||.
Therefore ||if A, B|| = (W - ||A||) ∪ ||B||.
Problem 227: Show that ||A|| ⊆ ||B|| if and only if ||if A, B|| =
W.
223
BE Modals
So far the relativization of truth values to worlds is an idle wheel in our for-
malism. Having world-relativized truth values lets us distinguish between
sentences that have the same unrelativized truth value, and lets us make some
useful observations about the connection between truth-functional connectives
and operations on sets. But we don’t do anything with world-relativized truth
values = they don’t serve as input to any further mechanism in our formal
machinery.
S
Necessarily
tigers
are mammals
And now we can quickly see why we need world-relativized truth values to
make sense of modals. We start with four observations:
1. Tigers are mammals is true.
2. Tigers are striped is true.
3. Necessarily tigers are mammals is true.
4. Necessarily tigers are striped is false.
224
If necessarily combined with non-relativized simple t values, then necessarily
would receive the same input when combined with either of Tigers are
mammals or Tigers are striped (namely, >). But if necessarily receives
the same input, it will produce the same output, so we would be forced to give
the same truth value to Necessarily tigers are mammals and Necessarily
tigers are striped. Since we want those two sentences to have different
truth values, rather than the same truth value, we need necessarily to take
something other than simple truth values as input.
World-relativized truth values will do the trick. Tigers are mammals and
Tigers are striped don’t have the same truth value relative to every world.
Precisely because Necessarily tigers are striped is false, we know there is
some possibility of non-striped tigers. Let w be a world in which tigers are not
striped. Since Necessarily tigers are mammals is true, tigers are mammals
in w. Thus we have:
• ~Tigers are mammalsw = >
Necessarily
λw
tigers
are mammals
The two full sentence nodes for tigers are mammals and necessarily
tigers are mammals are of type t, but relativized to a choice of world:
225
t
Necessarily
t
λw
tigers
are mammals
We aren’t worrying yet about how to incorporate world-relativity into
semantic values at the sub-sentential level, so we won’t worry here about
the typing of tigers, are, or mammals. λw will then convert the world-
relativized t of tigers are mammals into a property of worlds, and hence
a value of type (w, t). Finally, necessarily provides a second-order prop-
erty of worlds (just as quantified noun phrases provide a second-order
property of objects), and is of type ((w, t), t):
Necessarily
t
λw
tigers
are mammals
Now we need to characterize λw and give a specific semantic value for
necessarily:
(a) λw takes as input a node with world-relativized truth-values and
outputs a (w, t) function. In particular, if N is the input node, then
(λw N)(w) = > if and only if ~Nw = >.
(b) Necessarily takes a (w, t) function as input, and produces > as
output if and only if the (w, t) input maps every world to >. Thus
~Necessarilyu = λx(w, t) .∀w x(w) = >. (For any world u – note
that while necessarily gets a world-relativized semantic value, it
is insensitive to the choice of world.)
To test this, suppose we have four world w1 , w2 , w3 , and w4 , and that
Tigers are mammals is true in each of those worlds, but Tigers are
striped is true only in w1 , w2 , and w3 . That is:
• ~Tigers are mammalsw1 =~Tigers are mammalsw2 =~Tigers are
mammalsw3 =~Tigers are mammalsw4 =~Tigers are stripedw1 =~Tigers
are stripedw2 =~Tigers are stripedw3 = >.
226
• ~Tigers are stripedw4 = ⊥.
Then we have:
w1 → >
w → >
• λw Tigers are mammals = 2 .
w3 → >
w4 → >
w1 → >
w → >
• λw Tigers are striped = 2 .
w3 → >
w4 → ⊥
w1 → >
w → >
• = ∀w 2 (w) = >
w3 → >
w4 → >
• =>
So Necessarily tigers are mammals comes out true, as desired. (No-
tice that the final semantic value of Necessarily tigers are mammals is
> relative to every possible world, so that the sentence is world-insensitive
in its semantic value.) And then:
• ~Necessarily tigers are stripedu
• = ~Necessarilyu (~λw tigers are stripedu )
w1 → >
w → >
• = (λx(w, t) .∀w x(w) = >)( 2 )
w3 → >
w4 → ⊥
w1 → >
w → >
• = ∀w 2 (w) = >
w3 → >
w4 → ⊥
• =⊥
227
So Necessarily tigers are striped comes out false, also as desired.
(Notice that again the final semantic value of Necessarily tigers are
striped is ⊥ relative to every possible world, so that the sentence is world-
insensitive in its semantic value. Tigers are striped, on the other hand,
is world-sensitive in its semantic value, since it is true relative to some
worlds (w1 , w2 , and w3 ) and false relative to other worlds (w4 ).)
Problem 230: Notice that both Tigers are mammals and Necessarily
tigers are striped are world-insensitive. But they are world-
insensitive for different reasons. Necessarily tigers are striped
is world-insensitive de jure – it’s a consequence of the rules of
the system (in particular, the rule for ~necessarily) that the
sentence comes out world-insensitive. Tigers are mammals,
on the other hand, is world-insensitive de facto – it’s a conse-
quence of substantive, non-semantic features of tigers that the
sentence comes out world-insensitive.
Do any of your examples call into question how clear the dis-
tinction is between de jure and de facto world-insensitive sen-
tences?
228
With these pieces, we can calculate ~Necessarily tigers are mammals
and ~Necessarily tigers are striped. First we have:
• ~Necessarily tigers are mammals
• = ~Necessarily(~Tigers are mammals)
w1 → >
w2 → >
• = (λxt .λvw .∀u ∈ w x(u) = >)( )
∗
w3 → >
w4 → >
w1 → >
w → >
• = λvw .∀u ∈ w 2 (u) = >
w3 → >
w4 → >
• = λvw .>
~Necessarily tigers are mammals is thus the member of t∗ that maps
every world to the true.
Second we have:
• ~Necessarily tigers are striped
• = ~Necessarily(~Tigers are striped)
w1 → >
w2 → >
• = (λxt∗ .λvw .∀u ∈ w x(u) = >)( )
w3 → >
w4 → ⊥
w1 → >
w → >
• = λvw .∀u ∈ w 2 (u) = >
w3 → >
w4 → ⊥
• = λvw .⊥
~Necessarily tigers are striped is thus the member of t∗ that maps
every world to the false. Notice that both necessity claims pick out one of
the constant members of t∗ , mapping each world to the same truth value.
Both of these semantic approaches agree that necessarily is, in effect, a uni-
versal quantifier over worlds, requiring that the matrix sentence be true with
respect to every possible world. The resulting complex sentence is no longer
world-sensitive – in the first framework, it has the same truth value relative
to every world, and in the second framework, its t∗ value is one of the two
constant functions that maps each world to the same truth value.
Summarizing, we have two possible semantic values for necessarily:
1. World-Relativized: ~Necessarilyu = λx(w, t) .∀w x(w) = >
229
2. Set-Wise: ~necessarily = λxt∗ .λvw .∀u ∈ w x(u) = >
BF Existential Modals
In addition to the universal modals such as necessarily and must, there are
various modals that have the effect of existential quantification over worlds.
Existential modals include possibly, may, might, and can. For simplicity, we’ll
focus on possibly. Let’s again work through two sentences:
1. Possibly tigers are striped.
2. Possibly tigers are reptiles.
Possibly
t
λw
tigers
are striped
230
w1 → >
w → >
As before, ~λw tigers are striped = 2 . We also have
w3 → >
w4 → ⊥
w → ⊥
1
w → ⊥
2
~λw tigers are reptiles = .
w3 → ⊥
w4 → ⊥
With these pieces, we can calculate semantic values for the modalized
sentences. First we have:
w1 → >
w → >
• = ∃w 2 (w) = >
w3 → >
w4 → ⊥
• =>
And then we have:
• =⊥
2. Second Approach: We write world-relativization into our basic type
theory as before. t∗ is then the type of functions from worlds to truth
231
w1 → >
w2 → >
values, and ~tigers are striped = while ~tigers are
w3 → >
w4 → ⊥
w1 → ⊥
w → ⊥
reptiles = 2 .
w3 → ⊥
w4 → ⊥
We then have:
• ~Possibly = λxt∗ .λww .∃u ∈ w x(u) = >
We can then calculate again both ~Possibly tigers are striped and
~Possibly tigers are reptiles. First we have:
• ~Possibly tigers are striped
• = ~Possibly(~Tigers are striped)
w1 → >
w → >
• = (λxt∗ .λww .∃u ∈ w x(u) = >)( 2 )
w3 → >
w4 → ⊥
w1 → >
w → >
• = λww .∃u ∈ w 2 (u) = >
w3 → >
w4 → ⊥
• = λww .>
~Possibly tigers are striped is thus the member of t∗ that maps
every world to the true.
Second we have:
• ~Possibly tigers are reptiles
• = ~Possibly(~Tigers are reptiles)
w1 → ⊥
w → ⊥
• = (λxt∗ .λww .∃u ∈ w x(u) = >)( 2 )
w3 → ⊥
w4 → ⊥
w1 → ⊥
w → ⊥
• = λww .∃u ∈ w 2 (u) = >
w3 → ⊥
w4 → ⊥
• = λww .⊥
~Possibly tigers are reptiles is thus the member of t∗ that maps
every world to the false.
232
Each of these two approaches in its own way makes possibly into an existential
quantifier over worlds. (Note that both semantic values for possibly feature a
∃ existential quantifier somewher e.)
Do any of the other quantified noun phrases from ((e, t), t) have
analogs among modal expressions? Are there modal terms that are,
in effect, most or few or many quantifiers over worlds?
BG Modal Flavors
233
2. Everyone must pay their taxes by April 15.
The two claims seem to be attributing different kinds of necessity. The first claim
fits well with the model of the previous section. The distinctive thing about the
claim Bachelors are unmarried is that it is inevitably true. No matter how the
world is, bachelors are unmarried – there’s just no way for things to go that
results in unmarried bachelors.
But the second claim isn’t like that. Everyone pays their taxes by April
15 isn’t an inevitable truth. In fact, it’s unlikely even to be a truth – typi-
cally some people pay their taxes late. So in saying Everyone must pay their
taxes by April 15, we aren’t saying that timely tax-paying is something that
happens no matter how the world is. Rather, we are saying that timely tax-
paying is required by law.
5. Bouletic Necessity: I’ve had all I can take of your rudeness. You
must/need to leave immediately.
6. Capability Necessity: I can’t run a five minute mile/I must take
more than five minutes to run a mile.
That’s not meant to be an exhaustive list – just some indication of the range of
meanings (what are sometimes called modal flavors) that is available for words
like must.
234
Problem 234: Find three more modal flavors. To identify a modal
flavor, give a sentence using some modal word like must, might,
necessarily, or possibly in which the meaning of the modal word
doesn’t fit any of the categories we’ve considered so far. If it’s
helpful, you can give some surrounding context for the sentence
to help isolate the right meaning. Comment briefly on how your
modal flavor differs from the flavors considered above, and on the
right way to describe the new modal flavor.
Problem 235: The list we’ve given of modal flavors might encourage
the thought that, while the word must can express many flavors of
modality, once we’ve picked a particular sentence containing the
word must, we’ve done enough to fix the modal flavor. To show
that this isn’t true, give a sentence containing must that can be read
with more than one modal flavor. Give a collection of surrounding
contexts that bring out different modal readings of the sentence. See
how many modal flavors you can get from a single sentence.
Tigers must be mammals is true just in case Tigers are mammals is true in
every world. But You must pay your taxes by April 15, understood as an
expression of legal necessity, can’t require for its truth that You pay your taxes
by April 15 is true in every world. It’s true that you must pay your taxes by
April 15. But there are many ways the world can be on which you don’t pay
your taxes by April 15. (Perhaps the actual world is even a world in which you
don’t pay your taxes by April 15.) So the truth of this legal necessity claim can’t
require the truth of You pay your taxes by April 15 in every world.
However, we can retain the core idea that modals like must and necessarily
are universal quantifiers over possible worlds. The truth of a legal necessity
claim of the form Necessarily S doesn’t require the truth of S in every pos-
sible world, but it does require the truth of S in every legally possible world.
235
A world is legally possible if all of the (actual) laws are obeyed in that world.
Legally possible worlds can be quite different from the actual world – a world
inhabited entirely by talking purple kangaroos, but in which the kangaroos all
drive under the speed limit and pay their taxes on time, is legally possible. One
of the laws requires that taxes be paid by April 15. That law, then, must be
obeyed in a world for that world to count as legally possible. Thus taxes are
paid by April 15 in every legally possible world, which means that it is legally
necessary that taxes be paid by April 15.
w is the type of possible worlds. We can then let wL be the collection of legally
possible worlds. The legally possible worlds are a subset of all of the possible
worlds, so wL ⊆w. We can then give semantic values for both universal and
existential legal modals:
1. Legally necessarily:
(a) Using relativized semantic values:
~legally necessarilyu = λx(w, t) .∀w ∈ wL x(w) = >
(b) Using setwise semantic values:
~legally necessarily = λxt∗ .λvw .∀u ∈ wL x(u) = >
2. Legally possibly:
236
4. wt is the set of teleologically possible worlds (the set of worlds in which
some relevant goal is achieved). mustt is the necessity modal quantify-
ing universally over wt , and mayt is the possibility modal quantifying
existentially over wt .
5. wb is the set of bouletically possible worlds (the set of worlds in which all
the desires of some relevant person are satisfied). mustb is the necessity
modal quantifying universally over wb , and mayb is the possibility modal
quantifying existentially over wb .
6. wcap is the set of capacity possible worlds (the set of worlds in which every
action performed by some relevant person lies within the capacities of that
person). mustcap is the necessity modal quantifying universally over wcap ,
and maycap is the possibility modal quantifying existentially over wcap .
7. wn is the set of nomologically possible worlds (the set of worlds in which
all of the actual laws of physics (or other sciences) are true). mustn is
the necessity modal quantifying universally over wn , and mayn is the
possibility modal quantifying existentially over wn .
8. wchess is the set of chess possible worlds (the set of worlds in which all of the
rules of chess are followed). mustchess is the necessity modal quantifying
universally over wchess , and maychess is the possibility modal quantifying
existentially over wchess .
On this approach, modal words like must and might are multiply ambiguous.
That is, there are really many words in English that look like must – the words
we’ve identified above as mustep , mustcirc , mustb , and so on. (We also need a
word must for the unrestricted modal that requires truth in all possible worlds
(what we called analytic necessity).)
Proliferating words must in English doesn’t make for a very elegant theory.
Another possibility is to treat must as a context-sensitive expression. Previously
we have modeled contexts as ordered quadruples of the form:
• hspeaker, audience, time, placei
Now we will expand our contexts so that they also include a set of worlds:
• hspeaker, audience, time, place,worldsi
We can then give the following context-sensitive semantic values for must and
may:
237
knowledge, must expresses an epistemic flavor of modality. When must is used
in a context c such that c(5) is the set of worlds in which all the laws of physics
are true, must expresses a nomological flavor of modality. Because the context
controls the modal flavor, we only need a single word must in the language,
rather than separate modals for each modal flavor.
238
2. wchess is the set of worlds in which S states the rules of chess.
That is, wchess = {w :~The rules of chess are that Sw =
>}
Consider how these two methods of specifying wchess differ. What
kind of worlds will be in one version of wchess but not the other?
What affect does that difference in worlds have on the truth value of
claims of chess necessity? Which method of specifying wchess gives
a better account of the truth values of chess necessity sentences?
BI Modal Modals
But chess wasn’t always played with rules allowing pawns to move two squares
– this was a change in the rules in the late medieval period. And since chess
wasn’t always played with a two-square pawn movement rule, there are pos-
sible worlds in which chess is not played with that rule. Let @ be the actual
world, the world we actually live in. And let w be a world in which the rules of
chess are different from the actual rules, and permit only single-square moves
by pawns at any point. Then we should have:
1. ~Pawns may move two squares on their first move@ = >
2. ~Pawns may move two squares on their first movew = ⊥
Unfortunately, we can’t get this result using the machinery we’ve developed
so far. We’ll use a simplified tree for Pawns may move two squares on their
first move:
t
May
λw t
239
1. The actual world @, in which some pawns are moved one square on their
first move and some pawns are moved two squares on their first move.
2. World w, which doesn’t permit two-square moves by pawns, and in which
all pawns are moved one square on their first move.
3. World u, in which some pawns are moved two squares on their first move
and some pawns are moved three squares on their first move.
One thing we can extract from these three worlds is that ||Pawns move two
squares on their first move||={@, u}. (We aren’t worrying here about the
internal details of calculating the truth conditions of Pawns move two squares
on their first move, but we are assuming that those truth conditions are
existential, so that the claim is true relative to a world as long as at least some
pawns in that world move two squares. If we don’t like those existential truth
conditions, we could add a fourth world v in which pawns always move two
squares on their first move.)
We can also use these worlds to say exactly what wchess is. In world u, the (ac-
tual) rules of chess aren’t obeyed, since the (actual) rules of chess don’t allow
pawns ever to move three squares. Thus u <wchess . But in the actual world @, the
rules of chess are obeyed (we idealize slightly here), so @∈wchess . And in world
w, the (actual) rules of chess are also obeyed – the rules of chess say that both
single-square and double-square initial pawn moves are legal, so there’s no vio-
lation of the rules if, in fact, pawns always move just one square. Thus w ∈wchess .
That gives us wchess = {@, w}. We can now use that specification of wchess to
calculate both ~Pawns may move two squares on their first move@ and
~Pawns may move two squares on their first movew .
• So ~may@ (~λ
w pawns move two squares on their first move@ )
@ → >
= ~may@ ( w → ⊥ )
u → >
@ → >
• = (λx(w, t) .∃v ∈ wchess x(v) = >)( w → ⊥ )
u → >
@ → >
• = (λx(w, t) .∃v ∈ {@, w} x(v) = >)( w → ⊥ )
u → >
240
@ → >
• = ∃v ∈ {@, w} w → ⊥ (v) = >
u → >
@ → >
• = >, since w → ⊥ (@) = >
u → >
• So ~mayw (~λ
w pawns move two squares on their first movew )
@ → >
= ~mayw ( w → ⊥ )
u → >
@ → >
• = (λx(w, t) .∃v ∈ wchess x(v) = >)( w → ⊥ )
u → >
@ → >
• = (λx(w, t) .∃v ∈ {@, w} x(v) = >)( w → ⊥ )
u → >
@ → >
• = ∃v ∈ {@, w} w → ⊥ (v) = >
u → >
@ → >
= >, since (@) = >
w → ⊥
•
u → >
We thus get Pawns may move two squares on their first move coming out
true in both @ and w. But that’s the wrong result – the sentence should be true
in @ and false in w.
It’s not hard to see what has gone wrong. When we calculate both ~Pawns
may move two squares on their first move@ and ~Pawns may move two
squares on their first movew , the world relativization quickly disappears.
That’s because:
1. ~May@ = ~Mayw
2. ~λw pawns move two squares on their first move@ = ~λw pawns move
two squares on their first movew
So after we pass the first step of calculating by noting that:
241
• ~Pawns may move two squares on their first move@/w = ~May@/w (~λw
pawns move two squares on their first move@/w )
the relativization of semantic value to worlds effectively drops out, and we’re
guaranteed to get the same final truth value relative to @ or relative to w.
We should have ~λw pawns move two squares on their first move@ = ~λw
pawns move two squares on their first movew . The whole point of the
λw expression is to convert a sentence, with its distribution of truth values
relative to worlds, into a (w, t) expression mapping each world to the truth
value of the sentence at that world. ~λw pawns move two squares on their
first move thus depends on the total behavior of pawns move two squares
on their first move at all worlds – it doesn’t matter what world we are rel-
ativizing to, if we’re reporting a global feature of the sentence’s truth values at
all worlds.
So if we’re going to get different truth values for Pawns may move two squares
on their first move relative to @ and w, we need to have ~May@ , ~Mayw –
we need may to have a genuinely world-sensitive semantic value. The question
then is where to incorporate world-sensitivity. Right now we have:
• ~May = λx(w, t) .∃w ∈ wchess x(w) = >
There aren’t many options in that semantic value for adding some world-
sensitivity.
But there is one good option. The set wchess of chess-possible worlds should
vary depending on the world of evaluation. Recall:
1. In @, the rules of chess allow pawns to move one or two squares on their
first move.
2. In w, the rules of chess require pawns to move exactly one square on their
first move.
@ and w, but. not u, obey all the rules of chess according to @. That’s why we
set wchess = {@, w}. But @ doesn’t obey the rules of chess according to w. In w,
the rules prohibit pawns from moving two squares, so @ isn’t a chess-possible
world according to w. From the point of view of w, wchess should be {w}, not
{@, w}. (World u isn’t chess-possible according to either @ or w.)
World-sensitivity, then, enters into the selection of wchess , the relevant set of
worlds for the modal may to quantify over. To build this into our machinery,
we replace the set of worlds wchess with a function gchess that maps worlds into
sets of worlds. Given any input world v, gchess (v) is the set of worlds that are
chess-possible according to v – that is, the set of worlds in which all of the rules
that are rules of chess as it is played in v are obeyed. Thus we have:
1. gchess (@) = {@, w}
242
2. gchess (w) = {w}
We then change the semantic value of may to make use of this function:
• ~Mayv = λx(w, t) .∃w ∈ gchess (v) x(w) = >
Let’s re-calculate the truth value of Pawns may move two squares on their
first move relative to both @ and w using this new semantic value for may:
@ → >
• = (λx(w, t) .∃v ∈ gchess (@) x(v) = >)(
w → ⊥
)
u → >
@ → >
• = (λx(w, t) .∃v ∈ {@, w} x(v) = >)(
w → ⊥
)
u → >
@ → >
• = ∃v ∈ {@, w} (v) = >
w → ⊥
u → >
@ → >
• = >, since w → ⊥ (@) = >
u → >
@ → >
• = (λx(w, t) .∃v ∈ gchess (w) x(v) = >)( w → ⊥ )
u → >
@ → >
• = (λx(w, t) .∃v ∈ {w} x(v) = >)( w → ⊥ )
u → >
@ → >
• = ∃v ∈ {w} (v) = >
w → ⊥
u → >
243
→ >
@
• = ⊥, since w → ⊥ (w) = >
u → >
And now we have the claim true relative to @ and false relative to w, as desired.
Generalizing, we can associate with each modal flavor F a function gF that maps
each world to a set of worlds – the set of worlds that characterize the possibilities
of that flavor according to the input world. Thus we have functions gep , gb , gcirc ,
and so on. We then have a collection of modals in the language, with both
universal and existential modals of each flavor, using the rules:
1. ~mustF v =λx(w, t) .∀w ∈ gF (v) x(w) = >
Pick some modal flavor F, and let must express that modal flavor. Suppose Must
S is true, for some unspecified sentence S. What can we say about Must must S?
Before we made the shift, in the previous section, from modeling modal flavors
using a set wF of worlds to modeling modal flavors using a function gF that
assigns each world its own set of worlds for the flavor, the answer would have
been simple. Pre-shift, we can reason as follows. Suppose Must S is true. As
we observed earlier, when we use wF , Must S isn’t world-sensitive, so ~Must
Sw = ~Must Su for any worlds w and u. Since Must S is true at the actual
world, it thus must be true at every world.
But if Must S is true at every world, Must must S will also be true. We can
check the details. We have:
Must
λw t
must λw t
S
Thus:
• ~Must λw must λw Sw = ~Mustw (~λw must λw Sw )
244
• Because Must S is true relative to every world, ~λw must λw Sw is the
function that maps every world to >. For simplicity, assume we have just
w1 → >
three worlds w1 , w2 , and w3 . Then ~λw must λw S = w2 → > .
w
w3 → >
w1 → >
• So ~Must λw must λw Sw = ~Mustw ( w2 → > )
w3 → >
w1 → >
• = λx(w, t) .∀w ∈ wF x(w) = >( 2
w → >
)
w3 → >
w1 → >
• = ∀w ∈ wF w2 → > (w) = >
w3 → >
• =>
Thus the truth of Must S guarantees the truth of Must must S.
w3 → ⊥
that:
1. gF (w1 ) = {w1 , w2 }
2. gF (w2 ) = {w2 , w3 }
(It won’t matter to us what gF (w3 ) is.)
245
2. Must S is false at w2 . The truth of Must S at w2 requires the truth of S at
every world in g f (w2 ). But g f (w2 ) = {w2 , w3 }, and S is false in w3 .
3. Must must S is false at w1 . The truth of Must must S at w1 requires the
truth of Must S at every world in gF (w1 ). Since gF (w1 ) = {w1 , w2 }, and
Must S is (as we’ve just seen) true at w1 but false at w2 , we don’t have
Must S true at every world in gF (w1 ). Thus Must must S is false at w1 .
The switch from flavored worlds to flavored functions thus gives us a seman-
tics in which Must S does not imply Must must S. That means that the two
sentences Must S and Must must S don’t mean the same thing, so in our mod-
ified semantics, iterated modals aren’t redundant in meaning – adding another
modal can change the meaning of a sentence.
To get a clearer picture of the semantic effect of using flavored functions, and
how those functions impact the interpretation of iterated modals, let’s consider
another method of implementing the functions idea. Our starting picture as-
sociated with each modal flavor a distinguished set of worlds – the worlds
relevant to that modal flavor. So if type w contains some nine worlds:
w1 w2 w3
w4 w5 w6
w7 w8 w9
w1 w2 w3
w4 w5 w6
w7 w8 w9
246
w1 w2 w3
w4 w5 w6
w7 w8 w9
Problem 241: Using the previous diagram, specify exactly what set
of worlds gF (w) is for each world w in {w1 , . . . , w9 }.
w1 w2 w3
w4 w5 w6
w7 w8 w9
Notice that for any world w, we can find the set of all worlds that w has access
to. As a result, an accessibility relation associates each world with a set of
worlds. That means we can reproduce the effects of a modal flavor function
gF by using an accessibility relation. We get a simple translation procedure in
both directions:
1. Suppose we are given an accessibility relation RF . We can then define a
modal flavor function gF using RF :
247
2. Suppose we are given a modal flavor function gF . We can then define an
accessibility relation RF using gF :
• For any worlds w and u, w bears RF to u just in case u ∈gF (w). For
example, in our earlier diagram of gF , we have w1 with access to w5
and w5 with access to w8 , but w8 does not have access to w1 .
As a result, we can easily move back and forth between characterizing modals
with flavor functions and characterizing modals with accessibility relations.
w1 w2 w3
w4 w5 w6
w7 w8 w9
1.
w1 w2
w3 w4
2.
w1 w2
w3 w4
3.
248
w1 w2 w3
w4 w5 w6
w7 w8 w9
4.
Problem 245: For each of the following modal flavor functions, give
a corresponding accessibility relation.
1. • gF (w1 ) = {w1 , w3 }
• gF (w2 ) = {w2 , w4 }
• gF (w3 ) = {w1 , w3 }
• gF (w4 ) = {w2 , w4 }
2. • gF (w1 ) = {w2 }
• gF (w1 ) = {w1 , w2 , w3 , w4 , w5 }
• gF (w3 ) = {w1 , w3 , w5 }
• gF (w4 ) = ∅
• gF (w5 ) = {w2 , w4 }
3. • gF (w1 ) = {w2 , w3 , w4 , w5 , w6 }
• gF (w2 ) = {w1 , w3 , w4 , w5 , w6 }
• gF (w3 ) = {w1 , w2 , w4 , w5 , w6 }
• gF (w4 ) = {w1 , w2 , w3 , w5 , w6 }
• gF (w5 ) = {w1 , w2 , w3 , w4 , w6 }
• gF (w6 ) = {w1 , w2 , w3 , w4 , w5 }
4. • gF (w1 ) = ∅
• gF (w2 ) = {w2 , w9 }
• gF (w3 ) = ∅
• gF (w4 ) = {w1 , w8 , w9 }
• gF (w5 ) = {w5 , w7 , w9 }
• gF (w6 ) = ∅
• gF (w7 ) = {w2 , w4 , w6 , w8 , w9 }
• gF (w8 ) = {w1 , w3 , w5 , w9 }
• gF (w9 ) = {w9 }
249
Problem 247: Suppose again that w contains ten worlds. How
many different accessibility relations are there on w? (An accessi-
bility relation is a binary relation on w, which means that it is a set of
ordered pairs of members of w. How many ordered pairs of mem-
bers of w are there? How can this number be used to determine the
total number of accessibility relations?)
We can now give revised semantic values for (flavored) modals using an acces-
sibility relation rather than a modal flavor function:
• ~mustF v =λx(w, t) .∀w(vRF w → x(w) = >)
250
6. ~Might might Cabc
7. ~Must (might A or might B)ABC
8. ~Must not AaBC
9. ~Not must AaBC
10. ~Must not might not must not AABC
A a
The world in which A is false has access to the world in which A is true,
but not vice versa, and neither world has access to the other. Then we can
observe:
251
• ~Aa = ⊥, because world a is, by definition, a world making A false.
• ~Must Aa = >, because world a has access only to world A, and
~AA = >.
But if Must A is true and A is false, Must A does. not imply A. So the
argument:
• Must A; therefore A
is invalid.
A a
The world in which A is false has access to the world in which A is true,
and not vice versa. In addition, each world has access to itself. Then we
observe:
is valid.
• Must A; therefore A
is valid for some accessibility relations and invalid for other accessibility rela-
tions. But we can do better than this – we can say what it is about an accessibility
relation that makes the argument valid.
252
Suppose that RF is reflexive, and that Must A is true at some world v. Then A
is true at every world accessible from v. But since RF is reflexive, v is acces-
sible from v. Thus A is true at v. As a result, every world that makes Must A
true also makes A true. This shows that (when RF is reflexive), Must A implies A.
Now suppose that RF is not reflexive. Then there is some world v that does
not have access to itself. Now let A be false at v and true at every world other
than v. Must A is then true at v, because A is true at every world accessible
from v. (The worlds accessible from v do not include v, and A is true at all other
world.) Thus v is a world making Must A true and A false. Because there is
such a world, Must A does not imply A.
• •
• •
253
Each world has access to another – three of the worlds form a cycle of
accessibility, and the fourth world has access to itself.
• •
• •
The lower right world does not have access to any world.
Now suppose RF is not serial. Then there is some world w that does not
have access to any world (including itself). MustF A is trivially true at w
– the truth of MustF A at a world requires the truth of A at all accessible
worlds, and if there are no accessible worlds, then trivially they all make
A true. But MightF A is false at w. The truth of MightF A at a world
requires the truth of A at some accessible world, so if a world has access to
no worlds, MightF A cannot be true at that world. So if RF is not serial,
there is a world at which MustF A is true and MightF A is false, showing
that MustF A does not imply MightF A.
254
(c) Might A
(d) Might B
Problem 251: A modal conflict is a situation in which two
sentences of the following form:
• MustF A
• Not mustF A
are both true. Show that if the accessibility relation RF for the
modal mustF is serial, then there can be no modal conflicts. Con-
sider five different modal flavors and give your best judgment
for each about whether modal conflicts are in fact possible for
those flavors.
Problem 252: Prove that if an accessibility relation RF is reflex-
ive, it is also serial. Can there be an accessibility relation that is
serial but not reflexive?
Recall that:
• If RF is reflexive, then the inference form MustF A to A is
valid.
• If RF is serial, then the inference from MustF A to MightF A
is valid.
Show that the validity of the inference from MustF A to A implies
the validity of the inference from MustF A to MightF A.
2. Transitivity: An accessibility relation RF is transitive if, whenever a first
world has access to a second and a second has access to the third, then
the first has access to the third:
• RF is transitive: for all worlds u, v, and w, if uRF v and vRF w, then
uRF w.
The following accessibility relation is transitive:
• • • •
• • • •
255
The second accessibility relation is not transitive because the second world
from the left has access to the third world from the left, and the third world
from the left has access to the fourth world from the left, but the second
world from the left does not have access to the fourth world from the left.
A modal mustF makes valid the inference from MustF A to MustF Must F
A if and only if its accessibility relation RF is transitive.
Now suppose that RF is not transitive. Then there are three worlds u, v,
and w, such that uRF v, vRF w, but not uRF w. Suppose A is false at w, and
true at every other world. Because u does not have access to w, A is then
true at every world that u has access to. Therefore MustF A is true at u. But
because v does have access to w, and A is false at w, MustF A is false at v.
And because u has access to v, MustF mustF A is false at u. Therefore u is
a world at which MustF A is true and MustF mustF A is false. This shows
that when RF is not transitive, MustF A does not imply MustF mustF A.
• •
256
•
• •
Now suppose RF is not symmetric. Then there are are world u and v such
that uRF v, but not vRF u. Let A be true at u, but false at every other world.
Because v does not have access to u, v does not have access to any world
at which A is true. Thus MightF A is false at v. But since u has access to v, u
has access to a world at which MightF A is false. Therefore MustF mightF
A is false at u. So there is a world making A true and MustF mightF A
false, showing that A does not imply MustF mightF A.
257
• RF is dense: for any worlds u and v, if uRF v, then there is a
world w such that uRF w and wRF v.
Find an argument whose validity is tied to the density of the ac-
cessibility relation (so that the argument is valid if the accessibility
relation is dense and invalid if the accessibility relation is not dense).
BL A Family of Modals
258
is reflexive is also serial. If every world has access to itself, then every
world certainly has access to some world.
Therefore we have:
259
• Not A Might not A
But A here is an arbitrary sentence, so we can replace A with Not A to get:
• Not not A Might not not A
Since Not not A is equivalent to A, we can simplify this to:
• A Might A
Thus T gives us both of:
• Must A A
• A Might A
Combining these, we get:
• Must A Might A
This is the characteristic logical feature of D, so T proves everything that
D does. Since T also allows the inference from Must A to A, which D does
not, T is logically stronger than D.
KB, on the other hand, is not a logical strengthening of D. (Nor is D a logical
strengthening of KB – the two systems are logically incomparable.) There are
accessibility relations that are serial but not symmetric, such as:
• •
And there are accessibility relations that are symmetric but not serial, such as:
• •
260
We can give a graph of the logical relations among the modal sys-
tems we’ve been considering:
D KB K4
We can also create additional modal systems by imposing more than one con-
straint on the accessibility relation. Suppose, for example, we require the
accessibility relation to be both reflexive and symmetric. Then we get a modal
that validates both of the inferences:
• Must A A
• A Must might A
(The resulting modal system is in fact exactly characterized by adding these
two validities, but showing this is non-trivial, and we won’t pursue that here.)
The modal system created by an accessibility relation that is both reflexive and
symmetric is called B. It is a strengthening of both T and KB, because it imposes
the accessibility constraint of both of those system. So we can add to our graph
of systems:
D KB K4
K
We have considered four different structural features of the accessibility rela-
tion: seriality, reflexivity, symmetry, and transitivity. That means there are a
total of sixteen possible combinations of those features: However, we’ve al-
ready seen that any accessibility relation that is reflexive is also serial, so we
can’t have combinations of features that are +reflexive and -serial. That leaves
twelve combinations:
1. -serial, -reflexive, -symmetric, -transitive: With no constraints on the
accessibility relation, we get the modal system K. (Note that -serial means
that we do not require the accessibility relation to be serial, rather than that
the accessibility relation is not serial.)
2. -serial, -reflexive, -symmetric, +transitive: Requiring only transitivity, we
get the modal system K4.
261
3. -serial, -reflexive, +symmetric, -transitive: Requiring only symmetry, we
get the modal system KB.
4. -serial, -reflexive, +symmetric, +transitive: Requiring both symmetry and
transitivity, we get the modal system called KBE. (We’ll say more about
this system and its name soon.)
5. +serial, -reflexive, -symmetric, -transitive: Requiring only seriality, we
get the modal system D.
6. +serial, -reflexive, -symmetric, +transitive: Requiring both seriality and
transitivity, we get the modal system KD4.
7. +serial, -reflexive, +symmetric, -transitive: Requiring both seriality and
symmetry, we get the modal system KDB.
8. +serial, -reflexive, +symmetric, +transitive: This combination turns out
to be logically impossible.
S5
S4 B KBE
T KDB KD4
D KB K4
262
BM S5
The strongest of the modal systems in our graph is the system S5, characterized
by an accessibility relation that is reflexive, symmetric, and transitive. A relation
that is reflexive, symmetric, and transitive is an equivalence relation. Consider
some examples of equivalence relations:
• • •
• • •
• • •
1.
• • •
• • •
• • •
2.
• • •
• • •
• • •
3.
Notice that each of the above equivalence relations splits up the collection of
worlds into distinct clusters, so that within each cluster every world has access
to every other world. In the first example, each world is in its own cluster,
having access only to itself. In the second example, there are four clusters of
worlds – two clusters of three worlds each, one of two worlds, and one with a
single world. And in the third example, all of the worlds form a single cluster,
with every world having access to every other world.
263
• • •
• • •
• • •
1.
• • •
• • •
• • •
2.
• • •
• • •
• • •
3.
264
2. Given a partition Π on some set S, we can produce an equivalence relation
R that relates two items just in case they are in the same partition element.
Thus aRb if and only if ∃π ∈ Π, a, b ∈ π.
Problem 263: Let S be the set {a, b, c, d, e, f }. For each of the following
partitions of S, find the equivalence relation determined by that
partition.
1. {a, e, f }, {b}, {c, d}
2. {a, b, c, d, e, f }
3. {a}, {b}, {c}, {d}, {e}, { f }
4. {a, b, c, f }, {d, e}
For each of the following equivalence relations, find the partition
determined by that equivalence relation:
a b c
d e f
1.
a b c
d e f
2.
a b c
d e f
3.
The S5 modal logic is particularly simple, because worlds all have access to
each other (within a given cluster of worlds). As a result, in S5 there are no
interesting effects from iterated modals. In S5, all of the following are equivalent:
265
• Must A
• Must must A
• Might must A
• Must might must A
• Might might must must might must A
If Must A is true at a world w, then A is true at every world in its cluster. But
then every world in w’s cluster has access only to worlds in that cluster, so Must
A is true at every world in the cluster, so both Might must A and Must must
A are true at w. Since all the worlds in the cluster are symmetrically arranged
in accessibility, if Might must A and Must must A are true at w, they is true at
every world in the cluster. Therefore Must might must A, Must must must A,
Might might must A, and Might must must A are all true at w. And so on –
if Must A is true at w, then any sequence of must and might, ending in must,
applied to A is true at w. So there’s no real effect of adding more modals once
an initial must has been added. Similarly with might – if Might A is true at w,
then any sequence of must and might, ending in might, applied to A is true at
w.
One other constraint on the accessibility relation that n useful to consider when
thinking about S5 is euclideanness. An accessibility relation RF is euclidean if,
whenever a world has access to two worlds, those two worlds have access to
one another:
• RF is euclidean: for all worlds w, u, and v, if wRF u and wRF v, then uRF v
and vRF u.
The following accessibility relation is euclidean:
• •
• •
266
1. Suppose RF is symmetric and euclidean. We want to show that RF then
must be transitive. So consider some three worlds u, v, and w such that
uRF v and vRF w:
u v w
u v w
u v w
So, in particular, when u has access to v and v has access to w, u has access
to w. Thus RF is transitive.
u v
u v
267
Since u has access to w and w has access to v, by transitivity, u has access
to v. Similarly, since v has access to w and w has access to u, v has access
to u:
u v
w u
w u
w u
So our arbitrary world has access to itself, showing that the accessibility must
be reflexive. (By the same reasoning, we could show that u must also have
access to itself.)
268
Problem 264: Show that any equivalence relation must be serial,
symmetric, and euclidean (thus showing that the requirements (i)
reflexive, symmetric, and transitive, and (ii) serial, symmetric, and
euclidean are equivalent to each other).
S5
S4 B KBE
T KDB KD4
D KB K4 KE
We have added the modal system KE, which requires only that the accessibility
relation be euclidean. Notice that the first modal system that is above all of D
(serial), KB (symmetry), and KE (euclidean) is S5.
Our default assumption will be that modals are S5 modals, but we’ll also
consider cases where S5 doesn’t look like the right system.
We’ve given tools already for a simple treatment of this sentence. We start with
a syntactic analysis, either binary branching:
(t, t) t
Plato laughs
(t, (t, t)) t
if Socrates cries
269
or trinary branching:
(ht, ti, t) t t
That analysis has some nice features. Most noticeably, it explains the validity
of the rule of modus ponens:
• A; if A, B B
• not A if A, B
or equivalently, any conditional with a false antecedent is true. But con-
ditionals with false antecedents don’t always appear true. Consider some
examples:
270
people as bizarre and inappropriate. But they rarely strike people as
straightforwardly true, as would be predicted by the truth-functional se-
mantics for if. A better semantics for if would allow conditionals with
false antecedents to be something other than true.
2. We have:
• B if A, B
or equivalently, any conditional with a true consequent is true. But con-
ditionals with true consequents don’t always appear true. Consider some
examples:
• If you have the bubonic plague, your left big toe glows purple.
So let’s check your toe.
Beth, understandably, says:
• What? No, that’s not true.
271
Alex’s reply looks absurd, but if not(if A, B) does imply A and not
B, then Beth’s denial of If you have the bubonic plague, your left
big toe glows purple implies You have the bubonic plague and your
left big toe does not glow purple, and hence implies You have the
bubonic plague. Since that looks wrong, we would prefer a semantics
for the conditional that didn’t have the result that rejecting a conditional
requiring affirming the antecedent of the conditional.
4. We have:
• (If A, B) or (if B, C)
That is, given any three sentences A. B, and C, if there isn’t a conditional
connection from the first to the second, then there is a conditional con-
nection from the second to the third.
But again, there are examples that don’t seem to conform to this logical
feature.
(a) Consider the sentences It is raining, The streets are dry, and
Cars skid easily. Then we get the disjunction:
• Either (if it is raining then the streets are dry) or (if
the streets are dry, then cars skid easily).
But neither disjunct of this disjunction looks true.
(b) Consider the sentences Number N is prime, N is divisible by
4, and N is odd. Then we get the disjunction:
• Either (if N is prime, then N is divisible by 4) or (if
N is divisible by 4, then N is odd.)
But again, neither disjunct looks true.
So (If A, B) or (if B, C) does not appear to be a logical truth, and
it would be nice to have a semantics for if that didn’t make it a logical
truth.
5. We have:
272
But many examples seem to contradict this inference pattern. Smith is
attempting to defuse a bomb, and is told:
• If you cut the red wire and the green wire, the detonator
will be disconnected.
BO Modal Conditionals
We can improve our semantics for conditionals by using possible worlds re-
sources. The truth-functional semantic value for if makes ~if A, B depend
only on the truth values of A and B at the actual world, not on their truth values
at other possible world. But we can instead give a modal implementation for
if, by requiring a conditional to have the truth values of antecedent and con-
sequent properly coordinated at all worlds.
Implemented this way, if takes inputs of type (w, t). So, as with the world-
relativized implementation of modals, we’ll need intervening λ operators to
convert the world-relativized t values of sentences into (w, t) values. Thus
we’ll think of If A, B as having the full form if (λw A), (λw B).
Alternatively, we can use set-wise semantic values in giving a modal analysis
of if. Done this way, we have:
• ~if = λxw .λy(w, t) .λz(w, t) .∀u ∈ w(y(u) = ⊥ ∨ z(u) = >)
On either version of the modal analysis, the basic idea is that If A, B is true
(at a world w) if every world that makes A true also makes B true. This is then
equivalent to the requirement that ||A||⊆||B||.
273
Problem 267: Let’s use the symbol ⊃ for the truth-functional version
of if, often called the material conditional, and reserve if for the
modal strict conditional analysis. Show that the following two
constructions are equivalent:
• Must(⊃ A, B)
• if A, B
(In order to get the equivalence to work out, we need to say some-
thing about the accessibility relation used for must. What accessi-
bility relation is needed?)
The strict conditional avoids many of the problems we encountered with the
truth-functional version of the conditional. For example, we do not have:
• B If A, B
To see this, suppose we have two worlds u and v. At world u, A and B are both
true. At world v, A is true and B is false. But then ||A||*||B||. Not every world
that makes A true makes B true. Thus If A, B is false. So at world u, B is true
and If A, B is false. This shows that B does not imply If A, B.
The simple modal implementation we’ve just given doesn’t make any use of
an accessibility relation. If we want to add a role for the accessibility relation,
we can say:
• ~ifw = λx(w, t) .λy(w, t) .∀u ∈ w(wRu → x(u) = ⊥ ∨ y(u) = >)
274
This semantic value for if has the effect that If A, B is true at a world w just
in case every world accessible from w that makes A true also makes B true.
Once we add a role for the accessibility relation, we can ask what effect the
choice of accessibility relation, and thus the choice of modal logic, has on the
behavior of the conditional. As it turns out, the answer is: not very much. The
most important question is whether the accessibility relation is reflexive.
Now suppose R is not reflexive. Let w be a world that does not have
access to itself. Let A be true at w and false at every other world,
and let B be false at w. Then If A, B is true at w, because A is false
at every world accessible from w. So at w, A is true and If A, B is
true, but B is false. This shows that A together with If A, B does
not imply B.
BP Subsentential Modality
We’ve been simplifying the addition of possible worlds to our semantic machin-
ery thus far by focusing only on what happens at the level of entire sentences.
It’s time to work out q possible worlds version of the semantic values for sub-
sentential expressions. Consider the sentence Aristotle laughs. Suppose
that we have four possible worlds w1 , w2 , w3 , and w4 , and that ||Aristotle
laughs|| = {w1 , w2 }. That is, Aristotle laughs is true relative to w1 , w2 , and
275
false relative to w3 and w4 . How can we give semantic values for Aristotle
and laughs that produce this result?
276
" #
Aristotle → ⊥
3. ~Aristotle laughs w3
= ~laughs (~Aristotle ) =
w3 w3
(Aristotle)
Plato → >
=⊥
" #
Aristotle → ⊥
4. ~Aristotle laughs w4
= ~laughs (~Aristotle ) =
w4 w4
(Aristotle)
Plato → ⊥
=⊥
So we get, as desired, that Aristotle laughs is true in w1 and w2 but false in
w3 and w4 , and hence that ||Aristotle laughs|| = {w1 , w2 }.
277
3. We can convert a world into a canonical description, and then use con-
ditions that have that canonical description as an antecedent. Suppose
again we have our four worlds:
(a) In w1 , Aristotle laughs and Plato laughs.
(b) In w2 , Aristotle laughs and Plato doesn’t laugh.
(c) In w3 , Aristotle doesn’t laugh and Plato laughs.
(d) In w4 , Aristotle doesn’t laugh and Plato doesn’t laugh.
And suppose for simplicity that these features of the worlds fully describe
them. Then instead of saying:
(a) ~laughsw1 = λx.If D(w1 ), then x laughs = λx.If Aristotle laughs and
Plato laughs, then x laughs.
(b) ~laughsw2 = λx.If D(w2 ), then x laughs = λx.If Aristotle laughs and
Plato doesn’t laugh, then x laughs.
Giving the world-relative semantic value of laughs in this way doesn’t
require us to understand any esoteric world-specific terminology (like
laughing in a world or world w1 being actual), but it does require us to have
a canonical method for associating each world with a description of how
things are in that world.
We’ll primarily use the first of these options, adding argument places for worlds
to ordinary English verbs. Thus we can calculate:
• ~Aristotle laughsw = ~laughsw (~Aristotlew )
• =~laughsw (Aristotle)
• = (λx.x laughs in w)(Aristotle)
• = Aristotle laughs in w
Now let’s try two more complicated examples:
1. Consider a sentence with a transitive verb:
278
• Plato admires Socrates
World-relativized semantic values can be given for Plato and Socrates
in the same way that we handled Aristotle above:
• ~Platow = Plato
• ~Socratesw = Socrates
This is part of a general strategy of treating names as picking out their
bearers with respect to all worlds. For admires, we world-relativize along
the same lines as we did laughs:
Plato
admires Socrates
Plato
λx.λy.y admires x in w Socrates
admires Socrates
Plato
λx.λy.y admires x in w Socrates
admires Socrates
279
We thus discover that the semantic value of Plato admires Socrates is
> relative to worlds in which Plato admires Socrates, and ⊥ relative to
worlds in which Plato does. not admire Socrates.
2. Now let’s add a modal operator to the mix. Consider:
• Might(Aristotle admires Plato)
We start with a tree for the sentence (notice that we must add a λw -abstract
to the tree to allow for the modal to interact properly):
Might λw
Aristotle
admires Plato
Aristotle
λx.λy.y admires x in w Plato
admires Plato
280
Adding this to the tree, we have:
Might
λw Aristotle admires Plato in w
Aristotle
λx.λy.y admires x in w Plato
admires Plato
281
∃u ∈ w(wRu∧ Aristotle admires Plato in u)
Might
λw Aristotle admires Plato in w
Aristotle
λx.λy.y admires x in w Plato
admires Plato
282
2. Second Scenario: It is 1471. The War of the Roses has been raging for
years, and Henry VI and Edward IV have been alternating periods as
king. You’ve just seen Edward kill Henry, so you know that Edward
is alive and Henry is dead. However, you haven’t been tracking all the
political turmoil of the War of the Roses, so you aren’t sure which of Henry
and Edward was king at the time of the killing. You’re thus uncertain
whether the king is alive or dead. (Let’s assume that if Henry was king,
Edward doesn’t become king until some future coronation, so that Henry
remains the (dead) king.) You thus endorse The king might be dead.
The king might be dead is true in both of these scenarios, but it is true in
different ways in each scenario. In the first scenario, there is certainty about
who is king but doubt about whether that person is alive or dead – it’s doubt
about vitality that makes the might claim true. But in the second scenario, there
is certainty about who is alive and who is dead, but doubt about who is king –
it’s doubt about nobility that makes the might claim true.
Might
λw
is dead
the king
(For simplicity we assume that is dead is a single intransitive verb, rather than
forming it from an adjective dead and an is of predication.) We already have
world-relativized semantic values available for most of the pieces:
• ~mightw = λx(w, t) .∃u ∈ w(wRu ∧ x(u) = >)
283
λx(w, t) .∃u ∈ w
(wRu ∧ x(u) = >)
Might
λw
λx.x is dead in w
is dead
λx(e, t) .λy(e, t) .∃z ∈ e(x(z) = > λx.x is king in w
∧∀u(x(u) = > ↔ u = z) ∧ y(z) = >)
king
the
Composing the king, we get:
So we have:
284
λx(w, t) .∃u ∈ w
(wRu ∧ x(u) = >)
Might
λw
285
λx(w, t) .∃u ∈ w
(wRu ∧ x(u) = >)
Might
λw ∃z ∈ e(z is a king in w
∧∀u(u is a king in w ↔ u = z) ∧ z is dead in w)
286
λx(w, t) .∃u ∈ w λww. ∃z ∈ e(z is a king in w∧
(wRu ∧ x(u) = >) ∀u(u is a king in w ↔ u = z) ∧ z is dead in w)
Might
λw ∃z ∈ e(z is a king in w
∧∀u(u is a king in w ↔ u = z) ∧ z is dead in w)
287
∃u ∈ w(wRu ∧ ∃z ∈ e(z is a king in u∧
∀y(y is a king in u ↔ y = z) ∧ z is dead in u))
Might
λw ∃z ∈ e(z is a king in w
∧∀u(u is a king in w ↔ u = z) ∧ z is dead in w)
Scenario 1 and Scenario 2 above correspond to what look like two different
scope readings of Might(the king is dead). On one reading (the Scenario 2
reading), the modal might has scope over the definite description the king.
288
On this reading we first pick a world w and then second pick out whoever is
king in that world w, and then check whether that person is dead or alive in w.
On another reading (the Scenario 1 reading), the definite description the king
has scope over the modal might. On this reading, we first pick out whoever is
king (in the actual world), and then pick out a world w and see whether that
person is dead or alive in w.
The tree we considered above gives might scope over the king. (We can see this
from the structure of the tree, since might c-commands the king, but not vice
versa.) And that tree, as we’ve seen, produces truth conditions for Might(the
king is dead) that are suitable for Scenario 2. To capture the other scope op-
tion, we need a different tree. To get this other tree, we assume that the king
is able to move above might to the top of the tree, producing:
the king
λ1
might λw
x1 is dead
Problem 273: We might also want a tree for the other scope reading
that uses movement, so that the king moves out of the initial sen-
tence position, leaving a variable x1 , and the remaining variable is
then bound by the king:
might
λw
λ1
the king
x1 is dead
Give a full derivation of the semantic values for all nodes in this
tree, and show that the final result for the semantic value of the
entire sentence is the same as the semantic value we calculated in
the previous section.
We’ll now derive semantic values for this tree giving the king scope over
might, to see if the resulting truth conditions are suitable for Scenario 1. Be-
cause we are using variable binding, we need to relativize semantic values
both to worlds and to assignment functions. So we will calculate ~The king λ1
might λw x1 is deadw,g , for an arbitrary world w and assignment function g.
289
We start by adding lexical semantic values to the tree:
290
λy(e, t) .∃z ∈ e(z is a king in w∧
∀u(u is a king in w ↔ u = z) ∧ y(z) = >)
λ1
λx(w, t) .∃u ∈ w λu.g(1) is dead in u
λx(e, t) .λy(e, t) .∃z ∈ e(x(z) = > λx.x is king in w (wRu ∧ x(u) = >)
∧∀u(x(u) = > ↔ u = z) ∧ y(z) = >) λw g(1) is dead in w
king might
the
g(1) λx.x is dead
x1 is dea
Might then combines with this (w, t) function:
x1 is dead
We then λ-abstract ∃u ∈ w(wRu ∧ g(1) is dead in u) in the x1 position to get
λx.∃u ∈ w(wRu ∧ g[x/1](1) is dead in u):
291
λy(e, t) .∃z ∈ e(z is a king in w∧ λx.∃u ∈ w(wRu∧
∀u(u is a king in w ↔ u = z) ∧ y(z) = >) g[x/1](1) is dead in u)
∃u ∈ w
λx(e, t) .λy(e, t) .∃z ∈ e(x(z) = > λx.x is king in w λ1 (wRu ∧ g(1) is dead in u)
∧∀u(x(u) = > ↔ u = z) ∧ y(z) = >)
king
the λx(w, t) .∃u ∈ w λu.g(1) is dead in u
(wRu ∧ x(u) = >)
λw g(1) is dead in w
might
g(1) λx.x is dead in w
x1 is dead
This (e, t) function is then combined with the ((e, t), t) value of the king:
• ~the kingw,g (~λ1 might λw x1 is deadw,g )
292
∃z ∈ e(z is a king in w ∧ ∀u(u is a king in w ↔ u = z) ∧ ∃u ∈ w(wRu ∧ z is dead in u))
∃u ∈ w
λx(e, t) .λy(e, t) .∃z ∈ e(x(z) = > λx.x is king in w λ1 (wRu ∧ g(1) is dead in u)
∧∀u(x(u) = > ↔ u = z) ∧ y(z) = >)
king
the λx(w, t) .∃u ∈ w λu.g(1) is dead in u
(wRu ∧ x(u) = >)
λw g(1) is dead in w
might
g(1) λx.x is dead in w
x1 is dead
Compare the results we got from the two trees:
1. When might scopes over the king, the final truth conditions are ∃u ∈
w(wRu∧∃z ∈ e(z is a king in u∧∀y(y is a king in u ↔ y = z)∧z is dead in u)).
2. When the king scopes over might, the final truth conditions are ∃z ∈
e(z is a king in w∧∀u(u is a king in w ↔ u = z)∧∃u ∈ w(wRu∧z is dead in u)).
The crucial difference between the two truth conditions is:
1. When might scopes over the king, we find an individual who is both king
and dead in some merely possible world that we reach via accessibility
from the (actual) world of assessment.
2. When the king scopes over might, we find an individual who is king in
the (actual) world of assessment, and then check whether that individual
is dead in some merely possible world that is accessible from the actual
world.
So when the king scopes over might, the truth conditions amount to the re-
quirement that the person is who is in fact king is such that he is dead in some
possible world. This matches the reading that we get in Scenario 1.
It’s time to address the fact that we’ve been cheating throughout this long
discussion of modals. We’ve been treating modals such as might and must as
if they acted on entire sentences, and thus have been making use of artificial
constructions such as:
293
• Might(the king is dead)
• Must(Socrates admires Plato)
But that’s not really how modals work in English. The real English sentences
are:
• The king might be dead
• Socrates must admire Plato
The king might be dead looks like it has a tree of the form:
We could try to build a fancier semantics for modals that made them some
kind of predicate modifier rather than sentential operator, and gave them some
type other than (w, t). But instead we’ll suggest that English (and other natural
languages) has a more complicated syntax than we’ve been assuming. We
begin by separating inflection fro verb. To see the distinction, consider the
general availability of do-variants of sentences:
• The king rules the land // The king does rule the land
• Socrates taught Plato // Socrates did teach Plato
• Plato thought Aristotle refuted him // Plato did think Aristotle
refuted him
Notice that when do is inserted into the sentence, two things happen:
1. Any tense or person marking on the original verb disappears from that
verb.
2. The tense or person marking appears instead on do.
Thus in Socrates did teach Plato, the verb teach is not marked for third
person (it is not teaches) and is not marked for past tense (it is not taught). Do,
on the other hand, is marked for third person and past tense, and thus appears
as did.
This suggests that the verb itself can be separated from the inflection (tense and
person, at least) of the verb. Returning to our X-bar syntactic framework, we’ll
make four suggestions:
294
1. There is a category head I (for inflection)
2. The complement of I is VP.
3. The specifier of I is DP (the subject of the sentence)
4. The maximal projection IP of I is the entire sentence.
The tree for Socrates did teach Plato is thus:
IP
DP I
Socrates
I VP
did V DP
teach Plato
Next we notice that there are other constructions that force verbs to appear
without the normal tense and person inflection information:
• Socrates can read the book. [not reads]
• Plato will teach Aristotle. [not teaches]
• Aristotle wanted to open the door. [not opened]
• Alexander saw the barbarian die. [not dies or died]
We won’t try to deal here with all of the complications that arise in these cases.
(For example, we won’t touch the question of when the uninflected verb shows
up simply without tense and number (teach) and when it shows up as an
infinite without tense and number (to open). But we can at least suggest that
modals such as may, must, might, and can are in category I, so that we get trees
of the form:
IP
DP I
Socrates
I VP
can V DP
read D NP
the N
book
295
IP ]
DP I
D NP
I VP
the N V AP
might
king be A
dead
Notice that on this approach, the future tense will gets grouped with modals
like may and must. But past tense can’t be handled in quite the same way, or we
get unacceptable trees of the form:
IP
DP I
D NP
I VP
the N
-ed
V PP
linguist
walk P DP
across D NP
the N
street
That’s unfortunate. As a clever fix, we’ll propose that the supposedly unac-
ceptable tree is in fact acceptable, but that the past tense marker -ed moves to
a different position in the tree:
296
IP
DP I
D NP
I VP
the N
∅
linguist V PP
walk-ed P DP
across D NP
the N
street
We then owe a story about why the past tense inflection marker -ed moves
downward but the future tense will, as well as modals like must, do not move
downward. But we won’t worry about those details here.
We still haven’t solved the central problem – we still have modals scoping over
something other than an entire sentence. To get modals in a position in which
they can be of type (w, t), we need to consider the VP-Internal Subject Hy-
pothesis. The VP-Internal Subject Hypothesis says that subjects of verbs are
produced as specifiers of the VP, rather than (as we’ve been doing) as specifiers
of the TP. Again, we won’t worry about syntactic evidence for the VP-Internal
Subject Hypothesis (although see the next problem). So we get the following
tree for Plato must teach Aristotle:
IP
I VP
must
DP V
Plato V DP
teach Aristotle
Notice that if we accept the VP-Internal Subject Hypothesis, the semantic type
for VP is t, which we can see if we add types to the tree:
297
IP
I VP
must t
DP V
e (e, t)
Plato
V DP
teach Aristotle
But now we’re back where we started – we’ve got the modal scoping over the
entire sentence, but that structure doesn’t match normal English word order.
So we add another bit of movement. Here we require the subject to move to
the specifier of the IP phrase:
IP
DP I
e
I VP
Plato
must t
DP V
e (e, t)
?
V DP
teach Aristotle
Now we’ve got the word order right. But some final adjustment is needed to
get all of the syntactic typing to work out. First, we need to say what is left
behind when Plato moves to the specifier of IP. The obvious suggestion is that
a (type e) variable remains. But then we’ll need that variable to be bound,
which means we’ll need to treat Plato as a variable binder. Fortunately, we
know how to do this, by treating names as generalized quantifiers of type ((e,
298
t), t). So we have:
IP
DP I
((e, t), t)
I VP
Plato
must t
DP V
e (e, t)
x1
V DP
teach Aristotle
Second, we need some lambda-abstractors to prepare for (i) the variable bind-
ing by Plato and (ii) the world-binding by must. Adding these, we can give
full typing for the sentence:
299
IP
DP (e, t)
((e, t), t)
λ1 I
Plato
t
I (w, t)
((w, t), t) λw VP
must t
DP V
e (e, t)
x1
V DP
teach Aristotle
(If we are treating Plato as a type ((e, t), t) generalized quantifier, we probably
ought to do the same with Aristotle. But the extra complication won’t help
with anything we want to do here (although it also wouldn’t mess anything
up), so we’ll leave Aristotle as type e.)
Let”s quickly check that everything works out now. Adding semantic values,
we have:
300
IP
DP (e, t)
((e, t), t)
λx(e, t) .x(Plato)
λ1 I
Plato
t
I (w, t)
((w, t), t)
λw VP
λx(w, t) .∀u ∈ w(wRu → x(u) = >)
t
must
DP V
e (e, t)
g(1)
V DP
x1
(e, (e, t)) e
teach Aristotl
We then start composing semantic values:
• ~teaches Aristotlew,g = ~teachesw,g (~Aristotlew,g )
• = (λx.λy.y teaches x in w)(Aristotle)
• = λy.y teaches Aristotle in w
301
• = g(1) teaches Aristotle in w
• ~λw x1 teaches Aristotlew,g = λu.g(1) teaches Aristotle in u
• ~must λw x1 teaches Aristotle in ww,g = ~mustw,g (~λw x1 teaches
Aristotlew,g )
• = (λx(w, t) .∀u ∈ w(wRu → x(u) = >))(λu.g(1) teaches Aristotle in u)
So far, so good. But, while we’ve given a syntactic story that allows modals to
function as sentential operators, we don’t yet have everything we need to deal
with the scope ambiguity of The king might be dead. Right now we generate
only one tree for this sentence:
IP
DP
λ1 I
D NP
the N I
λw VP
king might
DP V
x1 V AP
be A
dead
302
This tree results from producing the subject the king as the specifier of the VP
(following the VP-Internal Subject Hypothesis), and then moving it, as before,
to the specifier of IP. But (without going through all the details) we then have
the king scoped above might, so we will pick out whoever is king in the actual
world, and then check, in some accessible world u, whether that person is dead
in u. That gives us only the Scenario 1 reading of our sentence.
Once we have added the distinction between IP and VP phrases and the VP-
Internal Subject Hypothesis, how can we get a tree producing the reading on
which the king takes scope under might? It’s tempting to think we could just
leave the subject the king in its original position as the specifier of VP, so that
it will be scoped under might in the I position. But if we leave the king as
specifier of the VP, we’re again left without an explanation of the word order
of the English sentence. So instead we suggest that there is a second phase of
movement. First the king moves from the specifier of VP to the specifier of IP.
And then might moves from its original I position to a higher I position:
303
IP
I (w, t)
((w, t), t)
might λw IP
DP (e, t)
((e, t), t) λ1 I
D NP t
e (e, t)
x1 V AP
be (e, t)
dead
The full picture of the derivation of the sentence is:
1. Initially, the subject the king is the specifier of the VP, and the modal
might is in the I position above the subject.
2. The subject then moves above might to the specifier of IP. The tree that
results from this movement is the tree that produces the visible form of
the sentence.
3. The modal might then moves above the (moved) subject to a higher I
position. The resulting final tree is the tree on which semantic processing
304
occurs.
(We owe a story about why the upward movement of the king is visible in
the final sentence on the page, but the upward movement of might above the
(moved) the king is not visible. But we’ll set aside that issue for now.)
Problem 274: Give a full semantic derivation for this tree, and con-
firm that the resulting truth conditions match those of Might(the
king is dead, with the modal might having scope over the quanti-
fied noun phrase the king. What should we say about the seman-
tic value of the empty node ∅ left by the movement of might to the
higher I position?
Alexander the Great, flush with his victory over the Persian empire, announces:
• No army can defeat me.
We can now give two different syntactic trees for Some army could defeat
Alexander, depending on whether the modal could moves above the (moved)
subject:
305
1. IP
DP
λ1 I
D NP
some N I
λw VP
army could
DP V
x1
V DP
defeat Alexander
2. IP
could λw IP
DP
λ1 IP
D NP
I VP
some N
∅
army DP V
x1
V DP
defeat Alexander
We want a reading of Hephaestion’s claim on which the army that (possibly)
defeats Alexander is a non-actual army. So we don’t want the first tree, in
which some army scopes over could, since in that tree we pick an army in the
actual world, and then assess whether that army defeats Alexander in some
other world.
The second tree is better. It has some army scope under could, which allows us
to pick something that is an army only in the possible world under considera-
tion. The second tree produces the truth conditions:
• ∃u ∈w(wRu ∧ ∃x ∈e(x is an army in u and x defeats Alexander in u))
306
But there’s an important difference between the Scenario 2 reading we wanted
of The king might be dead and the reading we want of Some army could
defeat Alexander. In Scenario 2, we had two actual individuals – Edward IV
and Henry VI – but we wanted to consider both worlds in which Edward IV
was king and worlds in which Henry VI was king. It was important that we
pick out the possibly dead individual not by looking at who was actually king,
but rather by looking at who was king in various possible circumstances. But the
people who were kings in those possible circumstances were real people.
The case of Alexander and Hephaestion looks different. Perhaps, for example,
Hephaestion’s thought is this: the Etruscans don’t have an army, being a peace-
ful people. But they’re also a noble and determined people – had they formed
an army, it would have been a formidable one, capable of defeating Alexander.
What, in this scenario, is the member of type e which is, in some possible world,
an army that defeats Alexander? Not the army of the Etruscans – there is no
army of the Etruscans (although there could have been one), so that can’t be the
thing that is possibly an Alexander-defeating army. Perhaps it is the Etruscan
people? They are not an army, but perhaps they – the collection of them, that
very actually existing thing – could have been an army. But that might not
give us what we want. Perhaps the Etruscans are few in number, and they,
no matter how organized and trained, could never be an Alexander-defeating
army. But the Etruscans are long-range planners – had they chosen to confront
Alexander in the field of battle, they would first have had larger families for
many generations, producing many new Etruscans to form their mighty army.
That is indeed a situation in which there could have been an army capable of
defeating Alexander. But the actual thing that could have been that army isn’t
the Etruscan people, because that army isn’t composed of the actual Etruscans,
but rather of merely possible Etruscans.
In fact, there just isn’t anything suitable in type e, if we’re thinking of that as
the collection of all actual entities – nothing that could have been (but isn’t) an
army capable of defeating Alexander. The core issue here is that there could
have been things other than the things that actually are, so talking about what
might have been requires looking beyond the contents of type e.
307