Boolean Search Logic

Download as pdf
Download as pdf
You are on page 1of 2

Volume 4 Issue 5 May 2012 Editors Desk

Boolean Search Logic the need for revisiting the basics


In a recent presentation, I presumed that almost everyone in information industry knows the set theory based logical operators, if not the limitations of and alternatives to Boolean search model. To my surprise there came a need to use a board and explain with Venn diagrams! Taking the same example of participants consuming ice-cream (i) and/or rosgulla (r) in the lunch, the diagrams below illustrate the four possibilities of Boolean operators. Note that the rectangular box represents the universal set of participants of the meeting (i.e., the collection of documents in case of information retrieval system) and the Boolean operators must
OR AND NOT XOR [+] [* &] [- ! ]

be ALL CAPS.

1. Boolean OR: Those who took either icecream or rosgulla can be expressed as (i OR r), i.e., (i + r). OR is the logical sum (union of sets) represented by +. Note that this group is not either ice-cream or rosgulla, but ice-cream or rasgulla or both. It is quite common to mistake and equate natural language or with Boolean OR.

2.

Boolean AND: Those who took both ice-cream and

rosgulla can be expressed as i AND r, i.e., (i * r) or (i & r). AND is the logical product (intersection of sets) represented by * as well as &. Note that this group consists of those who took both icecream and r asgulla. It is again quite likely to wrongly equate natural language and with Boolean AND. For example if one is interested in coffee and tea, he is not looking for documents having both coffee and tea, but would like to have all documents with either coffee or tea as well as both. The symbol ** or && can be used in place of the word AND if it is in the text itself in some systems.

3. Boolean NOT: Use `NOT to remove a word or phrase, i.e., NOT is the prohibit operator that excludes documents that contain the term after the "-" symbol like ice-cream rosgulla. The minus sign should appear immediately before the word to be excluded and should be preceded with a space. `NOT is more useful to express those who took ice-cream but not

rosgulla (i NOT r). That is NOT is the logical difference (negation of set) represented by - as well as !. Unfortunately NOT is a rarely used operator. All those represented in the rectangular box outside the two circles are those who did not take icecream or rosgulla or both and the same can also be expressed as NOT (i OR r) i.e., - (i OR r).

4. Boolean XOR: Those who took either icecream or rosgulla, but not both can be expressed as (i XOR r) i.e., {(i OR r) (i AND r)}. XOR represents either of them, but not both and is rarely known and used as the same can be expressed using the other three operators (in this example, it is expressed as the negation of intersection of the union of two sets, i.e., -(i OR r).

Lastly, use parentheses ( ) to group clauses to form sub queries, i.e., group operators to eliminate any confusion because operators are not associative (multiple clauses can also be grouped in field directed searches). Such Nested Boolean search allows more complex conditions on search terms with multiple operators, but nesting inside parentheses is a must to specify the order of execution of Boolean operators. What appears inside parentheses is processed first and the wide difference in number of hits for changed order of parentheses is shown in the example below. M S Sridhar [email protected]

You might also like