Learning Guide Unit 1 - Home
Learning Guide Unit 1 - Home
id=443814
1 of 10 12/10/2024, 12:01 PM
Learning Guide Unit 1 | Home https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814
2 of 10 12/10/2024, 12:01 PM
Learning Guide Unit 1 | Home https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814
3 of 10 12/10/2024, 12:01 PM
Learning Guide Unit 1 | Home https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814
4 of 10 12/10/2024, 12:01 PM
Learning Guide Unit 1 | Home https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814
This course will explore the key theories of information retrieval and will also put these theories into practice as you will build a complete
information retrieval (IR) system in a series of four development projects. Information retrieval has its beginnings in a paper presented by
Vannevar Bush in 1945 (Bush, 1945) in which Bush describes a system capable of storing and retrieving large amounts of information.
Lesk (1995) describes information retrieval as a discipline that ‘grew up’ as a function of library science. The archival and ability to search
library information was an important application of information retrieval techniques. The introduction of the internet and the world wide
web in the 1990’s has signi�cantly broadened the role and application of information retrieval techniques. Google has become a
technology leader by applying IR techniques to develop the ability to index and search the world wide web. One objective of this course is
to develop and understanding of the underlying theory of IR and the skills necessary to apply IR techniques.
The basic objective in information retrieval is the ability to �nd speci�c information within a corpus through the use of a query. A corpus is
a collection of information usually in the form of documents although other forms of media are becoming increasingly commonplace.
Imagine a collection of Shakespeare’s plays and you wanted to �nd just those that included ‘Ceasar’ as a subject. The way that you could
accomplish this is by scanning each work for the word ‘Ceasar’.
In our Unit 1 reading assignment, we will begin to explore information retrieval (IR). The �rst concept that we are introduced to is the
Boolean Retrieval model. The term Boolean refers to a simple two state protocol; on/o�, true/false, and of course present/not present.
The Boolean retrieval method is based upon the presence or lack of presence of the search term. The Boolean method is a very basic
concept that does not rank results but simply returns any document that meets the terms of the search.
One of the key topics that is introduced in unit one is the concept of an inverted index. The inverted index which is also called the postings
�le is a data structure that maps the words extracted from a document or set of documents to the documents that contain them and also
typically maintains the frequency the word appears.
The purpose of this structure is that it allows speci�c terms to be quickly searched to determine which documents contain the words
(search terms). Although the inverted index structure can support the Boolean Retrieval Model, it also enables other models such as the
Ranked Retrieval Model.
The Ranked Retrieval Model di�ers from the Boolean model in that users make use of free text queries rather than the precise language
of the Boolean model. In the Boolean model we issue a query that incorporates a strict Boolean language format which includes keywords
such as AND in which both terms are required to be present in order to return a document, OR in which either term can be present to
return the document or NOT in which the term CANNOT be present in order to return the document.
In the ranked retrieval model, queries are free text and relevance is determined by techniques such as the vector space model, learned
weights and other techniques for determining relevance.
In this �rst unit we are introduced to a number of concepts that may be quite new for you. Including tokenization, stemming, byword
indexes, and positional indexes. Make sure that you spend some time understanding these concepts. As a reminded, each unit contains a
self-quiz. This self-quiz does not receive a grade and has no points, however, it is designed as a learning tool and is important to use in
conjunction with the reading assignment. You should begin each unit by completing the reading assignment, reviewing the unit overview,
and then taking the self-quiz. Every time you answer a question incorrectly, you should immediately go back and review the relevant
sections in the reading assignment or overview to ensure your understanding of the subject matter. This iterative process will aid in your
learning and help you to prepare for the mid-term and �nal exams.
Bush, V. (1945). As We May Think. Atlantic Monthly. 176(1). 101-108. Retrieved June 10, 2011 here.
Lesk, M. (1995). The Seven Ages of Information Retrieval. UDT Occasional Paper # 5. Retrieve June 10, 2011 from http://archive.i�a.org/
VI/5/op/udtop5/udtop5.htm
5 of 10 12/10/2024, 12:01 PM
Learning Guide Unit 1 | Home https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814
Manning, C.D., Raghaven, P., & Schütze, H. (2009). An Introduction to Information Retrieval (Online ed.). Cambridge, MA: Cambridge
University Press. Available at http://nlp.stanford.edu/IR-book/information-retrieval-book.html
• Boolean Retrieval
• Document
• Corpus
• Inverted Index
• Posting
• Intersection
• Ranked Retrieval
• Term Frequency
• Tokenization
• Document unit
• Stop words
• Normalization
• Stemming
• Lemmatization
• Skip pointer
• Biword index
• Positional index
6 of 10 12/10/2024, 12:01 PM
Learning Guide Unit 1 | Home https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814
In unit one, we are introduced to the concept of the inverted index as a fundamental technology in information retrieval systems. The
inverted index essentially is an index of words known as terms extracted from the document corpus that can be searched to �nd
documents with the content that the user is looking for. Our text also introduces two extensions to the concept of the inverted index, the
biword index and the positional index.
You must post your initial response before being able to review other student’s responses. Once you have made your �rst response, you
will be able to reply to other student’s posts. You are expected to make a minimum of 3 responses to your fellow student’s posts.
Did the posting describe either the byword index or positional index?
Did the description explain how the index is di�erent from the inverted index?
Did the posting describe under what circumstances the index would be used?
Did the posting describe the advantage that the index has over the inverted index?
7 of 10 12/10/2024, 12:01 PM
Learning Guide Unit 1 | Home https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814
Your learning journal entry must be a re�ective statement that considers the following questions:
• Describe what you did. This does not mean that you copy and paste from what you have posted or the assignments you have
prepared. You need to describe what you did and how you did it.
• Describe your reactions to what you did
• Describe any feedback you received or any speci�c interactions you had. Discuss how they were helpful
• Describe your feelings and attitudes
• Describe what you learned
8 of 10 12/10/2024, 12:01 PM
Learning Guide Unit 1 | Home https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814
The Self-Quiz gives you an opportunity to self-assess your knowledge of what you have learned so far.
The results of the Self-Quiz do not count towards your �nal grade, but the quiz is an important part of the University’s learning process
and it is expected that you will take it to ensure understanding of the materials presented. Reviewing and analyzing your results will help
you perform better on future Graded Quizzes and the Final Exam.
Please access the Self-Quiz on the main course homepage; it will be listed inside the Unit.
9 of 10 12/10/2024, 12:01 PM
Learning Guide Unit 1 | Home https://my.uopeople.edu/mod/book/tool/print/index.php?id=443814
Participate in the Discussion Assignment (post, comment, and rate in the Discussion Forum)
10 of 10 12/10/2024, 12:01 PM