Search and Retrieval of Information
Search and Retrieval of Information
Information retrieval is the next step to determining information needs. It can be recovered
through different tools: databases, Internet, thesauri, ontologies, maps... Knowing and
using these tools contributes to quality recovery.
Recover of information
The recovery process is carried out through queries to the database where the structured
information is stored, using an appropriate interrogation language. It is necessary to take
into account the key elements that allow the search to be carried out, determining a greater
degree of relevance and precision, such as: indexes, keywords, thesauri and the phenomena
that can occur in the process such as noise and documentary silence. . One of the problems
that arise in the search for information is whether what we recover is “a lot or a little”, that
is, depending on the type of search, a multitude of documents can be recovered or simply a
very small number. This phenomenon is called Silence or Documentary Noise.
Documentary silence : These are those documents stored in the database but that
have not been recovered, because the search strategy has been too specific or
because the keywords used are not appropriate to define the search.
Document noise : These are those documents recovered by the system but that are
not relevant. This usually happens when the search strategy has been defined too
generic.
Process where previously stored information is accessed, using computer tools that allow
establishing specific search equations. This information must have been structured prior to
its storage.
Essential components
Tools
Databases
Internet
1
Electronic magazines
Search engines. Search engines are tools that allow you to locate and retrieve
information stored on the Internet. The operation is similar to databases, they store
pages with certain characteristics (metadata) and later, after using some keywords,
they issue a list of the most relevant ones.
o General search engines
Google (http://www.google.com)
Alltheweb (http://www.alltheweb.com)
AltaVista (http://www.altavista.com)
Excite (http://www.excite.com)
Infoseek (http://www.infoseek.com)
Lycos (http://www.lycos.com)
Webcrawler (http://webcrawler.com)
Hotboot (http://www.hotbot.com)
Directories. Directories are organized lists that allow us to access information in a
structured and hierarchical way. They are classified into categories and the user
links from the most general to the most specific.
o Recommended for searches in which the user does not know much about the
specific topic
The Google Directory (http://directory.google.com)
Ozu (http://categorias.ozu.es)
The index (http://www.elindice.com)
Yahoo (http://www.yahoo.com)
o Directory and specialized engines
Humbul http://www.humbul.ac.uk
Librarian Index to the Internet http://lii.org
Internet Public Library http://www.ipl.org
Scirus http://www.scirus.com
Search4Science http://www.search4science.com
Metasearch engines. They are search engines, with the quality that they not only
search in a single database, but when entering the search concepts they scan
different databases, in this way the breadth of results is greater.
o Vivisimo (http://www.vivisimo.com)
o Dogpile (http://www.dogpile.com)
o Kartoo (http://www.kartoo.com)
o Qbsearch (http://www.qbsearch.com)
o Metacrawler: (http://www.metacrawler.com)
Selective search engines. They use a database specialized in a subject.
o Ask (http://www.ask.com)
o Teoma (http://www.teoma.com)
o Electric Library (http://www.elibrary.com)
o Hieros Gamos http://www.hg.org/index.html
Program to search
o Copernic (http://www.copernic.com)
Intelligent agents. Intelligent agents are tools that allow you to locate information
automatically. You only need to define a search profile and where you should
2
launch it (databases, websites, etc.) and they automatically present a report on the
new information that is found. is emerging.
o BookWhere http://www.bookwhere.com
o BullsEye Pro http://www.intelliseek.com
o WebSeeker 5 http://www.bluesquirrel.com/
o WebFerret http://www.ferretsoft.com
Indices.
List of standardized terms that represent the content of a resource. Some types are:
Subject index: terms ordered according to the subjects covered by the database, the
search engine, etc.
Alphabetical index: listing of terms alphabetically
KWIC Index: Type of permuted index in which the thematic content of a work is
represented by keywords from its title or another source of information in the
document.
KWOC Index: Type of permuted index that varies in presentation from the KWIC
index, in which keywords appear as a separate line heading. Under each heading
appears all the titles, complete or truncated, that contain the keyword in question.
Keywords (Keywords).
Meaningful term in natural language that represents the content of the document.
When searching for information, this option is essential since it allows us to narrow down
and specify information. The problem lies in defining the exact word that represents the
content, which is why it is convenient to use specifiers. For example, if we use the word
flower in any search engine we may be looking for the nearest florist, an image of flowers
or a study about flowers in the different seasons of the year.
Meta Keywords. Most search engines use the keywords of each web page to locate
resources. For this reason, it is essential that each page has a label that includes the
keywords that define it. The exact definition of each page is also important because
it is from these that search engines locate a resource or not.
Thesauruses
Its main characteristic is that the terms are arranged hierarchically, allowing terminological
precision in the search for information.
3
Components:
Admitted or preferred descriptors : these are those normalized terms (where they
have undergone an expurgation process, denying plurals, avoiding synonyms, etc.)
that the thesaurus considers suitable to be assigned to a document and that
subsequently facilitate recovery.
Unsupported descriptors : these are those that, even though they are standardized,
are not considered appropriate for use (they are usually synonyms, terms not used in
the field of action, etc.)
Relations:
Idioms
Each recovery system has its own interrogation language, which allows it to “speak” in the
same language as the database. This language, like any other, has its own syntax that
specifies the special characteristics of the search, determining at all times the relationship
that the search elements have. The grammatical rules in question language are operators.
There are no guidelines that tell us how to exactly do all the searches because each query is
different. That is why it is convenient to define a basic work procedure:
Simple equations
Composite equations
Operators
4
Logical or Boléan: They allow you to convert the words of the query into
mathematical sets, and operate with the words as if they were sets. The basic
operations are addition (OR), subtraction (NOT) and product (AND).
Navigation is the program that allows you to consult and obtain information through
hypertext systems.
Differences
The essential difference between both concepts lies in the way of obtaining information;
While information retrieval is obtained linearly, navigation has the ability to obtain
information through hypertext. This means that the acquisition of knowledge is carried out
gradually and depending on the user's interest, it is deepened through the information nodes
in one subject or another.
5
the page. relevant fields such as the title, keywords, etc.
They store the information through theirThey store information through directories,
own database. classified into categories.
The search is performed in the database The search is carried out hierarchically
using the search equation. according to the established categories.
The presentation of the results is The presentation of the results is carried out
established in order of relevance through a list of all the corresponding
according to criteria established in thedocuments in the category, without any
search equation. presentation criteria.
Appropriate for locating specificAppropriate for locating general information on
information. a topic.
Metadata
Metadata in navigation and information retrieval are used to detect relevant information
quickly and efficiently. Tags describe the content of the web resource, which is then used
by search tools to locate and access the resource. Mainly it is the keyword and title tags that
give way to locating the document.
Recovery quality
Below are some basic criteria so that the recovery carried out is of quality.
6
Skills and competencies
Formulation of a plan for searching for information: defining the subject or aspects
to be searched, using a list of appropriate keywords, delimiting the search according
to chronological and idiomatic criteria.
Knowledge of potential and actual sources of information
Skills in locating relevant printed and electronic resources in the context of the
information need
Ability to select the most appropriate search tool and formulate the most appropriate
strategy.
Mastery of advanced techniques for retrieving information on the Internet, using
engines, search directories, and intelligent agents.
Skills to evaluate the results of the search, reflecting on successes, failures and
alternative strategies.
Determine the location and access to information, respecting ethical and legal
principles.