0% found this document useful (0 votes)
36 views18 pages

KEN2570-5-Search and IR

The document provides an agenda for an information retrieval and search class that includes the following: 1) An lab assignment on wildcards is due Friday at 6pm and any issues should be reported on Discord. 2) Students should form project teams and think about potential project ideas to propose. 3) The class will cover an introduction to information retrieval, retrieval models like the Boolean and vector space models, and evaluation methods.

Uploaded by

Gibi Gibi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views18 pages

KEN2570-5-Search and IR

The document provides an agenda for an information retrieval and search class that includes the following: 1) An lab assignment on wildcards is due Friday at 6pm and any issues should be reported on Discord. 2) Students should form project teams and think about potential project ideas to propose. 3) The class will cover an introduction to information retrieval, retrieval models like the Boolean and vector space models, and evaluation methods.

Uploaded by

Gibi Gibi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

KEN2570 Public Service Announcements

Natural Language Processing • Lab 1 is due this Friday 18.00


- How are we doing?
Information Retrieval/Search - Report any issues on Discord
- Wildcards can be used
• Project proposal
- Form your teams
- Think about your project idea
Jerry Spanakis
http://dke.maastrichtuniversity.nl/jerry.spanakis
@gerasimoss

Agenda Information Retrieval/Search


• Intro to IR
• Retrieval models
- Boolean model
- Vector space model (BOW + TFIDF)
• Evaluation

• Part of Information Extraction (NER)


Retrieval Models Search engines • Foundation for Question-Answering
& Conversational Agents

3 4
Information Retrieval The classic search model
• Information Retrieval (IR) is finding material (usually
Get rid of mice in a
documents) of an unstructured nature (usually text) User task
politically correct way
that satisfies an information need from within large Misconception?
collections (usually stored on computers). Info about removing mice
Info need
without killing them
- These days we frequently think first of web search, but there Misformulation?
are many other cases:
- E-mail search Query
how trap mice alive Search
- Searching your laptop
- Corporate knowledge bases
- Legal information retrieval Search
engine

Query Results
5 Collection 6
refinement

Sec. 1.1 Sec. 1.1

What do people search? What do people search?


https://trends.google.com/trends/yis/2022/GLOBAL/ https://trends.google.com/trends/yis/2022/US/ https://trends.google.com/trends/yis/2022/NL/

7 8
Challenges & Characteristics Text Retrieval Is Hard!
• Dynamically generated content
• Under/over-specified query
• New pages get added all the time
- The size of the web (or textual content in general) just doubles
- Ambiguous: “buying CDs” (money or music?)
every a few minutes - Incomplete: what kind of CDs?
• Users (usually) revise and revisit their queries - What if “CD” is never mentioned in documents?
• Queries are not extremely long • Vague semantics of documents
- They used to be very short (up to 2 words) - Ambiguity: e.g., word-sense, structural
• Probably a large number of typos - Incomplete: Inferences required
• A small number of popular queries • Hard even for people!
- A long tail of infrequent ones
• Almost no use of advanced query operators
- 80% agreement in human judgments or relevant
results
9 10

Text Retrieval Is “Easy”! Retrieval models


• Text retrieval CAN be easy in a particular
case:
- Ambiguity in query/document is relative to the
database
- So, if the query is specific enough, just one
keyword may get all the relevant documents
• Perceived text retrieval performance is
usually better than the actual performance
- Users can NOT judge the completeness of an
answer
11 12
Sec. 1.1 Sec. 1.1

First approach: Term-document matrices Incidence vectors


• So, we have a 0/1 vector for each term.
• To answer query: take the vectors for
Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth
Antony 1 1 0 0 0 1
Brutus 1 1 0 1 0 0
Caesar
Calpurnia
1
0
1
1
0
0
1
0
1
0
1
0
Brutus, Caesar and ΝΟΤCalpurnia and apply
Cleopatra
mercy
1
1
0
0
0
1
0
1
0
1
0
1 some logical rules
worser 1 0 1 1 1 0

Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth
Antony X 1 1 0 0 0 1
Brutus 1 1 1 0 1 0 0
Caesar 1 1 1 0 1 1 1
Calpurnia 0 0 1 0 0 0 0

Brutus AND Caesar BUT NOT 1 if play contains Cleopatra X 1 0 0 0 0 0


mercy X 1 0 1 1 1 1

Calpurnia word, 0 otherwise worser X 1 0 1 1 1 0

13 14

Sec. 1.1

Problems with the Boolean model Storage


• Consider N = 1 million documents,
• Storage & size each with about 1000 words.
• Sparsity • Avg 6 bytes/word including spaces/punctuation
• Requires exact match • What is the memory size required?
• Others…?
A.6 MB
B. 60 MB
C. 1 GB
D.6 GB
E. 60 GB

15 16
Sec. 1.1

Sparsity A potential improvement: Inverted Index


• Consider N = 1 million documents, • For each term t, we must store a list of all
each with about 1000 words. documents that contain t.
• Say there are M = 500K distinct terms among
these. • Given a query, we can fetch the lists for all
• Term-document matrix is 500K x 1M query terms and work on the involved
• How many non-zero elements does it have? documents.
A.1 million
B. 1 billion • Keep everything sorted! This gives you a
C. 0.5 trillion logarithmic improvement in access.
D.1 trillion
E. What’s a trillion? 17 18

Inverted index construction Tokenizer+Lingustics = Vocabulary reduction


Remember all these steps…?
Documents to Friends, Romans, countrymen.
be indexed
• Tokenization
Tokenizer
- Cut character sequence into word tokens
- Deal with “John’s”, a state-of-the-art solution
Token stream Friends Romans Countrymen • Normalization
- Map text and query term to same form
Linguistic modules - You want U.S.A. and USA to match
friend roman countryman
• Lemmatization or Stemming
Modified tokens - We may wish different forms of a root to match
Indexer friend 2 4 - authorize, authorization
• Stop words
roman 1 2 - We may omit very common words (or not)
Inverted index
countryman 13 16 - the, a, to, of
19 20
Sec. 1.3

Exact match What the boolean model doesn’t tell us


• The Boolean retrieval model is being able to ask a
• How to define/select the tokens/words
query that is a Boolean expression:
- Boolean Queries are queries using AND, OR and NOT to join (or the “basic concepts”)
query terms - What kind of pre-processing I need
- Is precise: document matches condition or not. • How to assign weights
- Perhaps the simplest model to build an IR system on
- Weight in query indicates importance of term
- Weight in doc indicates how well the term
• Primary commercial retrieval tool for many decades.
characterizes the document
• Many search systems you still(?) use are Boolean:
- Email, library catalog, Mac OS X Spotlight • How to define a similarity/distance measure

• Many improvements/extensions: e.g. use bigrams


21 22

What’s a good “basic concept”? Why Is Being “Orthogonal” Important?


1) Orthogonal • Query = {laptop computer pc sale}
- Linearly independent basis vectors • Document 1 = {computer sale}
- “Non-overlapping” in meaning • Document 2 = {laptop computer pc}
2) No ambiguity
3) Weights can be assigned automatically Ambiguity is the killer
and –hopefully- accurately
• Query = {Jaguar band}
• Many possibilities: Words, stemmed • Document 1 = {Jaguar car}
words, phrases, “latent concept”, … • Document 2 = {Jaguar cat}
23 24
How to Assign Weights? Weighting Is Very Important…
• Very, very important! • Query = {text information management}
• Why weighting • Document 1 = {text information}
- Query side: Not all terms are equally important • Document 2 = {information management}
- Doc side: Some terms carry more information about
content
• Document 3 = {text management}
• How?
- Two basic heuristics • How do we weight the multiple occurrences
- TF (Term Frequency) = Within-doc-frequency of a term in a document?
- IDF (Inverse Document Frequency) • How do we weight different terms across
- Many variations of these exist… documents?
25 26

Sec. 6.2

Term-document count matrices Bag of words (BOW) model


• Consider the number of occurrences of a • Vector representation doesn’t consider the ordering of
term in a document: words in a document
• John is quicker than Mary and Mary is quicker than John
- Each document is a count vector in ℕ|V|: a
have the same vectors
column below
Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth
• This is called the bag of words model.
Antony 157 73 0 0 0 0
• Quite popular in many applications, involving
Brutus 4 157 0 1 0 0 classification problems
Caesar 232 227 0 2 1 1 • Can you think of other problems (other than the order)
Calpurnia 0 10 0 0 0 0
Cleopatra 57 0 0 0 0 0
of words?
mercy 2 0 3 5 5 1
worser 2 0 1 1 1 0 27 28
Sec. 6.2

Term frequency (tf) tf log-weighting


• The term frequency tft,d of term t in document d is • The log frequency weight of term t in d is
defined as the number of times that t occurs in d. ì1 + log10 tft,d , if tft,d > 0
• We want to use tf when computing query-document wt,d = í
î 0, otherwise
match scores. But how?
0 → 0, 1 → 1, 2 → 1.3, 10 → 2, 1000 → 4, etc.
• Raw term frequency is not what we want:
- A document with 10 occurrences of the term is more
relevant than a document with 1 occurrence of the term. • Score for a document-query pair: sum over
- But not 10 times more relevant. terms t in both q and d:
• Relevance does not increase proportionally with term
frequency.
score = åtÎqÇd (1 + log tft ,d )

NB: frequency = count in IR 29 30

Sec. 6.3

tf normalization tf normalization, cont.


• A vector can be (length-) normalized by dividing
• Why? each of its components by its length – for this
- Document length variation
- “Repeated occurrences” are less informative than
we use the L2 norm:
the “first occurrence” 
• Two views of document length
x2= åx 2
i i

- A doc is long because it uses more words • Dividing a vector by its L2 norm makes it a unit
- A doc is long because it has more content $%&' !(!,#
(length) vector 𝑡𝑓!,# =
• Generally we want to penalize long documents, ∑'
$%& !($,#
(

but avoid over-penalizing • Long and short documents now have


comparable weights
31 32
Sec. 6.2.1 Sec. 6.2.1

Document frequency (df) Document frequency, continued


• Frequent terms are less informative than rare terms
• Rare terms are more informative • Consider a query term that is frequent in the collection
than frequent terms (e.g., high, increase, line)
- Recall stop words • A document containing such a term might or might not
• Consider a term in the query that is rare in the be relevant
collection (e.g., anatidaephobia) • Intuition: If all documents in the collection contain the
term, then for sure it is not relevant
• A document containing this term is very likely to
• We want lower weights for frequent terms (relatively
be relevant to the query anatidaephobia to rare terms)
• → We want a high weight for rare terms like • We will use document frequency (df) to capture this.
anatidaephobia.
33 34

Sec. 6.2.1 Sec. 6.2.1

idf weight idf example, suppose N = 1 million


• dft is the document frequency of t: the
number of documents that contain t term dft idft

- dft is an inverse measure of the anatidaephobia 1

informativeness of t animal 100

- dft £ N sunday 1,000


fly 10,000
• We define the idf (inverse document under 100,000
frequency) of t by idft = log10 ( N/dft ) the 1,000,000
- We use log (N/dft) instead of N/dft to
“dampen” the effect of idf. idft = log10 ( N/dft )
There is one idf value for each term t in a collection.
Turns out the base of the log is immaterial. 35 36
Sec. 6.2.2 Sec. 6.2.2

tf-idf weighting Final ranking of documents for a query


• The tf-idf weight of a term is the product of its tf weight
and its idf weight.
w t ,d = (1 + log tft ,d ) ´ log10 ( N / df t ) Score(q,d) = ∑ tf.idft,d
t ∈q∩d
• Best known weighting scheme in information retrieval
- Note: the “-” in tf-idf is a hyphen, not a minus sign!
- Alternative names: tf.idf, tf x idf, tfidf
• Increases with the number of occurrences within a Note that we can (and will) use tf.idf weights
document for other tasks like e.g. classification!
• Increases with the rarity of the term in the collection

37 38

Sec. 6.3

Effect of idf on ranking Binary → Count → Weight matrix


Score(q,d) = ∑ tf.idft,d
t ∈q∩d

• Question: Does idf have an effect on ranking for Antony


Antony and Cleopatra
5.25
Julius Caesar
3.18
The Tempest
0
Hamlet
0
Othello
0
Macbeth
0.35
one-term queries, like: “iPhone” ? Brutus 1.21 6.1 0 1 0 0

• idf has no effect on ranking one term queries Caesar 8.59 2.54 0 1.51 0.25 0
€ Calpurnia 0 1.54 0 0 0 0
- idf affects the ranking of documents for queries with Cleopatra 2.85 0 0 0 0 0
at least two terms mercy 1.51 0 1.9 0.12 5.25 0.88

- For the query capricious person, idf weighting makes worser 1.37 0 0.11 4.15 0.25 1.95

occurrences of capricious count for much more in


the final document ranking than occurrences of Each document is now represented by a real-valued
vector of tf-idf weights ∈ R|V|
person.
39 40
Sec. 6.3 Sec. 6.3

The Vector Space Model Formalizing vector space proximity


• Distance measures we know:
• Now we have a |V|-dimensional vector space x = (x1 x2 ⋅⋅⋅ xV ) and y = (y1 y2 ⋅⋅⋅ yV )
• Terms are axes of the space ⎛V ⎞p
1

• Documents and queries are points or vectors in d(x, y) = ⎜⎜ ∑| xi − yi | p ⎟⎟ , p > 0


⎝ i=1 ⎠
this space
• For p=1 à Manhattan distance
• Very high-dimensional: tens of millions of
• For p=2 à Euclidean distance
dimensions when you apply this to a web
search engine ∑
V

i=1
xi yi
cos(x, y) =
• These are very sparse vectors – most entries ∑
V
xi2 ∑
V
yi2
are zero
i=1 i=1

• Cosine distance/angle of vectors


41 42

Sec. 6.3 Sec. 6.3

Formalizing vector space proximity Why distance is a bad idea

• Let’s compare Euclidean distances: • The Euclidean distance


between q and d2 is
distance(d1,d2), distance(d1,d3), distance(d2,d3)
large even though the
distribution of terms in
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 the query q and the
d1 1 0 0 0 0 0 0 0 0 0 distribution of terms in
d2 0 0 0 0 0 0 0 0 0 1 the document d2 are
d3 1 0 0 0 1 0 0 0 0 1 very similar.

• But then, could I use


angles instead?

43
Sec. 6.3

cosine(query,document) cosine for length-normalized vectors


• For length-normalized vectors, cosine
Dot product Unit vectors
similarity is simply the dot product (or
   
å
V
  q•d q d qd
i =1 i i
scalar product):
cos( q, d ) =   =  •  =
qd q d
å
V
q 2
å
V
d 2     V
i =1 i i =1 i
cos(q, d ) = q • d = ∑ qi di
i=1
qi is the tf-idf weight of term i in the query
di is the tf-idf weight of term i in the document

cos(q,d) is the cosine similarity of q and d … or, for q, d length-normalized.


equivalently, the cosine of the angle between q and d. €
45 46

Sec. 6.3 Sec. 6.3

Cosine similarity amongst 3 documents Cosine similarity amongst 3 documents, cont.


• Log frequency • After length
How similar are weighting normalization
the novels term SaS PaP WH
term SaS PaP WH term SaS PaP WH
SaS: Sense and affection 115 58 20 affection 3.06 2.76 2.30 affection 0.789 0.832 0.524
Sensibility jealous 2.00 1.85 2.04 jealous 0.515 0.555 0.465
jealous 10 7 11
PaP: Pride and gossip 1.30 0 1.78 gossip 0.335 0 0.405
gossip 2 0 6
Prejudice, and wuthering 0 0 2.58 wuthering 0 0 0.588
WH: Wuthering wuthering 0 0 38
cos(SaS,PaP) ≈
Heights? 0.789 × 0.832 + 0.515 × 0.555 + 0.335 × 0.0 + 0.0 × 0.0 ≈ 0.94
Term frequencies (counts)
cos(SaS,WH) ≈ 0.79
cos(PaP,WH) ≈ 0.69
Note: To simplify this example, we don’t do idf weighting.
Why do we have cos(SaS,PaP) > cos(SAS,WH)?
47 48
Sec. 6.4

tf-idf weighting has many variants (Okapi) BM25: A strong baseline


• Goal: be sensitive to term frequency and document length while
not adding too many parameters

N (k1 +1)tfi
RSV BM 25 = ∑ log ⋅ dl = document length (|d|)
avdl = average document length
i∈q dfi k ((1− b) + b dl ) + tf in the whole collection
1 i
avdl

• k1 controls term frequency scaling


- k1 = 0 is binary model; k1 large is raw term frequency
• b controls document length normalization
Columns headed ‘n’ are acronyms for weight schemes. - b = 0 is no length normalization; b = 1 is relative frequency (fully scaled
by document length)
• Typically, k1 is set around 1.2–2 and b around 0.75

49 (Robertson and Zaragoza 2009; Spärck Jones et al. 2000)

Why BM25 might be better than VSM tf-idf? Intelligent IR

• Example query:
• Taking into account the meaning of the words
“president lincoln” tfpresident,d tflincoln,d Score(q,d) used.
with BM25
• “president” is in 40.000 - Any external info e.g. WordNet?
documents in the collection 15 25 20.66
15 1 12.74
- Word vectors?
(dfpresident=40000)
• “lincoln” is in 300 documents in 15 0 5.00 • Taking into account the order of words in the
the collection (dflincoln=300) 1 25 18.2 query.
• The document length is 90% of 0 25 15.66
• Adapting to the user based on direct or indirect
the average length (dl/avgl=0.9)
• Let’s assume, k1=1.2, b=0.75
The low df term plays a bigger role. feedback.
• Taking into account the authority of the source.

52
Integrating multiple features How to combine features to assign a relevance
to determine relevance score to a document?
• Modern systems – especially on the Web – use a great
number of features: • Given lots of relevant features…
- Arbitrary useful features – not a single unified model
- Log frequency of query word in anchor text? • You can continue to hand-engineer retrieval
-
-
Query word in color on page?
# of images on page?
scores
- # of (out) links on page? • Or, you can build a classifier to learn weights for
- PageRank of page?
- URL length? the features
- URL contains “~”? - Requires: labeled training data
- Page edit recency?
- Page length? - This is the “learning to rank” approach, which has become a
• Google is using over 200 such features for the rankings and hot area in recent years (esp. with deep models)
constantly updates the algorithm (SEO, paid advertisements, - We only provide an elementary introduction here
etc.)
53 54

Sec. 15.4.1 Sec. 15.4.1

Simple example: Simple example:


Using classification for ad-hoc IR Using classification for ad hoc IR
• Collect a training corpus of (q, d, r) triples
- Relevance r is here binary (but may be multiclass, with 3–7 values) 0.05
- Document is represented by a feature vector
- x = (φ, ω) φ is cosine similarity, ω is minimum query window size R Decision surface
- ω is the the shortest text span that includes all query words R N
R R
- Query term proximity is a very important new weighting factor! R
cosine score φ

R R
- Train a machine learning model to predict the class r of a document-query
pair N N
0.025 R
R R
R N N
N
N
N
N N

0
2 3 4 5
Term proximity w 56
Sec. 8.6

Evaluation of search engines Measures for a search engine


• How fast does it index
- Number of documents/hour
- (Average document size)
• How fast does it search
- Latency as a function of index size
• Expressiveness of query language
- Ability to express complex information needs
- Speed on complex queries
• Uncluttered UI
• Is it free?
57 58

Sec. 8.1 Sec. 8.4

What about what users need, contet-wise? Evaluating results


• An information need is translated into a query • Evaluation of a result set:
• Relevance is assessed relative to the information need - If we have
- a benchmark document collection
not the query
- a benchmark set of queries
- assessor judgments of whether documents are relevant to
• E.g., Information need: I’m looking for information on queries
whether drinking red wine is more effective at reducing Then we can use Precision/Recall/F measure as before
your risk of heart attacks than white wine. • Evaluation of ranked results:
• Query: wine red white heart attack effective - The system can return any number of results
- By taking various numbers of the top returned
• You evaluate whether the doc addresses the documents (levels of recall), the evaluator can
information need, not whether it has these words produce a precision-recall curve
59 60
The Contingency Table How to evaluate ranking?
Compute the precision/recall at every point (or more often to the position where each
relevant document is retrieved). That is called Precision/Recall @K
Action Retrieved Not Retrieved
Doc
Sim. Doc. Relevant
Relevant + Relevant Retrieved Relevant Rejected 0.98 d1 + Recall = 0.25, Precision = 1
0.95 d2 + Recall = 0.5, Precision = 1
Not relevant - Irrelevant Retrieved Irrelevant Rejected
0.83 d3 - Recall = 0.5, Precision = 0.67
0.80 d4 + Recall = 0.75, Precision = 0.75
0.76 d5 -
Relevant Retrieved 0.56 d6 -
Precision = ……
Retrieved 0.34 d7 -
0.21 d8 +
Relevant Retrieved Recall = 1, Precision = 4/9
Recall = 0.21 d9 -
Relevant
In this example we can say e.g. that R@1 = 25% or R@4=75% or R@9=100%

61 62

Sec. 8.4

PR curves Other measures: R-precision


Plot a precision-recall (PR) curve for every pair • R-precision
- If have known (though perhaps incomplete)
set of relevant documents of size R, then
x
calculate precision of top-R docs returned
precision
x - Perfect system could score 1.0.
x
x
Example:
recall o In a ranking: R N R N R (5 documents in total, 3 relevant)
o I retrieve the first 3 documents (because I know there are 3 relevant)
o R-Precision is then 2/3 = 0.67.

63 64
Sec. 8.4

Other measures: MAP Modern IR


• Mean average precision (MAP)
- AP: Average of the precision value obtained for the Indexing Query
top-k documents, each time a relevant doc is
retrieved
- Avoids interpolation, use of fixed recall levels
- Does weight most accuracy of top returned results
- MAP for set of queries is arithmetic average of APs
- Macro-averaging: each query counts equally
Example:
In a ranking: R N R N R (5 documents in total, 3 relevant)
1 1 2 3
𝑀𝐴𝑃 = + + = 0.76
3 1 3 5
65 66

Modern IR, cont. Bias in IR


• BM25 is still a strong baseline! • Let’s google “professor style” and “teacher style” and see the
• Modern IR systems will rely on “dense vectors” images
(embeddings) for capturing semantics (“dense retrieval”) - Is it OK for a system to be biased if it amplifies a bias in the world? What if it
faithfully represents the world?
Louis, Antoine, and Gerasimos Spanakis. "A Statutory Article Retrieval Dataset in French." ACL 2022
Louis, Antoine, Gijs van Dijck, and Gerasimos Spanakis. "Finding the Law: Enhancing Statutory Article Retrieval via Graph Neural Networks." EACL2023
• Let’s try “why coffee is good for you” vs. “why coffee is bad for
you”
- Does a system have a responsibility to give us unbiased information when we
ourselves are biased? What are the potential impacts of this kind of bias in search
results?
• Bias in IR is unsolved (and complex)
- If you were the CEO of a search engine company and wanted to reduce bias, how
would you modify the algorithm? Are there any other factors the algorithm
should consider besides the similarity of the query to the retrieved document?
How should it detect and handle opinionated queries or queries with potential
for gender bias?

67 68
Privacy in IR Recap
• Personalization is an important topic in information retrieval; after all,
we'd like our search results to be relevant to us and our interests.
• Let’s google "marguerite". What is the first search result? Would you • Information Retrieval Challenges
expect another person - say, someone in USA- to get the same search
result?
• Retrieval models
- Think of other examples of personalization based on location, search and - Boolean
browsing history, or social media?
• What are potential benefits and risks of getting personalized searches? Is - TF-IDF
it okay that search engines are using our data to personalize our - BM25
searches? Or is there a limit to what kind of data should be okay for
search engines to use? • Evaluation metrics for retrieval tasks
• In 2009, the French government signed the "Charter of good practices on
the right to be forgotten on social networks and search engines."
- Do you think people should have the right to remove information about
themselves from the web (the right to be forgotten)?
- Do you think Google should be required to remove information about an
individual upon request?
69 70

References
• IR Chapters: 1, 2.1-2.2, 6.2-6.4.3, 8.1-8.5
• Bias in IR:
- https://www.theverge.com/2022/5/11/23064883/google-ai-skin-tone-measure-
monk-scale-inclusive-search-results
- https://blogs.bing.com/search-quality-insights/february-2018/toward-a-more-
intelligent-search-bing-multi-perspective-answers
- https://www.tandfonline.com/doi/abs/10.1080/00913367.1990.10673179
- https://journals.sagepub.com/doi/10.1177/002193479902900303
- https://psycnet-apa-org.stanford.idm.oclc.org/fulltext/2020-42793-001.html
- https://journals.sagepub.com/doi/abs/10.1177/1090198120957949

71

You might also like