0% found this document useful (0 votes)

3 views32 pages

Lecture - 7 MSDS

The document discusses various string similarity measures, focusing on representing strings as vectors using methods like Bag-of-Words, TF-IDF, and character-level vectors. It explains how to calculate term frequencies, normalize them, and evaluate the importance of words in documents. Additionally, it covers different distance measures such as Euclidean, Manhattan, and Cosine similarity for comparing strings or documents.

Uploaded by

Afroze Najam BSCS2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views32 pages

Lecture - 7 MSDS

Uploaded by

Afroze Najam BSCS2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

String

Similarity
Measures
Representing strings with vectors of words or
characters
• to be or not to be a bee is the question, said the
queen bee
Multiset Word frequencies Word frequencies
a Words Count Raw Relative
be a 1 Words Count Frequency
be be 2 a 1 0.07
bee bee 2 be 2 0.13
bee is 1 bee 2 0.13
is not 1 is 1 0.07
not or 1 not 1 0.07
or queen 1 or 1 0.07
queen question 1 queen 1 0.07
question said 1 question, 1 0.07
said the 2 said 1 0.07
the to 2 the 2 0.13
the to 2 0.13
to Total 15 1
to

• Bag-of-words or unigrams
More context?
• to be or not to be a bee is the question, said the
queen bee
Multiset Word pair frequencies Word pair frequencies
to be Word pairs Count Raw Relative
be or a bee 1 Word pairs Count Frequency
or not be a 1
a bee 1 0.07
not to be or 1
to be be a 1 0.07
bee is 1
be a is the 1 be or 1 0.07
a bee not to 1 bee is 1 0.07
bee is or not 1 is the 1 0.07
is the queen bee 1 not to 1 0.07
the question said 1 or not 1 0.07
question said the 1
question queen bee 1 0.07
the queen 1
said the question 1 question said 1 0.07
said the to be 2 said the 1 0.07
the queen the queen 1 0.07
queen bee the question 1 0.07
to be 2 0.14

• Bigrams Total 14 1
More context?
• to be or not to be a bee is the question, said the
queen bee
• Unigrams
• Bigrams
• Trigrams
• 4 grams and so on…
Character-level vectors
• to be or not to be a bee is the question, said the
queen bee Character pair frequencies
Character frequencies Character pairs Raw Count Relative Frequency
ab 1 0.02
Raw Relative ai 1 0.02
be 4 0.09
Characters Count Frequency dt 1 0.02
a 2 0.04 ea
ee
1
3
0.02
0.07
b 4 0.09 ei
en
1
1
0.02
0.02
d 1 0.02 eo 1 0.02
eq 2 0.05
e 11 0.24 es 1 0.02
he 2 0.05
h 2 0.04 id 1 0.02
io 1 0.02
i 3 0.07 is 1 0.02
nb 1 0.02
n 3 0.07 no 1 0.02
o 5 0.11 ns
ob
1
2
0.02
0.05
q 2 0.04 on
or
1
1
0.02
0.02
r 1 0.02 ot 1 0.02
qu 2 0.05
s 3 0.07 rn 1 0.02
sa 1 0.02
t 6 0.13 st 2 0.05
th 2 0.05
u 2 0.04 ti 1 0.02
to 2 0.05
Total 45 1 tt 1 0.02
ue 2 0.05
Total 44 1.00
Bag of Words
• A simplified representation
• Text (such as a sentence or a document) is
represented as the bag (multiset) of its words
– Disregarding grammar and even word order but keeping
multiplicity
• This representation only takes into account the frequency
of each word in the document, and not their position,
grammar or context.
Bag of Words
• To create a bag of words representation, the text is first
preprocessed by removing stop words (common words like
"the" and "a"), punctuation, and any other irrelevant
information.

• Then, the remaining words are counted and their

frequencies are stored in a vector.

• the: 2 quick: 1 brown: 1 fox: 1 jumps: 1 over: 1 lazy: 1 dog:

1
• [2,1,1,1,1,1,1,1,]
• Each word may also be represented as a one hot encoded
vector. 1 x V: Brown =[0 0 1 0 0 0 0 0]
Bag of Words
• Used in document classification where the
(frequency of) occurrence of each word is used as a
feature for training a classifier
• But some words are common in general which do
not tell a lot about a document (e.g. the, is, a)
• TF-IDF
Bag of Words
For example, consider the following sentence:

• "The quick brown fox jumps over the lazy dog.“

• The bag of words representation of this sentence

would be a vector with the following entries:
{"The": 1, "quick": 1, "brown": 1, "fox": 1, "jumps":
1, "over": 1, "the": 1, "lazy": 1, "dog": 1}.

• Note that the words "the" and "The" are treated

as different words, as they have different
capitalization.
Measures to normalize term-frequencies
• Raw frequency: The number of times that
term t occurs in document d,
– tf(t,d) = ft,d
– We need to remove bias towards long or short
documents
• Normalize by document length
• Relative term frequency i.e. tf adjusted for
document length:
– tf(t,d) = ft,d ÷ (number of words in d)
Other measures to normalize term-frequencies
• Next, we need ways to remove bias towards more
frequently occurring words.
– A word appearing 100 times in a document does not make it
a 100 times more likely representative of the document

• Boolean ”frequency": tf(t,d) = 1 if t occurs in d and 0

otherwise

• Logarithmically scaled frequency:

tf(t,d) = = log (1 + ft,d) if log (ft,d) > 0

Thus, terms which occur 10 times in a document would have a tf=1, 100 times in a
document tf=2, 1000 times tf=3,...
TF-IDF
https://en.wikipedia.org/wiki/Tf%E2%80%93idf

• To evaluate the importance of a word in a document.

• It takes into account not only the frequency of the word in the
document, but also the frequency of the word in the corpus
(i.e., the collection of all documents).

• TF = (number of times the word appears in the document) /

(total number of words in the document)

• Term frequency is often normalized or transformed in some

way to reduce the impact of common terms and increase the
weight of rare terms.

• TF-IDF = TF * IDF
TF-IDF
https://en.wikipedia.org/wiki/Tf%E2%80%93idf

• TF-IDF increases proportionally to the number of times a word

appears in the document,
• It is offset by the frequency of the word in the whole corpus of
documents
• Helps adjust for the fact that some words appear more frequently in
general.
• A word that appears frequently in a document but infrequently in
the corpus is likely to be more important to that document than a
word that appears frequently in both the document and the corpus
TF-IDF
https://en.wikipedia.org/wiki/Tf%E2%80%93idf

• Give a higher weight to words that occur only in a few documents

• Terms that are limited to a few documents are useful for
discriminating those documents from the rest of the collection; terms
that occur frequently across the entire collection aren’t as helpful.
• Because of the large number of documents in many collections, this
measure is usually squashed with a log function.
• Inverse document frequency, idf(t, d) is the logarithmically scaled
inverse fraction of the documents that contain the word

For example, if we have a corpus of 100 documents, and the word "apple" appears in 20 of
those documents, the IDF for "apple" would be: idf = log(100 / 20) = log(5) = 1.609
TF-IDF
https://en.wikipedia.org/wiki/Tf%E2%80%93idf
TF-IDF

Total plays: 37
IDF (Romeo)= log(37/1)= 1.57
TF-IDF

-
TF-IDF
https://en.wikipedia.org/wiki/Tf%E2%80%93idf
Several way to create vectors
• TF
– Levels: Character, word, phrase, We have already seen
calculation variations of this (raw, normalized, Boolean,
smoothed etc.)
• TF-IDF
• Word Embedding
Now that we have our vectors!
• How do we compare two strings or documents
– Convert each into a vector

– Calculate the distance between vectors

Similarity and distance
(https://www.cs.utah.edu/~jeffp/teaching/cs5955/L4-Jaccard+Shingle.pdf)

• A distance d(A, B) has the properties:

– it is small if objects A and B are close,
– it is large if they are far,
– it is (usually) 0 if they are the same, and
– it has value in [0, ∞].
• On the other hand, a similarity s(A, B) has the
properties:
– it is large if the objects A and B are close,
– it is small if they are far,
– it is (usually) 1 if they are the same, and
– it is in the range [0, 1].
• Often we can convert between the two as d(A, B) = 1 − s(A, B)
Several distance measures for vectors
• Euclidean distance
• Manhattan distance
• Chebyshev Distance
• Minkowski distance
• Cosine similarity
Euclidean Distance
• Good choice for numeric attributes
• When data is dense or continuous, this is a good proximity
measure
• The Pythagorean theorem gives this distance between two
points, p and q, each with a n-dimensional feature vector:

𝑑 𝑝, 𝑞 = 𝑑 𝑞, 𝑝 = (𝑝1 − 𝑞1 )2 +(𝑝2 − 𝑞2 )2 , … , (𝑝𝑛 − 𝑞𝑛 )2

𝑑 𝑝, 𝑞 = 𝑑 𝑞, 𝑝 = ෍(𝑝 𝑖 − 𝑞𝑖)2
𝑖=1

• Downside: Sensitive to extreme deviations in a single

attribute (as it squares differences)
Manhatten Distance
• The distance between two points is the sum of the absolute
differences of their Cartesian coordinates.
– It is the total sum of the difference between the x-coordinates
and y-coordinates.
• Also known as Manhattan length, rectilinear distance, L1
distance or L1 norm, city block distance, snake distance,
taxi-cab metric, or city block distance
𝑑 𝑝, 𝑞 = 𝑑 𝑞, 𝑝 = |𝑝1 − 𝑞1 | + |𝑝2 − 𝑞2 | , … , |𝑝𝑛 − 𝑞𝑛 |

𝑑 𝑝, 𝑞 = 𝑑 𝑞, 𝑝 = ෍ |𝑝𝑖 − 𝑞𝑖 |
𝑖=1
Minkowski Distance
• The Minkowski distance is a generalized metric form of
Euclidean distance and Manhattan distance
1/𝑎
𝑛

𝑑(𝑝, 𝑞) = ෍ |𝑝𝑖 − 𝑞𝑖 |𝑎
𝑖=1
• a = 1 is the Manhattan distance
• a = 2 is the Euclidean distance
Chebyshev Distance
Effect of Different Distance Measures in Result of Cluster Analysis, Sujan Dahal

• For Chebyshev distance, the distance between two vectors

is the greatest of their differences along any coordinate
dimension
• When two objects are to be defined as “different” if they
are different in any one dimension
• Also called chessboard distance, maximum metric, or L∞
metric
𝑑 𝑝, 𝑞 = 𝑚𝑎𝑥𝑖 𝑝𝑖 − 𝑞𝑖
Chebyshev Distance
Effect of Different Distance Measures in Result of Cluster Analysis, Sujan Dahal

• A= [70, 40]
• B= [330, 228]

• D(A,B)= max {|70-330|, |40-228|}

• Max {260, 188}
• 260
Angles between vectors
Cosine Similarity

• This raw dot-product, however, has a problem as a similarity metric: it favors vector length
long vectors.
• The simplest way to modify the dot product to normalize for the vector length is to divide
the dot product by the lengths of each of the two vectors.
• This normalized dot product turns out to be the same as the cosine of the angle between
the two
Cosine Similarity
• The cosine value ranges from 1 for vectors pointing in the
same direction, through 0 for vectors that are orthogonal,
to -1 for vectors pointing in opposite directions.
• But raw frequency values are non-negative, so the cosine
for these vectors ranges from 0–1.
Cosine Similarity

Cos (x,y)= x dot y/ || x || || y ||

X=[3,2,0,5], y[1,0,0,0]
x.y (dot)= 3x1 + 2x0 + 0X0 + 5x0
||x||= sqrt (3^2 +2^2 + 0^2 + 5^2)
All Distances
S1: to be or not to be a bee is the question, said the queen bee
S2: one needs to be strong in order to be a queen bee
sqrt((f1 -
Words # S1 # S2 f1 f2 |f1 - f2| Max |f1-f2| f1 . f2 |f1| |f2| Cosine Simi
f2)^2)

a 1 1 0.0667 0.0833 0.0003 0.0167 0.0167 0.0056 0.0044 0.0069

be 2 2 0.1333 0.1667 0.0011 0.0333 0.0333 0.0222 0.0178 0.0278
bee 2 1 0.1333 0.0833 0.0025 0.0500 0.0500 0.0111 0.0178 0.0069
in 0 1 0.0000 0.0833 0.0069 0.0833 0.0833 0.0000 0.0000 0.0069
is 1 0 0.0667 0.0000 0.0044 0.0667 0.0667 0.0000 0.0044 0.0000
needs 0 1 0.0000 0.0833 0.0069 0.0833 0.0833 0.0000 0.0000 0.0069
not 1 0 0.0667 0.0000 0.0044 0.0667 0.0667 0.0000 0.0044 0.0000
one 0 1 0.0000 0.0833 0.0069 0.0833 0.0833 0.0000 0.0000 0.0069
or 1 0 0.0667 0.0000 0.0044 0.0667 0.0667 0.0000 0.0044 0.0000
order 0 1 0.0000 0.0833 0.0069 0.0833 0.0833 0.0000 0.0000 0.0069
queen 1 1 0.0667 0.0833 0.0003 0.0167 0.0167 0.0056 0.0044 0.0069
question 1 0 0.0667 0.0000 0.0044 0.0667 0.0667 0.0000 0.0044 0.0000
said 1 0 0.0667 0.0000 0.0044 0.0667 0.0667 0.0000 0.0044 0.0000
strong 0 1 0.0000 0.0833 0.0069 0.0833 0.0833 0.0000 0.0000 0.0069
the 2 0 0.1333 0.0000 0.0178 0.1333 0.1333 0.0000 0.0178 0.0000
to 2 2 0.1333 0.1667 0.0011 0.0333 0.0333 0.0222 0.0178 0.0278
Total 15 12 1 1 0.2828 1.0333 0.1333 0.0667 0.3197 0.3333 0.6255
Cosine
Euclidean Manhattan Chebyshev Dot product Similarity

Vector Semantics - NLP
No ratings yet
Vector Semantics - NLP
118 pages
Term Frequency and Inverse Document Frequency
No ratings yet
Term Frequency and Inverse Document Frequency
26 pages
TF Idf
No ratings yet
TF Idf
4 pages
2.3 Inertia
0% (1)
2.3 Inertia
20 pages
NLP m3
No ratings yet
NLP m3
111 pages
Lecture 5 - Language Representation Tf-Idf
No ratings yet
Lecture 5 - Language Representation Tf-Idf
51 pages
Transformer Magnetizing Inrush Currents and Influence On System Operation
No ratings yet
Transformer Magnetizing Inrush Currents and Influence On System Operation
9 pages
Term Weighting and Similarity Measures
50% (2)
Term Weighting and Similarity Measures
54 pages
Lecture - 7 PPMI
No ratings yet
Lecture - 7 PPMI
37 pages
Term Weighting 2021
100% (2)
Term Weighting 2021
38 pages
Wordembedding
No ratings yet
Wordembedding
25 pages
Week 5
No ratings yet
Week 5
64 pages
Reference Material For NLP - 1
No ratings yet
Reference Material For NLP - 1
40 pages
Data Mining: Characterization: Jimma University, Faculty of Computing Arranged By: Dessalegn Y
No ratings yet
Data Mining: Characterization: Jimma University, Faculty of Computing Arranged By: Dessalegn Y
79 pages
3 Termweighting
No ratings yet
3 Termweighting
34 pages
IR Lecture 4b
No ratings yet
IR Lecture 4b
57 pages
ISR Chap..3
No ratings yet
ISR Chap..3
26 pages
3 termWeightingIR
No ratings yet
3 termWeightingIR
32 pages
Module 5 Document Clustering
No ratings yet
Module 5 Document Clustering
33 pages
Feature Engineering
100% (2)
Feature Engineering
44 pages
Chapter Three Term Weighting and Similarity Measures
No ratings yet
Chapter Three Term Weighting and Similarity Measures
33 pages
Clustering Part4
No ratings yet
Clustering Part4
79 pages
IR Chapter 2 Part II
No ratings yet
IR Chapter 2 Part II
45 pages
Chapter 6
No ratings yet
Chapter 6
55 pages
Unit 2a
No ratings yet
Unit 2a
51 pages
Lecture 10
No ratings yet
Lecture 10
18 pages
Chapter 3 IR
No ratings yet
Chapter 3 IR
34 pages
Non Numeric Clustering Seminar
No ratings yet
Non Numeric Clustering Seminar
26 pages
TE IT DMBI Module2 Data Preprocessing L8-L11
No ratings yet
TE IT DMBI Module2 Data Preprocessing L8-L11
73 pages
Chapter 3 Term Weighting
No ratings yet
Chapter 3 Term Weighting
11 pages
IR Lecture 4b
No ratings yet
IR Lecture 4b
57 pages
3 Term Weighting
No ratings yet
3 Term Weighting
34 pages
0620 Learner Guide For Examination From 2023
No ratings yet
0620 Learner Guide For Examination From 2023
51 pages
2 Termweighting
No ratings yet
2 Termweighting
38 pages
5.2 Feature Engineering
No ratings yet
5.2 Feature Engineering
57 pages
Lecture 3
No ratings yet
Lecture 3
58 pages
Vector Semantics 3
No ratings yet
Vector Semantics 3
5 pages
Similarity Measures Le 512
No ratings yet
Similarity Measures Le 512
14 pages
Computational Journalism 2016 Week 2: Text Analysis
No ratings yet
Computational Journalism 2016 Week 2: Text Analysis
68 pages
Session 4 Text Feature
No ratings yet
Session 4 Text Feature
40 pages
Lect 04
No ratings yet
Lect 04
44 pages
AI6122 Topic 3.2 - Ranking
No ratings yet
AI6122 Topic 3.2 - Ranking
27 pages
Chapter-3 Termweighting
No ratings yet
Chapter-3 Termweighting
17 pages
3 Termweighting
No ratings yet
3 Termweighting
34 pages
03 EGD Dimensioning Techniques
No ratings yet
03 EGD Dimensioning Techniques
50 pages
Module 3 Indexing Part A
No ratings yet
Module 3 Indexing Part A
46 pages
Question Bank (Problems)
No ratings yet
Question Bank (Problems)
6 pages
Data Mining: Similarity and Distance Recommendation Systems Sketching, Locality Sensitive Hashing
No ratings yet
Data Mining: Similarity and Distance Recommendation Systems Sketching, Locality Sensitive Hashing
57 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
27 pages
Tkde 2014 26 7
No ratings yet
Tkde 2014 26 7
17 pages
Assignment No 1 (Data Science) - Ashber
No ratings yet
Assignment No 1 (Data Science) - Ashber
9 pages
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
No ratings yet
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
48 pages
Similarity Analysis
No ratings yet
Similarity Analysis
85 pages
TF Idf
100% (3)
TF Idf
38 pages
Non Singular L.T
No ratings yet
Non Singular L.T
16 pages
Term Weighting and Similarity Measures
No ratings yet
Term Weighting and Similarity Measures
35 pages
DeekshikaJadyada26 AP24LDS11
No ratings yet
DeekshikaJadyada26 AP24LDS11
7 pages
Tom-I Question Bank Updated
No ratings yet
Tom-I Question Bank Updated
174 pages
TCB - (ECP) Cold Formed C - Z1
No ratings yet
TCB - (ECP) Cold Formed C - Z1
8 pages
NLP-Neuro Linguistic Programming: What Is A Corpus?
No ratings yet
NLP-Neuro Linguistic Programming: What Is A Corpus?
3 pages
Text Similarity Cosine BOW TF-IDF Lecture
No ratings yet
Text Similarity Cosine BOW TF-IDF Lecture
6 pages
Answers To Saqs: Cambridge International As Level Biology
No ratings yet
Answers To Saqs: Cambridge International As Level Biology
1 page
Similarity Measures
No ratings yet
Similarity Measures
11 pages
Tf-Idf: David Kauchak cs160 Fall 2009
No ratings yet
Tf-Idf: David Kauchak cs160 Fall 2009
51 pages
Chapter Three Term Weighting and Similarity Measures
No ratings yet
Chapter Three Term Weighting and Similarity Measures
25 pages
An Open Source CFD-DeM Perspective
No ratings yet
An Open Source CFD-DeM Perspective
11 pages
Friedel Crafts Reactions
No ratings yet
Friedel Crafts Reactions
6 pages
Neet Question Paper
No ratings yet
Neet Question Paper
22 pages
Subject: Machine Design Presentation On: Spur Gear Design Mechanical 7 - B2
No ratings yet
Subject: Machine Design Presentation On: Spur Gear Design Mechanical 7 - B2
25 pages
Unit 2
No ratings yet
Unit 2
14 pages
Lecture Notes For Algorithms For Data Science: 1 Nearest Neighbors
No ratings yet
Lecture Notes For Algorithms For Data Science: 1 Nearest Neighbors
3 pages
Midterm 04 PDF
No ratings yet
Midterm 04 PDF
3 pages
Theoretical Models of Single Droplet Drying Kinetics: A Review
No ratings yet
Theoretical Models of Single Droplet Drying Kinetics: A Review
18 pages
Physics (SPA) & Physics Lab (SPAL) Syllabus: Instructor
No ratings yet
Physics (SPA) & Physics Lab (SPAL) Syllabus: Instructor
3 pages
Non Reacting Mixtures
No ratings yet
Non Reacting Mixtures
6 pages
1032 1st Exam 1040325 A PDF
No ratings yet
1032 1st Exam 1040325 A PDF
8 pages
Maraging Steels - Making Steel Strong and Cheap
No ratings yet
Maraging Steels - Making Steel Strong and Cheap
3 pages
Wang Et Al 2020 Influence of The Substrate On The Optical and Photo Electrochemical Properties of Monolayer Mos2
No ratings yet
Wang Et Al 2020 Influence of The Substrate On The Optical and Photo Electrochemical Properties of Monolayer Mos2
9 pages
Jee Main 2025 Dpyq Test Series Part Test 3 SHM, Waves and Gravitation
No ratings yet
Jee Main 2025 Dpyq Test Series Part Test 3 SHM, Waves and Gravitation
4 pages
Basic Concept
No ratings yet
Basic Concept
12 pages
1970 A Survey of Clear-Air Propagation Effects Relevant To Optical Communications
No ratings yet
1970 A Survey of Clear-Air Propagation Effects Relevant To Optical Communications
23 pages
Lecture 1 General Review in Electrostatic & Gauss's Law
No ratings yet
Lecture 1 General Review in Electrostatic & Gauss's Law
13 pages
Algebras and Their Covariant Representations in Quantum Gravity
No ratings yet
Algebras and Their Covariant Representations in Quantum Gravity
21 pages
Numericals2 70150660
No ratings yet
Numericals2 70150660
13 pages
PPFD To Watts Conversion
No ratings yet
PPFD To Watts Conversion
1 page
Ladder Operators and The Quantum Harmonic Oscillator
No ratings yet
Ladder Operators and The Quantum Harmonic Oscillator
4 pages
Phy Full Prelim 2
No ratings yet
Phy Full Prelim 2
4 pages

Lecture - 7 MSDS

Uploaded by

Lecture - 7 MSDS

Uploaded by

String

• Then, the remaining words are counted and their

• the: 2 quick: 1 brown: 1 fox: 1 jumps: 1 over: 1 lazy: 1 dog:

• "The quick brown fox jumps over the lazy dog.“

• The bag of words representation of this sentence

• Note that the words "the" and "The" are treated

• Boolean ”frequency": tf(t,d) = 1 if t occurs in d and 0

• Logarithmically scaled frequency:

tf(t,d) = = log (1 + ft,d) if log (ft,d) > 0

• To evaluate the importance of a word in a document.

• TF = (number of times the word appears in the document) /

• Term frequency is often normalized or transformed in some

• TF-IDF increases proportionally to the number of times a word

• Give a higher weight to words that occur only in a few documents

– Calculate the distance between vectors

• A distance d(A, B) has the properties:

𝑑 𝑝, 𝑞 = 𝑑 𝑞, 𝑝 = (𝑝1 − 𝑞1 )2 +(𝑝2 − 𝑞2 )2 , … , (𝑝𝑛 − 𝑞𝑛 )2

• Downside: Sensitive to extreme deviations in a single

• For Chebyshev distance, the distance between two vectors

• D(A,B)= max {|70-330|, |40-228|}

Cos (x,y)= x dot y/ || x || || y ||

a 1 1 0.0667 0.0833 0.0003 0.0167 0.0167 0.0056 0.0044 0.0069

You might also like