7 Query Authentication
7 Query Authentication
Solution:
When the service provider answers a query, it returns both
the query result and a proof of the result’s integrity
i.e., the proof should let the user verify whether the result
is correct and whether it is from the most updated
database
Query Authentication: Motivation
Database D
decrypt encrypt
Problems:
It does not prevent the service provider from dropping any
tuples
Each tuple needs to be stored twice: encrypted and
unencrypted
We need something better…
Towards An Improved Solution
Let’s focus on a specific type of queries:
equality query on one attribute
i.e., SELECT * FROM T
WHERE T.A = X
e.g., SELECT * FROM Employees
WHERE Age = 30
How can we outsource this type of queries
and prevent the service provider from faking
or dropping results?
Towards An Improved Solution
Age = 20 t 1, t 2, t 3, …
Encryption
Age = 21 t10, t11, t12, …
Encryption
Age = 21 t10, t11, t12, …
Encryption
Age = 21
Encryption
Age = 21
Encryption
Age = 21
Encryption
Age = 21
Encryption
Age = 21
Encryption
Age = 21
Encryption
Age = 21
Encryption
Age = 21
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Suppose that we are to build a Merkle tree to support
range queries on Salary
First, sort all tuples by Salary
Second, build a binary tree on the sorted sequence
Merkle Tree: Example
v7
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Third, materialize the non-leaf nodes in a bottom up manner,
using a cryptographic hash function h
For each non-leaf v, its content equals h(vleft) ǁ h(vright), where vleft and
vright are v’s left and right children, respectively, and ǁ denotes
concatenation
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Finally, let the data owner encrypts h(root) using her private key
sk
In the above example, the root is v7
Then, the data owner sends the encrypted digest to the service
provider
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Intuition:
The encrypted h(v7) ensures that the service
provider cannot make any change to the sorted
sequence t1, t2, …, t8
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Why?
Since h(v7) is signed by the data owner, the service provider will get caught if he
changes v7
Since v7 cannot be changed, the service provider will get caught if he changes v5 or v6
Since v5 and v6 cannot be changed, the service provider will get caught if he changes
v1, v2, v3, or v4, and so on…
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
In other words, the service provider can answer any query as follows
Return the sorted sequence t1, t2, t3, …, t8, along with the signed h(v7)
Ask the user to verify the correctness of the sorted sequence, and then answer the query
herself using the sorted sequence
Problem: this approach returns too many irrelevant tuples
Solution: return some hash values instead of tuples
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
t1 t2 t3 t4
Salary = 1k 2k 3k 4k
Consider the above Merkle tree
Re-consider the query on “Salary = 3.5k”
Option 1:
The service provider could return t1, t2, t3, t4, as well as the encrypted h(v5)
The user could then compute h(t1), h(t2), h(t3), h(t4)
Based on that, she computes h(v1) and h(v2)
Then she can compute the hash of h(v1) ǁ h(v2) and verify it against the encrypted h(v5)
Then she can be sure that the data has only t1, t2, t3, and t4; so no “Salary = 3.5k”
Merkle Tree: Example h(v5)
v5
h(v1) ǁ h(v2) Owner
v1 v2
h(t1) ǁ h(t2) h(t3) ǁ h(t4)
t1 t2 t3 t4
Salary = 1k 2k 3k 4k
Option 1: The user computes
h(t1), h(t2), h(t3), h(t4),
and then h(v1) ǁ h(v2)
and then verify it again h(v5)
Question: does the user really need t1 and t2?
No; She only needs h(v1)
That is, given t3, t4, and h(v1), the user can already verify the query result
against h(v5)
Merkle Tree: Example h(v5)
v5
h(v1) ǁ h(v2) Owner
v1 v2
h(t1) ǁ h(t2) h(t3) ǁ h(t4)
t1 t2 t3 t4
Salary = 1k 2k 3k 4k
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Now reconsider the query on “Salary = 3.5k”
The service provider could return the following:
h(v1), t3, t4, h(v6), and the encrypted h(v7)
Any more replacement?
No; we definitely need t3, t4, and the encrypted h(v7)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Now consider a query on “Salary > 2.5k and Salary < 3.5k”
The service provider could return the following:
t1, t2, t3, t4, t5, t6, t7, t8, and the encrypted h(v7)
Any replacement possible?
t5, t6, t7, t8 could be replaced by h(v6)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Now consider a query on “Salary > 2.5k and Salary < 3.5k”
The service provider could return the following:
t1, t2, t3, t4, h(v6), and the encrypted h(v7)
Could we replace t1, t2 with h(v1)?
No; otherwise, the user cannot verify whether the service provider has hidden a
tuple with Salary = 2.6k
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Now consider a query on “Salary > 2.5k and Salary < 3.5k”
The service provider could return the following:
t1, t2, t3, t4, h(v6), and the encrypted h(v7)
Could we replace t1 with h(t1)?
This is OK
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Now consider a query on “Salary > 2.5k and Salary < 3.5k”
The service provider could return the following:
h(t1), t2, t3, t4, h(v6), and the encrypted h(v7)
Any more replacement?
No; we definitely need t2, t3, t4 to prove correctness
Merkle Tree: General Algorithm
Consider a query on T.A in [x, y]
Among the tuples t with t.A < x, find the tuple tx whose A value is
the largest
Identify the path from tx to the root of the Merkel tree
For every “left branch” on the path, collect the hash value of the branch
Among the tuples t with t.A > y, find the tuple ty whose A value is
the smallest
Identify the path from ty to the root of the Merkel tree
For every “right branch” on the path, collect the hash value of the
branch
Return tx, ty, and all tuples between them, and all hash values
collected, as well as the encrypted Merkle root
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Consider a query on “Salary in [5.5k, 6.5k]”
i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
“Among the tuples t with t.A < x, find the tuple tx whose A value
is the largest”
tx is t5
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Consider a query on “Salary in [5.5k, 6.5k]”
i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
“Among the tuples t with t.A < x, find the tuple tx whose A value
is the largest”
tx is t5
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Consider a query on “Salary in [5.5k, 6.5k]”
i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
“Identify the path from tx to the root of the Merkel tree”
The path from t5 to v7
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Consider a query on “Salary in [5.5k, 6.5k]”
i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
“Identify the path from tx to the root of the Merkel tree”
The path from t5 to v7
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Consider a query on “Salary in [5.5k, 6.5k]”
i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
“For every left branch on the path, collect the hash value of the
branch”
Collected hash: h(v5)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Consider a query on “Salary in [5.5k, 6.5k]”
i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
“For every left branch on the path, collect the hash value of the
branch”
Collected hash: h(v5)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Consider a query on “Salary in [5.5k, 6.5k]”
i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
“Among the tuples t with t.A > y, find the tuple ty whose A value
is the smallest”
ty is t7
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Consider a query on “Salary in [5.5k, 6.5k]”
i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
“Among the tuples t with t.A > y, find the tuple ty whose A value
is the smallest”
ty is t7
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Consider a query on “Salary in [5.5k, 6.5k]”
i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
“Identify the path from ty to the root of the Merkel tree”
the path from t7 to v7
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Consider a query on “Salary in [5.5k, 6.5k]”
i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
“Identify the path from ty to the root of the Merkel tree”
the path from t7 to v7
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Consider a query on “Salary in [5.5k, 6.5k]”
i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
“For every right branch on the path, collect the hash value of
the branch”
Collected hashes: h(t8)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Consider a query on “Salary in [5.5k, 6.5k]”
i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
“For every right branch on the path, collect the hash value of
the branch”
Collected hashes: h(t8)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Consider a query on “Salary in [5.5k, 6.5k]”
i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
“Return tx, ty, and all tuples between them, and all hash values
collected, as well as the encrypted Merkle root”
i.e., t5, t7, and t6, and h(v5) and h(t8), and the encrypted h(v7)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Consider a query on “Salary in [5.5k, 6.5k]”
i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
“Return tx, ty, and all tuples between them, and all hash values
collected, as well as the encrypted Merkle root”
i.e., t5, t7, and t6, and h(v5) and h(t8), and the encrypted h(v7)
Merkle Tree: Exercise h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
What if the tuples are updated?
E.g., the tuple with Salary = 6k is deleted
Option 1:
The data owner reconstructs the binary tree, re-computes h(root), signs it with a
timestamp, and sends it to the service provider
The data owner announces the timestamp to users
Merkle Tree: Updates h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)
t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
Problem with Option 1:
Reconstructing the whole binary tree once per update is time-consuming
Improved solution:
Use an update-friendly tree structure, e.g., a red-black tree
Any tree structure could be signed by the data owner like a Merkle tree
Extension to Multi-Dimensional Data