0% found this document useful (0 votes)
8 views

7 Query Authentication

Uploaded by

pohkangyu
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

7 Query Authentication

Uploaded by

pohkangyu
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 66

CS5322 Database Security

Query Authentication: Motivation


Database D

Users Service Owner


Provider

 A data owner passes a database D to a service


provider
 The service provider answers queries from users
 This setting is referred to as database
outsourcing
Query Authentication: Motivation
Database D

Users Service Owner


Provider
 This was the same scenario that we considered in our
discussions of encrypted databases
 But now, let's assume that we do NOT aim to ensure data
confidentiality against the service provider
 Reason: encrypted databases incurs a lot of overheads
 Instead, we want to ensure that the service provider answers
queries correctly
Query Authentication: Motivation
Database D

Users Service Owner


Provider
 "What do you mean by 'answering queries correctly'?"
 Possible ways to answer queries incorrectly:
 Return only a subset of the tuples in the query result (e.g., by
stopping the query early)
 Return some fake tuples
 Return the query result previously obtained from an outdated
version of the database
Query Authentication: Motivation
Database D

Users Service Owner


Provider
 "Why would the service provider do that? "
 Possible reasons:
 The service provider is overloaded with a lot of queries on a lot of
databases, and does not have enough resource to process all queries in
time
 The service provider wants to compromise the query result for
malicious purposes
 …
Query Authentication: Motivation
Database D

Users results Service Owner


+ Provider
proofs of correctness

 Solution:
 When the service provider answers a query, it returns both
the query result and a proof of the result’s integrity
 i.e., the proof should let the user verify whether the result
is correct and whether it is from the most updated
database
Query Authentication: Motivation
Database D

Users results Service Owner


+ Provider
proofs of correctness

 "I don't think service providers would really do that in


practice."
 Amazon has done something based on a similar idea
 https://aws.amazon.com/qldb/
 Microsoft also proposed something along a similar line
 https://dl.acm.org/doi/10.1145/3035918.3064030
Query Authentication: Motivation
Database D

Users results Service Owner


+ Provider
proofs of correctness

 "OK. But how?"


 Use public-key cryptography
Public-Key Cryptography
encrypt decrypt

Others public key private key


Owner

decrypt encrypt

 The data owner has two keys


 A private key sk, which she keeps to herself
 A public key pk, which she distributes to the public
 Messages encrypted by pk can only be decrypted by sk
 So anyone can use pk to send secret messages to the owner
 Messages that can be decrypted by pk must have been encrypted by sk
 So the owner can send authenticated messages to others
Naïve Solution
Database D

Users results Service Owner


+ Provider
proofs of correctness

 Let the owner sign each tuple in the database, i.e.,


 She encrypts each tuple with her private key sk
 She sends both the encrypted version and the original to the service
provider
 Whenever the service provider answers a query by returning
some tuples, he also returns the encrypted versions
Naïve Solution
Database D

Users results Service Owner


+ Provider
proofs of correctness
 Whenever the service provider answers a query by returning
some tuples, he also returns the encrypted versions
 For each tuple t and its encrypted version t*, the user can
decrypt t* using the owner’s public key, and see if the
decrypted tuple is the same as t
 This prevents the service provider from modifying or faking any
tuple
Naïve Solution
Database D

Users results Service Owner


+ Provider
proofs of correctness

 Problems:
 It does not prevent the service provider from dropping any
tuples
 Each tuple needs to be stored twice: encrypted and
unencrypted
 We need something better…
Towards An Improved Solution
 Let’s focus on a specific type of queries:
equality query on one attribute
 i.e., SELECT * FROM T
WHERE T.A = X
 e.g., SELECT * FROM Employees
WHERE Age = 30
 How can we outsource this type of queries
and prevent the service provider from faking
or dropping results?
Towards An Improved Solution
Age = 20 t 1, t 2, t 3, …

Encryption
Age = 21 t10, t11, t12, …

Owner Database D Age = 22 t20, t21, t22, … Service


Provider
… …
 Idea: We let the data owner
 Divide the tuples into groups according to their
Age values
 Encrypt each group separately using her private
key sk
 Send the encrypted groups to the service provider
Towards An Improved Solution
Age = 20 t 1, t 2, t 3, …

Encryption
Age = 21 t10, t11, t12, …

Owner Database D Age = 22 t20, t21, t22, … Service


Provider
… …
 Suppose that the user issues the following query:
 SELECT * FROM Employee WHERE Age = 21
 The service provider just returns the encrypted group of tuples for
Age = 21
 The user then decrypts the tuple group using the data owner’s
public key pk
 If the tuples have Age = 21, then it is guaranteed that there is no fake or
missing tuples
Towards An Improved Solution
Age = 20 t 1, t 2, t 3, …

Encryption
Age = 21

Owner Database D Age = 22 t20, t21, t22, … Service


Provider
… …
 Question:
 Suppose that there is no employee with Age = 21
 What does the data owner do?
 Option 1:
 Just omit the Age = 21 group
 Only create encrypted groups for those age values that have at least one
tuple
 Does this work?
Towards An Improved Solution
Age = 20 t 1, t 2, t 3, …

Encryption
Age = 21

Owner Database D Age = 22 t20, t21, t22, … Service


Provider
… …
 Option 1:
 Only create encrypted groups for those age values that have at least
one tuple
 Does this work?
 No
 If we do this, then the service provider can drop any tuple group from
the query result, by pretending that there is no tuple with that age
value
Towards An Improved Solution
Age = 20 t 1, t 2, t 3, …

Encryption
Age = 21

Owner Database D Age = 22 t20, t21, t22, … Service


Provider
… …
 Option 2:
 If Age = 21 does not have any tuple, then the data owner
encrypts a message “Age = 21 is empty” using her private key
 Do this for every empty Age group
 If a user asks for an Age group that is empty, just return
the corresponding encrypted message
Towards An Improved Solution
Age = 20 t 1, t 2, t 3, …

Encryption
Age = 21

Owner Database D Age = 22 t20, t21, t22, … Service


Provider
… …
 Option 2: Create an encrypted message for each
empty group
 Problem: There could be too many empty groups
 Suppose that the attribute to be queried is Salary
 Creating an encrypted group or message for each possible
Salary value is rather expensive in terms of time and space
Towards An Improved Solution
Age = 20 t 1, t 2, t 3, …

Encryption
Age = 21

Owner Database D Age = 22 t20, t21, t22, … Service


Provider
… …
 Option 3:
 For each non-empty group, we not only encrypt the tuples, but also create
an encrypted message to indicate which is the next non-empty group
 E.g., in the above example, we not only encrypt the tuple group with Age =
20, but also create an encrypted message: “After Age = 20, the next non-
empty group is Age = 22”
 The total number of encrypt messages equals the total number of non-
empty groups
Towards An Improved Solution
Age = 20 t 1, t 2, t 3, …

Encryption
Age = 21

Owner Database D Age = 22 t20, t21, t22, … Service


Provider
… …
 If the user queries a non-empty group, then returns the
corresponding encrypted group
 If the user queries an empty group, return the
encrypted message that can prove the empty results
 E.g., if the user enquires “Age = 21”, then return the encrypted
message “After Age = 20, the next non-empty group is Age =
22”
Towards An Improved Solution
Age = 20 t 1, t 2, t 3, …

Encryption
Age = 21

Owner Database D Age = 22 t20, t21, t22, … Service


Provider
… …
 Any problem?
 There could be too many non-empty groups to encrypt
 Think about Salary
 Only works well for equality query
 Does not work well for range queries, e.g., “Salary > 100k
and Salary < 200k”
Towards An Improved Solution
Age = 20 t 1, t 2, t 3, …

Encryption
Age = 21

Owner Database D Age = 22 t20, t21, t22, … Service


Provider
… …
 We will introduce a solution that only require the
data owner to create one encrypted “message”
 No need to encrypt any tuple at all
 And it can support range queries
 It is called the Merkle tree
Merkle Tree

 Invented by Ralph Merkle in 1979


 A tree structure that allows efficient
verification of its content
 Key ingredient: a cryptographic hash function
Cryptographic Hash Function

data cryptographic digest


hash function
 A function that maps data of arbitrary size to a bit string of a fixed
size, with the following properties:
 It is deterministic (so the same message always results in the same hash)
 It is efficient to compute the hash value for any given message
 It is infeasible to generate a message from its hash value except by trying
all possible messages
 A small change to a message should change the hash value so extensively
that the new hash value appears uncorrelated with the old hash value
 It is infeasible to find two different messages with the same hash value
Merkle Tree: Example

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Suppose that we are to build a Merkle tree to support
range queries on Salary
 First, sort all tuples by Salary
 Second, build a binary tree on the sorted sequence
Merkle Tree: Example
v7
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Third, materialize the non-leaf nodes in a bottom up manner,
using a cryptographic hash function h
 For each non-leaf v, its content equals h(vleft) ǁ h(vright), where vleft and
vright are v’s left and right children, respectively, and ǁ denotes
concatenation
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Finally, let the data owner encrypts h(root) using her private key
sk
 In the above example, the root is v7
 Then, the data owner sends the encrypted digest to the service
provider
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k

 Intuition:
 The encrypted h(v7) ensures that the service
provider cannot make any change to the sorted
sequence t1, t2, …, t8
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Why?
 Since h(v7) is signed by the data owner, the service provider will get caught if he
changes v7
 Since v7 cannot be changed, the service provider will get caught if he changes v5 or v6
 Since v5 and v6 cannot be changed, the service provider will get caught if he changes
v1, v2, v3, or v4, and so on…
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 In other words, the service provider can answer any query as follows
 Return the sorted sequence t1, t2, t3, …, t8, along with the signed h(v7)
 Ask the user to verify the correctness of the sorted sequence, and then answer the query
herself using the sorted sequence
 Problem: this approach returns too many irrelevant tuples
 Solution: return some hash values instead of tuples
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k

 Suppose that a user searches for “Salary = 3.5k”


 The service provider would return the following
 t3, t4, h(v1), h(v6), and the encrypted h(v7)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k

 Suppose that a user searches for “Salary = 3.5k”


 The service provider would return the following
 t3, t4, h(v1), h(v6), and the encrypted h(v7)
 Why? We will explain using a simpler tree
Merkle Tree: Example h(v5)
v5
h(v1) ǁ h(v2) Owner
v1 v2
h(t1) ǁ h(t2) h(t3) ǁ h(t4)

t1 t2 t3 t4
Salary = 1k 2k 3k 4k
 Consider the above Merkle tree
 Re-consider the query on “Salary = 3.5k”
 Option 1:
 The service provider could return t1, t2, t3, t4, as well as the encrypted h(v5)
 The user could then compute h(t1), h(t2), h(t3), h(t4)
 Based on that, she computes h(v1) and h(v2)
 Then she can compute the hash of h(v1) ǁ h(v2) and verify it against the encrypted h(v5)
 Then she can be sure that the data has only t1, t2, t3, and t4; so no “Salary = 3.5k”
Merkle Tree: Example h(v5)
v5
h(v1) ǁ h(v2) Owner
v1 v2
h(t1) ǁ h(t2) h(t3) ǁ h(t4)

t1 t2 t3 t4
Salary = 1k 2k 3k 4k
 Option 1: The user computes
 h(t1), h(t2), h(t3), h(t4),
 and then h(v1) ǁ h(v2)
 and then verify it again h(v5)
 Question: does the user really need t1 and t2?
 No; She only needs h(v1)
 That is, given t3, t4, and h(v1), the user can already verify the query result
against h(v5)
Merkle Tree: Example h(v5)
v5
h(v1) ǁ h(v2) Owner
v1 v2
h(t1) ǁ h(t2) h(t3) ǁ h(t4)

t1 t2 t3 t4
Salary = 1k 2k 3k 4k

 So the service provider could simply return t3,


t4, and h(v1), as well as the encrypted h(v5)
 This is sufficient for the user to verify the answer
for “Salary = 3.5k”
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k

 Now reconsider the query on “Salary = 3.5k”


 The service provider could return the following:
 t1, t2, t3, t4, t5, t6, t7, t8, and the encrypted h(v7)
 But t5, t6, t7, t8 could be replaced by h(v6)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k

 Now reconsider the query on “Salary = 3.5k”


 The service provider could return the following:
 t1, t2, t3, t4, h(v6), and the encrypted h(v7)
 But t1, t2 could be replaced by h(v1)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Now reconsider the query on “Salary = 3.5k”
 The service provider could return the following:
 h(v1), t3, t4, h(v6), and the encrypted h(v7)
 Any more replacement?
 No; we definitely need t3, t4, and the encrypted h(v7)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k

 Now reconsider the query on “Salary = 3.5k”


 Final answer: The service provider returns the
following:
 h(v1), t3, t4, h(v6), and the encrypted h(v7)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Now consider a query on “Salary > 2.5k and Salary < 3.5k”
 The service provider could return the following:
 t1, t2, t3, t4, t5, t6, t7, t8, and the encrypted h(v7)
 Any replacement possible?
 t5, t6, t7, t8 could be replaced by h(v6)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Now consider a query on “Salary > 2.5k and Salary < 3.5k”
 The service provider could return the following:
 t1, t2, t3, t4, h(v6), and the encrypted h(v7)
 Could we replace t1, t2 with h(v1)?
 No; otherwise, the user cannot verify whether the service provider has hidden a
tuple with Salary = 2.6k
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Now consider a query on “Salary > 2.5k and Salary < 3.5k”
 The service provider could return the following:
 t1, t2, t3, t4, h(v6), and the encrypted h(v7)
 Could we replace t1 with h(t1)?
 This is OK
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Now consider a query on “Salary > 2.5k and Salary < 3.5k”
 The service provider could return the following:
 h(t1), t2, t3, t4, h(v6), and the encrypted h(v7)
 Any more replacement?
 No; we definitely need t2, t3, t4 to prove correctness
Merkle Tree: General Algorithm
 Consider a query on T.A in [x, y]
 Among the tuples t with t.A < x, find the tuple tx whose A value is
the largest
 Identify the path from tx to the root of the Merkel tree
 For every “left branch” on the path, collect the hash value of the branch
 Among the tuples t with t.A > y, find the tuple ty whose A value is
the smallest
 Identify the path from ty to the root of the Merkel tree
 For every “right branch” on the path, collect the hash value of the
branch
 Return tx, ty, and all tuples between them, and all hash values
collected, as well as the encrypted Merkle root
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Consider a query on “Salary in [5.5k, 6.5k]”
 i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
 “Among the tuples t with t.A < x, find the tuple tx whose A value
is the largest”
 tx is t5
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Consider a query on “Salary in [5.5k, 6.5k]”
 i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
 “Among the tuples t with t.A < x, find the tuple tx whose A value
is the largest”
 tx is t5
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Consider a query on “Salary in [5.5k, 6.5k]”
 i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
 “Identify the path from tx to the root of the Merkel tree”
 The path from t5 to v7
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Consider a query on “Salary in [5.5k, 6.5k]”
 i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
 “Identify the path from tx to the root of the Merkel tree”
 The path from t5 to v7
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Consider a query on “Salary in [5.5k, 6.5k]”
 i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
 “For every left branch on the path, collect the hash value of the
branch”
 Collected hash: h(v5)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Consider a query on “Salary in [5.5k, 6.5k]”
 i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
 “For every left branch on the path, collect the hash value of the
branch”
 Collected hash: h(v5)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Consider a query on “Salary in [5.5k, 6.5k]”
 i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
 “Among the tuples t with t.A > y, find the tuple ty whose A value
is the smallest”
 ty is t7
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Consider a query on “Salary in [5.5k, 6.5k]”
 i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
 “Among the tuples t with t.A > y, find the tuple ty whose A value
is the smallest”
 ty is t7
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Consider a query on “Salary in [5.5k, 6.5k]”
 i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
 “Identify the path from ty to the root of the Merkel tree”
 the path from t7 to v7
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Consider a query on “Salary in [5.5k, 6.5k]”
 i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
 “Identify the path from ty to the root of the Merkel tree”
 the path from t7 to v7
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Consider a query on “Salary in [5.5k, 6.5k]”
 i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
 “For every right branch on the path, collect the hash value of
the branch”
 Collected hashes: h(t8)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Consider a query on “Salary in [5.5k, 6.5k]”
 i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
 “For every right branch on the path, collect the hash value of
the branch”
 Collected hashes: h(t8)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Consider a query on “Salary in [5.5k, 6.5k]”
 i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
 “Return tx, ty, and all tuples between them, and all hash values
collected, as well as the encrypted Merkle root”
 i.e., t5, t7, and t6, and h(v5) and h(t8), and the encrypted h(v7)
Merkle Tree: Example h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Consider a query on “Salary in [5.5k, 6.5k]”
 i.e., A is Salary, and [x, y] = [5.5k, 6.5k]
 “Return tx, ty, and all tuples between them, and all hash values
collected, as well as the encrypted Merkle root”
 i.e., t5, t7, and t6, and h(v5) and h(t8), and the encrypted h(v7)
Merkle Tree: Exercise h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k

 Consider a query on “Salary = 4k”


 What should the service provider return?
Merkle Tree: Exercise h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k

 Consider a query on “Salary in (3k, 4k]”


 What should the service provider return?
Merkle Tree: Updates h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 What if the tuples are updated?
 E.g., the tuple with Salary = 6k is deleted
 Option 1:
 The data owner reconstructs the binary tree, re-computes h(root), signs it with a
timestamp, and sends it to the service provider
 The data owner announces the timestamp to users
Merkle Tree: Updates h(v7)
v7
Owner
h(v5) ǁ h(v6)
v5 v6
h(v1) ǁ h(v2) h(v3) ǁ h(v4)
v1 v2 v3 v4
h(t1) ǁ h(t2) h(t3) ǁ h(t4) h(t5) ǁ h(t6) h(t7) ǁ h(t8)

t1 t2 t3 t4 t5 t6 t7 t8
Salary = 1k 2k 3k 4k 5k 6k 7k 8k
 Problem with Option 1:
 Reconstructing the whole binary tree once per update is time-consuming
 Improved solution:
 Use an update-friendly tree structure, e.g., a red-black tree
 Any tree structure could be signed by the data owner like a Merkle tree
Extension to Multi-Dimensional Data

 The previous discussions focus on one-dimensional


queries
 How about multi-dimensional queries? e.g.,
 SELECT * FROM Employee
WHERE Age > 30 AND Salary > 10000
 Option 1:
 Build two Merkle trees on Age and Salary, respectively
 Problem:
 No benefit from using two trees simultaneously
 We need to choose either the tree on Age or the one on
Salary to answer a query
Extension to Multi-Dimensional Data

 The previous discussions focus on one-dimensional


queries
 How about multi-dimensional queries? e.g.,
 SELECT * FROM Employee
WHERE Age > 30 AND Salary > 10000
 Option 2:
 Use multi-dimensional indices
 E.g., an R-tree
R-tree

You might also like