Query Authentication
Query Authentication
S i ini Outsourced
O d
Databases
(Query
(Q y Answer Assurance))
1
Traditional Client
Client-Server
Server Arch
Arch.
DB Client
Query
Results
Owner
2
Data Publishing
(Database-as-a-Service)
DB Client
Owner
Database
Third Party
S
Server
3
Data Publishing
DB Client
Owner
Q y
Query
Results
Third Party
S
Server
4
Data Publishing
• Pushes business logic and data processing from corporate data centers to third
party servers at the “edge”
edge of the network
– Distribution of (part of) the database to edge servers
– Edge servers perform query processing
• Why?
– Most organizations need DBMSs
– DBMSs extremely complex to deploy, setup, maintain
– Require skilled DBAs (at very high cost!)
• Advantages
– Cuts
C t down
d network
t k latency
l t andd produces
d faster
f t responses
– Cheaper way to achieve scalability
– Lowers dependency on corporate data center (removes single point of failure)
– Reduced cost to client
• Get what you need,
need pay for what you use and not for: hardware,
hardware software infrastructure or
personnel to deploy, maintain, upgrade…
– Reduced overall cost
• cost amortization across users
– Better service
• leveraging experts
5
The Challenge
DB Client
Owner
Q y
Query
The Truth?
Results The Whole Truth?
Nothing But The Truth?
Third Party
S
Server
Untrusted! 6
The Challenge
Sel * FROM Emp Owner
DB Client
WHERE Sal < 5000
Server
7
The Challenge
Sel * FROM Emp Owner
DB Client
WHERE Sal < 5000
5 A 2000 1
Result = 2 C 3500 2 ID Name Sal Dept
4 B 2200 3 5 A 2000 1
2 C 3500 2
1 D 8010 1
4 B 2200 3
3 E 7000 2
Server
8
Security Concerns
5 A 2000 1
DB Client
Result = 2 C 3500 2
4 B 2200 3
Query
Result’
9
Security Concerns
5 A 2000 1
DB Client
Result = 2 C 3500 2
4 B 2200 3
Query
Result’
5 A 2000 1
2 C 3500 2 ID Name Sal Dept
4 B 2200 3 5 A 2000 1
2 C 3500 2
1 D 8010 1
Server is trustworthy! 4 B 2200 3
Server 3 E 7000 2
10
Security Concerns
5 A 2000 1
DB Client
Result = 2 C 3500 2
4 B 2200 3
Query
Result’
5 A 3500 1
2 D 3500 2 ID Name Sal Dept
4 B 2200 1 5 A 2000 1
2 C 3500 2
1 D 8010 1
Server is malicious! 4 B 2200 3
Server
Records are tampered 3 E 7000 2
11
Security Concerns
5 A 2000 1
DB Client
Result = 2 C 3500 2
4 B 2200 3
Query
Result’
5 A 2000 1
2 C 3500 2 ID Name Sal Dept
5 A 2000 1
2 C 3500 2
1 D 8010 1
Server is malicious! 4 B 2200 3
Server
Answers are dropped 3 E 7000 2
(Incompleteness) 12
Security Concerns
5 A 2000 1
DB Client
Result = 2 C 3500 2
4 B 2200 3
Query
Result’
5 A 2000 1
2 C 3500 2 ID Name Sal Dept
4 B 2200 3 5 A 2000 1
1 D 1500 2 2 C 3500 2
6 E 3400 1 1 D 8010 1
4 B 2200 3
Server 3 E 7000 2
Server is malicious!
Spurious answers are added 13
Data Security Challenge:
Design
i Objectives:
bj i
• Authenticity: Every entry originated from the owner
• Completeness:
p No result entryy is omitted from the answer
• Precision: Minimum information leakage
• Security: Computationally infeasible to cheat
• Efficiency: Polynomial proof
14
Collision-resistant (one-way)
h h functions
hash f i
• Given x, easy to compute h(x); given h(x),
difficult to determine x
• i.e., it is computationally hard to find x1 and x2 s.t.
h(x1)=h(x2)
• Computational hard? Based on well established
assumptions such as discrete logarithms
• E.g., SHA, MD5
15
Public key digital signature schemes
Cryptographic tool for authenticating the signed message as
well as its origin, e.g., RSA, DSA
Sender
m
Insecure Channel
Recipient
KeyGen (SK, PK)
SK
m Ver(m PK,
Ver(m, PK ) valid?
By checking:
Sign(h(m), SK)
( ) =? S
h(m) Sign
g -1((PK,, )
16
Authentic Publication Scheme
Trusted
DB Client
…
Result +
Q
Query Correction proof
Unsecured
Does not certify data
Edge Server ((a)) Untrusted
(b) Disclaim liability
DB +Certification
(Verification Objects)
P bli key
Public k
Trusted
Certify data
Central DBMS (a) Ownership
((b)) Liabilityy
17
Naï e Scheme
Naïve
Each attribute has a signed
g digest
g
Each tuple has a signed digest
Relation R
DT (A1, D1) … (Ai, Di) …
18
Naï e Scheme
Naïve
Query:
Q y SELECT A3, A4, … FROM R
Filtered attributes
Result tuples
DT A3 A4 … D1 D2 D5 …
19
Naïve Scheme (Example)
A1 B1 C1 a1 b1 c1 T1
A2 B2 C2 a2 b2 c2 T2
A3 B3 C3 a3 b3 c3 T3
T = sign(g(h(A)|h(B)|h(C))
g and h are collision-resistant hash functions
ai = h(Ai)
Retrieve whole of first tuple:
Server returns A1, B1, C1, T1; Client can compute h(A1), h(B1) and
h(C1) and verify T1 from A1,
h(C1), A1 B1 and C1
Issues??
Using Merke Hash Tree (MHT)
• For each tuple t, a tuple hash h(t) is computed
h(t) = h(h(t.A1) | h(t.A2) | … | h(t.An))
• Assume a total order on attribute A of a relation R
with |R| tuples (e.g., based on the primary key)
– MHT(R,A) is a binary tree with |R| leaf nodes and hash values h(i)
associated with node i
– If i is a leaf node, then h(i) = h(ti), ti is the ith tuple in the order
– If i is an internal node, then h(i) = h(h(l), h(r)) where l and r are the
left and right children of node i.
– The root hash is the digest of all values in the Merkle-hash tree
MHT(R A)
MHT(R,A).
21
Merkle Hash Tree
N1234 = h(N12 | N34) Sign(h1234,SK)
N1 = h(d
( 1) N2 = h(d
( 2) N3 = h(d
( 3) N4 = h(d
( 4)
Edge server returns d2, N1, N34 and signed N1234 (and the structure)
Client computes N1234 = h(h(h(d2)|N1), N34) and verify that the
signed value is correct 25
Range Queries
Path l
LCA(q)
q
GLB(q) LUB(q)
26
Example:
p Range
g qqueries
Query
answer
What are returned?
27
Example:
p Range
g qqueries
Query
digest answer
What are returned?
28
Example: Range queries
digest
What are returned?
29
Proving Authenticity is Easy
s(hd)
hc
Certified
Hash Tree
ha = h(h(2)|h(4)) hb hc
Data: 2 4 6 8 10 12
Query: 5 ≤ r ≤ 7
30
Proving Authenticity is Easy
s(hd)
hc
Certified
Hash Tree
ha hb hc
Data: 2 4 6 8 10 12
Query: 5 ≤ r ≤ 7
31
Provingg Completeness
p is Easyy But …
s(hd)
hc
Certified
Hash Tree
ha hb hc
Data: 2 4 6 8 10 12
Query: 5 ≤ r ≤ 7
32
Precision may be compromised!
s(hd)
hc
Certified
Hash Tree
ha hb hc
Data: 2 4 6 8 10 12
Query: 5 ≤ r ≤ 7
36
Signature Chain
• For each data value, there is an associated signature
– Computed from its own value, and that of its left and right
neighbors
– sig(ri) = s(h(g(ri-1) | g(ri) | g(ri+1)))
… ri-1 ri ri+1 ri+2 …
• Owner stores the (ri, sig(ri)) pair in the server
• During querying, server returns (answer, signature)
pairs and more …(verification
(verification objects) …
39
Signature
g Chain Ensures Contiguity
g y
Query: 55 ≤ r Result Q
Server: … 60 75 80 90 …
User:
… g(60)
(60) g(75)
(75) g(80)
(80) g(90)
(90) …
Result Q
Server: ra ra+1 …. rn
User:
Create
C t a fictitious
fi titi recordd rn+1 that
th t is
i
larger than the largest value but smaller
than U
• sig(rn+1) = s(h(g(rn)|g(rn+1)|h(U)))
42
Signature
g Chain Ensures Terminal
Result Q
Server: ra ra+1 …. rn g(rn+1)
User:
…. U
g(ra) g(ra+1) g(rn)
Create
C t a fictitious
fi titi recordd rn+1 that
th t is
i
larger than the largest value but smaller ver(Hn+1, sig(rn+1), PK)?
than U
• sig(rn+1) = s(h(g(rn)|g(rn+1)|h(U)))
• server returns g(rn+1) instead of rn+1 43
How to prove Origin (without revealing
th boundary
the b d point)??
i t)??
40 50 60 70 80 ….
44
How to p
prove Origin??
g
40 50 60 70 80 ….
Q
Query: 55 ≤ r
45
How about this …
User:
g(r
( a) g(r
( a+1)
Query: α ≤ r 46
The basic idea fails …
Cheat!
Server: 50 60 g(70) 80 90
User:
g(r
( a) g(r
( a+1)
Query: 55 ≤ r 47
Private Boundaryy Proof Ensures Origin
g
α-ra-1-11
Server: h (ra-1) ra ra+1
User: ??
g(ra-1
g( a 1) g(ra) g(
g( g(ra+1)
ver(H
(Ha, sig(r
i ( a),
) PK)?
hi (r) = h i-1 ( h (r) )
g(r)) = h U-r-1 ((r))
g(
Query: α ≤ r 48
Private Boundaryy Proof Ensures Origin
g
α-ra-1-11
Server: h (ra-1) ra ra+1
A collaborative
hash scheme to compute
p
User: U-α the hash value
times
g(ra-1
g( a 1) g(ra) g(
g( g(ra+1)
ver(H
(Ha, sig(r
i ( a),
) PK)?
hi (r) = h i-1 ( h (r) )
g(r)) = h U-r-1 ((r))
g(
Query: α ≤ r = h U-α ( h α-r-1 49(r) )
Back to our example
p
Cheat! Require that the
inverse of hi for
α-70-1
h (70) i < 0 be undefined
Server: 80 90
hash
User: U - 55
times
Wrong! Undefined! g(r
( a) g(r
( a+1)
Query: 55 ≤ r 50
Back to our example
p
55-50-1
Server: h (50) 60 70 ….
hash
User: U - 55
times
g(50) g(60)
(60) g(70)
(70)
Query: 55 ≤ r 51
Putting
g the Pieces Together
g
R lt Q
Result
α-ra-1-1
Distributor: h (ra-1) ra ra+1 … rn g(rn+1)
hash
User: U-α
times
Query: α ≤ r 52
Other cases
• α≤r
• β r ( Result = {ra, ra+1, … rb }, i.e., ra, … rb ≤ β < rb+1
– Need to verify that rb+1 > β
– Define g(r) = h r- L-1 (r) = h β - L ( h r- β -1 (r) ) where L is a
value outside of the minimum value of the domain
• So, we have α ≤ r ≤ β
• r=α≡α≤r≤α
• α < r < β ≡ α+1 ≤ r ≤ β-1
• α r ≡ ((L < r < α) (α < r < R))
53
NULL Answers??
• Consider Q: α ≤ r.
• Q = because
b rn < α.
– Server returns h α-rn-1 (r), g(rn+1), sig(rn+1)
– User computes h U - α ( h α-r -11 (r) ) and verifies
n
54
One More Vulnerability
• User can discover ra-1 through brute force
enumeration of numbers below ra
• Solution:
– Record [K,
[K A1, .., Am], ] K = ordering attribute
– g(ri.K | ri.A1 | … | ri.Am)
– Brute-force
Brute force attack is no longer feasible
55
Completeness Verification for Range Queries
Verify α < ra-1.K ver(Ha, sig(ra), PK)?
U-ra-1.K-1 ra-1.K-L-1
h (ra-1.K) h (r.K) h(ra-1.A)
hash Merkle
U-α Tree
.
times :
α-ra-1.K-1 h(ra-1.A1) … h(ra-1.AR)
h (ra-11.K)
Record ra-1: [ K A1 A2 … AR ]
57
What else?
– What about data freshness?
– More efficient scheme
– Ad-hoc
d oc jojoinss
– Aggregates
– Multi-dimensional
Multi dimensional data
– Computation
– Complete (complex) queries
58
Summary
• Malicious service provider may cheat
• Users need assurance on their query
answers
• Merkle hash tree offers a good solution but
…
• Signature
Si chain
h i guarantee completeness
l
without violating access control policy
59