0% found this document useful (0 votes)
51 views36 pages

Cher: Cheating Resilience in The Cloud Via Smart Resource Allocation

This document describes CheR, a system for detecting and mitigating cheating in cloud computing resource allocation. CheR models the problem as a linear programming problem to minimize costs while ensuring results are correct above a given confidence threshold and within a maximum time. It takes as input parameters about nodes, tasks, costs and times to generate an optimal assignment matrix distributing tasks to nodes. CheR has been implemented in Java using GLPK to solve the linear program and return an assignment that meets reliability and cost objectives.

Uploaded by

sankar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views36 pages

Cher: Cheating Resilience in The Cloud Via Smart Resource Allocation

This document describes CheR, a system for detecting and mitigating cheating in cloud computing resource allocation. CheR models the problem as a linear programming problem to minimize costs while ensuring results are correct above a given confidence threshold and within a maximum time. It takes as input parameters about nodes, tasks, costs and times to generate an optimal assignment matrix distributing tasks to nodes. CheR has been implemented in Java using GLPK to solve the linear program and return an assignment that meets reliability and cost objectives.

Uploaded by

sankar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

CheR: Cheating Resilience in the Cloud via Smart

Resource Allocation

Roberto Di Pietro, Flavio Lombardi, Fabio Martinelli,


Daniele Sgandurra

Dept. of Mathematics and Physics - University Roma Tre, Italy


Institute for Informatics and Telematics - CNR, Italy
[email protected]

FPS 2013
22/10/2013
Outline

Background
Cloud Computing
Use Case

CheR
Modeling The Problem
Current Prototype

Results
Tests
Conclusion and Future Works
Background

Part I

Background
Background

Background
Computational outsourcing
• is needed:
fast complex computations are required;
very large datasets are to be processed;
• is successful since cloud computing offers:
cheap computing/storage resources;
scalable computing resources.
Background

Motivation
However, computational outsourcing does NOT guarantee:
• correctness of computations;
• availability of computations;
• timeliness of results.
In particular, cheating:
• allows computing nodes to save resources;
• if undetected, leads to erroneous results.
Background

Use Case

• A sysadmin needs to rent some cloud resources to perform a parallel


task over a large data set.
• Two main possible scenarios exist:
1. the computation needs to end reliably within a given timeframe
and the sysadmin is willing to spend as little as possible;
2. the sysadmin has a fixed maximum budget and she has to
reliably compute the function by minimizing the required time.
Background

Use Case

Figure : Use Case: Node N2 is a Cheater


Background

Possible Solution?

• Use only “trusted” nodes:


no parallelization;
not cost-effective.
• The main goal here is to guarantee timeliness, cost-effectiveness and
correctness of the computed results.
CheR

Part II

CheR
CheR

Generalizing the Problem (1)

• There are n cloud nodes that host computation.


• At most k nodes are rational adversaries.
• The remaining (n − k) cloud nodes are non-malicious and
well-behaving nodes
• We model the adversary as a static cheater:
it always fakes its computations with a given probability.
CheR

Generalizing the Problem (2)

• Cloud nodes are required to compute a parallel function f over an


input vector X of length m.
• The output is a vector of length m where the j-th element is f (j).
• We assume that there are n cloud nodes where each node ni can
compute a subset (possibly overlapping) of the input vector.
• Each node ni has an associated unitary cost per operation ci and the
time to perform an unitary operation is ti .
CheR

Generalizing the Problem (3)

• The percentage of cheaters (CheaterRate) is k


n.
• The system administrator has to send multiple (possibly overlapping)
subsets of X to the nodes to be confident that the output is correct.
• The confidence threshold is DetConf .
• The goal of the system administrator is to:
minimize the total cost of the operations,
given a maximum time of computation Tmax ,
all the results to be correct with an error less than DetConf .
CheR

Assignment Matrix

• To better model the scenario we use an assignment matrix M n×m :


where Mi,j means that the node ni receives the element xj to
be computed on the function f .
• Indexes of the rows are coupled with the nodes, where 1 ≤ i ≤ n;
• Indexes of the columns are associated with the elements of the
vector, where 1 ≤ j ≤ m.
CheR

An Example

VM1 VM2 VM3 VM4 VM5


c1 c2 c3 c4 c5
t1 t2 t3 t4 t5

Table : Cost and Time Vectors

x1 x2 x3 x4 x5 x6 x7
VM1 1 0 1 0 0 1 0
VM2 0 1 0 1 1 0 1
VM3 1 0 0 1 0 0 1
VM4 0 1 0 1 1 0 1
VM5 0 1 1 0 1 1 0

Table : Matrix Assignment of the Example


CheR

Modeling The Problem


The total cost Ci of the operations performed by the i-th node:
m
X m
X
Ci = ci · Mi,j = ci · Mi,j (1)
j=1 j=1
The total time Ti of the i-th node is:
m
X m
X
Ti = ti · Mi,j = ti · Mi,j (2)
j=1 j=1
We can formulate the problem as a Linear Programming Model:
n X
X m
Minimize ci · Mi,j (3)
i=1 j=1

s.t.:
m
X
ti · Mi,j ≤ Tmax ∀i (4)
j=1
CheR

Modeling The Problem

• Each input element can also be processed by a cheater node.


• To this end, we introduce the following equation:
n
X
Mi,j ≥ Repl(j) ∀j (5)
i=1

where Repl(j) is the number of elements that has to be replicated


for each input element j according to the confidence level DetConf .
• The chances of wrong results due to cheater nodes are lower.
• The number of requested replicas is computed using the
hypergeometric distribution by considering that at least half of the
replicated elements are given to, and processed by, cheater nodes.
CheR

Number of Replicas

• The hypergeometric distribution describes the probability of k


successes in n draws without replacement from a finite population of
size N containing exactly K successes.
• Given the hypergeometric distribution H(n, h, r ), where n is the
number of nodes and h is the number of cheaters.
• Find the smallest r , i.e. the number of replicated elements for each
input (i.e., Repl(j)), for which the following equation holds:
max h n−h
 
r −i
X i
P(i) = n
 < DetConfLocal (6)
i= 2r r
CheR

Modeling The Problem

• Finally, we have to consider the binary condition variables that are


used to decide which of the input elements are given to which nodes:

Binary Mi,j ∀i, j (7)


• Once this LP model is solved, if a solution exists, an optimal
assignment of input elements to cloud nodes is returned that
satisfies all the system administrator-imposed requirements.
CheR

Modeling The Problem


That can be written as:

n X
X m
Minimize ci · Mi,j
i=1 j=1
Xm
Subject to Mij ≤ Max(i) ,1≤i ≤n
j=1
Xn
Mij ≥ Repl(j) , 1 ≤ j ≤ m
i=1
Binary Mi,j , 1 ≤ i ≤ n, 1 ≤ j ≤ m

Table : LP Model
CheR

CheR Implementation

• CheR receives as input the cloud scenario and creates a linear


programming problem and returns an optimal solution.
• Implemented in Java.
• GLPK used as problem solver.
• It includes also a simulator/validator.
CheR

CheR Implementation

The CheR meta-program takes as input the following parameters:


• the number of nodes n;
• the number of input elements m;
• the confidence level DetConf ;
• the features of available nodes:
time ti required to process a single element,
cost per unit ci .
• the time constraints for the termination of the reliable distributed
computation (Tmax ):
this means that every node ni can process at most Max(i)
elements.
CheR firstly computes the number of replicas and then it outputs the LP
model that is later fed to the LP solver.
Results

Part III

Results
Results

Testbed

• To model an Amazon-like cloud, 5 node typologies were tested:


ranging from Medium to XXXLarge according to real
(normalized) cost and performance.
• The number of nodes n is set to 100.
20 nodes for each of the 5 typologies.
• 5 static cheaters P, i.e. 5% of the total nodes
• 1,000 input elements m are considered for each round.
Results

Tested Scenarios

• We have depicted four scenarios, where Tmax is set so that nodes


can process a different (decreasing) number of elements.
• These four scenarios depict four different alternatives:
1. extreme cost savings;
2. moderate balance requirements;
3. tight time constraints and cost containment;
4. with extremely tight time constraints.
Results

Matrix Assignment 1

Figure : Bitmap representing an actual assignment of elements (x-axis) to


nodes (y -axis) for scenario E1 targeted at cost savings.
Results

Matrix Assignment 1

Figure : Bitmap representing an actual assignment of elements (x-axis) to


nodes (y -axis) for scenario E2 targeted at moderate cost savings.
Results

Matrix Assignment 3

Figure : Bitmap representing an actual assignment of elements (x-axis) to


nodes (y -axis) for scenario E3 targeted at moderate time constraints.
Results

Matrix Assignment 4

Figure : Bitmap representing an actual assignment of elements (x-axis) to


nodes (y -axis) for scenario E4 targeted at extremely tight time constraints.
Results

Performance

Both the execution time and memory requirements are O(n × m)

Figure : Execution Time of the Prototype


Results

Validation Tests

• Several simulation tests that exploit the assignment matrix of E4:


to discover whether there was any wrong result that would be
erroneously considered as correct by the collector.
• For each test, each experiment was repeated several times by
randomly choosing the set of k cheaters.
• In the end, for each test, we counted:
1. the number of wrong results;
2. the number of tests that contain at least one wrong results.
Results

Validation Tests

• Select a cheating probability for each static cheater P:


i.e. how often a cheater returns a fake result.
• As a consequence, if a cheater returns wrong results more often, it
will be easier to detect it.
• Conversely, if a smaller number of wrong results is returned, then it
can be detected with more difficulty.
• The central collector can discover a cheater by calculating a
divergence index for each node:
keeping track of how many times such node has given responses
that were different from other ones on the same input
(percentage of results in which the cheater was in minority).
Results

Validation Tests
P FNT FNE (Avg) FNTC FNEC (Avg)
1 0.236% 0.006% (6.1) 0.00375% 0.0025% (2.445)
0,9 0.223% 0.0045% (4.5) 0.022% 0.0012% (1.17)
0.8 0.228% 0.0038% (3.8) 0.0075% 3.393E-4% (0.34)
0.75 0.226% 0.0036% (3.45) 0.0047% 2.144E-4% (0.21)
0.66 0.208% 0.0025% (2.5) 0.0011% 5.14E-5% (0.05)
0.6 0.2% 0.0021% (2.1) 6.0E-4% 1.267E-5% (0.0127)
0.5 0.19% 0.0015% (1.5) 0% 0% (0)
0.4 0.17% 9.436E-4% (0.94) 0% 0% (0)
0.33 0.16% 6.573E-4% (0.66) 0% 0% (0)
0.3 0.16% 5.358E-4% (0.54) 0% 0% (0)
0.25 0.14% 3.77E-4% (0.38) 0% 0% (0)
0.2 0.12% 2.5E-4% (0.25) 0% 0% (0)
0.1 0.05% 6.14E-5% (0.06) 0% 0% (0)

Table : Results of the Validation tests

P cheating probability of static cheaters: how often a cheater cheats;


FNT % of false negative tests: ratio of failed tests without centralized control (at least one wrong results in an experiment);
FNE % of false negative elements: ratio of wrong results without centralized control (in all the experiments);
FNTC % of false negative tests (centralized scenario): ratio of failed tests with centralized control (at least one wrong results in an
experiment).
FNEC % of false negative elements (centralized scenario): ratio of wrong results with centralized control (in all the experiments).
Results

Discussion

• As the maximum execution time is compressed, the workload,


including the replicated computations, seamlessly shifts towards the
bottom lines, representing the more costly but faster nodes.
• For large sizes of the problem matchmaking this is nontrivial and as
such an automated approach is needed.
• The assignment matrix strives to pair elements to less-costly nodes
as soon as the time constraints have been satisfied.
Results

Conclusion

• CheR is a model for reliable execution of workload over a large


number of heterogeneous computing nodes:
some node can cheat according to some identified models.
• CheR helps sysadmin that are assessing how to distribute
computation over a cloud, to reason about the possible scenarios,
available approaches, and their convenience and feasibility.
• CheR provides probabilistic assurance that the result of the
distributed computations is not affected by misbehaving nodes.
• Experimental evidence based on a real-world cloud provider such as
Amazon shows the viability of CheR.
Results

Future Work

• Develop resilience approaches against smart cheaters.


• Consider trust of the cheaters:
past behaviors.
• Dynamically change the assignment matrix at each round.
• Further improvements: paper accepted at “The 10th IEEE
International Conference on Autonomic and Trusted Computing”:
AntiCheetah: an Autonomic Multi-round Approach for Reliable
Computing.
Results

Questions?
E-mail:
[email protected]
[email protected]

You might also like