0% found this document useful (0 votes)
44 views38 pages

Optimization Approach For Data Placement in Cloud Computing: Preliminary Study

The document presents an optimization approach for data placement in cloud computing. It first surveys existing works on data placement strategies. It then outlines a system model where content providers store data on cloud providers to serve content consumers. An optimal data placement algorithm (ODPA) is proposed to minimize costs and delays by formulating the data placement problem as a mathematical program solved using GAMS. Several experiments are conducted by varying data sizes, network bandwidths, and processing powers. The results demonstrate optimizing for cost and delay. Limitations are acknowledged and future work is suggested to improve the system model and use other optimization approaches.

Uploaded by

Unique Chan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views38 pages

Optimization Approach For Data Placement in Cloud Computing: Preliminary Study

The document presents an optimization approach for data placement in cloud computing. It first surveys existing works on data placement strategies. It then outlines a system model where content providers store data on cloud providers to serve content consumers. An optimal data placement algorithm (ODPA) is proposed to minimize costs and delays by formulating the data placement problem as a mathematical program solved using GAMS. Several experiments are conducted by varying data sizes, network bandwidths, and processing powers. The results demonstrate optimizing for cost and delay. Limitations are acknowledged and future work is suggested to improve the system model and use other optimization approaches.

Uploaded by

Unique Chan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Optimization Approach for Data Placement in Cloud Computing: Preliminary Study

presented by

Nay Myo Sandar (Chan Chan) Shinawatra University

Outlines
1.

Overview

2.
3. 4. 5. 6. 7. 8.

Survey
Summary of Survey System model and assumption Problem formulation GAMS (Introduction) Experimental results Conclusion

Overview
Data placement = data locality management in cloud

computing
Cloud computing is an efficient solution for data

placement

Overview (Continued)
Size of data Monetary cost for data placement/computation Resource reliability Network bandwidth Geographical data movement

Overview (Continued)
In this study, we propose ODPA (i.e., Optimal Data Placement

Algorithm) to store data in cloud providers in order to minimize cost and delay
investigate a number of constraints (i.e., demand, processing

power, and storage)


approach mathematical programming model to formulate data

placement problem
use GAMS to solve data placement problem in terms to get

optimal solutions
perform numerical studies and experiments to evaluate our

proposed optimization models

Survey
A Data Placement Strategy in Scientific Cloud Workflows Data Storage Placement in Sensor Networks Load Balancing and Data Placement for Multi-tiered

Database Systems
New Algorithms for Planning Bulk Transfer via Internet

and Shipping Networks

A Data Placement Strategy in Scientific Cloud Workflows [1/4]

scientists need to analyse terabytes of data from existing data resource or collected from physical devices To effectively store these data, scientists must intelligently select data centers problems
o o

moving the data becomes a challenge data movement can also impact in costs

Continued [2/4]

apply k-means clustering algorithm based on two stages


o o

build-time stage run-time stage

to minimize data movement generate test workflows to run on SwinDeW-C two types of data (existing & generated)
o
o

existing data: exists before the workflow's execution


generated data: generated during the workflow's execution

Continued [3/4]

k-means clustering algorithm can make to cluster data sets to the data centers
o

build-time stage: cluster the existing data sets into k data centers as the initial partitions run-time stage: cluster the generated data sets to one of the k data centers based on their dependencies

Drawbacks (Continued) [4/4]

do not consider the structure of data it is not practical to calculate the data sets' dependencies and assign them to a data center at the build-time stage very hard to predict when a certain dataset will be generated in a dynamic cloud environment

it is impractical and inefficient to reserve the storage for generated data at the build-time stage

Data Storage Placement in Sensor Networks [1/6]

data storage has become an important issue in sensor networks


monitoring learning behavior of the children, senior care system, and environment sensing etc. need to be archived for future information retrieval problem
o

storage node placement problems (i.e., how to store and search the collected data)

to minimize cost

Sensor Network (problems) (Continued) [2/6]

sensor is equipped with only limited memory or storage space since sensors are battery operated, the stored data will be lost searching the data of interest in a widely scattered network is a hard problem

Continued [3/6]

collected data can be transmitted to the sink and stored there for future information retrieval
problems
o

large amount of data cannot be transmitted from the sensor network to the sink effectively
take long routes consuming much energy and depleting of sensor battery power quickly

Continued [4/6]

use two tree models


o o

fixed tree model dynamic tree model

fixed tree model: assume sensor network has organized into a tree rooted at the sink
dynamic tree model: the optimal communication tree is constructed after the storage nodes are deployed each sensor selects a storage node in its proximity for its data storage to minimize energy cost

Sink (drawbacks) (Continued) [5/6]

In sensor networks, query is the most important application If data are stored in the sink, it can be beneficial to the query reply with no transmission cost but data accumulation to the sink is very costly

query diffusion cost becomes large

Tree Model (drawbacks) (Continued) [6/6]

communication tree may be broken due to link failure consider when building the tree, only stable links are chosen In reality, storage nodes may not be deployed in a precise way use stochastic analysis to evaluate the performance of random deployment of storage nodes in both models

Load Balancing and Data Placement for Multi-tiered Database Systems [1/4]

MQT (Materialized Query Table) is an auxiliary table with precomputed data MQTA (Materialized Query Table Advisor) is often used to recommend and create MQTs MQTA is placed at the backend database server to recommend and create MQTs at the frontend database server to improve the response time of a query workload

Continued [2/4]

problem
o

placing all or many MQTs at the frontend database server cannot improve the response time of the workload

MQTA cannot be used extend the MQTA functionality with DPA (Data Placement Advisor) and load balancing strategies for automatic recommendation and placement of MQTs used WebSphereII as a frontend database server
o

statistics about remote data sources are collected and maintained in WebSphereII for later use by the query optimizer

Continued [3/4]

DPA takes input from user specified preference in order to cache the MQTs at the frontend database server considers information output from the MQTA which provides MQT dependency information

performs data placement analysis considering the MQT benefit,


MQT size, MQT dependency, processing and network latency simulates the placement of MQTs on the frontend database server one by one and observes the response time the workload can be distributed across multiple database servers

for better response time

Drawbacks (Continued) [4/4]

ignores construction cost of MQT the cost of constructing a MQT is usually higher than the benefit of it

New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks [1/4]

large datasets are located at different geographically distributed sources

these data need to be transferred to a single sink (e.g., AWS, Google data centers, etc) for processing
problem
o

planning a group-based deadline data transfer through the internet and shipping storage devices from companies

to satisfy latency deadline/to reduce total dollar costs

Continued [2/4]

difference between pros and cons of internet transfer and shipping transfer using internet transfer can be cheap and fast for small datasets, but very slow and expensive for large datasets using shipping transfer can be cheap and fast for large datasets, but expensive for small datasets take into account for the costs and latencies to get optimal choice for shipping as well as internet transfer

Continued [3/4]

build Pandora (People and Networks Moving Data Around) planning system
o

takes inputs the dataset sizes at source sites, interconnectivity between sources and sink (bandwidth, cost and latency for both internet and shipping links), and latency deadline which bounds total time taken for transfer

formulate the inputs using integer programming into a data transfer problem

Continued [4/4]
Minimize c(f) =

shows to minimize total dollar cost while satisfying a latency deadline at Time T use Mixed Integer Program (MIP) solver use real data from Fedex and PlanetLab prove their transfer planning algorithms satisfy deadlines while simultaneously minimizing dollar costs

Summary of Survey
reviewed a lot of papers related to data placement problems in

the variety of distributed computing environments


applied different assumptions, algorithms, methods, and

models
did not consider the structure of data, assume homogeneity for

data structure, no data bottleneck, and ignore the construction


cost of service model implementation

System Model and Assumption Content Provider

Cloud Provider Content Consumers

Cloud Provider

Cloud Provider

Cloud Provider

26

Problem Formulation
Minimize Cost: Minimize Delay:

Constraint for Demand:


Constraint for Processing Power: Constraint for Storage:
27

GAMS (Introduction)
refers to General Algebraic Modeling System
use GAMS version 23.7.3 allows using syntax so that dont need to implement

algorithms
can be solved on different types of computers

28

Experimental Results Experiment 1


Results of Brute Force Search (cost optimization)

Optimal Solution = $62.416


29

Experiment 1 (Continued)
Results of Brute Force Search (delay optimization)

Optimal Solution = 10.467 MB/sec

30

Experiment 2
Configuration Cost ($) 1 2 3 4 5 6 7 8 9 10 $94.3990 $65.8896 $44.9550 $294.3855 $101.1281 $216.2001 $116.1138 $217.5314 $114.4319 $86.7977 Objective Value Delay (MB/sec) 15.7768 8.8820 7.6150 43.2565 16.7685 35.1447 23.4208 49.6789 23.4466 13.7720

31

Experiment 2 (Continued)

Trajectory of Cost (objective value)

Trajectory of Delay (objective value)


32

Experiment 3
Change size of contents for optimization models

33

Experiment 3 (Continued)
Change network bandwidth for optimization models

34

Experiment 3 (Continued)
Change processing power of each content for

optimization models

35

Experiment 3 (Continued)
Change maximum processing power offered by each

cloud provider for optimization models

36

Conclusion

Extensively performed the survey Proposed 2 optimization models Performed experiments Limitation and future work

Uncertainty Optimization approaches: stochastic programming and Markov Game theory Revision of System Model & Assumption

38

You might also like