Application of Machine LearningOptimizationinCloud Computing Resource SchedulingandManagement
Application of Machine LearningOptimizationinCloud Computing Resource SchedulingandManagement
�� �,� = � ∈� � � � ,�
inner product of the resources used by different services.
� �
� ∈� ,�>�
� �
,� � � ,� (5)
Machine utilization rate
Figure 2. Deep reinforcement learning method considering time-varying Machine utilization is mainly evaluated for the proportion of
features machines currently in use. Adding this reward can make
reinforcement learning allocate resources using machines
In this framework, users' historical resource usage data is first rather than non-machine resources in resource allocation.
� � =− �� �,�
collected and analyzed to understand patterns and trends in ��
� �∈��
(6)
resource utilization over different time periods. This data is
then used to train the model to learn how the user's resource Where U (t; d) represents the unused resource of resource d by
needs change over time. Next, according to the learned pattern, the used machine m at time t; M. Represents the collection of
develop a resource allocation strategy to allocate resources machines currently in use.
reasonably in different time periods, so as to reduce the Excessive use of penalties
� � =− �� ∗
pressure on the server and reduce the risk of downtime.
� �∈�
��,� 퐹���� �����ℎ��� 푓�� �ℎ� 푇�� (7)
Finally, a new incentive evaluation mechanism is used to
evaluate and optimize the resource allocation strategy to
improve the system performance and efficiency. The research
Overuse penalties are a measure of penalties given when
of Mondal et al. provides an innovative solution to the
resource usage is higher than machine usage, where |md is an
resource allocation problem in long-term cloud computing
indicator of overuse.
tasks, and makes an important contribution to improving the
Use time penalties
efficiency and stability of resource utilization in cloud
computing environments. The use of time penalties is mainly calculated by the number
As illustrated in Figure 2, in comparison with section 2.1, this of requests waiting in the queue, mainly by the following
�� =− �� ∗ �� (8)
study introduces a novel approach by initially employing formula:
clustering techniques to forecast the time-evolving behaviors
Where �� represents the number of requests waiting in the
of users in their utilization of cloud resources. This proactive
measure aims to preempt the confluence of peak usage periods
among different users, a scenario that would otherwise exert queue.
immense strain on the server infrastructure. Within the realm C. Task scheduling/unloading with priority constraints
of clustering methodology, a combination of autoregressive
analysis, linear trend analysis, and dynamic time warping Optimization Problem
techniques is predominantly employed. Additionally, the In the cloud computing resource allocation and task
inclusion of dynamic time warping facilitates the identification scheduling system, complex tasks are decomposed into
of users exhibiting similar temporal utilization patterns multiple subtasks to form task flows and then allocated to
through the application of K-means clustering. This processors for parallel processing in order to enhance
multifaceted approach not only enhances the predictive computational efficiency. As certain tasks' computations
capabilities regarding resource utilization dynamics but also depend on the results of previous tasks and priority constraints
lays the groundwork for proactive resource allocation exist among tasks, directed acyclic graphs (DAGs) can be
strategies aimed at alleviating server pressure and optimizing utilized to abstract and model the workflow. In this model,
resource allocation efficiency. nodes represent subtasks, and edges between nodes denote
In the part of deep reinforcement learning, the idea of this priority constraints among subtasks. Literature [4] provides a
study is roughly similar to that of Study 1. Study 2 puts tree diagram of the DAG scheduling problem, considering
forward a number of definitions of reward worthy of reference. factors such as communication time between tasks, processor
It mainly includes the following parts: resource limitations, and interconnectivity between processors.
�� =− � � � ∗ � � � ,�
Mathematical Model and Related Algorithms
�∈�
(4)
The priority constraints between tasks can be expressed as
Competition EST(j) ≤ CT(i), for all i→j, where EST(j) denotes the earliest
The competition describes the usage of server resources by time task j can begin processing, CT(i) represents the
different resources. If the usage of allocated VMS by two completion time of task i, and i→j indicates that task i must be
services peaks at the same time, the competition score is completed before task j. For the cloud offloading problem on
mobile terminals, processors can handle multiple tasks Overall, the above methodologies are logically aligned with
simultaneously. Literature introduces a deterministic delay- the actual requirements of cloud computing resource
constrained task segmentation algorithm based on dynamic allocation and task scheduling.
programming, demonstrating its suboptimality. Considering
the deadline constraints, the topology of DAG is not restricted,
assuming processors can only handle one task at a time. III. EXPERIMENT AND METHODOLOGY
Literature [5] presents a mixed-integer programming model A. Experimental environment
considering collaboration between edge computing nodes and
The CloudSim3.0.2 cloud simulation platform of the Grid
remote cloud nodes. This problem involves a general
Laboratory of the University of Melbourne was used in the
allocation problem with NP-hard characteristics, and there is
experiment to test the performance of the author's algorithm.
currently no polynomial-time optimization method. Literature
The experiment involved simulation comparison and result
[6] utilizes relaxed integer programming model with 0-1
analysis with the basic ant colony algorithm (ACO)[10] and
variables to convert the problem into a convex optimization
simulated annealing algorithm (SA). Firstly, in the cloud
problem, followed by the design of a heuristic approach.
simulation platform, the MyAll-ocationTest class is created to
Addressing the multi-DAG mobile terminal offloading
perform the initial configuration of the cloud environment.
problem, literature [7] proposes a mixed-integer programming
This includes the creation of the data center, initialization of
model to decide whether to upload tasks to the cloud and
the scale parameters of the cloud computing task,
optimize energy consumption under deadline constraints.
determination of the task size and the size of the input and
Heuristic Methods
output data files, and the creation of virtual machine resources.
Optimization methods for DAG scheduling mainly include Each VM resource is characterized by the number of CPUs,
heuristic methods, intelligent algorithms, and hybrid memory size, bandwidth, and instruction processing speed.
algorithms, among which heuristic methods are primarily Subsequently, cloudsim objects are created to add cloud
categorized into list scheduling, clustering, and task computing tasks. Moreover, GAACO, ACO, and SA
replication. List scheduling arranges tasks based on priority algorithms are implemented in the DatacenterBroker class.
and selects the highest-priority task from the pending tasks for The relevant parameters of the genetic ant colony algorithm
assignment to the appropriate processor. Clustering groups are presented in Table 1.
tasks until the number of categories equals the number of
processors. Task replication involves duplicating tasks with
significant data to multiple processors to reduce processing TABLE I. GENETIC ANT COLONY ALGORITHM PARAMETER TABLE
latency.
In the "cloud-edge-end" system, literature [6] employs forward Parameter
ranking as the criterion for list scheduling to optimize the Symbol Meaning Value
computation, communication costs, and latency of cloud and evolution Num Evolution generations 100
edge computing nodes. Considering both deadline and cost,
literature [7] employs methods such as lower bound estimation
to allocate deadlines and compute nodes to DAG subtasks. population Population size 10
Referring to the order of task deadlines, literature [8] allocates
virtual machines based on the earliest completion time of tasks m Number of ants 31
to optimize overall time performance.
Intelligent Algorithms
Pc Crossover probability 0.35
Distinct from heuristic rules, intelligent algorithms aim for
global optimization performance. Literature [8] utilizes Pm Maximum mutation probability 0.08
genetic algorithms to optimize task-edge node group
assignments. Probability is employed by literature [9] to
characterize the positional relationship between tasks. After A max Maximum pheromone factor 1.00
DAG pre-segmentation based on heuristic methods, literature
[9] utilizes bivariate correlation distribution estimation .max Maximum expected pheromone factor 2.00
algorithms to rank tasks and optimize overall application
completion time and edge node energy consumption. In
Y max Maximum pheromone evaporation 0.10
literature [10], Estimation of Distribution Algorithm (EDA) is
coefficient
employed to optimize total delay considering task deadline
information. For the task-node assignment problem, literature Q Maximum pheromone intensity 50.00
[10] applies particle swarm optimization algorithm to optimize
weighted objectives of cost and completion time.
The experimental design and result analysis in this paper focus
on comparing the performance of the genetic ant colony
algorithm across four key aspects: average time cost, average cost, time and reliability, and it can be seen that the
cost, algorithm service quality, and system resource load rate. comprehensive performance of GAACO is better than that of
ACO and SA. They reduced by 14.4% and 76.8%,
B. Experimental parameter
respectively.
Initially, the task size is set to 10, with cloud computing
resources consisting of 10 VMs. Each VM has a storage size
of 10 GB, memory size of 256 MB, one CPU, and a
bandwidth of 1,000 MB. The unit time bandwidth cost and
unit time instruction cost are 0.01 yuan/s each. Tasks are
experimented with in increments of 10.
The quality of service for the algorithm is represented by
multiQoS. The resource load rate is defined as follows:
(9)
The average time cost for each algorithm is calculated as the
number of tasks increases by 10. The results are depicted in
Figure 1. It's observed that the time cost of GAACO is
superior to that of ACO, albeit longer than SA. Moreover, as
the number of tasks increases, the time gap widens, with
GAACO reducing time by 50.9% compared to ACO and
showing a 3% difference compared to SA. Thus, the author's
algorithm outperforms ACO in terms of time cost, albeit with
a slight difference compared to SA.
The cost of each algorithm under the condition of different
number of tasks is shown in Figure 3. The number of tasks is
increased by a multiple of 10. It can be seen that the difference
between algorithms is not large, and the average cost is only Figure 4. Service quality of each algorithm and system load of each
algorithm
about 1%.
The experimental results regarding the algorithm's system load
are presented in Figure 4. It's evident that the system load of
ACO has consistently remained high, whereas GAACO
exhibits a higher load compared to SA but is notably superior
to ACO. Specifically, GAACO achieves a 50.2% reduction in
average system load compared to ACO.
Furthermore, in conjunction with Figures 2 and 3, it's observed
that SA tends towards an evenly distributed task assignment to
virtual machines, resulting in a system load of 0 as per
equation (9). However, effective cloud computing task
scheduling necessitates a balance between cost, time, and
reliability, as reflected in QoS. Therefore, the author's
proposed algorithm better aligns with customer requirements
during the actual scheduling process.
C. Experimental conclusion
After conducting thorough research on cloud computing task
scheduling, the proposed algorithm has been rigorously
compared with both the basic ant colony algorithm and the
simulated annealing algorithm across four critical aspects. The
comprehensive analysis of the results reveals that the proposed
algorithm consistently outperforms the other two algorithms.
Figure 3. Average time cost of each algorithm and Cost of each algorithm Notably, the proposed algorithm demonstrates superior
capabilities in balancing various factors including time cost,
The experimental results of service quality of each algorithm monetary cost, reliability, and system load. This balanced
are shown in Figure 5. It can be seen that the service quality of optimization ensures that the algorithm can effectively meet
the author's algorithm and SA increases slowly with the the multidimensional quality of service (QoS) requirements of
increase of the number of tasks, while ACO presents a linear users.
and sharp rise. Service quality is a comprehensive index of
In essence, the experimental findings underscore the efficacy V. CONCLUSIONS
of the proposed algorithm in addressing the complex
challenges inherent in cloud computing task scheduling. By In conclusion, this study presents a comprehensive approach
surpassing traditional methods, it offers a more holistic and to address the challenges of resource scheduling and
efficient approach to meeting the diverse needs of users while management in cloud computing environments. By leveraging
ensuring optimal resource utilization and performance. Thus, machine learning optimization techniques, particularly deep
the proposed algorithm stands as a promising solution for reinforcement learning, the proposed algorithm demonstrates
enhancing cloud computing task scheduling processes in significant improvements in system performance and
practical applications. efficiency. Through extensive experimentation and analysis, it
IV. REALIZE DYNAMIC ALLOCATION OF RESOURCES is evident that the proposed algorithm outperforms traditional
methods such as ant colony optimization and simulated
Dynamic allocation of resources is a crucial aspect of resource annealing in terms of time cost, cost effectiveness, service
management in cloud computing environments. To achieve quality, and system resource load. This underscores its
this, several key components and processes need to be potential to revolutionize cloud computing resource
considered: management and bring about new breakthroughs in the field.
A. Resource Scheduling Algorithm A. Synergy of Deep Learning and Cloud Computing
The resource scheduling algorithm serves as the foundation for Scheduling
dynamic resource allocation. It evaluates various factors such The integration of deep learning techniques with cloud
as resource demand, type, and availability to select the optimal computing scheduling presents a promising avenue for
scheduling scheme. Common algorithms include load achieving more intelligent and adaptive resource allocation
balancing, static allocation, and dynamic allocation. Each strategies. Deep reinforcement learning, as demonstrated in
algorithm addresses specific requirements and constraints to this study, enables dynamic allocation of resources based on
ensure efficient resource utilization. the current system state, leading to maximized resource
B. Resource Monitoring utilization efficiency and reduced user waiting times. By
considering user needs and system constraints in a holistic
Resource monitoring is essential for detecting changes and
manner, deep learning-based approaches can offer
anomalies in real-time. Different types of resources must be
personalized and optimized solutions to complex scheduling
continuously monitored to track usage patterns and identify
problems in cloud environments.
potential issues. By monitoring resources, organizations can
proactively address fluctuations in demand and optimize B. Advantages of Deep Learning in Cloud Computing
resource allocation strategies accordingly. Scheduling
C. Resource Forecasting Deep learning brings several advantages to cloud computing
scheduling. Firstly, its ability to learn complex patterns and
Forecasting future resource demand and usage trends is critical
relationships from data enables it to adapt to diverse and
for effective resource allocation. Historical data analysis and
evolving cloud environments. Secondly, deep learning models
prediction techniques are used to anticipate future resource
can handle high-dimensional and non-linear data, allowing for
requirements. By accurately forecasting resource needs,
more accurate and nuanced decision-making. Additionally,
organizations can preemptively allocate resources to prevent
deep learning algorithms can continuously improve over time
shortages and optimize utilization, minimizing waste and
through experience, leading to enhanced performance and
maximizing efficiency.
scalability in cloud scheduling tasks. Overall, the integration
D. Resource Management of deep learning with cloud computing scheduling holds great
Effective resource management is essential for facilitating promise for addressing the increasingly complex and dynamic
dynamic resource allocation. Resources must be properly nature of modern cloud environments.
classified, labeled, and organized to enable efficient location C. Future Prospects of AI and Cloud Computing Integration
and scheduling. Through robust resource management
Looking ahead, the convergence of artificial intelligence (AI)
practices, organizations can streamline operations, improve
and cloud computing is expected to drive significant
resource utilization, and optimize performance.
advancements in various domains. AI-powered cloud services
By integrating these components into a cohesive framework,
will offer more intelligent and autonomous capabilities, such
organizations can realize the full potential of dynamic resource
as predictive resource provisioning, anomaly detection, and
allocation in cloud computing environments. This approach
self-optimizing systems. Furthermore, AI-driven insights
ensures that resources are allocated efficiently, adaptively, and
derived from vast amounts of cloud data will enable
in accordance with evolving demands and conditions,
businesses to make more informed decisions and gain
ultimately enhancing productivity and driving business
competitive advantages. As AI technologies continue to
success.
evolve, their integration with cloud computing will usher in a
new era of innovation and transformation across industries,
paving the way for smarter, more efficient, and more resilient [8] Wu C, Li W, Wang L, et al. Hybrid Evolutionary Scheduling for
Energy-efficient Fog-enhanced Internet of Things[J]. IEEE Transactions
digital ecosystems. on Cloud Computing (S2168-7161), 2018, 1(1): 1-1.
REFERENCES [9] Atzori L, Iera A, Morabito G. The Internet of Things: A
Survey[J]. Computer Networks (S1389-1286), 2010, 54(15): 2787-2805.
[10] Bonomi F, Milito R, Natarajan P, et al. Fog Computing: A Platform for
[1] X. Mo and J. Xu, “Energy-efficient federated edge learning with joint Internet of Things and Analytics. N. Bessis, C. Dobre. Big Data and
communication and computation design,” Journal of Communications Internet of Things: A roadmap for smart
and Information Networks, vol. 6, no. 2,pp. 110–124, 2021. environments[M]. Cham: Springer, 2014, 546: 169-186.
[2] Q. Zeng, Y. Du, K. Huang, and K. K. Leung, “Energy-efficient resource [11] Liu, Bo, et al. "Integration and Performance Analysis of Artificial
management for federated edge learning with cpu-gpu heterogeneous Intelligence and Computer Vision Based on Deep Learning
computing,” IEEE Transactions on Wireless Communications, vol. 20, Algorithms." arXiv preprint arXiv:2312.12872 (2023).
no. 12, pp. 7947–7962, 2021. [12] Xiaoxi Zhang, Jianyu Wang, Li-Feng Lee, Tom Yang, Akansha
[3] Yu, L., Liu, B., Lin, Q., Zhao, X., & Che, C. (2024). Semantic Similarity Kalra,Gauri Joshi, Carlee Joe-Wong, “Machine Learning on Volatile
Matching for Patent Documents Using Ensemble BERT-related Model Instances:Convergence, Runtime, and Cost Trade-offs”, IEEE/ACM
and Novel Text Processing Method. arXiv preprint arXiv:2401.06782. Transactions on Networking, 30(1):215—228, 2022
[4] Hussain H, Malik S U R, Hameed A, et al. A Survey on Resource [13] Yichen Ruan, Xiaoxi Zhang, Shu-Che Liang,Carlee Joe-Wong,
Allocation in High Performance Distributed Computing "Towards Flexible Device Participation in Federated Learning for Non-
Systems[J]. Parallel Computing (S0167-8191), 2013, 39(11): 709-736. IID Data", International Conference on Artificial Intelligence and
[5] [9]. Bellendorf J, Mann Z Á. Classification of Optimization Problems in Statistics(AISTATS), 2021
Fog Computing[J]. Future Generation Computer Systems (S0167- [14] Yichen Ruan, Xiaoxi Zhang, Carlee Joe-Wong, “How Valuable Is
739X), 2020, 107(1): 158-176. Your Data? Optimizing Device Recruitment in Federated
[6] Liu, B., Zhao, X., Hu, H., Lin, Q., & Huang, J. (2023). Detection of Learning”, submitted to ToN , prelimianary results are published in
Esophageal Cancer Lesions Based on CBAM Faster R-CNN. Journal of WiOpt 2021
Theory and Practice of Engineering Science, 3(12), 36–42. [15] Liu, Bo, et al. "Integration and Performance Analysis of Artificial
https://doi.org/10.53469/jtpes.2023.03(12).06 Intelligence and Computer Vision Based on Deep Learning
[7] Brogi A, Forti S, Guerrero C, et al. How to Place Your Apps in the Fog: Algorithms." arXiv preprint arXiv:2312.12872 (2023).
State of the Art and Open Challenges[J]. Software: Practice and
Experience (S0167-739X), 2019, 1(1): 1-8.