Optimization Control Strategies and Evaluation Metrics of Cooling Systems in Data Centers: A Review

Chang, Qiankun; Huang, Yuanfeng; Liu, Kaiyan; Xu, Xin; Zhao, Yaohua; Pan, Song

doi:10.3390/su16167222

Open AccessReview

Optimization Control Strategies and Evaluation Metrics of Cooling Systems in Data Centers: A Review

by

Qiankun Chang

¹,

Yuanfeng Huang

²,

Kaiyan Liu

^3,*

,

Xin Xu

¹,

Yaohua Zhao

³ and

Song Pan

³

¹

Dawning Information Industry Co., Ltd., Beijing 100193, China

²

Sugon DataEnergy (Beijing) Co., Ltd., Beijing 100193, China

³

The College of Architecture and Civil Engineering, Beijing University of Technology, Beijing 100124, China

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(16), 7222; https://doi.org/10.3390/su16167222

Submission received: 31 July 2024 / Revised: 17 August 2024 / Accepted: 18 August 2024 / Published: 22 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

In the age of digitalization and big data, cooling systems in data centers are vital for maintaining equipment efficiency and environmental sustainability. Although many studies have focused on the classification and optimization of data center cooling systems, systematic reviews using bibliometric methods are relatively scarce. This review uses bibliometric analysis to explore the classifications, control optimizations, and energy metrics of data center cooling systems, aiming to address research gaps. Using CiteSpace and databases like Scopus, Web of Science, and IEEE, this study maps the field’s historical development and current trends. The findings indicate that, firstly, the classification of cooling systems, optimization strategies, and energy efficiency metrics are the current focal points. Secondly, this review assesses the applicability of air-cooled and liquid-cooled systems in different operational environments, providing practical guidance for selection. Then, for air cooling systems, the review demonstrates that optimizing the design of static pressure chamber baffles has significantly improved airflow uniformity. Finally, the article advocates for expanding the use of artificial intelligence and machine learning to automate data collection and energy efficiency analysis, it also calls for the global standardization of energy efficiency metrics. This study offers new perspectives on the design, operational optimization, and performance evaluation of data center cooling systems.

Keywords:

data center; cooling system; energy efficiency assessment; control strategy optimization; bibliometrics

1. Introduction

With the flourish growth of the Internet, the demand for Internet services and data storage is driving the growth of data centers. It has resulted in a massive increase in energy consumption and carbon emissions. Facing the dual pressures of energy and environmental concerns, China has introduced the dual-carbon target. The implementation of energy efficiency measures in the data center field is one of the keys to achieve the dual-carbon goal. Given that cooling systems are a significant component of data centers, directly impacting their energy consumption, the types of cooling systems, optimization control strategies, and evaluation metrics are the crucial links in the chain from design and operation to the assessment of these systems. First, the categorization of cooling systems provides a framework for understanding the design and functionality of each system. Next, exploring optimization control strategies helps to enhance the energy efficiency. Finally, appropriate evaluation metrics allow us to quantify and compare system performance, providing data support for optimization and adjustments. A detailed assessment of these key aspects not only enables us to grasp trends in technological development but also offers a theoretical and practical basis for selecting or developing the most suitable cooling technologies. This comprehensive analysis aids in optimizing energy efficiency management and promotes the realization of environmental responsibilities, aligning with strategic goals of sustainable development.

Data center cooling is crucial for safeguarding equipment functionality and enhancing energy efficiency, primarily categorized into air and liquid cooling systems. Air cooling methods, including direct air cooling, indirect air cooling, and evaporative cooling [1,2], improve efficiency through optimized airflow management [3,4]. Liquid cooling methods, such as cold plate cooling, immersion cooling, and spray cooling [5,6], directly cool components with liquid, enhancing server efficiency, stability, and reducing energy consumption and noise [7,8,9]. Although extensive research exists on cooling technologies [10], there is a lack of focus on selecting the most suitable cooling system during the design phase and integrating these systems into specific environments [11,12]. This review compares the advantages, disadvantages, cost-effectiveness, and environmental adaptability of air and liquid cooling systems, emphasizing key factors. It aims to assist decision-makers in choosing the most appropriate cooling system based on the external environment of their data centers. In addition, modern data centers are seeing an increase in rack density due to advancements in AI and other high-performance computing tasks. The average rack power density has grown from around 8.5 kW per rack in 2023 to an expected 12 kW per rack in 2024, highlighting the need for cooling systems capable of managing significantly higher heat loads [13]. Cooling systems can account for approximately one-third of a data center’s total energy consumption, with liquid cooling technologies offering notable efficiency improvements. For example, direct-to-chip cooling systems have been shown to reduce facility power needs by up to 18%, contributing to an overall energy cost savings of around 10% [14]. Furthermore, the Coefficient of Performance (COP) of cooling systems varies by technology and configuration, with modern liquid cooling systems achieving higher COPs compared to traditional air cooling. Immersion cooling systems, in particular, have been reported to reduce energy consumption by up to 94% in specific applications, reflecting their superior energy efficiency [15].

The control strategies of cooling systems have a significant impact on the energy consumption and normal operation of data centers. To optimize energy efficiency and ensure system stability, it is essential to classify and understand different control strategies. Although PID control is simple and effective in practical applications, its limitations become apparent. Therefore, exploring and implementing more advanced optimization control strategies is crucial for enhancing system efficiency and adaptability. MPC predicts future states and adjusts controls accordingly to better adapt to changing conditions, while RL optimizes control strategies through trial-and-error learning, improving system efficiency over time. In this context, Zhang et al. [6] proposed optimization strategies based on MPC and RL. They evaluated the effectiveness of these methods in improving system efficiency and reliability, summarizing their applicable conditions. Du et al. [16] analyzed the performance of PID, MPC, and RL in the dynamic thermal environment control of data centers. Recent technological advancements have led to the integration of multiple control strategies in the optimization process, rather than relying solely on a single method. Therefore, this paper aims to analyze and discuss the features of combined control strategies versus single control strategies. Additionally, it outlines the future development directions of each technology, providing better guidance for the future optimization control of data centers.

Given the high energy consumption characteristics, their energy efficiency assessment has attracted extensive attention from data center operators, policy makers, and researchers. Energy efficiency assessment is not only an effective tool to measure the efficiency of energy use but also provides important guidance [17,18]. In the current literature, there are various ways to categorize energy efficiency indicators; for example, they can be categorized according to the granularity of the indicator [19], or according to the type of indicator, such as energy efficiency, eco-design and safety [20]. PUE has become the most mainstream energy efficiency assessment index in the industry, yet it has its limitations. The literature has seen numerous improvements to PUE, including the creation of refined energy efficiency metrics like ApPUE and AoPUE [21], the development of new assessment tools such as PEMC [22], the optimization of prediction models with artificial intelligence algorithms [23], and the introduction of advanced sensing technologies for more accurate parameter measurement [24]. However, there are few review-type papers that provide a comprehensive summary of the work on PUE improvement. Therefore, our work serves to illuminate potential directions for future research. Many studies [9,10,11] have reviewed the classification, control strategy optimization, and evaluation metrics of cooling systems in data centers, but the utilization of bibliometrics in this field remains infrequent. Using bibliometrics quantitatively can provide a more comprehensive overview, which can guide future research by analyzing historical developments and current trends in the field, as has been proven in other fields [25,26,27,28]. Therefore, this review aims to offer an overview of the following topics through bibliometrics using CiteSpace-6.3.1 software. Through the application of metrology software and a comprehensive review of the existing literature, in this study, we aim to address the following key issues related to the optimization control strategy and evaluation indicators of data center cooling systems:

(1): The specific classification of data center cooling systems, with a comprehensive description of liquid cooling and air cooling, including the specific categories included.
(2): Design optimization measures for data center cooling systems, elaborated in detail from the two aspects of air cooling and liquid cooling.
(3): Optimal control strategies for data center cooling systems, such as PID control, model predictive control, and reinforcement learning.
(4): The classification and current status of the energy consumption indicators of data center cooling systems, and a detailed description of the shortcomings of PUE and improvement measures.

Through the above key issues, first of all, the basis of the research is clarified, that is, the research is carried out through the application of metrology software. Then, the key issues to be solved in the research are discussed gradually and in depth. Starting from the classification of cooling systems, readers can have a clear understanding of the basic composition of the system, laying the foundation for the subsequent discussion of optimization measures and control strategies. Then, the design optimization measures are explored in depth, and detailed elaboration is given on different cooling methods to ensure the research is more targeted and operational. Then, the optimization control strategy is discussed, and specific control methods are listed to provide technical support for the efficient operation of the system. Finally, attention is paid to energy consumption indicators, especially a detailed analysis of the shortcomings and improvement measures of PUE, so as to achieve the purpose of reducing energy consumption and improving system performance. Overall, this study provides new insights into the design, operation optimization, and performance evaluation of data center cooling systems, enabling researchers to quickly and deeply understand the research content and significance of data center cooling systems.

2. Data and Methods

2.1. Paper Search

To conduct a quantitative and visual analysis of the research content, it is essential to extract relevant papers from pertinent databases. The process of collecting papers is primarily divided into three phases: identifying the databases, performing keyword searches, and selecting the relevant papers. The databases selected for this study were Scopus, Web of Science, and IEEE. The selection was based on the comprehensive nature of these databases. Scopus offers an extensive collection of papers. Web of Science indexes various core journal articles, providing a platform for high-quality paper searches. IEEE focuses on electronic technology, making it crucial for research related to data centers and electronic technology. Consequently, it was necessary to include the IEEE database in the paper search process.

After determining the databases, we conducted literature retrieval and screening in Scopus, Web of Science, and IEEE. Initially, we searched for the topic “data center cooling systems” in alignment with our research focus. The search keywords used were “data center cooling systems”. The initially retrieved papers were then selected or excluded based on specific criteria.

(1): Papers published between 2004 and 2024.
(2): Research fields limited to engineering, civil engineering, and architecture. Papers within these fields were selected, while those outside these fields were excluded.
(3): Non-English papers were excluded.
(4): Published journal papers were selected, while non-journal papers were excluded.

The reason for conducting literature searches within the period of 2004 to 2024, spanning nearly 20 years, is twofold. First, research that is too distant may have limited relevance to future scientific inquiries. Second, the past two decades represent a period of vigorous development in the global data center industry. During this period, data centers have evolved from computing centers, information centers, and cloud centers to power centers, gradually integrating new technologies and advancing towards greener and smarter directions. This period has been marked by continuous upgrading and development for data centers [29].

After the initial screening, the selected papers were further screened and excluded based on the following criteria.

(1): To ensure the selected papers aligned more closely with our research topic, filters in the Scopus and IEEE databases were applied with the following keywords: “data center”, “cooling systems”, “energy-saving”, “energy consumption”, “optimization”, and “energy efficiency”. The same keywords were also used to filter results in Web of Science.
(2): Based on the titles and abstracts of the papers, we checked whether one or more of the above keywords were mentioned. Papers that mentioned these keywords were selected, while those that did not mention any of the keywords were excluded. Additionally, papers that included the keywords in the titles and abstracts but were unrelated in direction were also excluded.
(3): After re-screening the papers based on the above criteria, we performed a full-text review of the remaining papers. We selected the final papers based on the following criteria: whether the paper focused on data center cooling systems and whether it involved technologies, theories, or practical cases in this field. Papers meeting these criteria were retained, determining the final number of selected papers in each database. A summary of the paper search and screening process in a flowchart is shown in Figure 1.

2.2. Database Analysis

Figure 2 illustrates the total number of studies selected from the Web of Science, Scopus, and IEEE databases. We observed that the total number of studies selected in Scopus was the highest, with 291 articles, accounting for 62%. The total number of studies selected in Web of Science was 121 articles, accounting for 26%. The total number of studies selected in IEEE was 55 articles, accounting for 12%.

IEEE excels in electrical, electronics, and computer engineering research and is highly regarded for its comprehensive collection of published papers. Data centers have gained significant attention, correlating strongly with research areas such as electronics and computer science. Utilizing the keyword “data center” in the IEEE database yielded 69,086 journal articles from the period of 2004 to 2024. This indicates that data centers have become a significant focus in IEEE journal publications. However, when searching for “data center cooling systems” in the IEEE database from 2004 to 2024, only 415 journal articles were found. This suggests that specialized research on data center cooling systems within IEEE is relatively scarce.

The cooling system is also a vital component of data centers, consuming nearly 50% of the power consumption in data centers [30].

For instance, in the aspects of system control and energy efficiency evaluation, Bruno et al. [30] analyzed data centers from the perspective of networked physical systems. They considered both network and physical factors, introducing a control model that couples a computer network representing network dynamics with a thermal network representing physical dynamics. This model was used to evaluate the energy efficiency and computational performance of different data center control strategies. Additionally, they introduced the cyber–physical index (CPI) to measure the distribution of network and physical effects in data centers. The study by Bruno et al. [30] simultaneously considers physical effects alongside computational networks in data centers.

In the direction of optimal control, Cheong et al. [31] proposed a comprehensive and multi-pronged approach to data center environmental control to prevent the formation of hotspots and cold spots around server cabinets. This approach aims to prevent overheating, premature aging, performance degradation, or condensation damage to server cabinets. The method includes Computational Fluid Dynamics (CFD) simulation-assisted predictive design and a complementary Internet of Things (IoT) responsive management system.

In achieving the goal of energy conservation, Berezovskaya et al. [32] proposed a modeling toolkit that can construct data center models of any scale and configuration. The toolkit consists of a set of building blocks that can model various components of a data center. Using this toolkit, data centers can be modeled to estimate the energy consumption, temperature variation, and air temperature of all computing nodes and to evaluate the performance under different energy-saving strategies. The simulation data from the model, when compared with the actual data from corresponding real data centers, show similar trends.

Furthermore, in the direction of reducing energy consumption and saving energy, Ran et al. [33] proposed a new event-driven control and optimization algorithm within the framework of deep reinforcement learning (DRL).

Ahmed et al. [34] published a review paper in the IEEE database. They systematically reviewed and classified energy consumption models for major load segments of data center components such as IT, Internal Power Conditioning Systems (IPCS), and cooling loads. This review revealed the strengths and weaknesses of various models under different applications. The most innovative contribution of this paper is its systematic review of the current research status on reliability indicators, reliability models, and reliability methods for the first time. However, this paper does not focus on data center cooling systems themselves. It only provides a simple overview of existing energy consumption models for the cooling part and does not review the classification, performance optimization strategies, or evaluation indicators of data center cooling systems themselves. Indeed, a review specifically targeting cooling systems could significantly enrich the research content in the data center domain of the IEEE database.

IEEE Access is a multidisciplinary, open-access journal focused on electronic research. Its research and publication scope is broad, encompassing many areas that intersect with and relate to data centers. Additionally, IEEE Access publishes papers related to the data center domain. Therefore, publishing articles related to data center cooling systems in the IEEE Access journal is both relevant and desirable for IEEE Access.

2.3. Analysis of Annual Publication Trends

We observed the publication years of the selected papers and listed the number of papers selected from each database annually from 2004 to 2024, spanning nearly 20 years, as shown in Figure 3. In the field, the Scopus database has consistently published papers from 2004 to 2024. After 2015, there has been a rapid growth in the number of publications in this domain. In 2023, the highest number of papers was published, reaching 50, accounting for 17% of the selected papers in Scopus. The Web of Science database saw relevant papers emerging around 2008. In 2023, the highest number of papers was published, with 18 articles, accounting for 15% of the selected papers in Web of Science. Papers in the IEEE database have relatively low annual publication numbers, with relevant papers appearing only after 2011. The highest number of papers was published in 2021, with 9 articles, accounting for 16% of the selected papers in IEEE.

The analysis of annual publication trends and the obtained results are indeed consistent with the development trajectory of data centers. In the early 1990s, particularly in the USA, the number of data centers was minimal, primarily used for government and research applications, with limited commercial use. During the period from 2001 to 2006, the rapid growth of the Internet resulted in a substantial rise in the number of websites. It was also during this time that data centers gained widespread recognition. From 2006 to 2012, data centers globally entered the stage of information centers, characterized by a slowdown in the growth rate of the data center market. This phase also witnessed the introduction of cloud computing technology into data centers. From 2012 to 2019, the data center industry transitioned into the era of cloud-centric data centers. During this period, countries worldwide imposed energy-saving requirements on data centers. For example, in 2016, the United States Office of Management and Budget (OMB) announced the “Data Center Optimization Initiative (DCOI)”, requiring U.S. government agencies to monitor and measure metrics such as data center energy consumption, PUE targets, virtualization, server utilization rates, and equipment utilization rates. Since 2019, due to the accelerated development of various new digital technologies, the growth rate of data centers has experienced a counter-cyclical increase. During this phase, data centers began to evolve towards green and intelligent directions [29]. The global demand for energy efficiency in data centers has increased since 2012, and data centers have transitioned towards green and intelligent directions since 2019. The number of papers on data center cooling systems regarding classification, optimization strategies, and energy efficiency evaluation has been increasing in various paper databases.

From the publication years of the papers, it can be observed that research on data center cooling systems, as well as related areas such as natural cooling, optimization methods, and energy consumption metrics evaluation, has received increasing attention in recent years. Studying and summarizing these areas will play a crucial guiding role in the design and optimization of future data center cooling systems.

2.4. Keyword Analysis

“CiteSpace-6.3.1” is a free Java application used for visualizing and analyzing trends and patterns in the scientific literature [35,36]. CiteSpace-6.3.1 is a visual bibliometric software designed for creating scientific knowledge maps. It can generate knowledge maps of relevant fields, offering a comprehensive view of a specific knowledge domain. Through diversified and dynamic network analysis, CiteSpace-6.3.1 identifies critical literature, hot research topics, and emerging trends within a scientific field. It can perform co-occurrence, clustering, and burst analysis of keywords, with keyword co-occurrence and burst analysis being valuable indicators for assessing the future development trends of a research area. Additionally, it provides visual analyses of countries, institutions, and more. These features give CiteSpace-6.3.1 unique advantages compared to other visual bibliometric software. Apart from CiteSpace-6.3.1, VOSviewer is another commonly used visual bibliometric software. Although VOSviewer offers a simpler interface and charts, it lacks some of the unique features and more comprehensive, diversified visual analyses that CiteSpace-6.3.1 provides [37]. Using CiteSpace-6.3.1 to conduct keyword co-occurrence analysis on the selected papers from the Scopus and Web of Science databases, we generated keyword co-occurrence graphs as shown in Figure 4 and Figure 5.

Scopus adopts the time interval from 2005 to 2024 for the keyword co-occurrence graph, while Web of Science adopts the time interval from 2008 to 2024 for the keyword co-occurrence graph. The high-frequency keywords extracted from the selected papers in Scopus and Web of Science using CiteSpace-6.3.1 are presented in Table 1 and Table 2, respectively.

Based on the high-frequency keywords and keyword co-occurrence graphs from both databases, “data center” and “cooling system” emerge as the most frequent keywords. This indicates that the focus of our research is indeed the cooling systems of data centers. In the keyword co-occurrence graphs, we can also observe numerous co-occurrence relationships between keywords related to “data center” and “cooling” with other keywords. This demonstrates that these two keywords are indeed core keywords. Additionally, “energy efficiency” has a high frequency of occurrence in both databases, indicating that the energy efficiency of data center cooling systems is also a key focus of research. In the Scopus database, “energy utilization” and “energy conservation” are also high-frequency keywords, indicating an increasing focus on the importance of optimization strategies for data center cooling systems. Optimizing data center cooling systems can significantly increase their energy-saving potential. The comparison between Figure 4a–c shows that after 2011, there is a greater variety of keywords appearing in the keyword co-occurrence graph. This indicates that research on performance, energy efficiency, and optimization has become increasingly in-depth since 2011. In the Web of Science database, “performance” and “free cooling” are both high-frequency keywords. This indicates that natural cooling is widely utilized in data centers, and the performance of various cooling systems is a key focus in research on data center cooling. The thickness of the warm-colored rings around the high-frequency keywords in the co-occurrence graph is also significant, further confirming that these high-frequency keywords are indeed the current hotspots of attention in data center cooling systems.

Using CiteSpace-6.3.1, we generated a temporal clustering graph of keywords from the database with the most selected articles, Scopus, for the years 2020 to 2024, as shown in Figure 6. Among them, “energy efficiency” and “indirect evaporative cooling” are two clusters within the keyword clustering. In the direction of energy efficiency, relevant keywords continue to appear from 2020 to 2024. Data center cooling systems urgently need energy-saving measures, and adopting optimization strategies can alleviate the energy consumption of the cooling systems. Indirect evaporative cooling is one of the technologies for cooling systems and falls under the broader classification of cooling systems. Additionally, the COP is also one of the clusters. However, there has been little attention paid to COP in recent years. COP is a performance indicator used to evaluate the refrigeration capacity of data center cooling systems. Discussing the advantages and disadvantages of various energy efficiency evaluation indicators and proposing improvements to these indicators is indeed a valuable research direction.

2.5. Publication Journal Analysis

Figure 7 illustrates the annual publication volumes of selected papers from Scopus, Web of Science, and IEEE, categorized by the journals with the highest publication counts. Figure 7a is a bubble chart of the journal publication years from the Scopus database, Figure 7b is a bubble chart of the journal publication years from the Web of Science database, and Figure 7c is a bubble chart of the journal publication years from the IEEE database. In the bubble chart, the size of the bubble represents the volume of publications in that year. The larger the bubble, the greater the number of publications. Each bubble chart also includes a legend, where the specific number of publications represented by bubbles of different sizes can be seen.

In the Scopus database, the journals with the highest publication volumes are Applied Thermal Engineering, Energy and Buildings, Energy, and Applied Energy. Applied Thermal Engineering had the highest publication volume in 2023, with ten related papers. Energy and Buildings had the most publications in 2014, with six related papers. Energy and Applied Energy had the highest publication volumes in 2023 and 2022, with eight and seven related papers, respectively, indicating a significant number of publications in these journals in recent years. Overall, journals with high publication volumes have consistently published related papers since 2014, indicating that research in this area has been continuously active. This underscores the ongoing necessity to study the performance, optimization, energy-saving measures, and energy consumption evaluation of data center cooling systems.

In the Web of Science database, Energy and Buildings and Applied Thermal Engineering have significant publication volumes. Applied Thermal Engineering had the highest number of publications in this direction in 2023, with 6 papers, accounting for 23% of the total selected papers from this journal Energy and Buildings had the highest number of publications in 2014, 2020, and 2023, with six papers each.

Among the selected papers, IEEE has only three journals with higher publication volumes, namely IEEE Transactions on Components, Packaging and Manufacturing Technology, IEEE Access, and IEEE Transactions on Sustainable Computing. IEEE Transactions on Components, Packaging and Manufacturing Technology published the most papers in 2017 with three papers. IEEE Access published the most papers in 2019 and 2021 with three papers each. IEEE Transactions on Sustainable Computing published the most papers in 2017 with three papers. In recent years, the IEEE database has not had a high volume of publications in this direction. Therefore, publishing relevant research papers in IEEE journals can effectively fill this gap in the field.

2.6. Analysis of Publication Countries

Here we present the country-wise publications from the Scopus and Web of Science databases, along with the country co-occurrence maps generated using CiteSpace-6.3.1. These are represented in Figure 8 and Figure 9, respectively. Figure 8a,b depict the scatter plots of high-publishing countries from the two databases, while Figure 9a,b represent the country co-occurrence maps for each database. In both the Scopus and Web of Science databases, the two countries with the highest number of publications are China and the United States, with China’s publication count far exceeding that of the United States. According to the country co-occurrence map generated by CiteSpace-6.3.1, the outermost circle of China’s color ring represents earlier years, but the remaining warm-colored rings are also quite thick, indicating that this field remains a hot research topic even in recent years. The results indicate that China has been the most prolific in publishing in this field, consistently showing a sustained interest in scientific research in this area.

To explore whether there is a relationship between countries with high publication volumes and their top publishing institutions, a co-occurrence map of publishing institutions, as shown in Figure 10, was created. In the Scopus co-occurrence map, the countries where the institutions are located are not clearly identifiable. However, in the Web of Science co-occurrence map, it is evident that institutions such as Hunan University, Northeastern University, and the Chinese Academy of Sciences are leading publishing institutions in this research field. This helps to explain why both the Scopus and Web of Science databases show that China has the highest publication volume.

2.7. Discussion on the Citation of Papers

Due to a large number of selected papers in the Scopus database, especially during the period from 2020 to 2024, a citation network map for the Scopus database from 2020 to 2024 was generated using CiteSpace-6.3.1, as shown in Figure 11. According to the citation network map, the papers are clustered into eight main directions: (1) Immersion Cooling, (2) Thermal Performance, (3) Thermal Environment, (4) Deflector, (5) Indirect Evaporative Cooling, (6) Water-Cooled Heat Exchanger, (7) Optimization, (8) Simulation.

Among these right main directions, immersion cooling and indirect evaporative cooling can be considered as classifications of data center cooling systems. The features of different cooling systems vary, and their applicability differs as well. Therefore, it is necessary to summarize and discuss the positives aspects of each system. The optimization direction also continues to be researched. In response to the energy-saving trend, optimizing various categories of cooling systems is undoubtedly of paramount importance. Simulation is also a method for optimizing strategies for data center cooling systems. Relevant research should focus on improving the energy-saving capabilities of data center cooling systems through simulation and discussion.

2.8. Discussion of Research Directions in Papers

The research direction of our study is data center cooling systems. Each approach has its own advantages, and it is important for us to understand the current classification of data center cooling systems. During the design phase, it is crucial to select the appropriate cooling system based on different requirements and conditions. In the aforementioned Section 2.4 on keyword analysis, it can be observed that energy efficiency, optimization, and energy efficiency are all high-frequency keywords commonly appearing in the literature. Therefore, it can be inferred that these high-frequency keywords represent the most concerning issues for researchers.

We are now conducting keyword burst analysis for the Scopus and Web of Science databases using the “burstness” feature in CiteSpace-6.3.1. Since the Scopus database has the highest number of studies in recent years, we have selected the time interval from 2020 to 2024 to conduct keyword burst analysis for the Scopus database. The keyword burstness graph is shown in the following Figure 12. Figure 12a,b, respectively, show the keyword burstness graphs for Scopus from 2020 to 2024 and Web of Science. The keyword burstness graphs display the top ten keywords with the highest burstness. The red color in the keyword burstness graph indicate the time periods during which the keyword had a high frequency of occurrence, while the blue color represent the periods of lower frequency. The lighter the blue, the lower the frequency of the keyword during that time period. In the keyword burstness results from the Scopus database, it is evident that “data center” remains a highly discussed topic. At the same time, “airflow management” experienced burstness during the period of 2020–2021. This term is related to optimization strategies for air-cooled cooling systems. Hence, it suggests that there has been an increasing focus on optimizing strategies for different types of cooling systems in data centers in recent years. The keyword “cooling perform” experienced significant burstness during 2020–2022, indicating a growing emphasis on improving the performance of cooling systems, including efficiency and energy-saving aspects. The keyword “optimization” experienced burstness during 2021–2022 in the Web of Science database, indicating a recent surge in research focusing on optimizing systems to improve efficiency and save energy.

Based on the analysis of keyword co-occurrence and burstness, and considering the complete process of data center cooling systems from design, operation to evaluation, we will delve into four research directions in the following sections: the selection of current data center cooling systems at the design stage, optimization strategies for different types of data centers, optimization control strategies that can achieve energy savings, and evaluation metrics for energy efficiency. The following is a summary of the four major directions of research on data center cooling systems and their significance, which will be elaborated on in the subsequent sections in Table 3.

According to the keyword burstness graph, it is evident that waste heat recovery, thermal environment, and thermal management have emerged as keywords during the period from 2021 to 2024. This indicates a potential future research trend in data center cooling systems. In future research, there may be a focus on studying the thermal environment and the better utilization of waste heat, among other directions.

3. Classification of Data Center Cooling Systems

As a core component of IT infrastructure, data centers are rapidly expanding in both number and scale [7]. Data centers are known for their high energy density and continuous operation, running up to 8760 h a year, which results in an exceptionally high energy consumption [38]. Globally, data centers account for 1.3% of total electricity consumption [39]. This underscores the urgent need for high-efficiency solutions driven by the rapid growth of the data center market.

To ensure the stable operation of IT equipment, data centers rely on air-conditioning systems to provide continuous cooling throughout the year, preventing room temperatures from exceeding the maximum allowable limits for the equipment.

As shown in Figure 13, the energy consumption of air-conditioning systems is the largest, apart from the power consumption of the IT equipment itself, contributing about 40% of the total energy consumption [40]. Therefore, optimizing the energy efficiency of air-conditioning systems has become a crucial measure.

Recently, numerous researchers have conducted in-depth studies on data center cooling technologies. Zhang et al. [40] reviewed the development of natural cooling technologies in data centers from the aspects of configuration characteristics and performance, providing a detailed analysis of the performance characteristics of air-side natural cooling, water-side natural cooling, and thermosyphon cooling, and they also summarized the performance standards for evaluating the effectiveness of natural cooling in data centers.

Ebrahimi et al. [39] focused on cooling technologies and their operating conditions in data centers, exploring the possibility of utilizing waste heat from data centers. Additionally, they assessed the feasibility and effectiveness of implementing low-grade waste heat recovery technologies in combination with energy-saving cooling technologies.

Nadjahi et al. [7] provided an overview of potential cooling technologies for data centers from an energy-saving perspective in Table 4.

Based on these studies, data center cooling technologies can be clearly classified into two main categories: liquid cooling and air cooling. Liquid cooling technologies use liquids to directly or indirectly absorb and dissipate heat, providing high-efficiency thermal management solutions, especially suitable for high-heat-density environments. In contrast, air cooling technologies achieve cooling through air circulation, suitable for scenarios with a lower heat density and requiring a lower initial investment. The advantages and applicable conditions of these two technologies are important factors to consider when designing.

3.1. Liquid Cooling Technology

Liquid cooling technology utilizes the high heat capacity and thermal conductivity of liquids to dissipate the heat, thereby maintaining the equipment within a safe operating temperature range [1,2], as shown in Figure 14. Liquid cooling technology operates by circulating a coolant through a closed-loop system to efficiently manage the heat generated by data center equipment. The process begins with the cooling water system, where cooling towers dissipate heat to the external environment. The cooled water is then pumped into the CDU, which acts as a central hub, distributing the coolant to various cooling systems connected directly to the equipment. Within the information equipment room, the coolant flows through specialized cooling systems attached to the server cabinets. These systems absorb the heat produced by the servers, maintaining optimal operating temperatures. The heated coolant is then returned to the CDU, where it is recirculated back to the cooling towers, completing the cooling cycle.

Indirect Liquid Cooling: The heat source does not come into direct contact with the coolant. Instead, heat is transferred through cooling devices such as cold plates, with the coolant flowing through enclosed pipes to absorb the heat [45,46]. This method offers a high cooling efficiency and precise temperature control, making it suitable for high heat load scenarios. However, its drawbacks include high costs, system complexity, maintenance challenges, and the risk of leakage. Direct liquid cooling involves direct contact between the coolant and the heat-generating components and includes immersion and spray cooling systems. Immersion cooling submerges the equipment entirely in a non-conductive liquid and is further divided into single-phase and two-phase systems. Single-phase systems maintain the coolant in a liquid state, while two-phase systems utilize phase change from liquid to gas to release heat efficiently [47]. Immersion cooling boasts a high cooling efficiency, near-silent operation, and space savings, but it comes with a high initial investment and maintenance costs, as well as complex hardware replacement procedures [48].

Performance and Recommendations: Research by Mohamad Hnayno et al. [49] indicates that single-phase immersion cooling can reduce server energy consumption by at least 20% compared to air cooling systems, and by 7% compared to other liquid cooling systems. Greenberg et al. [50] have highlighted that adopting liquid cooling systems can significantly reduce the total cooling energy demand. Implementing liquid cooling at the server or rack level, combined with ambient free-air cooling, can reduce or even eliminate the reliance on CRAC (computer room air-conditioning) units and chillers, resulting in substantial energy savings.

3.1.1. Cold Plate Liquid Cooling

Chip-level liquid cooling technology is an indirect method that achieves heat dissipation by installing cold plates on high-heat-output components of servers, such as CPUs and GPUs. A notable feature of direct-to-chip cooling is its ability to use warm water as the coolant, making it environmentally friendly [51]. This technology can provide waste heat in the form of water at temperatures around 45 °C or even higher [52].

Commercial liquid cooling products primarily offer waste heat recovery channels on the primary side. However, it is noteworthy that waste heat can be recovered from both the primary and secondary sides of the CDU [53].

To address this research gap, LU et al. [54] conducted an innovative study using a liquid-cooled rack in an office building as a “data furnace” to supply heat to the return pipeline on the secondary side of the space heating network. This approach eliminates the need for heat pumps, thereby reducing district heating demand and investment costs [54]. In conclusion, chip-level liquid cooling technology offers a promising solution for efficient heat dissipation and waste heat recovery in data centers. The ability to use warm water as a coolant and recover waste heat at useful temperatures presents significant environmental and economic benefits. Additionally, exploring innovative applications, such as using liquid-cooled racks as “data furnaces”, can further enhance the sustainability and cost-effectiveness of heating solutions in buildings.

3.1.2. Immersion Liquid Cooling

Immersion liquid cooling systems achieve effective heat exchange by directly immersing heat-generating electronic equipment in coolant, with the coolant circulating to remove heat. During use, the IT equipment is fully immersed in non-conductive secondary-side coolants, including mineral oil, silicone oil, or fluorinated liquids. Immersion liquid cooling technology is further divided into single-phase and two-phase immersion cooling based on whether a phase change occurs in the coolant during heat exchange.

(1): Single-Phase Immersion Cooling

In single-phase liquid cooling systems, the coolant undergoes a temperature change during heat transfer without a phase change. The cooling distribution unit (CDU) circulates low-temperature coolant through the IT equipment, absorbing heat, and then returns the heated coolant to the CDU. Inside the CDU, heat is transferred to the primary-side coolant and then released into the atmosphere to complete the cooling cycle [55]. The performance of single-phase liquid cooling significantly depends on various design parameters of the cold plate, such as porous media, microchannel heat sinks, and the heat sink pressure [47]. These parameters enhance the efficiency of heat transfer, making liquid cooling systems more effective than traditional forced air cooling solutions. For instance, Parida et al. [56] conducted a comparative study using traditional forced air cooling and liquid cold plate solutions to explore the cooling capacity of the server racks. Additionally, Eiland et al. [57] explored the heat transfer performance of servers immersed in mineral oil, a type of single-phase immersion cooling system. Their findings indicated that the immersion cooling system reduced the thermal resistance by 34.4% compared to traditional air-cooled solutions, achieving a power usage effectiveness (PUE) as low as 1.0.

These studies illustrate the superior efficiency of liquid cooling systems, emphasizing how various design parameters contribute to their enhanced performance compared to air-cooled systems. By optimizing factors like porous media, microchannel heat sinks, and the heat sink pressure, single-phase liquid cooling not only improves thermal management but also significantly reduces energy consumption. Despite the advantages of liquid cooling, several challenges remain. The high initial investment and maintenance costs, the complexity of system integration, and the potential risk of leaks are significant barriers to widespread adoption. Future research should focus on the development of more cost-effective materials and designs for cold plates and heat exchangers, as well as the optimization of coolant formulations to improve the thermal performance and reduce the environmental impact [45,46,47,48,49,55,56,57].

(2): Two-Phase Immersion Cooling

Two-phase cooling involves the phase change in circulating coolant during the heat transfer process, where the liquid absorbs latent heat as it evaporates. This method reduces the flow rate of liquid cooling systems and produces a more uniform temperature distribution [6]. The key to improving efficiency lies in designing efficient porous media and microchannel heat sinks. However, two-phase cooling methods still face the challenge of flow instability, which may cause surface overheating and potential damage [58].

To compare the performance of single-phase immersion cooling (SPIC) systems and two-phase immersion liquid cooling systems, Kanbur et al. [59] conducted experimental studies on data center services, they found that the Coefficient of Performance (COP) of two-phase immersion cooling systems was 72.2–79.3% higher than that of SPIC systems. Despite the superior thermal characteristics of two-phase immersion cooling systems, they are relatively inferior in terms of the investment cost, safety, and maintainability [60]. Therefore, SPIC systems are more suitable for large-scale and commercial applications than two-phase immersion liquid cooling systems. In summary, both single-phase and two-phase liquid cooling systems offer significant advantages over traditional air cooling methods. Single-phase systems benefit from stable operation and a lower energy consumption, while two-phase systems provide a higher thermal efficiency but come with challenges in cost and stability. Future research should focus on overcoming the flow instability in two-phase systems and reducing the associated costs to make them more viable for large-scale applications. Additionally, advancements in cold plate design and immersion cooling technologies will further enhance the cooling efficiency and energy savings in data centers, contributing to more sustainable and high-performance computing environments.

3.1.3. Spray Liquid Cooling

Unlike the previous two systems, spray liquid cooling technology directly sprays coolant onto electronic equipment through specially designed nozzles to achieve efficient heat exchange [61]. During spraying, the coolant directly contacts the surface of electronic equipment or connected heat-conducting materials. The heated coolant is then recovered through the system’s return pipeline and sent back to the CDU for recooling [62]. This system typically includes a cooling tower, CDU, liquid cooling pipeline, and spray liquid cooling cabinet, which integrates key components such as the pipeline system, liquid distribution system, spray module, and liquid return system, ensuring efficient and precise cooling throughout the process. Spray cooling has promising potential in various fields including aerospace, biomedicine, and battery safety. Continuous advancements are expected to enhance its efficiency and applicability, overcoming existing technical challenges and expanding its usage in more advanced and compact electronic devices.

3.2. Air Cooling Technology

Air cooling methods use fans to cool the refrigerant in the condenser and directly release the heat into the air, as shown in Figure 15. Compared to water-cooled chiller systems, this method does not require the installation of cooling towers, cooling water pumps, and piping equipment, thus ensuring normal cooling operation in water-scarce environments. Air-cooled chiller systems are simple, reliable, and easy to maintain, making them widely used in medium and large data centers.

3.2.1. Direct Air Cooling

Direct air cooling is straightforward and cost-effective, particularly suited for environments where the ambient air quality and temperature are within acceptable limits for IT equipment operation [38]. However, this method has inherent limitations. Its efficiency significantly depends on ambient air conditions, making it less effective in hot or polluted environments. Furthermore, the reliance on high-speed fans can introduce significant noise levels, which can be problematic in densely populated data centers. The system’s cooling capacity is also limited by the heat dissipation capabilities of air compared to liquid cooling solutions, making it less suitable for extremely high-density configurations or high-performance computing (HPC) applications where heat loads are substantial [63,64]. Future research should aim at developing hybrid cooling systems that combine the simplicity and cost-effectiveness of direct air cooling with other cooling technologies to enhance overall performance. For example, integrating direct air cooling with liquid cooling systems could provide a more robust solution for managing diverse heat loads across different data center environments.

3.2.2. Indirect Air Cooling

Indirect air cooling technology involves transferring heat from one medium to another through a heat exchanger, typically transferring heat from the hot equipment to water or coolant, which then dissipates heat through the air [65]. Indirect air cooling systems are widely used in data centers, where heat exchangers transfer heat from servers to the liquid in cooling pipes. These systems can integrate with building HVAC systems to effectively manage the thermal environment of large-scale data centers [5]. In industrial applications, indirect air cooling technology is often used in environments with strict cooling requirements, such as chemical plants or pharmaceutical facilities, maintaining stable operating temperatures to prevent overheating and associated safety risks. The complexity of heat exchange mechanisms and integration with HVAC systems can lead to high operational costs. Regular maintenance is required to prevent leaks and ensure efficient operation. Precise control of coolant temperatures is crucial to maintain efficiency and prevent overheating.

3.2.3. Evaporative Cooling

Evaporative cooling uses the principle of heat absorption during water evaporation to cool air. It achieves more efficient thermal management by humidifying and lowering the air temperature. In green buildings, evaporative cooling systems are designed to utilize natural evaporation and air cooling synergy, reducing traditional air-conditioning energy consumption and achieving environmentally friendly temperature control [39]. By using this technology in hot environments, the overall system energy efficiency and output performance can be significantly enhanced. The efficiency of evaporative cooling is highly dependent on ambient air humidity levels; in very humid environments, its effectiveness diminishes significantly. Additionally, the system requires a continuous supply of water, which can be a limitation in areas with water scarcity. Regular maintenance is necessary to prevent mold and bacteria growth in the system, which can affect air quality and system efficiency. Managing the balance between humidity and temperature control can also be complex, requiring advanced control systems to optimize performance without compromising indoor air quality [5,39].

3.3. Key Factors in Data Center Cooling Systems

To maintain uniform airflow distribution and avoiding the mixing of hot and cold air, researchers influence the organization of airflow by adjusting the height of raised floors, the openness of perforated tiles [66], and the deployment of obstacles in the plenum ventilation system [67]. The optimal values for these adjustments are obtained through simulations and experiments [68]. In such systems, energy consumption is reduced by changing structural parameters, particularly the porosity of perforated tiles [69]. Natural cooling systems, as part of air cooling systems, also focus on climate conditions, cooling system structure design [70], natural cooling switch points, and flow rates for thermal management and energy savings [71]. Ham et al. [72] compared the cooling performance of nine air-side heat exchangers in data centers. They found that using energy-efficient equipment could save 47.5% to 62% of the total cooling energy compared to traditional data center cooling systems. Among these nine cooling systems, the indirect air-side economizer with high-efficiency heat exchangers achieved 63.6% energy savings, while the indirect air-side economizer with low-efficiency heat exchangers had the lowest energy savings. This study shows that the choice of heat exchanger is crucial for energy savings. Previous studies have shown that plate heat exchangers are more suitable for indirect air cooling and have focused on using algorithms to optimize various factors to improve the structural performance, save energy, and achieve uniform heat dissipation. In terms of thermal management and energy savings for liquid cooling systems, factors such as the thermal load, structural parameters [73,74], coolant flow rate [75,76], and coolant type [6,43] are involved. Researchers have analyzed the relationship between these factors and cooling performance and energy consumption and have optimized the structure of heat exchange facilities. Through these measures, liquid cooling systems can achieve more effective thermal management and energy savings, providing reliable assurance for data center operations.

4. Design Optimization Measure

4.1. Air Cooling System Design Optimization

4.1.1. Common Air Supply and Return Modes

Selecting appropriate methods for air supply and return is crucial for ensuring uniform airflow distribution within a data center. Cho et al. [77] identified three primary methods for delivering cold/hot air between the computer room air-conditioner (CRAC) and the servers: flooded supply/return, locally ducted supply/return, and fully ducted supply/return. Given that fully ducted supply/return air is uncommon in practical applications [77], this paper focuses solely on the first two methods, as detailed in Table 5. Flooded supply/return is typically employed in immersion air supply and return systems, where air is supplied and returned directly within the room without using ducts between the CRAC and the servers. This method often exhibits a poor overall performance [78], characterized by the mixing of hot and cold air [78], unstable temperature, and airflow distribution across all sections, and it is only suitable for small server rooms. Local ducted air supply systems can be further categorized into underfloor air supply and overhead air supply based on the characteristics of the floor. Raised floors offer superior airflow management and thermal performance compared to hard floors, thus typically adopting an underfloor air supply. While an underfloor air supply generally performs well, it may result in hot spots near the top of the rack inlet. Additionally, the presence of ductwork can increase air resistance in the space beneath the raised floor, preventing cold air from reaching the desired locations [79]. Hard floors are structurally simpler than raised floors, and their corresponding overhead air supply method can address partially damaged or high-heat-generating server racks and support server racks of varying heights. However, despite the superior cooling performance and robustness of an overhead air supply compared to an underfloor air supply [78,79,80], there may still be issues with airflow not adequately reaching the lower part of the server racks [77].

The evaluation method for the air supply effect of different air supply methods has evolved from non-quantitative to quantitative. Early scholars typically assessed the air supply performance of various methods using subjective evaluations based on numerical simulation results. For instance, Cho et al. [77] analyzed the temperature and airflow distribution of different air distribution systems using CFD to verify the air supply performance. However, relying solely on the subjective evaluations of researchers is insufficient for accurately assessing the effectiveness of different air supply methods. Table 6 presents some evaluation indicators for comparing the performance of different air supply modes. Suwa et al. [79], in their evaluation of air-conditioning systems with different supply and return air layouts, introduced quantitative analysis of ventilation performance using the age of air, in addition to temperature distribution analysis. Tsuchiya et al. [80] employed quantitative evaluation metrics, such as the dimensionless rack temperature (m_0m) and rack cooling index (RCI), to assess the air supply performance of ceiling-supplied and floor-supplied air systems. Similarly, Srinarayana et al. [78] analyzed the thermal performance of different air-conditioning systems in raised-floor and non-raised-floor data centers using indexes like the supply heat index (SHI), rack cooling index (RCI), and total irreversible loss (I_rr). Li et al. [81] used four quantitative metrics—supply air temperature (SAT), temperature inequality index (K_T), supply heat index (SHI), and return heat index (RHI)—to compare the thermal performance of four commonly used airflow management techniques in data centers. In summary, although the evaluation indexes for air supply effects in data centers have become increasingly quantified, the metrics used by different research institutes vary. There is a lack of a standardized evaluation index system to facilitate comparisons of air supply effects across data centers of different regions and sizes.

4.1.2. Optimization of Air Supply Modes

Considering the increasing energy consumption in data centers, numerous techniques and methods have been proposed in previous studies to enhance their thermal performance [82,83]. For underfloor air supply systems, the most effective methods to improve cooling performance include enhancing cold aisle airflow uniformity and preventing cold air bypass and hot air recirculation [83]. Addressing these issues, Isazadeh et al. [84] summarized the contributions of various airflow management strategies, such as cold and hot aisle containment, aisle sealing, and exhaust stacks, in improving data center cooling performance. Additionally, Chu and Wang [83] reviewed research on factors such as the static pressure plenum height, perforated tile geometry, enhanced facility applications (e.g., fans or movable floor tiles), infrastructure layout, and the impact of airflow leakage, alongside aisle sealing strategies. Lu et al. [82] provided a review of the ventilation equipment configuration and the geometric characteristics of the underground static pressure plenum (including plenum height and perforated tile geometry). These reviews indicate that previous studies have extensively covered the effects of ventilation equipment layout and related geometric parameters on the airflow distribution uniformity in data centers. However, while the installation of baffles in static pressure plenums has shown a good performance in improving airflow organization uniformity, it has been rarely reviewed. Chu and Wang [85] found that adjusting the plenum height and installing baffles within the plenum can effectively improve airflow uniformity in small-container data centers. Researchers [86,87] optimized the shape and angle of baffles to identify the best configurations for achieving the optimal thermal performance of the air-conditioning system. Although significant progress has been made in optimizing the cooling performance of air supply systems at the design stage, the focus has predominantly been on improving airflow uniformity within the aisles. Enhancing vertical airflow uniformity still requires further investigation [83]. To date, only Ma et al. [88] have conducted research on optimizing vertical airflow uniformity, discovering that installing deflectors beneath the raised floor can enhance the uniformity of vertical cooling airflow. They also optimized the size and installation of these deflectors.

Most experimental and numerical studies on airflow distribution and management in data centers are based on underfloor air supply systems [84], with relatively few studies focusing on overhead air supply [83]. For improving the cooling performance of overhead air supply systems, previous research has generally concentrated on channel sealing strategies [83], paying less attention to the optimized design of the geometrical parameters of the air supply components. Chu et al. [89] investigated the effects of the grille diameter and deflection angle on the thermal performance of data centers. Unfortunately, they did not identify a suitable grille diameter that would economically enhance the thermal performance of the supply air, nor did they recommend an optimal deflection angle for achieving the best performance in overhead air supply systems. Despite being a common air supply method with a generally good performance, practical applications of an overhead air supply can still encounter issues where the air supply does not adequately reach the lower parts of the server racks [77]. The existing research is very limited in terms of optimizing the design of the air supply characteristics of overhead air supply systems. Future research needs to focus more on techniques and methods to enhance cooling performance, aiming to achieve the optimal configuration for uniform airflow distribution in data centers. This would provide valuable guidance during the design stage of air-cooling systems, ensuring that servers can operate in a highly efficient and stable environment.

4.2. Optimization Strategies for Liquid-Cooled Data Centers

Improving the performance of cold plates has become a current research hotspot [90]. Among these improvements, microchannel heat sinks enhance their performance by designing microchannels within the cold plates, and the structural form of the microchannels has a significant impact on the performance of the cold plates.

Bionic topology design leverages the advantages observed in biological systems found in nature to enhance fluid turbulence within channels and reduce temperature differences [91]. Although this optimization method is not yet fully mature, research on microchannel design by simulating natural network topologies has made some progress. In recent years, scholars have achieved significant results in the study of microchannel heat exchanger topologies [92,93,94,95,96,97]. By simulating natural network topologies, they have designed biomimetic structures such as spider webs (Figure 16a), leaf veins (Figure 16b), dendritic patterns (Figure 16c), and serpentine shapes (Figure 16d). By combining numerical simulations with experimental validation, they have discovered that different topological structures exhibit significant differences in heat dissipation capabilities and pressure drop characteristics.

In terms of heat dissipation capacity, spider web structures usually exhibit higher heat dissipation capabilities because their multiple connection points can increase the heat dispersion area, reducing the formation of local hotspots [89]; venation structures mimic the natural distribution of plant veins, and this structure helps to evenly distribute heat flow, thereby improving overall heat dissipation performance [90,91,95,96]; tree-like structures are similar to venation structures but typically have larger main channels, which can support higher flow rates at lower pressure drops, enhancing the heat dissipation efficiency [97]; serpentine structures, due to their longer path and continuous bends, may produce higher pressure drops, but in specific applications, their simple design and ease of manufacturing result in a stable heat dissipation performance [98].

In terms of pressure drop characteristics, although the spider web structure has a good heat dissipation performance, its complex multi-connection design may lead to a higher fluid resistance and pressure drop [97]. The vein structure and dendritic structure are typically designed with fewer bends and abrupt changes, thereby reducing fluid dynamic losses and having lower pressure drops. Especially in the vein structure, its fine branches can effectively disperse fluid pressure. The serpentine structure, due to its continuous bending, usually experiences higher pressure drops, especially when the flow path is longer or the bending angles are larger [99].

In all studies, the pump power required for tree-shaped microchannels is significantly lower than that for parallel and serpentine microchannels, effectively reducing the energy loss [88,100]. Compared to serpentine microchannels, the optimized tree-shaped microchannels reduce the surface temperature difference by 69.25% and the pressure drop is 20.87% of that under the same conditions for serpentine microchannels [101].

In the current fields of technological research and industrial applications, although we have recognized the importance of different cooling structures for the heat dissipation of specific electronic devices, there is still a lack of research on the specific effects and optimization strategies of these structures in different application scenarios. The practical application effects of cooling designs such as serpentine, leaf vein, tree-like, and spider web structures lack systematic and comprehensive experimental verification and data analysis. This often makes us rely on theoretical conjecture or preliminary experimental results when selecting the most suitable cooling structure for specific devices, rather than ensuring the optimal performance of the selected scheme in actual use.

In summary, for the study of the application scenarios of different microchannel heat sink topologies in electronic device cooling, we still need to intensify research efforts and deeply explore the performance characteristics, optimization strategies, and application ranges of various heat sink structures. Through these studies, we can better understand and apply various heat sink structures, providing more effective solutions for the cooling issues of electronic devices.

5. Optimization Control Strategies

In the operation of modern data centers, optimizing control strategies for cooling systems is particularly important. This requires not only the application of advanced technologies and methods to enhance efficiency and reduce energy consumption but also the strengthening of the overall performance and reliability. As the scale of data centers increases and operational complexity rises, traditional experience-based methods are no longer sufficient to meet the demands, making automated control strategies crucial [6]. By implementing intelligent control systems and utilizing advanced monitoring and automatic control technologies, cooling equipment can be adjusted in real-time to match the actual thermal load. Researchers have widely adopted various control strategies, such as Proportional-Integral-Derivative (PID), model predictive control (MPC), and reinforcement learning (RL) [102]. Although PID control is widely used due to its simplicity of operation and high stability, advancements in technology have driven the adoption of more advanced control methods, such as MPC and RL. These methods not only enhance the level of automation and intelligence of the system but also optimize the operational efficiency and cost-effectiveness. Each control method demonstrates unique advantages and challenges, and they can be used alone or in combination with other technologies to improve system efficiency and adaptability.

5.1. PID Control

In the field of optimization control for data center cooling systems, PID control is highly favored. This method is widely adopted and suitable for most conventional cooling systems, primarily used to maintain stable operation at set points. However, PID control exhibits limitations in handling complex and rapidly changing environments.

For instance, research by Durand-Estebe et al. [102] has shown that the performance of PID control is constrained in nonlinear or rapidly changing system environments. This is because traditional PID control struggles to achieve ideal control effects when the parameters are improperly set or when the system dynamics are strong. To address this issue, researchers have proposed enhancing the performance of PID control in dynamic data center environments by improving PID parameter adjustment methods. Durand-Estebe et al. [102] optimized data centers using PID control in CFD simulations. They employed a computational fluid dynamics (CFD) simulation environment to monitor temperature and airflow in real time and adjust the PID parameters accordingly. Through iterative simulations and experiments, they precisely tuned the P, I, and D parameters, enabling the cooling system to quickly respond to thermal load changes, thus optimizing the system energy efficiency and stability. Zheng and Ping [103] proposed an Active Disturbance Rejection Control (ADRC) method for temperature regulation in server storage systems. This method augmented traditional PID control with real-time disturbance compensation mechanisms. Specifically, it employed an Extended State Observer (ESO) to estimate unknown disturbances in the system and dynamically adjusted the PID parameters based on these estimations, thereby enhancing the system’s disturbance rejection capability and control accuracy. These approaches aim to enhance the adaptability and efficiency of PID control through more precise parameter adjustments, thereby better meeting the practical demands and effectively optimizing the thermal environment of data centers.

Using PID control alone is rare; it is usually combined with other control strategies to improve control precision. The combination of different control strategies has shown significant advantages in the optimization of data center cooling systems. For example, Demir et al. [104] employed a combination of proportional control and fuzzy logic to independently control the temperature and humidity. This combination enhances the flexibility and responsiveness of the control system and allows for more precise control of the data center environmental conditions through intelligent methods, optimizing the energy efficiency and enhancing the overall system performance to better adapt to complex environmental changes.

PID technology was originally conceived over a century ago for single-input, single-output processes. Although its parameter-tuning methods can be cumbersome, they are manageable [104]. To enhance PID performance, researchers have proposed adapting to dynamic changes through fine-tuning parameters. In the future, it will be necessary to explore the integration of more advanced algorithms and technologies to address the application limitations of PID control in complex and rapidly changing environments. This will promote the development of data center cooling system control strategies towards greater efficiency and intelligence. This requires not only technological innovation but also comprehensive consideration from system design to practical application, ensuring that control strategies are effectively implemented and achieve the expected optimization results.

5.2. Model Predictive Control

Model predictive control (MPC) has garnered widespread attention in optimizing data center cooling systems due to its exceptional predictive and optimization capabilities, making it a core technology for enhancing system efficiency and responsiveness.

For example, Wang et al. [76] proposed a global optimization method for air-conditioning water systems in data centers. Global optimization was achieved by building an energy consumption model for each component and using the differential evolution (DE) algorithm. The results show that the cooler saves 10.2%, the pump saves 28.1%, and the energy consumption of the cooling tower increases by 29.7%, highlighting the importance of comprehensive optimization. The potential of MPC in balancing energy efficiency and system performance is underscored by the development of predictive optimization methods for the comprehensive optimization of chilled water systems. This work highlights the important application value of MPC in large-scale systems, while also noting the limitations of existing models in handling dynamic changes and complex systems. Zhao et al. [105] demonstrated that in ice storage air-conditioning systems, combining MPC with a multi-objective optimization control strategy can reduce energy consumption by 25% and operating costs by 20.9% compared to storage priority control.

In addition, Zhu et al. [106] studied an advanced control strategy combining refrigeration technology to optimize data center coolers. Advanced MPC strategies have also been extensively studied. Higher energy efficiency and system responsiveness are achieved through integrated refrigeration technology and standardized operating states. Zhu et al. [107] found that the advanced MPC strategy of the hybrid cooling system was reduced by 12.19% in the natural cooling mode, 4.04% in the mixed cooling mode, and 22.15% in the mechanical cooling mode. They combined this advanced control strategy with mixed integer linear programming (MILP) to effectively improve energy efficiency and reduce refrigeration losses. Choi et al. [108] introduced highly adaptive artificial neural network models and optimal control algorithms, greatly enhancing the responsiveness and adaptability of data center cyber–physical systems. Similarly, Fan and Zhou [109] applied MPC to optimize chiller units combined with water-side economizers, further affirming MPC’s flexibility and efficiency.

Despite MPC’s outstanding performance, Du et al. [16] noted that implementing MPC in dynamic environments faces numerous challenges. Future research needs to enhance MPC’s adaptability and flexibility by integrating advanced predictive models and optimization algorithms, improving real-time capabilities, and reducing implementation complexity and costs. The integration of emerging machine-learning technologies will be crucial to automatically adjust and optimize MPC parameters, making MPC more suitable for widespread data center applications.

These studies collectively constitute the application framework of MPC technology, from basic optimization methods to complex system management and specific technology strategies. They demonstrate MPC’s expanding application in data center cooling systems, from the meticulous control of individual servers to the holistic optimization of complex systems. Future research will focus on improving MPC’s learning efficiency, reducing reliance on big data, and enhancing real-time performance and stability. Through these efforts, MPC is expected to achieve more widespread and effective applications in data center cooling systems, driving control strategies toward greater efficiency and intelligence.

5.3. Reinforcement Learning

Reinforcement learning (RL), as an advanced intelligent control strategy, has demonstrated significant potential in adaptive adjustment and performance optimization. RL techniques have been widely applied to enhance the energy efficiency and responsiveness of data center cooling systems, showing substantial promise in adapting to complex and dynamic environments.

In specific applications, Zhang et al. [6] explored the use of RL in optimizing data center cooling systems. Their research showed that RL techniques effectively optimize energy efficiency, especially when dealing with complex system dynamics and continuous control variables. Although this method can improve energy efficiency, it requires high-quality training data and substantial computational resources, which limits its application in certain settings. He et al. [110] combined deep reinforcement learning with predictive control technologies to enhance the energy efficiency of chiller units, providing a new optimization approach for complex systems. This study utilized deep reinforcement learning to handle more complex system dynamics and manage continuous control variables, underscoring RL’s capabilities in complex decision-making environments. Despite its benefits, this approach also demands significant high-quality training data and computational resources, restricting its use in specific contexts.

To expand RL applications to broader system-level management, Qin et al. [111] employed a distributed reinforcement learning approach to optimize energy use across regional building clusters, effectively addressing energy distribution among multiple buildings. This distributed RL method not only optimized the energy efficiency of the entire region but also showed potential in coordinating multiple control points in large-scale systems. However, it faces technical challenges related to effective communication and synchronization between systems.

In the latest relevant studies, Lin et al. [112] enhanced server energy efficiency by integrating multi-agent reinforcement learning methods with dynamic voltage frequency scaling (DVFS) and dynamic fan control technologies. This approach finely tuned hardware operations to minimize energy consumption, demonstrating the direct effects of RL at a specific hardware level. The study also highlighted the need to improve algorithm generalization across different server environments.

These cases illustrate that RL technology is expanding its application scope from the meticulous control of individual servers to the holistic optimization of complex systems and even to energy management across multiple buildings. Future research needs to further enhance the learning efficiency of RL through algorithm improvements and improve the real-time performance and stability of algorithms. With these efforts, RL is expected to achieve more widespread and effective applications in data center cooling systems, driving control strategies towards greater efficiency and intelligence.

5.4. Summary

In the optimization of cooling systems, the choice of control strategies is crucial for enhancing system energy efficiency and responsiveness. This chapter discusses three main control strategies: PID control, MPC, and RL, exploring their applications, advantages, and the challenges they face in data center cooling systems. PID control, as a fundamental control strategy, is widely adopted due to its simplicity and high robustness, making it suitable for most conventional cooling systems, and it is primarily used to maintain stable operation at set points. However, PID control shows limitations in dealing with complex and rapidly changing environments. Model predictive control (MPC), as a model-based control strategy, is valued in data center cooling systems for its excellent optimization and predictive capabilities. It can achieve global optimization in large-scale and complex systems, significantly enhancing the system energy efficiency and performance by adjusting operations in real time to adapt to environmental changes. The main challenges for MPC lie in its model dependency and the need for real-time operation. Reinforcement learning (RL), as a model-free optimal control strategy, demonstrates strong potential in complex system regulation through its adaptive learning mechanism, especially in multi-level system applications. The application of RL requires substantial training data and computational resources, along with effective strategies to ensure learning process stability and convergence speed.

In Table 7, the advantages, disadvantages, application scenarios, and application stages of PID control, model predictive control, and reinforcement learning are summarized, allowing readers to quickly understand these three optimal control strategies.

Data centers may choose to employ a single control method or a combination of methods to adapt to complex and dynamic system environments. This diversified strategy provides more comprehensive and flexible solutions. Typically, a single control strategy is suitable for standard system optimization tasks, while a composite control strategy can more effectively handle complex and multi-objective optimization problems, thereby significantly enhancing overall optimization outcomes. Large data centers tend to prefer advanced control strategies, such as MPC or RL, or their combinations, to ensure system stability and maximize energy efficiency. However, the implementation and maintenance costs of these strategies are high, so when adopting these strategies, it is necessary to comprehensively consider the data center’s budget and business requirements.

Future trends will focus on the deeper integration of algorithms and technologies to enhance the performance and adaptability of these control strategies. Particularly, the integration of machine-learning technologies to automatically optimize control parameters will be a key step, not only improving the flexibility and accuracy of control strategies but also further reducing the energy consumption and enhancing the overall system performance. Through the integration and innovation of these technologies, data center cooling system control strategies are expected to become more efficient and intelligent, better meeting the growing performance demands and environmental adaptation challenges.

6. Data Center Cooling System Energy Consumption Indicators

Enhancing the energy efficiency to support energy conservation and emissions reduction is a primary focus of current research. To achieve this goal, researchers are working to develop and refine a series of energy efficiency assessment metrics. These metrics serve dual purposes: they measure the efficiency of energy use and guide energy management and conservation strategies in data centers [17].

We delve into the critical role of energy consumption metrics, which are essential for monitoring and evaluating energy usage efficiency and are pivotal in optimizing energy consumption and enhancing the sustainability of these facilities. Section 6.1 will introduce the classification and current status of these energy metrics, ranging from fundamental measures like power usage effectiveness (PUE) to more sophisticated and holistic efficiency metrics. Moving forward, Section 6.2 will conduct an in-depth analysis of the challenges and limitations faced in the practical application of PUE, particularly highlighting its potential shortcomings in comprehensively reflecting a data center’s energy efficiency. Furthermore, Section 6.3 will explore the ongoing improvements in the industry that build upon PUE. Researchers and engineers are striving to refine these metrics to more precisely assess and optimize the energy performance, thereby better aligning with global environmental sustainability objectives.

6.1. Classification and Status of Energy Consumption Indicators

In recent years, as data center energy efficiency has received increasing global attention, numerous researchers have systematically classified energy consumption metrics. Within this field, scholars have proposed a variety of metrics to quantify and manage energy consumption based on different assessment needs and methods. This section provides an overview of the existing energy consumption indicators and explores the current status of their development. By sorting through the relevant research results, this section reveals the strengths and weaknesses of the existing indicators and offers a theoretical basis and guidance for future research.

Long et al. [19] classified the existing data center energy efficiency assessment indicators into three categories in terms of granularity: coarse-grained indicators, medium-grained indicators, and fine-grained indicators. Coarse-grained metrics assess only the total energy efficiency and lack detailed information on subcomponents, making it difficult to provide specific recommendations for energy savings. Medium-grained metrics cover information on specific equipment, infrastructure, and green energy sources to more accurately assess energy efficiency. Fine-grained metrics provide detailed performance information on specific components, assessing energy efficiency through changes in performance and energy consumption, and are important to operators even though they do not directly represent overall energy efficiency. Shao et al. [20] collected existing energy efficiency assessment metrics for data centers in the last 20 years globally, summarized the energy efficiency metrics based on energy saving, eco-design and data center security, proposed the calculation formulas for the energy efficiency metrics, and discussed their strengths and weaknesses as well as their relevance and applicability. Wang et al. [118] discusses a taxonomy of green data center performance metrics, including basic performance metrics (e.g., greenhouse gas emissions, humidity, power, and heat metrics) as well as extended metrics. Reddy et al. [119] classified data center metrics according to greening, performance, heat, security, storage, and financial impact.

All four papers provide a systematic framework to classify data center energy efficiency assessment metrics, but their respective focuses and approaches show significant differences. Long et al. [19] constructs a hierarchical framework through the classification of granularity—coarse, medium, and fine-grained—that facilitates the systematic assessment of metrics based on the level of detail of the information. This categorization highlights the applicability and limitations of indicators at different levels of granularity. Shao et al. [20] focuses on energy efficiency assessment metrics from around the world in the last two decades. This paper provides a macroscopic perspective, not only summarizing a variety of metrics, but also exploring their calculation formulas and their strengths and weaknesses, and highlighting the relevance and applicability of the metrics in practical applications. Wang et al. [118] discusses a taxonomy of green data center performance metrics, including basic performance metrics as well as extended metrics. The focus of this literature is to present a more granular perspective that helps to assess the overall performance of green data centers. Reddy et al. [119] classifies data center metrics according to the energy efficiency, cooling, green technology application, performance, heat and air management, network, security, storage, and financial impact. This classification methodology incorporates financial impacts as well, which can help managers to develop more effective decision-making and management strategies to enhance the overall performance of data centers.

Despite these advances, as data centers face increasing environmental challenges such as air pollution and toxic by-products [120], the industry is calling for future research to be based not only on practical needs, but also to explore the assessment methods best suited to address specific issues.

Future research could consider fusing these two approaches, using the granularity classification of the first literature to organize the broad set of indicators in the second. This would not only maintain the comprehensiveness of the assessment, but also increase its usefulness. In addition, to address the shortcomings of the current classification methods, further research should explore how to optimize the use of indicators in specific application scenarios, as well as how to supplement the shortcomings of the existing methods through technological innovation.

6.2. Shortcomings of the PUE Indicator

There are a range of metrics available for evaluating data centers, but PUE has become the de facto industry standard over time [121]. The PUE metrics have become increasingly popular in the data industry in terms of computing energy efficiency [22].

Although PUE has become an important metric, several studies have shown that it still has significant shortcomings.

Firstly, PUE measures the overall energy efficiency, but this calculation method performs insufficiently in providing specific operational details and practical guidance. Long et al. [19] point out that PUE fails to adequately reflect the performance in terms of water usage, carbon emissions, and renewable energy usage. In addition, PUE also fails to effectively capture the problem of the over-provisioning of infrastructure in data centers under low load conditions, which may lead to inaccurate energy efficiency assessment.

Second, PUE has limitations in considering the meteorological conditions. Shao and Li et al. [20,122] point out that data centers with the same level of technology may exhibit different PUE values due to differences in geography and climate, which affects a fair comparison of PUE values between data centers across regions.

In addition, PUE makes it impossible to assess the energy efficiency of IT equipment, which may result in PUE not accurately reflecting the actual productivity. Zhou et al. [21] highlighted that software energy efficiency optimization may significantly reduce the energy consumption of IT equipment, while the energy consumption of the infrastructure does not change much, which may lead to unexpectedly high PUE values, thus incorrectly indicating a decrease in energy efficiency.

Finally, technical issues in PUE calculations also affect their accuracy and consistency. Brady et al. [121] and Yuventi et al. [122,123] discuss inconsistencies in definitions, unclear calculation nodes, and the complexity of data collection and parameter monitoring in PUE calculations. These technical issues not only increase the difficulty of calculating PUE but may also lead to inaccuracies in engineering estimates. Additionally, the influence of environmental factors may prevent PUE values from being a true reflection of actual performance throughout the year. Brady et al. [121] also mention the lack of research on the sensitivity of PUE to specific parameters and the shortcomings of attempts to use open-source information for PUE calculations.

The above analyses reveal a variety of limitations faced in its practical application. These limitations not only affect the accuracy and reliability of PUE, but also restrict its usefulness. In view of these issues, there is a clear need to improve and optimize PUE further in the future. This can be achieved by developing new metrics, introducing more relevant variables and parameters, formulating policies and standards for energy efficiency assessment, and utilizing advanced data analytics and machine-learning techniques, in order to enhance its effectiveness as an energy efficiency assessment tool.

6.3. Improvements in PUE

Recently, numerous scholars have made improvements to the PUE metric, which has led to a more comprehensive and accurate assessment.

To address the lack of specific operational details and practical guidance for the PUE metrics, Shaikh et al. [22] overcame the challenges of the PUE and DCiE metrics, such as the fact that the metrics only measure power efficiency and do not take into account CO₂ emissions, as well as the costs involved in the total power usage of the entire data center, and therefore proposed a new power efficiency and CO2 measurement calculator called PEMC.

Since the PUE metrics cannot assess the energy efficiency of IT equipment and applications, Zhou et al. [21] analyzed the requirements of application-level metrics for data center power usage efficiency and proposed two novel energy efficiency metrics: ApPUE and AoPUE, which constitute the application-level PUE family. ApPUE reflects the energy efficiency of IT equipment and relates application characteristics to power consumption. AoPUE measures the energy efficiency of the total power of data center facilities with respect to their application performance.

In order to address technical issues affecting the accuracy and consistency of energy efficiency in data centers, Yuventi et al. [123] suggest renaming PUE (power usage effectiveness) to “energy usage effectiveness” (EUE), which is intended to eliminate confusion and to avoid inconsistencies in the reporting of energy ratings. Rose et al. [124] adapted the calculation of PUE to exclude the effect of water storage systems.

Currently, there are some problems with the accuracy of PUE predictions. Lei et al. [23] proposed a statistical framework for predicting and analyzing PUE in Hyperscale Data Centers (HDCs). Islam et al. [125] describe a Belief Rule-Based Expert System (BRBES) prediction model for predicting power usage effictiveness (PUE) in data centers. This model is unique in that it integrates a new learning mechanism that significantly improves the accuracy of PUE prediction through parameter and structure optimization. Avotins et al. [24] improved the accuracy of PUE measurements and overall energy efficiency performance by deploying high-resolution sensors to analyze the thermodynamic behavior of the system in detail and using artificial intelligence algorithms to develop automated analysis tools to optimize cooling system configurations.

The above studies have effectively improved data center energy efficiency management through the introduction of advanced sensing technologies, the development of new assessment tools such as PEMC, the creation of refined energy efficiency metrics such as ApPUE and AoPUE, and the optimization of prediction models using AI algorithms. They have also adjusted existing assessment criteria, such as renaming PUE to EUE. These comprehensive improvement measures not only complement each other but also work together to enhance the science and practicality of data center energy efficiency management. However, despite the significant progress made in the field of data center energy efficiency assessment, it still faces some limitations. For example, although the proposed new technologies and methods are theoretically valid, they may encounter implementation challenges such as high cost, technical complexity, or compatibility with existing systems. High-resolution monitoring and artificial intelligence algorithms rely on a large amount of high-quality data, but obtaining such data in real-world environments can be challenging, and data incompleteness, errors, and biases may affect the accuracy of the results. Furthermore, despite the introduction of new metrics, harmonized standards for these metrics have not yet been established, which may affect the consistency and efficiency of future energy efficiency measurements.

7. Conclusions

Through bibliometric analysis, we conducted a comprehensive review of research on data center cooling systems from 2004 to 2024, with a particular focus on energy efficiency, optimization strategies, and energy management. The selected databases include Scopus, Web of Science, and IEEE. The study shows that as global attention has increasingly focused on data center energy efficiency, especially after 2012, research in this field has exhibited significant growth. Since 2019, research directions have gradually shifted towards greener and smarter data centers, as reflected by the increase in the number of publications in related fields. Among these databases, Scopus has the largest number of publications, and China has shown an outstanding research performance in this field.

By using CiteSpace-6.3.1 to visualize the literature, including keyword co-occurrence analysis, keyword clustering, and keyword burst analysis, we found that the classification of data center cooling systems, optimization strategies for different types of cooling systems, measures to improve data center energy efficiency, and the evaluation of cooling system energy efficiency are all important research directions in this field. Based on this, the study comprehensively reviews the research progress on optimization control strategies and evaluation indicators for data center cooling systems, focusing on improving system efficiency and sustainability. The key findings of the study are as follows:

(1): In the research domain of data center cooling system design, despite the extensive literature discussing various cooling technologies and technical details, studies often focus predominantly on the evaluation of technical performance. Particularly, when considering the influence of different climatic conditions on cooling system selection, the depth and breadth of the existing research remain insufficient. To address this, the present study proposes a systematic approach to assess and compare the adaptability, cost-effectiveness, and environmental impacts of air-cooled and liquid-cooled systems under diverse operational environments. By providing a comprehensive summary and categorization of key factors for both air-cooled and liquid-cooled systems, this paper enhances the understanding of their performance under varying conditions. These tools enable the selection of the most appropriate cooling solution based on specific thermal load requirements, available space, and climatic conditions. In the field of data center cooling, the future development of air-cooling and liquid-cooling systems will focus more on precise selection based on the specific heat load requirements, available space, and climatic conditions, while continuously improving the adaptability, cost-effectiveness, and environmental protection performance to meet growing performance demands.
(2): Although the evaluation indexes of data center air supply effectiveness have gradually been quantified, the evaluation indexes adopted by different research institutions are not the same, and there is a lack of a recognized evaluation index system to enable the comparison of air supply effectiveness across data centers in different regions and scales. Installing baffles in the static pressure chamber is a design optimization measure that can effectively improve the uniformity of airflow organization in the underfloor air supply system. By changing the shape and angle of the baffles, the air supply performance can be significantly enhanced. Overhead air supply shows a good performance in terms of cooling effect and robustness, and when exploring its optimized design, it is necessary to consider not only the channel sealing strategy but also the optimized design of the geometric parameters of the air supply components, such as the grille diameter and deflection angle.
(3): The PID control strategy is often used for temperature regulation. As data centers grow in scale and complexity, model predictive control has gradually become the mainstream approach. With the development of artificial intelligence, reinforcement learning methods are used to optimize the control strategy of the cooling system to cope with complex environments. Data centers can choose a single control method or a combination of multiple methods to adapt to complex and dynamic system environments. Diversified strategies provide more comprehensive and flexible solutions. Single control strategies are suitable for standard system optimization tasks, while composite control strategies can more effectively handle complex multi-objective optimization problems, thereby significantly improving overall optimization results.
(4): Future research into PUE is likely to see, on the one hand, a wider application of advanced AI and machine-learning techniques to automate the collection of data, analysis of energy efficiency, and execution of optimization strategies, thereby reducing human intervention and improving operational efficiency. On the other hand, more research is expected to focus on driving the uniform use of energy efficiency metrics to ensure consistency and standardization globally. Such efforts will help to improve the global synergy and effectiveness of data center energy efficiency management.

By systematically studying and optimizing the design and control strategies of data center cooling systems, improving the energy efficiency of cooling systems can not only significantly reduce the overall energy consumption and operating costs of data centers but also effectively extend the life of equipment and reduce carbon emissions, thereby making a positive contribution to addressing the global energy crisis and environmental issues.

Author Contributions

Conceptualization: Q.C. and Y.H.; methodology: Y.H., K.L. and X.X.; software: X.X., Y.Z. and S.P.; validation, Y.Z. and S.P.; formal analysis: Q.C., Y.H. and X.X.; investigation: X.X.; resources: Q.C. and Y.H.; data curation, Q.C. and Y.H.; writing—original draft preparation: Q.C., Y.H. and K.L.; writing—review and editing: X.X., Y.Z. and S.P.; visualization: Y.Z. and S.P.; supervision: K.L.; project administration: K.L.; funding acquisition: K.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China, grant number 2022YFB4501701.

Conflicts of Interest

Yuanfeng Huang was employed by the company (Sugon DataEnergy (Beijing Co. Ltd.). Qiankun Chang and Xin Xu were employed by the company (Dawning Information Industry Co. Ltd.). The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Nomenclature

ADRC	Active Disturbance Rejection Control	HPC	High-Performance Computing
ADRC	Active Disturbance Rejection Control	HVAC	Heating, Ventilation and Air-Conditioning
BRBaDE	BRBES-based adaptive differential evolution	IoT	Internet of Things
BRBES	Belief Rule-Based Expert System	IPCS	Internal Power Conditioning Systems
CDU	Cooling Distribution Unit	I_rr	Total Irreversible Loss
CFD	Computational Fluid Dynamics	IT	Information Technology
COP	Coefficient Of Performance	K_T	Temperature Inequality Index
CPI	Cyber–Physical Index	m_0m	Dimensionless Rack Temperature
CRAC	Computer Room Air-Conditioning	MILP	Mixed Integer Linear Programming
DCOI	Data Center Optimization Initiative	MPC	Model Predictive Control
DE	Differential Evolution	OMB	Office of Management and Budget
DH	District Heating	PUE	Power Use Efficiency
DRL	Deep Reinforcement Learning	RCI	Rack Cooling Index
DVFS	Dynamic Voltage Frequency Scaling	RHI	Return Heat Index
E-DRL	Event-driven Deep Reinforcement Learning	RL	Reinforcement Learning
ESO	Extended State Observer	SAT	Supply Air Temperature
EUE	Energy Usage Effectiveness	SHI	Supply Heat Index
HDCs	Hyperscale Data Centers	SPIC	Single-Phase Immersion Cooling

References

Caruana, R.; De Antonellis, S.; Marocco, L.; Guilizzoni, M. Modeling of Indirect Evaporative Cooling Systems: A Review. Fluids 2023, 8, 303. [Google Scholar] [CrossRef]
Pacak, A.; Worek, W. Review of Dew Point Evaporative Cooling Technology for Air Conditioning Applications. Appl. Sci. 2021, 11, 934. [Google Scholar] [CrossRef]
Sorell, V.; Escalante, S.; Yang, J. Comparison of overhead and underfloor air delivery systems in a data center environment using CFD modeling. ASHRAE Trans. 2005, 111, 756. [Google Scholar]
Schmidt, R.; Cruz, E. Cluster of high powered racks within a raised floor computer data center: Effect of perforated tile flow distribution on rack inlet air temperatures. In Proceedings of the ASME International Mechanical Engineering Congress and Exposition, Washington, DC, USA, 15–21 November 2003; pp. 245–262. [Google Scholar]
Chen, H.; Li, D. Current Status and Challenges for Liquid-Cooled Data Centers. Front. Sci. 2021, 9, 631652. [Google Scholar] [CrossRef]
Zhang, Y.; Fan, C.; Li, G. Discussions of Cold Plate Liquid Cooling Technology and Its Applications in Data Center Thermal Management. Front. Energy Res. 2022, 10, 954718. [Google Scholar] [CrossRef]
Shah, J.M.; Eiland, R.; Rajmane, P.; Siddarth, A.; Agonafer, D.; Mulay, V. Reliability considerations for oil immersion-cooled data centers. J. Electron. Packag. 2019, 141, 021007. [Google Scholar] [CrossRef]
Ding, T.; Cao, H.W.; He, Z.G.; Li, Z. Experiment research on influence factors of the separated heat pipe system, especially the filling ratio and Freon types. Appl. Therm. Eng. 2017, 118, 357–364. [Google Scholar] [CrossRef]
Khalaj, A.H.; Halgamuge, S.K. A Review on efficient thermal management of air-and liquid-cooled data centers: From chip to the cooling system. Appl. Energy 2017, 205, 1165–1188. [Google Scholar] [CrossRef]
Zhang, Q.; Meng, Z.; Hong, X.; Zhan, Y.; Liu, J.; Dong, J.; Bai, T.; Niu, J.; Deen, M.J. A survey on data center cooling systems: Technology, power consumption modeling and control strategy optimization. J. Syst. Archit. 2021, 119, 102253. [Google Scholar] [CrossRef]
Nadjahi, C.; Louahlia, H.; Lemasson, S. A review of thermal management and innovative cooling strategies for data center. Sustain. Comput. Inform. Syst. 2018, 19, 14–28. [Google Scholar] [CrossRef]
Yuan, X.; Zhou, X.; Pan, Y.; Kosonen, R.; Cai, H.; Gao, Y.; Wang, Y. Phase change cooling in data centers: A review. Energy Build. 2021, 236, 110764. [Google Scholar] [CrossRef]
Li, X.; Li, M.; Zhang, Y.; Han, Z.; Wang, S. Rack-level cooling technologies for data centers—A comprehensive review. J. Build. Eng. 2024, 90, 109535. [Google Scholar] [CrossRef]
Shahi, P.; Heydari, A.; Modi, H.; Chinthaparthy, L.S.R.; Barigala, A.; Soud, Q.; Chowdhury, U.; Agonafer, D.; Tradat, M.; Rodriguez, J. Study on the characterization of filters for a direct-to-chip liquid cooling system. In Proceedings of the 2024 40th Semiconductor Thermal Measurement, Modeling & Management Symposium (SEMI-THERM), San Jose, CA, USA, 25–29 March 2024; pp. 1–6. [Google Scholar]
Huang, Y.; Ge, J.; Chen, Y.; Zhang, C. Natural and forced convection heat transfer characteristics of single-phase immersion cooling systems for data centers. Int. J. Heat Mass Transf. 2023, 207, 124023. [Google Scholar] [CrossRef]
Du, Y.H.; Zhou, Z.H.; Yang, X.C.; Yang, X.Q.; Wang, C.; Liu, J.W.; Yuan, J.J. Dynamic thermal environment management technologies for data center: A review. Renew. Sustain. Energy Rev. 2023, 187, 113761. [Google Scholar] [CrossRef]
Livieratos, S.; Panetsos, S.; Fotopoulos, A.; Karagiorgas, M. A New Proposed Energy Baseline Model for a Data Center as a Tool for Energy Efficiency Evaluation. Int. J. Power Energy Res. 2019, 3, 1. [Google Scholar] [CrossRef]
Santos, A.F.; Gaspar, P.D.; Souza, H.J.L. Evaluation of the Heat and Energy Performance of a Datacenter Using a New Efficiency Index: Energy Usage Effectiveness Design—EUED. Braz. Arch. Biol. Technol. 2019, 62, e19190021. [Google Scholar] [CrossRef]
Long, S.; Li, Y.; Huang, J.; Li, Z.; Li, Y. A review of energy efficiency evaluation technologies in cloud data centers. Energy Build. 2022, 260, 111848. [Google Scholar] [CrossRef]
Shao, X.; Zhang, Z.; Song, P.; Feng, Y.; Wang, X. A review of energy efficiency evaluation metrics for data centers. Energy Build. 2022, 271, 112308. [Google Scholar] [CrossRef]
Zhou, R.; Shi, Y.; Zhu, C. AxPUE: Application level metrics for power usage effectiveness in data centers. In Proceedings of the 2013 IEEE International Conference on Big Data, Silicon Valley, CA, USA, 6–9 October 2013. [Google Scholar]
Shaikh, A.; Uddin, M.; Elmagzoub, M.A.; Alghamdi, A. PEMC: Power Efficiency Measurement Calculator to Compute Power Efficiency and CO₂ Emissions in Cloud Data Centers. IEEE Access 2020, 8, 195216–195228. [Google Scholar] [CrossRef]
Lei, N.; Masanet, E. Statistical analysis for predicting location-specific data center PUE and its improvement potential. Energy 2020, 201, 117556. [Google Scholar] [CrossRef]
Avotins, A.; Nikitenko, A.; Senfelds, A.; Kikans, J.; Podgornovs, A.; Sazonovs, M. Development of Analysis Tools for Energy Efficiency Increase of Existing Data Centres. In Proceedings of the 2022 IEEE 63th International Scientific Conference on Power and Electrical Engineering of Riga Technical University (RTUCON), Riga, Latvia, 10–12 October 2022. [Google Scholar]
He, K.; Zhang, J.; Zeng, Y. Knowledge domain and emerging trends of agricultural waste management in the field of social science: A scientometric review. Sci. Total Environ. 2019, 670, 236–244. [Google Scholar] [CrossRef]
Huang, L.; Kelly, S.; Lv, K.; Giurco, D. A systematic review of empirical methods for modeling sectoral carbon emissions in China. J. Clean. Prod. 2019, 215, 1382–1401. [Google Scholar] [CrossRef]
Liu, H.; Kong, F.; Yin, H.; Middel, A.; Zheng, X.; Huang, J.; Xu, H.; Wang, D.; Wen, Z. Impacts of green roofs on water, temperature, and air quality: A bibliometric review. Build. Environ. 2021, 196, 107794. [Google Scholar] [CrossRef]
Park, J.Y.; Nagy, Z. Comprehensive analysis of the relationship between thermal comfort and building control research—A data-driven literature review. Renew. Sustain. Energy Rev. 2018, 82, 2664–2679. [Google Scholar] [CrossRef]
White Paper on the Development of China’s Data Center Industry. 2023. [Online]. Available online: https://aimg8.dlssyht.cn/u/551001/ueditor/file/276/551001/1684888884683143.pdf (accessed on 12 May 2024).
Parolini, L.; Sinopoli, B.; Krogh, B.H.; Wang, Z. A Cyber-Physical Systems Approach to Data Center Modeling and Control for Energy Efficiency. Proc. IEEE 2012, 100, 254–268. [Google Scholar] [CrossRef]
Cheong, K.H.; Tang, K.J.W.; Koh, J.M.; Yu, S.C.M.; Acharya, U.R.; Xie, N.-G. A Novel Methodology to Improve Cooling Efficiency at Data Centers. IEEE Access 2019, 7, 153799–153809. [Google Scholar] [CrossRef]
Berezovskaya, Y.; Yang, C.-W.; Mousavi, A.; Vyatkin, V.; Minde, T.B. Modular Model of a Data Centre as a Tool for Improving Its Energy Efficiency. IEEE Access 2020, 8, 46559–46573. [Google Scholar] [CrossRef]
Ran, Y.; Zhou, X.; Hu, H.; Wen, Y. Optimizing Data Center Energy Efficiency via Event-Driven Deep Reinforcement Learning. IEEE Trans. Serv. Comput. 2022, 16, 1296–1309. [Google Scholar] [CrossRef]
Ahmed, K.M.U.; Bollen, M.H.J.; Alvarez, M. A Review of Data Centers Energy Consumption and Reliability Modeling. IEEE Access 2021, 9, 152536–152563. [Google Scholar] [CrossRef]
Chen, C. CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. J. Am. Soc. Inf. Sci. Technol. 2006, 57, 359–377. [Google Scholar] [CrossRef]
Chen, C.; Ibekwe-SanJuan, F.; Hou, J. The structure and dynamics of cocitation clusters: A multiple-perspective cocitation analysis. J. Am. Soc. Inf. Sci. Technol. 2010, 61, 1386–1409. [Google Scholar] [CrossRef]
Lin, J.; Ling, F.; Huang, P.; Chen, M.; Song, M.; Lu, K.; Wang, W. The Development of GABAergic Network in Depression in Recent 17 Years: A Visual Analysis Based on CiteSpace and VOSviewer. Front. Psychiatry 2022, 13, 874137. [Google Scholar] [CrossRef] [PubMed]
Ni, J.; Bai, X. A review of air conditioning energy performance in data centers. Renew. Sustain. Energy Rev. 2017, 67, 625–640. [Google Scholar] [CrossRef]
Ebrahimi, K.; Jones, G.F.; Fleischer, A.S. A review of data center cooling technology, operating conditions and the corresponding low-grade waste heat recovery opportunities. Renew. Sustain. Energy Rev. 2014, 31, 622–638. [Google Scholar] [CrossRef]
Zhang, H.; Shao, S.; Xu, H.; Zou, H.; Tian, C. Free cooling of data centers: A review. Renew. Sustain. Energy Rev. 2014, 35, 171–182. [Google Scholar] [CrossRef]
Chen, Y.; Jiang, Y.; Chang, S. The Research Status and Development of Personal Cooling Systems. In Proceedings of the 2020 2nd International Conference on Artificial Intelligence and Advanced Manufacture (AIAM), Manchester, UK, 15–17 October 2020; pp. 316–320. [Google Scholar]
Roj, R.; Katenbrink, N. Investigation on the application of different Air-Cooling-Systems in a thermoelectric Setup. Mater. Today Proc. 2015, 2, 714–720. [Google Scholar] [CrossRef]
Hanafi, N.S.M.; Ghopa, W.A.W.; Zulkifli, R.; Abdullah, S.; Harun, Z.; Abu Mansor, M.R. Numerical simulation on the effectiveness of hybrid nanofluid in jet impingement cooling application. Energy Rep. 2022, 8, 764–775. [Google Scholar] [CrossRef]
Chen, X.; Chen, X.; Yu, W.; Yan, Z. Optimization and simulation on heat dissipation structure of subsea data center. Cryog. Supercond. 2022, 50, 28–35. [Google Scholar]
Abou Elmaaty, T.M.; Kabeel, A.E.; Mahgoub, M. Corrugated plate heat exchanger review. Renew. Sustain. Energy Rev. 2017, 70, 852–860. [Google Scholar] [CrossRef]
Geng, H. Data Center Handbook; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
Lee, Y.J.; Singh, P.K.; Lee, P.S. Fluid flow and heat transfer investigations on enhanced microchannel heat sink using oblique fins with parametric study. Int. J. Heat Mass Transf. 2015, 81, 325–336. [Google Scholar] [CrossRef]
Eiland, R.; Fernandes, J.; Vallejo, M.; Agonafer, D.; Mulay, V. Flow Rate and inlet temperature considerations for direct immersion of a single server in mineral oil. In Proceedings of the Fourteenth Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm), Orlando, FL, USA, 27–30 May 2014; pp. 706–714. [Google Scholar]
Hnayno, M.; Chehade, A.; Klaba, H.; Polidori, G.; Maalouf, C. Experimental investigation of a data-centre cooling system using a new single-phase immersion/liquid technique. Case Stud. Therm. Eng. 2023, 45, 102925. [Google Scholar] [CrossRef]
Greenberg, S.; Mills, E.; Tschudi, B.; Rumsey, P.; Myatt, B. Best practices for data centers: Lessons learned from benchmarking 22 data centers. In Proceedings of the ACEEE Summer Study Energy Efficiency Buildings, Pacific Grove, CA, USA, 13–18 August 2006; Volume 3, pp. 76–87. [Google Scholar]
Coles, H. Direct Liquid Cooling for Electronic Equipment; Energy Technologies Area, Berkeley Lab: Berkeley, CA, USA, 2014. [Google Scholar]
Oltmanns, J.; Sauerwein, D.; Dammel, F.; Stephan, P.; Kuhn, C. Potential for waste heat utilization of hot-water-cooled data centers: A case study. Energy Sci. Eng. 2020, 8, 1793–1810. [Google Scholar] [CrossRef]
Taddeo, P. Waste heat recovery from urban data centres and reuse to increase energy efficiency of district heating and cooling network. Master’s Thesis, Universitat Politècnica de Catalunya, Barcelona, Spain, 2017. [Google Scholar]
Lu, T.; Lü, X.; Välisuo, P.; Zhang, Q.; Clements-Croome, D. Innovative approaches for deep decarbonization of data centers and building space heating networks: Modeling and comparison of novel waste heat recovery systems for liquid cooling systems. Appl. Energy 2024, 357, 122473. [Google Scholar] [CrossRef]
Dede, E.M.; Liu, Y. Experimental and numerical investigation of a multi-pass branching microchannel heat sink. Appl. Therm. Eng. 2013, 55, 51–60. [Google Scholar] [CrossRef]
Parida, P.R.; David, M.; Iyengar, M.; Schultz, M.; Gaynes, M.; Kamath, V.; Kochuparambil, B.; Chainer, T. Experimental investigation of water cooled server microprocessors and memory devices in an energy efficient chiller-less data center. In Proceedings of the 2012 28th Annual IEEE Semiconductor Thermal Measurement and Management Symposium (SEMI-THERM), San Jose, CA, USA, 18–22 March 2012; pp. 224–231. [Google Scholar]
Eiland, R.; Fernandes, J.E.; Vallejo, M.; Siddarth, A.; Agonafer, D.; Mulay, V. Thermal Performance and Efficiency of a Mineral Oil Immersed Server Over Varied Environmental Operating Conditions. J. Electron. Packag. 2017, 139, 041005. [Google Scholar] [CrossRef]
Kadam, S.T.; Kumar, R. Twenty first century cooling solution: Microchannel heat sinks. Int. J. Therm. Sci. 2014, 85, 73–92. [Google Scholar] [CrossRef]
Kanbur, B.B.; Wu, C.; Fan, S.; Duan, F. System-level experimental investigations of the direct immersion cooling data center units with thermodynamic and thermoeconomic assessments. Energy 2021, 217, 119373. [Google Scholar] [CrossRef]
Zhang, Y.; Zhao, Y.; Dai, S.; Nie, B.; Ma, H.; Li, J.; Miao, Q.; Jin, Y.; Tan, L.; Ding, Y. Cooling technologies for data centres and telecommunication base stations—A comprehensive review. J. Clean. Prod. 2022, 334, 130280. [Google Scholar] [CrossRef]
Zhang, T.; Mo, Z.; Xu, X.; Liu, X.; Chen, H.; Han, Z.; Yan, Y.; Jin, Y. Advanced Study of Spray Cooling: From Theories to Applications. Energies 2022, 15, 9219. [Google Scholar] [CrossRef]
Raj, A.Q.K. Application status and prospect of spray cooling in electronics and aerospace. J. Therm. Sci. Eng. Appl. 2020, 12, 041004-1–041004-12. [Google Scholar]
Niemann, J.; Bean, J.; Avelar, V. Economizer modes of data center cooling systems. In Schneider Electric Data Center Science Center Whitepaper; Schneider Electric: Rueil-Malmaison, France, 2011. [Google Scholar]
Schmidt, R.R.; Cruz, E.E.; Iyengar, M. Challenges of data center thermal management. IBM J. Res. Dev. 2005, 49, 709–723. [Google Scholar] [CrossRef]
Capozzoli, A.; Primiceri, G. Cooling systems in data centers: State of art and emerging technologies. Energy Procedia 2015, 83, 484–493. [Google Scholar] [CrossRef]
Zhang, K.; Zhang, Y.; Liu, J.; Niu, X. Recent advancements on thermal management and evaluation for data centers. Appl. Therm. Eng. 2018, 142, 215–231. [Google Scholar] [CrossRef]
Wang, I.-N.; Tsui, Y.-Y.; Wang, C.-C. Improvements of Airflow Distribution in a Container Data Center. Energy Procedia 2015, 75, 1819–1824. [Google Scholar] [CrossRef]
Fakhim, B.; Behnia, M.; Armfield, S.; Srinarayana, N. Cooling solutions in an operational data centre: A case study. Appl. Therm. Eng. 2011, 31, 2279–2291. [Google Scholar] [CrossRef]
Patankar, S.V. Airflow and cooling in a data center. J. Heat Transf. 2010, 132, 073001. [Google Scholar] [CrossRef]
Siriwardana, J.; Jayasekara, S.; Halgamuge, S.K. Potential of air-side economizers for data center cooling: A case study for key Australian cities. Appl. Energy 2013, 104, 207–219. [Google Scholar] [CrossRef]
Geng, H.; Li, J.; Zou, C. Discussion on outdoor air cooling energy saving technology of data center in temperate zone. Heat. Vent. Air Cond. 2017, 10, 19–25. [Google Scholar]
Ham, S.-W.; Kim, M.-H.; Choi, B.-N.; Jeong, J.-W. Energy saving potential of various air-side economizers in a modular data center. Appl. Energy 2015, 138, 258–275. [Google Scholar] [CrossRef]
Mudawar, I.; Wadsworth, D. Critical heat flux from a simulated chip to a confined rectangular impinging jet of dielectric liquid. Int. J. Heat Mass Transf. 1991, 34, 1465–1479. [Google Scholar] [CrossRef]
Xu, S.; Zhang, H.; Wang, Z. Thermal Management and Energy Consumption in Air, Liquid, and Free Cooling Systems for Data Centers: A Review. Energies 2023, 16, 1279. [Google Scholar] [CrossRef]
Cardenas, R.; Narayanan, V. Heat transfer characteristics of submerged jet impingement boiling of saturated FC-72. Int. J. Heat Mass Transf. 2012, 55, 4217–4231. [Google Scholar] [CrossRef]
Wang, P.; Sun, J.; Yoon, S.; Zhao, L.; Liang, R. A global optimization method for data center air conditioning water systems based on predictive optimization control. Energy 2024, 295, 130925. [Google Scholar] [CrossRef]
Cho, J.; Lim, T.; Kim, B.S. Measurements and predictions of the air distribution systems in high compute density (Internet) data centers. Energy Build. 2009, 41, 1107–1115. [Google Scholar] [CrossRef]
Srinarayana, N.; Fakhim, B.; Behnia, M.; Armfield, S.W. Thermal performance of an air-cooled data center with raised-floor and non-raised-floor configurations. Heat Transf. Eng. 2014, 35, 384–397. [Google Scholar] [CrossRef]
Suwa, Y. A high-performance airflow design for air-conditioning system in data centers. J. Environ. Eng. 2011, 76, 501–508. [Google Scholar] [CrossRef]
Tsuchiya, T.; Suwa, Y.; Ooka, R. Experimental Study of Airflow Designs for Data Centers. J. Asian Arch. Build. Eng. 2014, 13, 491–498. [Google Scholar] [CrossRef]
Li, X.; Zhang, Z.; Wang, Q.; Yang, X.; Hooman, K.; Liu, S. Performance Comparison between Data Centers with Different Airflow Management Technologies. Heat Transf. Eng. 2023, 45, 1011–1027. [Google Scholar] [CrossRef]
Lu, H.; Zhang, Z.; Yang, L. A review on airflow distribution and management in data center. Energy Build. 2018, 179, 264–277. [Google Scholar] [CrossRef]
Chu, W.-X.; Wang, C.-C. A review on airflow management in data centers. Appl. Energy 2019, 240, 84–119. [Google Scholar] [CrossRef]
Isazadeh, A.; Ziviani, D.; Claridge, D.E. Thermal management in legacy air-cooled data centers: An overview and perspectives. Renew. Sustain. Energy Rev. 2023, 187, 113707. [Google Scholar] [CrossRef]
Chu, W.-X.; Wang, C.-C. CFD Investigation of Airflow Management in a Small Container Data Center. IEEE Trans. Compon. Packag. Manuf. Technol. 2019, 9, 2177–2188. [Google Scholar] [CrossRef]
Feng, Y.; Liu, P.; Zhang, Z.; Zhang, W.; Li, L.; Wang, X. Influence of floor air supply methods and geometric parameters on thermal performance of data centers. J. Therm. Anal. Calorim. 2023, 148, 8477–8496. [Google Scholar] [CrossRef]
Chen, M.; Zhang, Z.; Deng, Q.; Feng, Y.; Wang, X. Optimization of underfloor air distribution systems for data centers based on orthogonal test method: A case study. Build. Environ. 2023, 232, 110071. [Google Scholar] [CrossRef]
Ma, B.; Liu, H.; Du, Y.; Yang, X.; Zhou, Z.; Lu, J.; Chen, Y. Simulation and experimental research on the optimization of airflow organization and energy saving in data centers using air deflectors. Clean Energy Sci. Technol. 2024, 2, 141. [Google Scholar] [CrossRef]
Chu, W.-X.; Wu, J.-L.; Tsui, Y.-Y.; Wang, C.-C. Impact of Overhead Air Supply Layout on the Thermal Performance of a Container Data Center. J. Electron. Packag. 2020, 142, 011008. [Google Scholar] [CrossRef]
He, W.; Zhang, J.F.; Li, H.L.; Guo, R.; Liu, S.C.; Wu, X.H.; Wei, J.; Wang, Y.L. Effects of different water-cooled heat sinks on the cooling system performance in a data center. Energy Build. 2023, 292, 113162. [Google Scholar] [CrossRef]
Lu, K.; Wang, C.; Fan, X.; Qi, F.; He, H. Topological structures for microchannel heat sink applications—A review. Manuf. Rev. 2023, 10, 27. [Google Scholar] [CrossRef]
Tan, H.; Wu, L.; Wang, M.; Yang, Z.; Du, P. Heat transfer improvement in microchannel heat sink by topology design and optimization for high heat flux chip cooling. Int. J. Heat Mass Transf. 2018, 129, 681–689. [Google Scholar] [CrossRef]
Tan, H.; Zong, K.; Du, P. Temperature uniformity in convective leaf vein-shaped fluid microchannels for phased array antenna cooling. Int. J. Therm. Sci. 2020, 150, 106224. [Google Scholar] [CrossRef]
Peng, Y.; Yang, X.; Li, Z.; Li, S.; Cao, B. Numerical simulation of cooling performance of heat sink designed based on symmetric and asymmetric leaf veins. Int. J. Heat Mass Transf. 2021, 166, 120721. [Google Scholar] [CrossRef]
Tan, H.; Du, P.; Zong, K.; Meng, G.; Gao, X.; Li, Y. Investigation on the temperature distribution in the two-phase spider netted microchannel network heat sink with non-uniform heat flux. Int. J. Therm. Sci. 2021, 169, 107079. [Google Scholar] [CrossRef]
Han, X.-H.; Liu, H.-L.; Xie, G.; Sang, L.; Zhou, J. Topology optimization for spider web heat sinks for electronic cooling. Appl. Therm. Eng. 2021, 195, 117154. [Google Scholar] [CrossRef]
Hu, D.; Zhang, Z.; Li, Q. Numerical study on flow and heat transfer characteristics of microchannel designed using topological optimizations method. Sci. China Technol. Sci. 2020, 63, 105–115. [Google Scholar] [CrossRef]
Luo, Y.; Liu, W.; Huang, G. Fabrication and experimental investigation of the bionic vapor chamber. Appl. Therm. Eng. 2020, 168, 114889. [Google Scholar] [CrossRef]
Li, H.; Ding, X.; Jing, D.; Xiong, M.; Meng, F. Experimental and numerical investigation of liquid-cooled heat sinks designed by topology optimization. Int. J. Therm. Sci. 2019, 146, 106065. [Google Scholar] [CrossRef]
Shui, L.; Huang, B.; Gao, F.; Rui, H. Experimental and numerical investigation on the flow and heat transfer characteristics in a tree-like branching microchannel. J. Mech. Sci. Technol. 2018, 32, 937–946. [Google Scholar] [CrossRef]
Imran, A.A.; Mahmoud, N.S.; Jaffal, H.M. Numerical and experimental investigation of heat transfer in liquid cooling serpentine mini-channel heat sink with different new configuration models. Therm. Sci. Eng. Prog. 2018, 6, 128–139. [Google Scholar] [CrossRef]
Durand-Estebe, B.; Le Bot, C.; Mancos, J.N.; Arquis, E. Data center optimization using PID regulation in CFD simulations. Energy Build. 2013, 66, 154–164. [Google Scholar] [CrossRef]
Zheng, Q.; Ping, Z. Active Disturbance Rejection Control for Server Storage System Temperature Regulation. In Proceedings of the 2018 IEEE 14th International Conference on Control and Automation (ICCA), Anchorage, AK, USA, 12–15 June 2018; pp. 1113–1118. [Google Scholar]
Demir, M.H.; Cetin, S.; Haggag, O.; Demir, H.G.; Worek, W.; Premer, J.; Pandelidis, D. Independent temperature and humidity control of a precooled desiccant air cooling system with proportional and fuzzy logic + proportional based controllers. Int. Commun. Heat Mass Transf. 2022, 139, 106451. [Google Scholar] [CrossRef]
Zhao, J.; Liu, D.; Yuan, X.; Wang, P. Model predictive control for the ice-storage air-conditioning system coupled with multi-objective optimization. Appl. Therm. Eng. 2024, 243, 122595. [Google Scholar] [CrossRef]
Zhu, Y.; Zhang, Q.; Zeng, L.; Wang, J.; Zou, S.; Zheng, H. An advanced control strategy for optimizing the operation state of chillers with cold storage technology in data center. Energy Build. 2023, 301, 113684. [Google Scholar] [CrossRef]
Zhu, Y.; Zhang, Q.; Zeng, L.; Wang, J.; Zou, S. An advanced control strategy of hybrid cooling system with cold water storage system in data center. Energy 2024, 291, 130304. [Google Scholar] [CrossRef]
Choi, Y.J.; Park, B.R.; Hyun, J.Y.; Moon, J.W. Development of an adaptive artificial neural network model and optimal control algorithm for a data center cyber–physical system. Build. Environ. 2022, 210, 108704. [Google Scholar] [CrossRef]
Fan, C.; Zhou, X. Model-based predictive control optimization of chiller plants with water-side economizer system. Energy Build. 2023, 278, 112633. [Google Scholar] [CrossRef]
He, K.; Fu, Q.M.; Lu, Y.; Wang, Y.Z.; Luo, J.; Wu, H.J.; Chen, J.P. Predictive control optimization of chiller plants based on deep reinforcement learning. J. Build. Eng. 2023, 76, 107158. [Google Scholar] [CrossRef]
Qin, Y.; Ke, J.; Wang, B.; Filaretov, G.F. Energy optimization for regional buildings based on distributed reinforcement learning. Sustain. Cities Soc. 2022, 78, 103625. [Google Scholar] [CrossRef]
Lin, W.; Lin, W.; Lin, J.; Zhong, H.; Wang, J.; He, L. A multi-agent reinforcement learning-based method for server energy efficiency optimization combining DVFS and dynamic fan control. Sustain. Comput. Inform. Syst. 2024, 42, 100977. [Google Scholar] [CrossRef]
Wang, D.; Zheng, W.; Wang, Z.; Wang, Y.; Pang, X.; Wang, W. Comparison of reinforcement learning and model predictive control for building energy system optimization. Appl. Therm. Eng. 2023, 228, 120430. [Google Scholar] [CrossRef]
Cen, J.; Zeng, L.Z.; Liu, X.; Wang, F.Y.; Deng, S.J.; Yu, Z.W.; Zhang, G.M.; Wang, W.Y. Research on energy-saving optimization method for central air conditioning system based on multi-strategy improved sparrow search algorithm. Int. J. Refrig. 2024, 291, 130304. [Google Scholar] [CrossRef]
Wang, K.F.; Ye, L.; Yang, S.H.; Deng, Z.F.; Song, J.Y.; Li, Z.; Zhao, Y.N. A hierarchical dispatch strategy of hybrid energy storage system in internet data center with model predictive control. Appl. Energy 2023, 331, 120414. [Google Scholar] [CrossRef]
Chen, S.H.; Ding, P.X.; Zhou, G.; Zhou, X.Q.; Li, J.; Wang, L.Z.; Wu, H.J.; Fan, C.L.; Li, J.B. A novel machine learning-based model predictive control framework for improving the energy efficiency of air-conditioning systems. Energy Build. 2023, 294, 113258. [Google Scholar] [CrossRef]
Koller, T.; Berkenkamp, F.; Turchetta, M.; Krause, A. Learning-Based Model Predictive Control for Safe Exploration. In Proceedings of the 2018 IEEE Conference on Decision and Control (CDC), Miami, FL, USA, 17–19 December 2018; pp. 6059–6066. [Google Scholar]
Wang, L.; Khan, S.U. Review of performance metrics for green data centers: A taxonomy study. J. Supercomput. 2011, 63, 639–656. [Google Scholar] [CrossRef]
Reddy, V.D.; Setz, B.; Rao, G.S.V.R.K.; Gangadharan, G.R.; Aiello, M. Metrics for Sustainable Data Centers. IEEE Trans. Sustain. Comput. 2017, 2, 290–303. [Google Scholar] [CrossRef]
Dey, S.; Pal, S. Controlling Air Pollution in Data Centers using Green Data Centers. In Proceedings of the CCGRID 2023, Cloud and Internet Computing Workshops (CCGridW), Bangalore, India, 1–4 May 2023. [Google Scholar]
Brady, G.A.; Kapur, N.; Summers, J.L.; Thompson, H.M. A case study and critical assessment in calculating power usage effectiveness for a data centre. Energy Convers. Manag. 2013, 76, 155–161. [Google Scholar] [CrossRef]
Li, J.; Jurasz, J.; Li, H.; Tao, W.-Q.; Duan, Y.; Yan, J. A new indicator for a fair comparison on the energy performance of data centers. Appl. Energy 2020, 276, 115497. [Google Scholar] [CrossRef]
Yuventi, J.; Mehdizadeh, R. A critical analysis of Power Usage Effectiveness and its use in communicating data center energy consumption. Energy Build. 2013, 64, 90–94. [Google Scholar] [CrossRef]
Rose, I.; Wemhoff, A.P.; Fleischer, A.S. The Performance Impact of Integrating Water Storage Into a Chiller-Less Data Center Design. J. Therm. Sci. Eng. Appl. 2019, 11, 021010. [Google Scholar] [CrossRef]
Islam, R.U.; Ruci, X.; Hossain, M.S.; Andersson, K.; Kor, A.-L. Capacity Management of Hyperscale Data Centers Using Predictive Modelling. Energies 2019, 12, 3438. [Google Scholar] [CrossRef]

Figure 1. Paper search flowchart.

Figure 2. Proportion of selected studies in search databases.

Figure 3. Publication year trends.

Figure 4. Keyword co-occurrence graph from Scopus.

Figure 5. Keyword co-occurrence graph from Web of Science.

Figure 6. Temporal clustering of keywords in Scopus.

Figure 7. Journal publication year bubble chart.

Figure 8. High-publishing countries graph.

Figure 9. Country co-occurrence map.

Figure 10. Issuing authority co-occurrence map.

Figure 11. Co-citation network of Scopus papers.

Figure 12. Keyword burstness graph.

Figure 13. Components of data center energy consumption.

Figure 14. Basic mechanism of liquid cooling technology.

Figure 15. Basic mechanism of air cooling technology.

Figure 16. Schematic diagrams of different microchannel structures.

Table 1. High-frequency keywords in the Scopus database.

Keywords	Frequency	Keywords	Frequency
cooling	200	energy efficiency	116
cooling systems	187	energy utilization	112
data center	170	energy conservation	67
thermoelectric equipment	118	green computing	60
datacenter	98	data center cooling	59

Table 2. High-frequency keywords in the Web of Science database.

Keywords	Frequency	Keywords	Frequency
data center	70	energy efficiency	15
performance	20	management	11
data centers	19	air side economizers	10
free cooling	17	efficiency	10
design	15	simulation	9

Table 3. Overview of the four research directions.

Research Directions	Research Significance
Classification of existing cooling systems	At the design stage, determining the most appropriate cooling system for different conditions and scenarios
Optimization strategies for air-cooled and liquid-cooled cooling systems in data centers	Optimization of cooling systems, providing guidance during the design phase
Optimization control schemes for data center cooling systems that enhance energy efficiency	Summarizing optimization methods, which are a hot topic in current research
Discussion on energy efficiency evaluation indicators	Summarizing existing energy efficiency evaluation indicators and suggesting potential improvements to current evaluation metrics

Table 4. Classification of cooling systems.

Cooling System	Type	Application	Advantages	Disadvantages
Air Cooling Systems	Direct Air Cooling	General data centers, office equipment, and light-load server rooms [1].	Low cost, easy installation and maintenance [41].	Limited cooling efficiency, potential noise, sensitivity to ambient air quality.
	Indirect Air Cooling	Data centers, large server rooms, environments requiring air quality control [39].	Improved air quality control, more effective temperature and humidity management.	Higher initial and maintenance costs, lower energy efficiency compared to liquid cooling.
	Evaporative Cooling	Data centers in dry climate regions.	High energy efficiency, especially suitable for dry environments [5].	Difficult humidity control, and high water consumption.
Liquid Cooling Systems	Cold Plate Liquid Cooling	High-performance computing (HPC), large data centers, supercomputers.	High cooling efficiency, precise temperature control, suitable for high heat loads [4].	High cost, complex systems, difficult maintenance, risk of leakage [5].
	Immersion Liquid Cooling	Blockchain mining, high-frequency trading servers, intensive data processing, and emerging applications in industrial data centers.	Extremely high cooling efficiency, virtually silent, and space-saving [42].	High initial investment and maintenance costs, complex hardware replacement [4]. Currently, broader adoption in industrial data centers is still emerging, with implementation primarily in specific high-performance computing environments.
	Jet Liquid Cooling	High-power microprocessors, high-performance GPUs, research-grade laboratories, and initial trials in industrial data centers.	Direct cooling of hot spots, extremely high efficiency [43].	High cost, complex technology, and high maintenance requirements [44]. While proven in research environments, broader adoption in industrial data centers is still limited to initial trials and specific high-performance applications.

Table 5. Typical air supply and return modes.

		Air Return System
		Flooded	Locally Ducted
Air Supply System	Flooded
	Underfloor Locally Ducted
	Overhead Locally Ducted

Table 6. Classification of the performance metrics.

Evaluation Metrics	Application Scenarios	Literatures
Age of air	Evaluate indoor ventilation performance.	[79]
RCI	Measure the degree to which the inlet temperature meets the inlet temperature standard.	[78,80]
I_rr	Measure and quantify the effectiveness of data center cooling systems.	[78]
K_T	Evaluate the thermal environment of the rack.	[81]
m_0m	Evaluate the cooling characteristics of the air supply modes.	[80]
SAT	Evaluate the thermal performance of the data center.	[81]
SHI		[78,81]
RHI		[81]

Table 7. Optimization control strategies for data center cooling systems.

Control Method	Advantages	Disadvantages	Applicable Situations	Applicable Stages
PID [12,102,103,104,113]	Simple, mature, stable, robust, well-developed system calibration methods	Not suitable for nonlinear and rapidly changing systems, performance is limited by parameter settings	Data centers with minimal environmental changes and situations requiring fine-grained control to optimize energy usage	Initial design parameter setting stage; an operational stage for maintaining set points
MPC [12,76,105,106,107,108,109,111,114,115,116,117]	Predictive control, dynamic optimization, high precision	Strong model dependency, poor real-time performance, complex online calculations	Scenarios requiring precise control of chilled water flow and indoor temperature and large data centers with highly dynamic control and energy optimization needs	Initial design stage for system configuration and strategy verification; operational stage for dynamic adjustment and optimization
RL [12,111,112,113,114]	Adaptive strategy optimization, capable of dynamic adjustments	Relies on extensive data and exploratory learning, high computational complexity	Environments with complex cooling needs and variable workloads, data centers with high energy consumption and stringent environmental control requirements	Strategy development and self-learning stage; operational stage for adaptive optimization

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chang, Q.; Huang, Y.; Liu, K.; Xu, X.; Zhao, Y.; Pan, S. Optimization Control Strategies and Evaluation Metrics of Cooling Systems in Data Centers: A Review. Sustainability 2024, 16, 7222. https://doi.org/10.3390/su16167222

AMA Style

Chang Q, Huang Y, Liu K, Xu X, Zhao Y, Pan S. Optimization Control Strategies and Evaluation Metrics of Cooling Systems in Data Centers: A Review. Sustainability. 2024; 16(16):7222. https://doi.org/10.3390/su16167222

Chicago/Turabian Style

Chang, Qiankun, Yuanfeng Huang, Kaiyan Liu, Xin Xu, Yaohua Zhao, and Song Pan. 2024. "Optimization Control Strategies and Evaluation Metrics of Cooling Systems in Data Centers: A Review" Sustainability 16, no. 16: 7222. https://doi.org/10.3390/su16167222

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimization Control Strategies and Evaluation Metrics of Cooling Systems in Data Centers: A Review

Abstract

1. Introduction

2. Data and Methods

2.1. Paper Search

2.2. Database Analysis

2.3. Analysis of Annual Publication Trends

2.4. Keyword Analysis

2.5. Publication Journal Analysis

2.6. Analysis of Publication Countries

2.7. Discussion on the Citation of Papers

2.8. Discussion of Research Directions in Papers

3. Classification of Data Center Cooling Systems

3.1. Liquid Cooling Technology

3.1.1. Cold Plate Liquid Cooling

3.1.2. Immersion Liquid Cooling

3.1.3. Spray Liquid Cooling

3.2. Air Cooling Technology

3.2.1. Direct Air Cooling

3.2.2. Indirect Air Cooling

3.2.3. Evaporative Cooling

3.3. Key Factors in Data Center Cooling Systems

4. Design Optimization Measure

4.1. Air Cooling System Design Optimization

4.1.1. Common Air Supply and Return Modes

4.1.2. Optimization of Air Supply Modes

4.2. Optimization Strategies for Liquid-Cooled Data Centers

5. Optimization Control Strategies

5.1. PID Control

5.2. Model Predictive Control

5.3. Reinforcement Learning

5.4. Summary

6. Data Center Cooling System Energy Consumption Indicators

6.1. Classification and Status of Energy Consumption Indicators

6.2. Shortcomings of the PUE Indicator

6.3. Improvements in PUE

7. Conclusions

Author Contributions

Funding

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI