[orcid=https://orcid.org/0000-0002-6980-5267] \fnmark[1] \cormark[1]

[orcid=https://orcid.org/0000-0003-1516-1993] \fnmark[1]

[orcid=https://orcid.org/0000-0002-4879-1206]

[]

\cortext

[cor1]Corresponding author \fntext[fn1]Both authors contributed equally to this article. \nonumnoteThis work was developed within the Innovate UK/CELTIC-NEXT European collaborative project on AIMM (AI-enabled Massive MIMO).

Anomaly Detection in Offshore Open Radio Access Network Using Long Short-Term Memory Models on a Novel Artificial Intelligence-Driven Cloud-Native Data Platform

Abdelrahim Ahmad [email protected] Boldyn Networks    Peizheng Li [email protected] Department of Electrical and Electronic Engineering, University of Bristol, United Kingdom    Robert Piechocki [email protected]    Rui Inacio [email protected]
Abstract

The radio access network (RAN) is a critical component of modern telecom infrastructure, currently undergoing significant transformation towards disaggregated and open architectures. These advancements are pivotal for integrating intelligent, data-driven applications aimed at enhancing network reliability and operational autonomy through the introduction of cognition capabilities, exemplified by the set of enhancements proposed by the emerging Open radio access network (O-RAN) standards. Despite its potential, the nascent nature of O-RAN technology presents challenges, primarily due to the absence of mature operational standards. This complicates the management of data and applications, particularly in integrating with traditional network management and operational support systems. Divergent vendor-specific design approaches further hinder migration and limit solution reusability. Addressing the skills gap in telecom business-oriented engineering is crucial for the effective deployment of O-RAN and the development of robust data-driven applications. To address these challenges, Boldyn Networks, a global Neutral Host provider, has implemented a novel cloud-native data analytics platform. This platform underwent rigorous testing in real-world scenarios of using advanced artificial intelligence (AI) techniques, significantly improving operational efficiency, and enhancing customer experience. Implementation involved adopting development operations (DevOps) practices, leveraging data lakehouse architectures tailored for AI applications, and employing sophisticated data engineering strategies. The platform successfully addresses connectivity challenges inherent in offshore windfarm deployments using long short-term memory (LSTM) Models for anomaly detection of the connectivity, providing detailed insights into its specialized architecture developed for this purpose.

keywords:
Open RAN \sepTelecom \sepAI \sepLSTM \sepDeep Learning \sepBig Data \sepDevOps \sepMLOps \sepCI/CD \sepData Engineering \sepAnomaly Detection \sepCloud-native private networks

1 Introduction

Telecommunication networks are essential to many aspects of our lives, driving digital transformation and revolutionizing communication. The benefits of these networks are numerous. Recently, their importance has surged due to the proliferation of various types of user equipment (UE), internet of things (IoT) devices, autonomous operations, and services that require faster, more reliable, resilient, secure, and private connectivity. This has led to a substantial rise in the demand for private networks, amplifying the challenges of managing many smaller, tailored mobile networks to deliver high-quality services.

To address these escalating demands, innovative enhancements in network design have emerged. A recent progress in this arena is the advent of open radio access network (O-RAN) technology. O-RAN aims to disaggregate the monolithic, single-vendor RAN reducing infrastructure costs and paving the way for network programmability, ultimately leading to autonomous network operations by leveraging native-supported in-network artificial intelligence (AI) techniques, aiming to streamline the complexities of designing, delivering, managing and operating private networks. It establishes a new framework of standards and principles for wireless networking, emphasizing open standards, interfaces, functions, and interoperability to foster greater market competition.

O-RAN’s primary objective is to reduce vendor lock-in, enhance flexibility, and develop network cognitive functions by leveraging data to optimize network performance and improve its resilience, to achieve cost efficiency, particularly in managing diverse heterogeneous networks. Its key advantages lie in software-defined network (SDN) technologies and virtualized network functions (VNF), which not only slash deployment costs but also enable network programmability for autonomous management. This, in turn, reduces operational complexity and optimizes performance [1].

On the other hand, O-RAN is still a relatively new approach. It comes with many realistic challenges in implementation, such as interoperability with legacy network management systems, data integration issues, immaturity in data processing platforms to produce data-driven applications, management, and other technical complexities. In addition to these challenges, there is a shortage of experienced engineers and an increased number of engineering roles with specific skill sets that are necessary to boost this transformation in RAN architecture.

There is a wide range of applications required in the O-RAN system to improve its functionality, such as predictive maintenance and anomaly detection, energy efficiency optimization, automated network configuration and healing, enhanced quality of service (QoS) and traffic management, enhanced user admission control, dynamic RAN slicing [2, 3, 4, 5]. AI and machine learning (ML) approaches are usually considered as the main tools to tackle these challenges [6]. The combination of programmability and AI in O-RAN, from the implementation of xApps and rApps using the interfaces offered by RAN intelligent controllers (RICs), leads to the automation of network management tasks and makes real-time, data-driven decisions [7]. The architectural innovations of O-RAN allow network operators to integrate AI algorithms into the O-RAN network, enabling the use of AI for tasks such as network optimization, troubleshooting, and other complex business problems. By automating these tasks, AI can be used to improve network performance, enhance the customer experience, and reduce costs.

However, developing solutions in O-RAN involves numerous challenges and complexities. These include obtaining a large amount of reliable training data from the network, managing and monitoring AI models for execution and inference, and regulating and implementing update mechanisms for the AI models deployed in the operational support system (OSS) stack.

In theory, the data and model aspects of these challenges can be partially addressed through machine learning operations (MLOps) [8]. However, in the context of O-RAN, the challenges are more complex due to multi-purpose applications and multi-platform issues stemming from the multi-vendor nature of O-RAN. The data sources for AI models are vast and intricate, often originating from multiple systems as shown in Fig. 1. Consequently, relying on a traditional data solution provided by a single vendor is nearly impossible in the multi-vendor environment of the mobile network operator industry.

Refer to caption
Figure 1: Multi-vendor ORAN network setup with Vendor-A in light blue, Vendor-B in green, and Vendor-Z in orange. In addition to other systems, equipment, and user devices that are working within the network and produce important data.

This underscores the need for a comprehensive platform and methodology to apply analytics and AI solutions within O-RAN. Such an approach is essential to effectively address these challenges and organize the efforts involved in developing data-driven solutions. These challenges present opportunities for network operators to design and implement a platform capable of tackling these issues whilst being able to proactively test and deploy AI models in O-RAN, to enhance the network’s adaptability and efficiency.

In this paper, we introduce a novel cloud-native, open data-driven platform for O-RAN to address these challenges and streamline the integration of AI applications into the O-RAN management stack. This platform leverages cutting-edge engineering technologies and concepts such as data lakehouse [9], DevOps [10], and MLOps. The design of this platform also considers the growing number of private network deployments, and the increased complexities of these networks and use cases, ensuring future scalability and the capacity to handle the vast amounts of data generated within the network. The contributions of this paper can be summarized as follows:

  1. 1.

    To the best of the authors’ knowledge, this is the first holistic cloud-native platform for multi-vendor system integration and services management proposed for O-RAN.

  2. 2.

    This cloud-native platform is tightly aligned with the O-RAN architecture for potential AI model implementation and integration.

  3. 3.

    This platform fully automates involved infrastructure, setup, data, and AI pipelines using DevOps and GitOps technologies, significantly reducing operational workload and streamlining the development cycle, which enables better collaboration between business owners and developers, leading to more efficient resource utilization.

  4. 4.

    The paper presents an existing near real-time business problem related to connectivity to be resolved on an offshore mobile network that uses O-RAN technology and provides a solution for it using anomaly detection with a long-short term memory (LSTM) model in the proposed data platform. This is the first AI-based solution targeting a use case of an offshore mobile network built using O-RAN technology.

  5. 5.

    This paper provides a new cloud-native open data platform architecture that will support multi-vendor O-RAN designs. It also presents the used method of work in the AI lifecycle and other data-driven applications to tackle the lack of specialized human resources and standardization when deploying AI in O-RAN.

The remaining sections of this paper are constructed as follows: Sec. 2 presents the background of O-RAN and AI-enabled intelligent networks. Sec. 3 elaborates on the proposed cloud-native open data platform. The anomaly detection use case leveraging AI technique is detailed in Sec. 4. Then, in Sec. 5, we discuss the potential problems of the raised platform and the plan for its future development. Lastly, the conclusions of this paper in Sec. 6.

2 Preliminaries

In this section, we present the background information regarding the O-RAN technique and its AI applications.

2.1 O-RAN

The RAN is a critical component of a typical mobile communication network, enabling UE to connect to the core network, which then delivers services to users. The evolution of wireless communication systems from 1G to 5G highlights increasing modularity and virtualization of network functionalities.

Key advancements in RAN architecture include distributed RAN (D-RAN), centralized (or Cloud) RAN (C-RAN), and virtual RAN (vRAN). The distinctions among these architectures are detailed in [11].

In the 3GPP 5G new radio (NR) specifications, the traditional base station (BS) is composed of three main components: the centralized unit (CU), distributed unit (DU), and radio unit (RU). The CU and DU together perform the functions of the baseband unit (BBU), while the RU is responsible for signal conversion and radio frequency (RF) transmission.

O-RAN aims to address vendor lock-in issues by promoting the decoupling of hardware and software. This approach advocates for open, standardized interfaces, virtualized network elements, and white-box hardware, driven by principles of intelligence and openness. By doing so, O-RAN seeks to transform the RAN industry, fostering a more flexible and interoperable ecosystem.

2.1.1 Openness of O-RAN

Openness in O-RAN involves adopting standardized interfaces to ensure interoperability, enabling seamless integration of hardware components from various vendors, and fostering a multi-vendor RAN ecosystem. The O-RAN Alliance has issued various specifications to support this initiative. Technically, O-RAN adheres to 3GPP 5G NR specifications, featuring the CU, DU, and RU. As illustrated in Fig. 2, the RU and DU are disaggregated based on the 7.2x split [12] and connected via the open fronthaul interface. Further segmentation of the CU results in two logical components: the CU control plane (CU-CP) and the CU user plane (CU-UP), enhancing deployment flexibility and reducing latency concerns. The DU and CU are interconnected through the open midhaul F1 interface, which is divided into F1-C for control plane communications and F1-U for user plane connectivity.

2.1.2 Intelligence of O-RAN

The intelligence of O-RAN is a pivotal aspect that enhances its functionality through the integration of AI and ML. These advanced technologies enable sophisticated network automation, allowing for dynamic resource allocation, efficient management, and proactive orchestration of network functions and resources. At the heart of this intelligence is RICs [13], which are designed to host various applications that drive network optimization and network operational and maintenance processes. RICs are categorized into non-real-time (non-RT) RIC and near-real-time (near-RT) RIC, each supporting different types of applications known as rApps and xApps, respectively. It can be seen from Fig. 2 that the near-RT RIC connects to the O-CU/O-DU via the E2 interface for near-real-time control, while the non-RT RIC communicates with the near-RT RIC through the A1 interface for non-real-time control and AI/ML model updates. Additionally, the O1 interface links the non-RT RIC with other RAN components for overall service management and orchestration [14]. This layered approach ensures that O-RAN can adapt to varying network demands and conditions in real-time, significantly improving performance, reducing operational costs, and enhancing user experience.

Refer to caption
Figure 2: The detailed O-RAN architecture and components.

2.2 The motivation of deploying AI in O-RAN

AI is becoming increasingly important in O-RAN compared to traditional RAN due to its capability to address complex network demands and enhance overall performance. Several key aspects benefit from the integration of AI [4, 15]:

  • Reducing Complexity: O-RAN networks have a more complex, disaggregated architecture compared to traditional RAN, making manual management and optimization more challenging. Also, building traditional applications that process the data will be challenging. AI algorithms can automate and optimize these processes. it also compensates for the shortage of skilled engineers to manage such novel networks [15].

  • Real-Time capability: The new O-RAN architecture supports Real-time and near real-time RIC allowing AI algorithms to respond to network changes in real-time, allowing for more efficient and effective management of the network, like traffic steering [16].

  • Cross-Layer Optimization: The intelligence executed in O-RAN is expected to perform cross-layer optimization over the network, which outperforms the classical optimization focusing on solely communication blocks.

  • Future Possibilities: The programmability, openness, and disaggregation enable opportunities for innovation especially when utilizing AI. One of the most important ideas that will change the shape of networks is autonomous management which will provide advanced capabilities compared to the traditional methods. Some of these capabilities are as follows:

    • Improved Network Performance: AI-based algorithms can be used to optimize network performance by dynamically allocating network resources and adjusting network parameters based on real-time network conditions such as autonomous QoE and QoS resource optimization [4, 15].

    • Cost Savings: By fitting autonomous network management in O-RAN, this will reduce the need for human intervention to manage the complex networks, being an essential concept to scale up the number of O-RAN networks and their size. It will also provide additional means to reduce cost by deploying specific AI applications such as automatic energy-saving applications and efficient resource utilization  [17].

    • Energy Efficiency Improvement: The AI capability embedded in O-RAN will be assisting in reducing the overall operational energy consumption of the O-RAN system. Apart from the software design of optimizing the network elements and control signal configuration, with AI, more flexible and fine-grained network function operation methods can be supported, for instance, the toggling off and on of carriers and cells in O-RAN can be conducted in the RIC with a non-real-time fashion [18].

In summary, the use of AI in O-RAN allows for more powerful automation, optimization, and insights compared to traditional RAN, making it a key enabler for the development and growth of the O-RAN market [19, 20, 21, 22, 23].

2.3 The challenges in enabling AI in O-RAN network

In this publication, in light of the authors’ engineering background, we focus on engineering challenges, notwithstanding the whole plethora of challenges related to legal, regulatory, business models, etc. As we progressed toward implementing AI in O-RAN, we encountered several hurdles and challenges:

  • Multi-vendor RAN Model: The multi-vendor model involves deploying and operating RAN equipment and software from different vendors within the same network. This raises significant challenges, especially in centralizing management and collecting data from these systems. Data integration becomes more difficult, delaying the development of AI applications. Additionally, having multiple O-RAN models can isolate each component, limiting the data available for building holistic applications.

  • Big Data Characteristics: O-RAN networks generate highly complex, diverse, and voluminous data, such as network performance data, configuration management data, fault management data, infrastructure data, and user equipment trace data. The big data characteristics of these data sources pose challenges in processing and analyzing information in real-time or in large volumes. Moreover, the varied structures and formats of data make integration into existing analytics platforms difficult. Consequently, many existing platforms may not meet the unique data requirements of O-RAN networks, slowing AI model development and complicating the integration of developed models with other O-RAN systems from different vendors.

  • Integration with Existing Systems: Building certain AI models with specific algorithms requires data from external sources not directly accessible to the O-RAN platform, such as UE and network functions of other domains such as transmission networks or core networks. This external data is essential for comprehensive analysis and characterization of performance across the overall network ecosystem but poses integration challenges.

  • Standardization: The lack of standardization in AI for O-RAN and RIC APIs presents significant challenges. There is no guarantee that developed xApps or rApps will be reusable across different RICs. Additionally, the industry is still in the early stages of establishing a centralized RIC system for the multi-vendor O-RAN model, further complicating standardization efforts.

  • Skilled Resources: Managing AI algorithms in O-RAN networks requires specialized skills and expertise, which may not be readily available in the market. A deep understanding of O-RAN is essential for those involved in AI development for O-RAN, making it difficult to find qualified personnel.

The primary challenge lies in the absence of suitable data analytics platforms capable of accommodating the distinct data needs of managing multiple O-RAN networks built using multiple O-RAN vendors. To overcome this obstacle, it is imperative to develop new data analytics platforms specifically designed to meet the unique data demands of O-RAN networks. These platforms must seamlessly integrate with existing O-RAN network systems and associated infrastructure components. They should have the ability to process and analyze vast quantities of complex and varied data to deliver use cases that depend both on real-time (or near-real-time) capabilities, and other use cases that are not real-time in nature, whilst facilitating the integration of data from other non-ORAN diverse sources.

Given that the approach to O-RAN is still an evolving technological concept there is ample opportunity to contribute to the development of more robust systems. These challenges have driven us to create a unique cloud-native data analytics platform. Our goal is to directly address these difficulties and provide an environment that supports the development and execution of AI applications within the O-RAN ecosystem.

3 The proposed cloud-native open data platform

Before introducing the proposed cloud-native open data platform, we will present the problem statement around the management of multiple O-RAN network instances and platforms. Fig. 1 depicts a scenario where one communications service provider (CSP) implements multiple O-RAN network instances and platforms to support its ecosystem of mobile private networks (MPNs) and neutral host networks (NHNs). This scenario is based on real-world implementation of networks and services and it is not a theoretical exercise.

The software platform of O-RAN Vendor A in light blue has been selected to build a network system that delivers against the requirements of MPNs. On the other hand Vendor B in green, has been selected to build a system to deliver combined multi-operator RAN (MORAN) services in outdoor high-density demand (HDD) areas, implementing each mobile network operator (MNO) on its own dedicated virtualized network instance. Vendor C has been selected to build MORAN in-building coverage and capacity services, and similarly to the HDD use case, each MNO has been implemented on its own dedicated virtualized network instance. Other potential vendor platforms or software stacks might be chosen in the future to address the specific architectural and functional requirements dictated by new use cases.

There are other network components such as the IP network in dark blue, and IT & Security in yellow (which is shared by all vendors). Other network components are also used like routers, mobile apps, and IoT devices, each of which supports the operation of the network.

These networks are deployed using these different vendors’ systems and support multiple MNOs, in addition to our MPNs, and they are deployed in different geographical areas like airports, offshore windfarms, stadiums, smart cities, stadiums, hospitals, etc. Each use case deployment is different in terms of the vendor that is used, the internal design inside each vendor, the supported MNOs, and the supported network features. This creates a versatile design of networks to comply with the customer’s needs and use case requirements

The authors of this paper have collaborated closely to tackle the unique challenges that arise in the development cycle of AI/ML models within these O-RAN ecosystems. So far, we have noticed a significant gap: there is a lack of a supportive platform for conducting data management and analytical processes across multiple-vendor-based O-RAN systems and for deploying AI models within them. This absence points to a crucial need for innovation and development in this area, highlighting an opportunity for us to contribute to bridging this gap and advancing the field. Accordingly, we designed and implemented a multi-vendor cloud native open data architecture, which is designed to address the workflow presented in Fig. 3.

The platform is used to centralize, normalize, and standardize the management of multi-vendor networks, by integrating the data sources of each sub-ecosystem onto one single overlaying data management system allowing easy access to the integrated data sources via the inbuilt data pipelines that standardize the entire data analytics process. This is in opposition to adopting a casuistic approach to each new network component or network system being deployed.

This novel approach permits splitting the data pipeline creation work to be performed by three different roles:

  • The RAN subject matter expert (SME) who understands the business problems to address and is able to describe these in terms of user cases detailing the purpose of the process under development, the topological nature of the data and its structure, explain the rationale of the analytical process, define the criteria for validation of the results and what can be classified as a successful outcome from developing/implementing a data pipeline and analytical application.

  • Data engineer who understands how to use the data management platform and its tools to create the data pipelines in collaboration with the SME and the data scientist.

  • The data scientist will use the continuous integration and continuous deployment (CI/CD) and MLOps techniques to implement the AI/ML models that will be trained to deliver the desired outcomes as defined in collaboration with the SME.

In O-RAN or any operational RAN environment, acquiring, storing, and processing data for model training poses significant challenges. While standard interfaces like E2 defined in the O-RAN architecture provide access to components such as O-DU, O-CU, and others within the network ecosystem, the data retrieved is typically raw and lacks a standardized schema, rendering it unsuitable for direct consumption by AI algorithms.

To effectively leverage AI within O-RAN and its interfaces, a multi-stage process is necessary. Initially, raw data from various sources must be collected, validated, enriched, transformed, and consolidated into an integrated data pool. This prepares the data for processing by using data engineering techniques, such as applying business rules, calculating key performance indicators (KPIs), performing feature engineering, and linking data tables based on network topology mapping. These processes ultimately enable the application of algorithms tailored to specific use cases. Moreover, an O-RAN network is constructed on top of other system components, including IP networks and cloud server infrastructure. The operation and maintenance of these components are vital for overall network performance and should be seamlessly integrated into a holistic network management process that encompasses all system elements.

The primary goal of this platform is then to streamline and speed up the development and hosting of AI solutions within the O-RAN ecosystem. The effort distribution across different stages of development of an AI-capable system to manage O-RAN networks, depicted in Fig. 3 shows that at the base of the pyramid lies the data acquisition and mediation stage that involves the biggest share of development effort. It is at this stage that the collaboration work between the three roles, mentioned earlier, is more intensive and if done correctly the components developed at this stage will underpin the work done at other stages facilitating and reducing the effort spent at each stage. Ultimately, the AI model deployment stage will benefit from having all the required components readily available from the start.

This efficiency stems from the elimination of redundant work across multiple network deployments as data migrates to a centralized system that standardizes the data handling processes, enabling the adoption of uniform AI models across different systems provided by different vendors.

Refer to caption
Figure 3: Cloud-native Data platform workflow with respect to the applied efforts.

Fig. 3 presents the key stages of AI model development, highlighting the effort involved in achieving the final product. The subsequent stages of the data acquisition and mediation stage involve data storage and processing, which, while less effort-demanding, are vital for improving the performance of the data management process and for preparing datasets to meet the requirements defined by the SME for the analytical stages of the workflow. At the top of the pyramid sits the development of the AI models, built upon the foundations laid by previous stages. For Example, This structured approach ensures that developing AI models is more straightforward and effective, supported by the availability of high-quality, systematically organized data. This approach increases the usage of standardized components, facilitating the integration of new network instances and the adaptation of existing AI models.

Fig. 4 depicts the main layers and components of the data management system and its interfaces towards the network functions and management systems. The data mediation layer implements the tools to connect with the multiple data sources, collecting the data and performing initial processing according to required processes to validate and enhance the quality of the datasets and unify the data (files and/or streams) coming from multiple instances of the same data-source type into one coherent pipeline. The datasets are then stored in the data storage layer and/or streamlined to upper layers such as the Data Virtualisation & Processing layer, Application Layer, or Data Visualisation and Monitoring Layer. When the data is cleansed, validated, and enriched it gets processed in the processing layer using big data execution engines or using virtualization techniques also depending on the AI use case. The Policies, Control, and Management Layer contains information about the network topology, data mapping, and the roles that are applied to the data in the processing layer to produce richer datasets, features, and KPIs to be used in the AI Layer and/or Visualisation and Monitoring Layer, the single pane of glass. In the visualization Layer, the network engineers can access a set of reports and dashboards that combine multiple datasets from multiple data sources providing information about network performance, its configuration, and faults addressing analytical use cases such as service assurance and operational situational awareness.

Refer to caption
Figure 4: Boldyn Network Cloud native Data Analytics platform for O-RAN.

3.1 The platform architecture

Despite the layered depiction, the order or positioning of the layers does not necessarily indicate a hierarchy or sequence. Each layer can be accessed directly, independent of its position in the stack. The layers are broken down as follows:

3.1.1 Data collection agent (DCA)

A DCA is a self-built software application deployed across the network equipment. This software is developed to extract or generate data from a function or interface that is not readily available and is considered important for the implementation of one or a set of use cases.

3.1.2 Data acquisition and mediation layer

This is the layer where the heavy tasks will be done to integrate all the deployed networks and supporting systems. The data is collected from all network functions and devices data sources including O-RAN, IoT devices, UEs, customer premise equipment (CPE), etc. This layer will deal with data in various formats and structures such as csv, json, xml, unstructured or semi-structured text, APIs, SFTP streaming system, snmp, etc. This layer will unify the way data is provided for the next layers and create a stream of coherent information from the disparate content provided by each data source. The data in this phase will go through different initial processing to enhance the quality like data parsing, enriching with more information and schema, transforming the data format and content, and distributing it to the next layers. This layer is important to standardize the data that is made accessible to other components of the data management platform.

3.1.3 Data storage layer

The collected data is stored in this layer. Depending on the volume and the accessibility requirement for this data by other components, a suitable storage system and table are used. Data lakehouse technologies are employed for sets of big data and relational databases for smaller datasets of data and mapping information.

3.1.4 Data streaming layer

This layer facilitates delivering data to the application layer or processing layer in real-time. This process avoids the latency associated with storing data in the storage layer and subsequently retrieving it, which is important when dealing with online monitoring and decision-making use cases. There’s an independent process for storing this data running in parallel, that doesn’t impact the performance of streaming procedures. Many important applications use streamed data, and they solve complex business problems. In this work, it was implemented for an anomaly detection AI/ML model that requires a continuous stream of data being fed with very low latency. This anomaly detection model is an example of an application where the data is processed in real-time before even being stored in the data lake. These use cases are important when we consider the case of near-RT RIC and RT-RIC where the latency of the decision-making process must be kept in order of magnitude of milliseconds, and data cannot be stored before being processed.

3.1.5 Data virtualization and processing layer

As data arrives at the storage layer, streaming layer, or both, it’s not immediately accessible for all types of processing. Here, two concepts come into play: the big data execution engine and data virtualization. Both share similarities in terms of data processing and access, but they differ in their applications. The execution engine, such as Spark, is used for complex computations on vast amounts of data to calculate network KPIs and perform feature engineering for AI. On the other hand, data virtualization simplifies data access, joining data from different sources like lakehouse objects and databases in a typical SQL-based manner. For instance, we use it to link the calculated KPIs with the network topology information and data from other sources to create views and reports that can be utilized by others in the upper layers of the data management platform.

3.1.6 Policies, controls and management

We use it to implement business rules and relationships across the four Data layers and the AI application layer. It manages O-RAN fault, configuration, accounting, performance, and security (FCAPS) [24] metadata, network topology, other data sources, performance alarm, and complex event alarm definition, AI rules and policies, network API access, and change activities on the network. In addition to that, it’s used as a central mapping layer to standardize data across different vendors into a common set of identifiers, promoting seamless data access and analysis. It improves data coherence and simplifies cross-networks operations.

3.1.7 Application layer

This is the layer where the data-driven application lifecycle is complete. In AI applications, this is the place where SME and data scientists work together on training, testing, publishing and validating an AI product.

3.1.8 Data visualisation layer

This layer is mostly dedicated to implementing business intelligence functions that offer SMEs a set of visualization artefacts such as reports and dashboards that combine data from multiple sources and at different stages of processing organized as per the use case definition. This can be described as the visual interface to monitor the overall system performance and provide situational awareness about network operational and maintenance priorities. This layer implements an interface between the engineer and the AI models, by reporting the actions taken by AI models during their regular operations. More details about this layer can be consulted in [25].

3.2 Features of this platform

The platform is designed to offer flexibility in working with O-RAN data and includes the following features and benefits:

  • Unification: The platform can collect, store, and process all types of data from various sources, regardless of volume, velocity, and variety, thanks to its modern open data architecture. This feature contributes to:

    • The development of batch and real-time processing for network applications.

    • Enabling data analysts to process and access data for advanced analytics tasks.

    • Supporting the development and deployment of AI models, paving the way for advanced federated learning in multi-vendor networks.

    • Handling operational needs ranging from reporting of complex processed metrics, performance, or fault events to interactive visualization with these dashboards and data cubes.

  • Cost-effective: Built on commodity servers, the platform leverages Kubernetes (K8s) container orchestration and other tools to simplify management, reducing the resources required. On the other hand, Using open data architecture and data lakehouse technology reduces costs by consolidating data, optimizing query performance, reducing bandwidth usage, leveraging scalable cloud storage, utilizing cost-effective open-source tools, and improving collaboration and usability.

  • Automation and Standardization: Using DevOps and GitOps [26] technologies, all infrastructure, setup, data, and AI pipelines are fully automated. This reduces the operational workload for development and deployment, making the development cycle predictable. Business owners can collaborate more effectively with developers to create solutions, ultimately reducing the resources needed.

  • Scalability: As the number of O-RAN networks deployed increases the platform’s scalability is crucial. It has been thoroughly evaluated to ensure it meets the increased demands for AI, data processing, and storage.

3.3 Platform core components:

Refer to caption
Figure 5: Boldyn Networks cloud-native data analytics core components. This figure shows the main components related to infrastructure, Operating systems, k8s management and data core components.

Fig. 5 depicts the architectural building blocks of the cloud-native data management platform. This platform is further detailed as follows.

The first tier, following a bottom-up order, contains the K8s infrastructure and its automation toolset. we utilized Terraform for defining and provisioning our infrastructure as code enabling efficient management and automation of infrastructure resources that run the operating system (second tier of the architecture) Talos OS. Talos OS is a modern, Linux-based operating system that is specifically designed for K8s. It provides a secure, minimal, and immutable platform to enhance security. Its major feature is that it automates and simplifies the management and operations of K8s clusters. On top of this, K8s is installed to deploy the microservices and applications that implement K8s cluster management functions and data management applicational system (this being the third tier). The following applications and services are implemented in the third tier:

3.3.1 Data management applicational systems

  • Apache NiFi is the core of the Acquisition and Mediation layer and is used to automate and manage the flow of data between systems where the data sources are located (across the multiple platforms and their components) and other applications and systems of the data management platform that consume these data. Therefore, it is the main application used to implement the collection and mediation layer of the platform.

  • Apache Kafka is the core streaming system and is used to transfer data in real-time from the source to the processing layer.

  • In the storage layer, we use the object storage Minio which provides an S3-like API. Depending on the data use case, we store the data using one of the data storing techniques like Hudi, Deltalake, or Iceberg open tables, or even in its raw status.

  • The data is later accessible by Apache Spark to perform complex big data processing on data from Streaming, data lakehouse, databases, or from all. Trino is, on the other hand, a virtualization system and it’s used to perform SQL-like queries on any dataset available anywhere in the platform.

  • In the application layer, we use Python, Jupyter Notebook, and MLflow to train, test, and validate the AI module before publishing its micro-service deployment in the processing layer. We also use DBT for version-controlled analytics workflows.

  • In the control and management phase, we use Apicorio as a schema registry, Hive Metastore to store data catalogues and metadata, and we also use PostgresDB to store the network topology, rules, and alarm triggers.

  • The last phase in the data is Elasticsearch and Kibana where we visualize different data from different sources in addition to the applied AI actions and results in one dashboard that serves as the single pane of glass in the platform.

3.3.2 K8s cluster management tools

There are other systems mentioned in the management toolbar. These tools are used to support the data tools in the data layers.

  • Longhorn & Rancher: for k8s and cloud-native storage (data layer for applications) management.

  • Prometheus and Grafana: for platform monitoring and alerting.

  • NeuVector and Kube-bench: for security and vulnerability checks.

  • Velero and Fluentd: for logs and backup management.

  • Argo CD and GitHub Actions: for CI/CD and GitOps.

3.3.3 CI/CD pipeline

The development of data pipelines on this architecture can be a complex, tedious, and error-prone process if done manually. A set of tools and systems are used to automate the development and deployment of these pipelines. GitOps methods, such as CI/CD, are used to simplify, automate and manage this process.

Fig. 6 shows an example of one CI/CD process implemented to develop and deploy the data pipeline of one use case addressed by the platform, providing details about every data engineering and AI model preparation task involved.

The pipeline CI/CD cycle starts by developing the source code and publishing it onto the GitHub repository, which triggers a set of automated workflows that checks the quality of the code and performs security scans. In the next step, the code is built into Docker images following a process known as containerization. All the Docker images are then submitted to a process of security vulnerability scanning, before being stored in the Docker registry. The last stage of the CI process is completed when all the deployment manifests are updated with the new Docker image. The final stage of the cycle is the CD process, which uses ArgoCD to automate the process of fetching the latest changes in the deployment manifests and deploying the new application.

Refer to caption
Figure 6: Automation pipelines for the AI and related data processes in the platform.

4 The problem statement and the proposed solution

4.1 Problem statement

Fig. 7 illustrates the use case of deploying a commercial O-RAN-based MPN in an offshore location, serving the UEs that are located in vessels that navigate around the windfarm attending to wind turbines for infrastructure operational and maintenance activities. These vessels spend most of the time navigating close to the windfarm and carry tens of people who work at sea for periods of 15 days. Therefore, these people rely on this connectivity to do their work, to communicate with their colleagues working onshore, their families, and for their entertainment. The connectivity provided by this network also supports business and operational critical processes to the organizations responsible for operating and maintaining the network, across an area of around 300 Km2. Many factors such as weather conditions, sea conditions, distance to the site, and MPN operational faults may affect the quality of the connectivity service as experienced by the end-users, which might impact the: ability of running business and operational critical processes work according to the requirements or the ability of people working productively in their floating offices. Despite these challenges, the MPN’s network operator is responsible for managing, operating, and maintaining the network’s performance as per the contracted service level agreement.

Refer to caption
Figure 7: Illustration for the real-time offshore 5G network deployment using ORAN and challenges in managing coverage in the sea.

The problem addressed by the solution described in this paper occurred in an MPN service deployed by Boldyn Networks in a windfarm located in the North Sea. This network is composed of 3-cellular sites of 3 sectors each and one cellular carrier per sector, totaling 9 macro cells that cover the whole extension of the windfarm. Each cell operates in the LTE B3 and implements a 20MHz channel. The cellular sites are installed across the windfarm in three different turbines, where commercial-of-the-shelf (COTS) servers are used to run the containerized O-DU, and three O-RUs are installed to implement a sector each. The O-DUs deployed offshore connect, via dark-fibre light-up using long-haul SFPs, to an infrastructure of COTS servers that run the O-CU container. On the other hand, each relevant vessel’s network is composed of a Wi-Fi network that provides IP connectivity across the vessel. This Wi-Fi network connects to Boldyn’s customer premise equipment (CPE) that acts like an LTE broadband router back-hauling the vessels’ IP traffic to connect to the internet and the MPN customer’s enterprise network. This CPE is installed inside the vessel, near the bridge, and it’s connected to external 4x4 multiple-input-multiple-output (MIMO) antennae that increase the coverage and capacity of the network. Each CPE is dual-modem capable for connection resilience and traffic load balancing reasons and one MPN SIM is inserted in each modem. The problem resides in the connection recovery procedure implemented by the CPE when one of the modems drops its link to the LTE network. This procedure takes 5 minutes to re-establish an LTE connection to the macro network which is a long period of time and in some conditions increases the likelihood of disconnections occurring in both modems at the same time. In studying the problem, it was found the following:

  • These disconnections weren’t always correlated with the performance of the network, the radio link conditions, or the distance to the site.

  • It could be observed drift in the behavior of radio link performance, between the two modems of the CPE, verified by monitoring indicators such as latency and throughput even if most of the time these were connected to the same cell and benefiting from similar radio conditions.

  • Updating the configuration of the modem on the CPE via its API, forced the modem connection to restart and established a new radio link improving the performance of the modem in all the indicators.

  • The process of restarting the modem is much quicker to implement than the process of re-connection in case of radio link drop and has the additional advantage that can be done pre-emptively whilst the other modem is showing good performance.

  • The automation of this process of analysis and decision-making could reduce significantly the number of disconnections thus having a great impact on the quality of experience (QoE) of the users being served by the network system.

The automation of this process enabled us to develop a self-healing mechanism that implements an ML model, for the detection of the event of this type of performance anomaly in the modem, trained on available historical performance data merged with data that showed the timestamps where actions were taken to resolve the issue, which resulted in a set of labeled data emphasizing the relationship between patterns of behavior of the relevant performance indicators and the need for decision-making action. Some of the relevant performance indicators were reference signal received power (RSRP), reference signal received quality (RSRQ), IP packet data latency, timestamp, location, and cell to which the modem is connected to, and others from different parts of the network.

The process of collecting the data from the CPE device via the API every 5 seconds, preparing and enriching the data to stream this data to the application layer to be consumed by the ML model for online decision-making and direct provisioning, benefited from all the features and capabilities offered by the data management platform.

4.2 The proposed solution

Network anomaly detection or prediction is a complicated task. Anomalies are referred to as patterns in data that do not conform to a well-defined characteristic of normal patterns. Anomalies can be classified into three types: (1) point anomaly can be considered as a particular data instance deviation from the normal pattern of the dataset; (2) contextual anomaly is defined when a data instance behaves anomalously in a particular context; (3) collective anomaly happens when a collection of similar data instances behave anomalously with respect to the entire dataset, the group of data instances is termed a collective anomaly [27]. Existing anomaly detection methods include NN, support vector machine or rule-based classification, statistical signal process method and clustering. In this case, the prediction task is based on the historical records of network states and automated actions, so it can be regarded as a point anomaly issue, and NN is the preferred method. Moreover, to exploit the temporal correlations of the historical samples, the LSTM is the selected learning model.

4.3 Data source

This case study has benefited from utilizing data from a real-life network based on O-RAN standards. The O-CU, O-DU, and O-RU provide FCAPS data, which describes network performance and operational behavior. This FCAPS data is a fundamental building block for the AI model’s development cycle, playing a crucial role in the training and validation process. Once deployed, the AI model uses this data for inference. Another important data source comes from the CPE equipment in the vessel, which provides measurements of the modem’s performance and its location as a time series, mapped with the FCAPS information collected from the O-RAN network.

The CPEs are at sea, and their location is monitored continuously. A couple of months of this data is shown in Fig. 8111The latitude and longitude values are removed for the consideration of the users privacy.. This data shows records of vessels connected to the network even when they are outside the coverage area. It contains many outliers that need to be removed, specifically those indicating locations far from the nearest site.

Refer to caption
Figure 8: Visualisation for the location of the vessel from the data used in the AI model layered on top of the predicted network coverage.

4.4 LSTM model design

Table 1: Template for some of the exported network state data
WAN ID Carrier LTE-RSRP LTE-SINR LTE-RSRQ Latency Time Latitude Longitude
Refer to caption
Figure 9: Data pre-processing and labelling scheme.
Input LSTM FCN FCN Output
Figure 10: The architecture of the neural network model from input to output. Each block represents a stage in the process, with arrows indicating the flow of data.

The development of the LSTM model adopts the manner of offline development and online deployment. The historical data stored by the platform is used for the model’s training. After the trained model meets the training criteria, the natural model will be deployed in the further step. Historical data includes two parts, the network state data and corresponding actions. Actions here are operations recorded by the engineers manually when the QoS is lower than the threshold. Some of the network state data is exported using the template shown in table 1, where WAN ID indicates the ID of the devices that need an action; the carrier is the carrier frequency; LTE-RSRP, LTE-SINR, LTE-RSRQ, Latency, Latitude and Longitude are UE related information; Time refers the timestamp for that record.

The next anomaly must be forecasted according to the hidden pattern of historical network state records, so a reasonable solution is to formalize a sequence prediction task, where multiple discrete records on sequential timestamps will be cascaded as one training sequence. Then these training sequences are labelled by using manual operations. The training sequence is labeled as ‘0’ if there is an action needed, otherwise, the label is ‘1’.

For the features in table 1, the WAN ID, LTE-RSRP, LTE-SINR, LTE-RSRQ, and Latency indicators are some of the taken features for training, while removing the carrier, latitude, and longitude because they are unlikely correlated with the occurrence of the anomaly. The recording interval of raw data is 5 seconds. Two months of records were stored and then processed in the cloud-native data platform. The data pre-processing process is illustrated in Fig. 9.

First, for the given raw dataset, there are some records missing. So, the process needs to pad these missing values and remove the records that are obviously out of the normal range. Then as a step of de-noising, the data is downsampled to 1/N1𝑁1/N1 / italic_N of the initial one. Then assemble every L𝐿Litalic_L sample as one training sequence. Lastly, the training sequence is labeled. It is worth mentioning that the criteria of labeling, are the manual actions taken by the engineer, which may lag over the first occurrence of the anomaly of 5555 to 20202020 minutes. So, 10101010 minutes is taken as the average delay applied to data labeling. The training sequence will be a matrix with size 1×5×L15𝐿1\times 5\times L1 × 5 × italic_L, and the labels will be binary values. In this use case, after validation, N=2𝑁2N=2italic_N = 2 and L=6𝐿6L=6italic_L = 6 are reasonable options.

The architecture of the NN model is depicted in Fig. 10, wherein the LSTM module inherits from the LSTM function in Pytorch and the input size equals 1×5×61561\times 5\times 61 × 5 × 6, hidden size equals 2. Two linear layers follow the LSTM module, and the output is the indicator of taking action or not. The last layer adopts the activation function ’sigmoid’.

For the model’s training, the training/validation dataset is divided according to the 80%/20% rule. The loss function is the binary cross entropy (BEC) [28]. The training is on the two RTX 2080Ti GPUs, the batch size is 2048204820482048 and Adam is taken as the optimizer. In the batch sampling process, the weighted batch sampler is used because this dataset is an imbalanced dataset, which means that labels ’1111’s are far more than labels ’00’s. The above training dataset is used to train the LSTM model, and the test set is used to evaluate the trained model. In the training stage, the training data is shuffled before batch sampling. Fig. 11 shows the model’s accuracy plots in the training and test. The convergence is reached after 100100100100 epochs.

Refer to caption
Figure 11: The model accuracy of the training and test set.

4.5 Model deployment and validation

The real-world deployment is relatively straightforward using the Boldyn data analytic platform. As illustrated in Fig. 4, the development happens in its application layer. In validation and deployment, the model is dockerized to an image and serves as a part of the DevOps pipeline and is hosted directly in the processing layer. This microservice continuously receives the real-time data and feeds it to the LSTM model. The generated actions will trigger the change in the networks. After deploying the model in production the complaints and network dis-connectivity decreased significantly to a very acceptable limit. Table 2 shows the improvement in the service performance and that validates the model in production.

Table 2: Network performance before and after model deployment (measured weekly)
One Modem Two Modems Network Manual Automated Response Complaints
Disconnected Disconnected Disconnectivity Actions Taken Actions Taken Time per Month
Before LSTM 170 25 25 25 0 30 mins 20
After LSTM 20 1 1 1 170 10 sec 2

It is evident that connectivity significantly improved due to proactive measures taken when the model identifies potential connection drops. This has effectively prevented simultaneous loss of connection for both modems. Occasionally, such drops occur concurrently due to weather conditions or specific spatial factors, albeit infrequently and under unique circumstances.

Due to the innovative nature of this research and the emerging O-RAN technology, there is currently no publicly available dataset that closely matches the specific requirements of our study. Consequently, the data used in this research is proprietary and was collected from real-world deployments by Boldyn Networks. This limitation underscores the need for future efforts to develop and share standardized datasets to facilitate broader validation and comparison of AI models in similar contexts.

5 Discussions and future works

Refer to caption
((a))
Refer to caption
((b))
Figure 12: (a) This figure shows the illustration of the centralized AI model where the data from all the vendor systems and the network are centralized in the platform and then the AI model is trained, tested and validated in each specific O-RAN vendor. (b) present the potential federated learning aproach where the data is kept in each vendor system and the AI model is trained there and the global parameters are only shared with other deployments. this case is important when privacy is key.

The deployment of the AI-driven application in the proposed cloud-native data analytics platform for an offshore O-RAN network demonstrated significant improvements in connectivity and operational efficiency. The use of LSTM models for real-time anomaly detection effectively reduced network disconnections, enhancing the user experience. These results highlight the potential of AI in managing complex network environments, particularly in challenging offshore settings.

Deploying consistent AI models across complex, multi-vendor network environments remains a significant challenge, particularly when networks span diverse regions with varying system configurations and architectures. As shown in Fig. LABEL:fig:Data1, the centralized AI models, while effective within single-vendor environments, struggle to maintain adaptability and efficiency across different network setups. This underscores the need for more flexible approaches that can ensure re-usability and privacy compliance in such diverse contexts.

To address these challenges, we propose Federated Learning (FL) as a promising solution. FL enables the training of AI models using local data from each deployment location, thus preserving data privacy and enhancing the model’s applicability across different environments. This decentralized approach ensures that AI models can be deployed consistently and efficiently, even in multi-vendor, multi-region scenarios.

Our future work will focus on implementing the FL approach across various offshore networks managed by different O-RAN vendors (Fig. LABEL:fig:Data2). By doing so, we aim to validate the effectiveness of FL in ensuring consistent performance and privacy compliance across diverse network deployments. The future work will also explore potential optimizations to further enhance the scalability and efficiency of FL in real-world applications, particularly in environments with highly heterogeneous systems and vendor setups.

6 Conclusions

RAN plays a critical role in modern telecom infrastructure, evolving towards disaggregated and open architectures like O-RAN. These innovations enable the integration of intelligent, data-driven applications to enhance network reliability and operational autonomy. However, the operation of O-RAN networks poses challenges due to immature real-world practices and complexities in managing data and applications across diverse vendor systems.

Boldyn Networks has developed a novel AI-driven cloud-native data analytics platform to address these challenges. Tested with advanced LSTM models for real-time anomaly detection, the platform significantly improves operational efficiency and enhances customer experience. Leveraging DevOps practices and tailored data lakehouse architectures for AI applications, it exemplifies sophisticated data engineering strategies.

The deployment of this platform in an offshore O-RAN network demonstrated significant improvements in connectivity and operational efficiency, validating the model’s effectiveness. However, the reliance on proprietary data highlights the need for standardized datasets to facilitate broader validation and comparison of AI models. Future research should explore the scalability of such AI-driven solutions across diverse, multi-vendor network environments. Implementing FL could ensure consistent AI model performance while preserving data privacy across different regions and system configurations.

This platform demonstrates significant potential for advancing in-RAN AI development. We aim to contribute to the community’s understanding and implementation of complex challenges in this domain, fostering innovations and improvements.

Acknowledgment

The authors would like to sincerely thank the following individuals from Boldyn Networks for their invaluable contributions to this paper: Sean Keating, Chief Technology Officer UK & Ireland, for his managerial support; Andrew Conway, Group Director Technology Strategy, Donal O’Sullivan, Head of Product Innovation, and David Kinsella, RAN Solutions Architect, for their technical review of the paper; and Menglin Yao, Data & Software Engineer, and Michael Waldron, DevOps Engineer, for their platform operation and technical support. Their review, constructive comments, and support were instrumental in the development and completion of this work.

Authors Contributions

  • Abdelrahim Ahmad: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing - original draft.

  • Peizheng Li: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing - original draft.

  • Robert Piechocki: Project administration, Supervision, Writing - review & editing.

  • Rui Inacio: Conceptualization, Methodology, Validation, Writing - review & editing.

References

  • [1] M. Yang et al., “OpenRAN: A Software-Defined Ran Architecture via Virtualization,” SIGCOMM Comput. Commun. Rev., vol. 43, p. 549–550, aug 2013.
  • [2] P. Li et al., “A Digital Twin of the 5G Radio Access Network for Anomaly Detection Functionality,” in Proc. of IEEE ICNP, IEEE, 2023.
  • [3] M. Alavirad et al., “O-RAN architecture, interfaces, and standardization: Study and application to user intelligent admission control,” Front. Commun. Netw., vol. 4, 2023.
  • [4] M. Polese et al., “Understanding O-RAN: Architecture, Interfaces, Algorithms, Security, and Research Challenges,” IEEE Commun. Surv. Tutor., 2023.
  • [5] A. M. Nagib, H. Abou-Zeid, and H. S. Hassanein, “Safe and Accelerated Deep Reinforcement Learning-Based O-RAN Slicing: A Hybrid Transfer Learning Approach,” IEEE J. Sel. Areas Commun., vol. 42, no. 2, pp. 310–325, 2024.
  • [6] S. K. Singh, R. Singh, and B. Kumbhani, “The Evolution of Radio Access Network Towards Open-RAN: Challenges and Opportunities,” in Proc. of IEEE WCNCW, pp. 1–6, IEEE, 2020.
  • [7] A. Lacava et al., “Programmable and Customized Intelligence for Traffic Steering in 5G Networks Using Open RAN Architectures,” arXiv preprint arXiv:2209.14171, 2022.
  • [8] D. Kreuzberger, N. Kühl, and S. Hirschl, “Machine Learning Operations (MLOps):Overview, Definition, and Architecture,” IEEE Access, 2023.
  • [9] M. Armbrust et al., “Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics,” in Proc. of CIDR, vol. 8, p. 28, 2021.
  • [10] C. Ebert et al., “DevOps,” IEEE Software, vol. 33, no. 3, pp. 94–100, 2016.
  • [11] Faisal, “RAN Vs Cloud RAN Vs VRAN Vs O-RAN: A Simple Guide!,” Apr 2021.
  • [12] O-RAN Alliance, “O-RAN.WG4.CTI-TCP.0-R003-v04.00: Cooperative Transport Interface - Transport Control Plane Specification,” tech. rep., O-RAN Alliance, 2023. Accessed: 25 July 2024.
  • [13] O-RAN Alliance, “O-RAN.WG1.OAD-R003-v12.00: O-RAN Architecture Description,” tech. rep., O-RAN Alliance e.V., 2024. Accessed: 25 July 2024.
  • [14] A. Aijaz et al., “Open RAN for 5G Supply Chain Diversification: The BEACON-5G Approach and Key Achievements,” in Proc. of IEEE CSCN, pp. 1–7, 2023.
  • [15] M. Q. Hamdan et al., “Recent Advances in Machine Learning for Network Automation in the O-RAN,” Sensors, vol. 23, no. 21, 2023.
  • [16] H. Erdol et al., “Federated Meta-Learning for Traffic Steering in O-RAN,” in Proc. of IEEE VTC2022-Fall, pp. 1–7, IEEE, 2022.
  • [17] S.-P. Yeh et al., “Deep Learning for Intelligent and Automated Network Slicing in 5G Open RAN (ORAN) Deployment,” IEEE Open J. Commun. Soc., vol. 5, pp. 64–70, 2024.
  • [18] L. Kundu, X. Lin, and R. Gadiyar, “Towards Energy Efficient RAN: From Industry Standards to Trending Practice,” arXiv preprint arXiv:2402.11993, 2024.
  • [19] B. Balasubramanian et al., “RIC: A RAN Intelligent Controller Platform for AI-Enabled Cellular Networks,” IEEE Internet Computing, vol. 25, no. 2, pp. 7–17, 2021.
  • [20] L. Bonati, M. Polese, S. D’Oro, S. Basagni, and T. Melodia, “OpenRAN Gym: AI/ML development, data collection, and testing for O-RAN on PAWR platforms,” Comput. Netw., vol. 220, p. 109502, 2023.
  • [21] B.-S. P. LinI, “Toward an AI-enabled O-RAN-based and SDN/NFV-driven 5G& IoT network era,” Netw. Commun. Technol., vol. 6, no. 1, p. 6, 2021.
  • [22] S. Soltani et al., “Can Open and AI-Enabled 6G RAN Be Secured?,” IEEE Consum. Electron. Mag., vol. 11, no. 6, pp. 11–12, 2022.
  • [23] P. Li et al., “Transmit Power Control for Indoor Small Cells: A Method Based on Federated Reinforcement Learning,” in Proc. of IEEE VTC2022-Fall, pp. 1–7, IEEE, 2022.
  • [24] International Telecommunication Union, “ITU-T Recommendation M.3010: Principles for a Telecommunications Management Network,” tech. rep., International Telecommunication Union (ITU), 2024. Accessed: 25 July 2024.
  • [25] P. Li et al., “RLOps: Development Life-Cycle of Reinforcement Learning Aided Open RAN,” IEEE Access, vol. 10, pp. 113808–113826, 2022.
  • [26] GitLab, “GitOps: A Comprehensive Guide,” 2024. Accessed: 25 July 2024.
  • [27] M. Ahmed, A. Naser Mahmood, and J. Hu, “A Survey of Network Anomaly Detection Techniques,” Journal of Network and Computer Applications, vol. 60, pp. 19–31, 2016.
  • [28] PyTorch Contributors, BCELoss. PyTorch, 2023.