0% found this document useful (0 votes)
24 views20 pages

2018 Software-Defined Radios - Architecture, State-Of-The-Art, and Challenges - Elsevier Enhanced Reader

The document discusses Software-Defined Radios (SDRs), highlighting their architecture, current advancements, and challenges in wireless communication protocols. SDRs enable flexible and programmable transceivers that can adapt to various standards without hardware changes, making them significant for both military and civilian applications. The paper surveys existing SDR platforms, compares their architectures, and outlines future research directions in the field.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
24 views20 pages

2018 Software-Defined Radios - Architecture, State-Of-The-Art, and Challenges - Elsevier Enhanced Reader

The document discusses Software-Defined Radios (SDRs), highlighting their architecture, current advancements, and challenges in wireless communication protocols. SDRs enable flexible and programmable transceivers that can adapt to various standards without hardware changes, making them significant for both military and civilian applications. The paper surveys existing SDR platforms, compares their architectures, and outlines future research directions in the field.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 20
‘Computer Communications 128 (2018) 106-125 Contents lists available at ScienceDisect Computer Communications journal homepage: www.elsevier.com/locate/comeom Software-defined Radios: Architecture, state-of-the-art, and challenges Rami Akeela, Behnam Dezfouli” ret of Things Research ah, Eeparnet of Camper gern, Sana Clara Une, USA ® Keywert Software defined Radio (SDR) is a programmable transceiver with the capability of operating various wiceles ‘communication protocols without the need to change or update the hardware. Progress inthe SDR fe has led to the escalation of prcocol development and a wide spectrum of applications, witha greater emphasis on programmability, Nesbit, portability, and energy effiency in cellular, WiF, and M2M communication. {Consequently SDK has earned lt of attention and is of great signifiance to bh academia and industey. SDR ‘designees intend to simplify the realization of eommunction protocle wile enabling researcher o expr ‘ment with prototypes on deployed networks. Tis paper sa survey of the state-of thet SDR platforms inthe ‘conten of vrircless communication protocols. We offer an overview of SDR architecture and its sie compo: ‘ents and then discuss the significant design tends and development tol. In dion, we highlight key ‘contrasts between SDR architectures with regards to ener, computing ower, and are, ased on a set of metrics. We also review existing SDR platforms and present an analytical comparison as a guide to develope Finally, we recognize afew ofthe elated research topic and summarize potent solitons 1. Introdueti Advances in witeless technologies have altered consumers’ com: munication habits. Wireless technologies are an essential par of users daily lives, and their impact will become even greater in the future. Ina technical report, the World Wireless Research Forum (WWRF) has predicted that for 7 billion people, 7 trilion wireless devices will be ‘deployed by 2020 [1]. When these devices are connected tothe Internet to form an Internet of Things (lo) network, the first challenge is to adjust the basic connectivity and networking layers to handle the large number of end points. There isan increasing number of wireless pro- tocos that have been developed, such as ZigBee, Bluetooth Low Energy GLE), Long Term Evolution (.TE), and new WIFI protocols, that have been developed to meet the demanding requirements of various do- mains such a5 5G, IoT, and cyber-physical systems [2-4]. Wireless standards, in general, are adapting quickly in order to accommodate different user needs and hardware specifications (5,6). To meet these specifications, a transceiver needs to be designed with the ability to handle several protocols, including the existing ones and those being ‘developed. In order to accomplish this task, one needs to recognize the protocols’ need for a lexble, re configurable, and programmable frame: work. oth consumer enterprise and military frameworks have a need for programmable platforms. Due to the rapid and consistent advancement = Corresponding author. ‘mal ress: raclagsc. eds (R. Akela, hdeviali@seued (8. Denon) hhups://dai.og/10.1016/.comeom.2018,07.012 Received 24 Moreh 2016; Receive in revise form 27 June 2018; Accepted § Jy ‘silabl online 30 Tuy 2018 (0140:3664/ © 2018 Fever BL. Al sights eserved of wireless protocols, programmability is of central significance to de- signers in the industry. Hardware needs tobe able to keep up with both the evolution of technology and the changing user demands. For ex: ample, the authors in (7) proposed a platform called OpenRadio for programming both Physical PHY) and Medium Access Control (MAC) layers while offering high level of abstraction, Rather than including yet another picee of equipment to deal with a new standard or recur rence band, the equipment of a formerly introduced platform is abe to adjust to the features of another standard. Ina military scenario, for example, the needs of these platforms can change in light ofthe highly "unpredictable conditions that arse during a mission. While these needs might not have been envisioned when designed intially, they ed to the ‘development and utilization of new protocols. Software-defined Radio (SDR) is a technology for radio commu: nication. This technology is based on software-defined wireless proto- cols, a8 opposed to hardware-based solutions. This translates to sup: porting various features and functionalities, such as updating and "upgrading through reprogramming, without the need to replace the hardware on which they are implemented, This opens the door to the possibilty of realizing mult-band and multi-functional wireless de- ‘The driving factors for the high demand of SDR include network interoperability, readiness to adapt to future updates and new proto cols, and most importantly, lower hardware and development costs, In 218 Abc Desf 4 report [8], the SDR market is projected to be worth more than $29 billion by the year 2021. Global Industry Analysts, Inc. [9] highlights ‘some of the market trends for SDR as follows: (i) increasing interest, from the military sector in building communication systems and large scale deployment in developing counties, (i) growing demand for public safety and disaster preparedness applications, and (ii) building Virtualized base stations (BSS). SDRs are also ideal for developing future space communications (10-12), Global Navigation Satellite System (GNSS) sensors [13], Vehicleto-Vehicle (¥2V) communication (14-16), and lo? applications (17,18), where relatively small and low- power SDRs can be utilized ‘The SDR industry flourished due to the Joint Tactical Radio System, OTRS) program, which was responsible for producing SDRs for the military. In urn, this led to the creation of an entire world of new technologies, Software Communications Architecture (SCA), and Electronic Design Automation (EDA) tools that facilitate the develop ment of SDRs [19], The newly abundant resources made it relatively feasible to fuel the effort to develop more SDRs, not only forthe mil tary, but alzo for evil applications. The first commercial SDR, named Anywave [20], was a dual-mode base station that supported both Global system for Mobile communication (GSM) and Cade Division “Multiple access (CDMA) concurrently and ran on GPPs. Another tech- nological advancement with a huge impact on the SDR industry was the ‘development and release of Radio Froquency Integrated Circuit (RFC), Which supports most frequency bands in the Miz to Gz range Researchers have been studying SDRs for several years and are striving to find better means of implementing them in order to optimize their processing and energy effieney. SDRS are implemented using various types of hardware platforms, such as General Purpose Processors (GPPS), Graphics Processing Units (GPUS), Digital Signal Processors (DSPs), and Field Programmable Gate Arrays (FPGAs). Each ‘of these platforms is associated with its vn set of challenges. Some of these challenges are: utilizing the computational power of the selected hardware platform, keeping the power consumption at « minimum, ‘ease of design process, and cost of tools and equipment. Both the re: search community and industry have developed SDRs that are based on the aforementioned hardware platforms. A few examples include USRP (21), Sora (22), Atomix (23), Airblue [24], and Wireless Open Access Research Platform (WARP) [25]. Each SDR is unique with regards to the design methodology, development tools, performance, and end application. In this paper, we frst present an overview ofthe SDR architecture, ‘as well as the analog and digital divides of the system and inter. ‘connection of components. Then, we introduce th criteria that defines how the different hardware platforms are classified. We thoroughly ‘examine the architecture and design approaches employed by these hardware platforms and present their strengths and weaknesses in the ‘context of SDR implementation. Furthermore, we provide an analytical ‘comparison of hardware platforms as a guide for design decision making. Moreover, we discuss the use of development tools and present 4 summary to give a streamlined explanation of their functionalities ‘and the platforms they support. Afterwards, we review the SDR plat forms developed by both industry and academia, analyze them, and ‘compare them using the criteria that was discussed earlier. Finally, we identify the current challenges and open research topes that are related to future SDR development. “This paper i organized 28 follows: Section 2 provides a description ‘of SDR architecture and the classification process that is used to sum marize the various design approaches adopted. Section 3 provides a ‘comprehensive study ofall the hardware platforms and associated de sign methodologies that are used to build SDR platforms. Sesion 4 lists ‘some ofthe corresponding development tools and platforms. Section 5 presents an analysis and comparison of the commercially and acade- ically developed SDR platforms. Section 6 highlights research ques- tions and future trends. Section 7 presents an analysis of the existing literature on SDR surveys. We conelude the paper in Section 8. A Tist of Camper Commoners 128 2018) 106-125 ‘Table 1 Key abbreviations. asic ‘Appleton pee aerated Creie ae Dgitalo-Anlog Caoverter bsp Digi ial Proceso Hors ating Pit Operations Per Sec Fre Feld Progamnable Gate Atay cre ‘General Prpase Presson ms High Level Sythe xf Network uncon Vion xn Reger Tran vel son Stwae bet Rao sx SgtaltoNoke nate sac ‘Sten on hi | key abbreviations used inthis paper ean be found in Table 1, 2, Concepts and architecture In this section, we examine the general architecture of SDRs, thelr ‘main components, and their processing requirements, As explained in the previous section, SDRs play a vital role in wireless standard de- velopment due to their flexibility and ease of programmability. This is due tothe fact that most digital signal processing and digital font end, which includes channel selection, modulation and demodulation, takes place inthe digital demain, This is usually performed inthe software running on processors, such as GPPs and DSPs. However, it can also run ‘on programmable hardware, ie. FPGAS. In general, from the transmitters point of view, fist a baseband waveform needs to be prodiced and then an Intermediate Frequency (GE) waveform. A RF seaveform will be generated and then sent throtgh the antenna. From the receiver's point of view, this RF signal is sam- pled, demodulated, and then decoded. To provide more details to the Drocess, we study the receiving end ofthe system as follows. ‘The RF signal from the antenna is amplified with a tuned RF stage, which amplifies a range ofthe frequency bond, This amplified RF signal {s then converted to an analog IF signal, The Analog-o-Digital (Converter (ADC) digitizes this I signal into digital samples. Then, its fed into & mixer stage. The mixer, which is an electrical circuit that takes in two signals and yields a new frequeney, has another input ‘coming from a local osilatar with a frequency that ix set By the tuning control, The mixer then translates the input signal to a baseband, The next stage is a Finite Impulse Response (FIR) filter that permits only one signal. The FIR is a combination of multply-addl units and shift regis: ters. Ths filter limits the signal bandwidth and acts as a decimating lovepas filter. The digital dawn-coaverter inchudes a large mumber of rltipliers, adders, and shiftepstes in the hardware in order to ae- complish the aforementioned tasks. Next, the signal processing stage perfoxms tasks such as demodulation and decoding. This stage is {yp cally handled by a dedicated hardware like an. Application Specific Integrated Circuit (ASIC) or ther programmable alternatives like FPGA or DSP (251 ‘As shown in Fig. (a) and (b), at a high level, a typical SDR trans ceiver consists of the following components: Signal Processing, Digital Front End, Analog RF Front End, and an antenna, 21, Antenna SDR platforms usually employ several antennas to cover a wide range of frequeney bands [27]. Antennas are often referred t0 95 THE ee es Fig. 1. SDR arehltecture.Subgure (a) shows SDR from a reeves plat of view, and sub-igure(b) shows SDR from a ansmitte's point of view, “intelligent” or “smart” due to their ability to select a frequency band andl adapt with mobile tracking o interference cancellation [26,28]. In the ease of SDRS, an antenna usually needs to meet a certain ist of requirements such as self-adaptation (Le, flexibility to tuning to several bands), selfalignment (ie., beamforming capability), and selPhealing (Ge, interference rejection) [28 22. RP fron End “This is @ RF circuitry where the main function isto transmit and receive the signal at various operating frequencies. Its other function is to change the signal to/from the Intermediate Frequency (IF). The process of operation is divided into two, depending onthe direction of the signal (Le, Tx or Rx mode) + In the transmission path, digital samples are converted into an analog signal by the Digtal-to-Analog Converter (DAC), which in tum feeds the RF Front End. This analog signal is mixed with a preset RF frequeney, modulated, and then transmitted, In the receiving path, the antenna captures the RF signal. The Antenna input is connected to the RF Front End using a matehing circuitry to guarantee an optimal signal power transfer. Then, it passes through a Low Noise Amplifier (LNA), which resides in a close proximity to the antenna, inorder to amplify weak signals and ‘minimize the noise level. This amplified signal, in conjunction with signal from the Local Osillator (LO), is fed into the mixer in order to down-convert it to the IF (29). 23. Analog to- Digital and Digtalto-Analog Conversion ‘The DAC, as mentioned in the previous section, is responsible for producing the analog signal thar will be transmitted from the digital samples. The ADC resides on the recciver side and is an essential ‘component in radio receivers. The ADC is responsible for converting ‘continuous-time signals to diserete-time, binary-coded signals. ADC performance ean be described by various parameters (30,31) including () Signa-to-Noise Ratio (SNF: the ratio of signal power to noise power inthe output, (i) resolution: numberof bits per sample, (ii) Spurious- free Dynamic Range (SFDR): the strength ratio of the cartier signal co the next strongest noise component or spur, and (iv) power dissipation. Advances in SDR development have provided momentum for ADC performance improvements. For example, since ADCS power con: sumption affects the lifetime of battery-powered SDRS, more energy ‘efficient ADCS have been developed (32), Camper Commoners 128 2018) 106-125 24, Digal Prone End “The Digital Front End performs two functions (31): Sample Rate Conversion (SRC), which isa functionality that eon: vert the sampling from one rate ro anather. This is necessary since the two comaunication parties must be synchronized, ‘Channelization, which includes up/down conversion in the trans miter and receiver side, respectively. I also includes channel fil tering, where channels that are divided by frequency are extracted, Some examples include interpolation and low-pass filters, as de- Ina SDR transceiver, the following tasks are executed in the digital front end: + On the transmitting side (Fig. 1(@), the Digital Up Converter (DUC) translates the baseband signal to IF. The DAC, which Is Connected to the DUG, then converts the digital IF samples into an fnalog IF signal. Afterwards, the RF upconverter converts the analog IF signal to RF frequencies On the receiving side (Fig. 1(b), the ADC converts the IF signal, {nto digital samples. These samples are subsequently fed into the next block, which is the Digital Down Converter (DDG). The DDC, {includes a digital mixer and a numerically-controlled oscillator. The DDE extracts the baseband digital signal from the ADC. After Its, processed by the Digital Front End, this digital baseband signal i forwarded to a high speed digital signal processing block [33]. ‘A new alternative to the classical approach is a concept known as Direct RF Sampling (DRFS). In DRE, the RF sampling ADC replaces the analog processing blocks, such asthe mixer, local esilator and filters, and moves the processing tothe digital domain. Ths sigifeant im proves the design for the receiver. Here, the signal is converted by the [ADC and handed over tothe signal processing block in order to extract the data [34], Based on the sub-sampling or bandpass sampling theory, Which uses the alias ofthe signa inorder to sample, it requires a much Tower sampling rate. A bandpass filter fs placed in front ofthe ADC to avoid sensitivity loss. The advantages of DRPS are supporting a very wide bandwidth and offering higher power efficiency [35]. 25, Sigal processing ‘signal processing operations, such as encoding/decoding, inter leaving/delnterleaving, modulation/demodulation, and serambling/ descrambling are performed in this block. Encoding for the channel serves as an error correcting coe. Specifically, the encoded signal in- cludes redundaney that is utilized by the receiver's decoder to re-co- struct the original signal from the corrupted received signal. Examples of error correcting codes include Convolutional Codes, Turbo Codes, and Low Density Parity Check (LDPC) [36]. The decoder constitutes the ‘most computationally intensive part ofthe Signal Processing block due to data transfer and memory schemes [37]. The second part that is regarded as highly complex and expensive, in terms of area and power, {is the Fast Fourier Transform (FFT) and Inverse FFT (IFFT), as part of the modulation phase [35 ‘The signal processing block is commonly refered to as the basehand processing block. When discussing SDRS, the baseband block is at the heart ofthe discussion since it makes up the bulk ofthe digital domain ofthe implementation. This implementation runs on top ofa hardware circuitry that is capable of processing signals efficiently. Some examples Include ASICs, FPGAs, DSPs, GPPs, and GPUs. The second part of the implementation isthe software, which provides the functionality and high-level abstractions needed to execiate the signal processing apera- tions. Inthe next seetion, we examine the aforementioned hardware platforms and analyze in detail the various design approaches. Abc Desf 3. Design approaches In this section, we discuss the classification of the various SDR de- sign methodologies of the baseband processing block, namely GPP, (GPU, DSP, FPGA, and co-sign based methodologies. tn this classi ‘ation, we analyze and compare SDR platforms based on a set of pet- formance metrics inthe following criteria: lxibily and reconfigurabity. The capability for the modulation and airinterface algorithms and protocols to evolve by merely loading new software onto the platforr (12. “Adaptability. The SDR platform can adjust its capabilites based on network dynamics and user demands. ‘Computational power. The processing rate of the SDR namely Giga Operations per Second (GOPS). -Energy efficiency. The total power consumption (typically within a few hundreds milliwatts), especially for mobile and 1oT deploy- ments (39,40) ‘Cast. Te total cost of the SDR platform, including time to market, development, and hardware cost form, 3.1. GPP based One of the fest approaches to realizing SDR platforms is using @ ‘General Purpose Processor (GPP), or the commonly known generic ‘computer microprocessors such as 386/64 and ARM architectures. ‘Some examples of SDR platforms that utilize GPPs are Sora [22], KUAR (0), and USRP (21). SLA. Definition and uses ‘AGPP isa digital circuit that is clock-driven and reister-based. Ibis, ‘capable of processing different functions and operates on data streams represented in the binary system [42]. These GPP can be used for several purposes, making them extremely useful for an unlimited number of applications. This eliminates the need for building applica: tion specifi circuits, reducing the overall cost of running applications {GPPs are generally a preferable hardware platform by researchers in ‘academia due 10 their flexibility, abundance, and ease of program: ability, which is one ofthe main requirements in SDR platforms (43) In addition, researchers prefer GPP', since they are mote familiar with them and their software frameworks when compared to DSPs and FPGAs. From the performance point of view, GPPs are being enhanced rapidly. This advancement is not only credited to technological a ‘vances in terms of Complementary Metal Oxide Semiconductor (CMOS) technology [44 but also to the increase ofthe average number of in- structions processed per clock eyele. The latter is achieved through different means, and in particular, utilizes parallelism within and be ‘beeen processors. This has led tothe evolution of multi-core GPPs (45) 3.1.2. Adoption and GPUs Architecturally, the instruction set of GPPs includes instructions for different operations such as Arithmetic and Logie Unit (ALU), data transfer, and 1/O. A GPP processes these instructions in a sequential ‘order. Because of sequential processing, GPPs are not convenient for high-throughput computing with real-time requirements (i.., high throughput and low latency) [46], For example, using GNU Radio [47) to implement IEEE 802.11 standard, which requires a 20 Mz sampling rate, would be challenging, since GNU Radio i estriced by the limited processing capabilites of GPPs, Ths leads fo the GPP cores (ofthe PC attached) to reach saturation and to frames becoming corrupted and discarded. Moreover, witeless protocols require predictable perfor mance in order to guarantee that they meet the timing constrains However, conditional branch instructions in the GPP’ instruction sets lead to outoforder execution, which makes it unfeasible to achieve predictability. ‘To overcome the limitation of GPPs, researchers have proposed, Camper Commoners 128 2018) 106-125 multiple solutions, one of whichis the addition of co-processors, such as the Graphic Processing Unit (GPU) [48]. GPUs are processors specifi cally designed to handle graphies-elated tasks, and they efficiently process large blocks of streaming data in parallel. SOR platforms that fare comprised of both GPPs and GPUs are flexible and have a higher level of processing power. However, this results in a lower level of ower eicieney (eg, GPP's power efficiency is ~ 9GFLOPS/W for single-precsion, compared to 20GFLOPS/W for GPU [49)). GPUs act as co-processors to GPPs because a GPP is required {0 act asthe control unit and transfer data fom external memory. After a transfer is com- pleted, the GPU executes signal processing algorithms. While GPUs are typically used for processing graphics they are also twsed for signal processing algorithms. Over the past few years, the theoretical peak performance for GPUs and GPP for single and double precision processing has been growing (50). For example, when com paring Intel Haswel's 900 GFLOPs [51] with NVIDIA GTX TTAN's 14500 GFLOPS {52} for single precision, Its apparent that GPUs have a computational power that far exceeds their GPP counterparts [50 Theit multi-core architectures and parallel processors are the main at tractive features, in adltion to their relatively reasonable prices and small credit cardlike size. These features make them good candidates for co:processos in GPP based SDRS, where they can play vital role in accelerating computing-intensive blocks (53). Another advantage is their povtereffcieney, which keeps improving with every new model (eg It went from 0.5 to 20GFLOPS/W for single-precision) (49). To take fall advantage of GPUs, itis a condition that the algorithms con- form to their architecture. From an architectural perspective, GPUs have a number of advantages that make them preferable solutions to applications like video processing. In particular, GPUs employ a con- cept called Single Program Multiple Data (SPMD) that allows multiple instruction steams to execute the same program. In addition, due to their malt-threading scheme, data load instructions are more efficient. (GPUS also present a high computational density, where the cache t0 ALU ratio is tow [54 In Table 2, the authors of {53) confirmed thatthe signal detection algorithm, which includes intensive FFT computations, shows a faster parallel processing inthe ease of GPU over GPP, while operating in real time (by orders of magnitude). This is due to the availability of the compute unified Fast Fourier Transform (cuFFT) library which was developed for NVIDIA GPUs for more efficient FFT processing [55 With regards tothe architectural advantage of GPUs, several hundred CUDA cores can perform a single operation at the same time, a5 op Posed to afew cones in the case of multi-core GPPs. {An example of using GPUs alongside GPs to build SDR platforms is found in the work in (56), where the authors built a framework on a {esitop PC in addition to using a GPU to implement an FM recelver ‘Additionally, the authors in (53) studied real-time signal detection using an SDR platform composed ofa laptop compiiter and an NVIDIA (Quadro M4000 [52]. Examples of GPUs available inthe market can be found in Table 3, In this table, we shove two examples of high per forming GPUs (> $500 GFLOPS), suitable for SDRS with strict timing and performance requirements. We also show two more examples of less powerful and less expensive GPUs, suitable for prototyping SDRe in academic environments Table 2 Performance of signal detection algorithm on GPP and GPU (53) ADE Dass Pree psor f ml detain leit length) Gp Seca CrP Puale) GPU Pl Proce (ns) proening me) press im) Abc Desf ‘Table 3 Comparison of GPUS ao sisa) SONS?) S80(S21 56057) Power onsumpon (W) 250 ms 86 10 co (U8D) 0 5030 150 3.1.3. Shortcomings ‘State-of the-art GPP and GPU-based platforms, such as Sora [22] ‘and USRP [21], utilize desktop computers to realize the systems However, these platforms consume a significant amount of power for @ performance goal, and their form factor (Le. shape and physical siz) Is large, making real-world deployment a challenging task. Ic is worth noting that GPPs and GPUs alike present sealing limitations while meeting Koomes’s Lav, This law states thatthe energy efficiency of, ‘computers doubles oughly every 18 months [58]. This limitation calls for alternatives that provide higher computing power while keeping the ‘energy efficieney the same, One alternative is the hybrid or co-desgn ‘approach, where software and hardware implementations are com bined. This will be diseussed in more details in Section 3. When both GPPs and GPUs are used for a SDR design, data transfer ‘operations between the GPP and GPU can create bottlenecks and eause performance loss, especially when trying to meet real-time require ments [59]. However, there are continuous efforts to reduce oF clim- inate the time overhead of data transfers by introducing mult-stream scheduling for pipelining of the memory copy tasks. This would ensure tha there ate no stalls in the pipeline, which would enhance processing parallelism (60,61). Finally, although the processing power of micro processors is constantly being improved, the balance between suficient ‘computing power and meeting a specific goal for energy consumption ‘and cost will remain avery difficult task, both inthe present day and in the future. This Is true especially with the growing need for more data to be processed and blocks that can handle data processing in parallel 3.2. DSP-based ‘The DSP-based solution can be considered asa special case of GPP. based solutions, but due to its popularity and unique processing fea tures, it deservesa separate discussion, An example of DSP-based SDR is the Atomix platform [23] which utilizes TI'TMS320C6570 DSP 62) 32.1. Definition and uses DP is a particular type of microprocessor that i optimized to process digital signals (63. To help understand how DSPs are dis: Linguished from GPPs, we should first note that both are capable of Implementing and processing complex arithmetic tasks (64). Tasks ke modulation/demodulation, filtering, and encoding/decoding are com. monly and frequently used in applications that include specch re ‘cognition, image processing, and communication systems. DSPs, how: ‘ever, implement them more quickly and efficiently due to their architecture (eg, RISC-like architecture, parallel processing), which i specifically optimized to handle arithmetic operations especially ml: Uiplications. Since DSPs are capable of delivering high performance with lower power, they are better candidates for SDR deployment (65) ‘compared to GPP. Examples of DSPs that are specifically designed for 'SDR platforms are TI TMS32005657 and TMS32006655. These DSPs ‘are both equipped with hardware accelerators for complex functions like the Viterbi and Turbo Decoders [65 32.2. Adoption ‘As discussed in the previous section, GPPs provide an average Camper Commoners 128 2018) 106-125 performance for @ wide range of applications, Needless to say, this performance level might be sufficient fr research and academia, but if the system is to be deployed commercially, certain performance te quirements must be met. To this end, compared 10 GPPs, DSPs are tailored for processing digital signals efficiently, mizing features like combined multiply accumulate operations (MAC units) and parallelism (67]. DSP manufacturers usualy sell these products in two categories: optimize for performance and optimized for energy. Therefore, when tased in SDRS, high performance and energy efficient products can be employed in BSs and edge deviees, respectively. In terms of the instruction set, DSPs can be categorized into two groups: () Single Instruction Multiple Data (SIMD) architecture, and Gi) Multiple Instruction Multiple Data (MIMD) architecture, as de scribed by Michael J. Flynn in what is known as Flynn's Taxonomy [66,68], This taxonomy is a method of classifying various architectures depending on the number of concurrent instructions and data streams, as follows: — A SIMD-based DSP can execute an instruction on multiple dats streams atthe same time. This architecture can be very efilent in cases when there exits high data parallelism within the algorithm (70). This indicates that there are similar operations that ean be performed on diferent datasets atthe same time. Examples of SIMD. based DPs include the Cell processor presented in [7] which Supports 256 GFLOPS. More examples of DSPs that are optimized for low power are NXP CoolFlux DSP (72) and leera Livanto [73]. A SDR employing a SIMD DSP is the SODA architecture [74]. Tt has been a common practice 10 addl more cores in order to achieve a better trade-off between performance and power. With each extra core utilizing Very Long Instruction Word (VLIW), a h parallelism can be accomplished as well =n the other hand, MIMDs have the ability to operate on multiple data streams executing multiple instructions at any point in time This is essentially an extension of the SIMD architecture, where diferent instructions or programs run on multiple cores con- currently. This is especially important and useful in eases where parallelism is not uniform across different blocks, However, the MIMD architecture allows for parallel execution, leading 10 speed Improvements. Examples of MIMD-based DSPs. include Texas Instruments SMJ320G80 and SM320C80 DSPs with 100 MFLOPS, (661. Since DSPs are customized to meet certain signal processing related needs it is erucial to clarify these customizations in order to understand how DSPs stand out and how they are successful at not only meeting the requirements but also in how they are becoming a vital player in the Wireless communication field. These customizations, which are mostly architectre-elated, areas follows In (75, the authors diseus the energy efficiency of DSP. In gen cal, DSPs consume more power than ASICS, however, there are DSPs ‘thar are optimized for low power wireless implementations, such as TI (0574x DSP [66]. One ofthe methods to lower power consumption isto tase multiple data memory buses (eg, one for write, and two for reds). ‘This paves the way for higher memory bandwidth and allows for ‘multiple operand instructions, resulting in fewer eyeles. As discussed above, VLIW architectures, along with specialized instructions, can provide a higher level of efficiency, which lowers energy’ consumption. ‘These improvements can be seen in DSPs like TI TMS320C6x [66] and ADI TigerSHARG [76]. These techniques, coupled with proven power- saving techniques, suchas lock gating and putting either parts of or the entire system in sleep mode, further reduce povier consumption. Ex amples of DSPs available in the market can be found in Table 4. In this table, we present three examples of DSPs that do not include co-pro cessor, and three DSP.based SoCs that, in adltion to DSP cores, in- clude extra soft cores as control processor Abc Desf Camper Commoners 128 2018) 106-125 ‘Table 4 {Comparison of DSPs and DSP-based SOC. T1.c56eIMSI20G5652)—CEVAGKC.—Amlog Devs TKgotone Teaiog Devices Quseonm Sapdrapon 20 eos 5500) 177) (ADSP-21969 (75) (GHAKRGNZ) 6) __—_—_—TADSPSCS73) 70) Mxagon 680 178) ‘onors| 3 « 2a maa 34 No eating Point Menecy (®) 1058 Noto 2000 124 78 oto rueney Ol) 600 1300 0 ‘00 so 2000 coe tsoy = 25 Neto 0 a oo =7 ‘et Core NA na wa ARO cones ARMECaceAS —— Quleoman Kyo 385 (CU) ‘Aden 890 (GPU) 3.2.3. Shortcomings Despite the ubiquity of DSPs in SDR implementations for the past two decades [79], they do present some shortcomings, First, as more applications call for increasing parallelism and reconfigurabilty in ‘order to handle computationally intensive tasks, DSPs can be in sufficient. Second, programming DSPs to achieve higher levels of par. allelism predictability can be challenging. This opened the door for parallel architectures, such as FPGAS, multi-core GPPS, or even a hybrid ‘of both, to be adopted for SDRs. Third, power consumption of DSPs is ‘generally higher than FPGAs due to them operating at high frequencies. 3.3, FPGA-based Another approach towards realizing SDRS isto use a programmable hardware such as FPGAs. Example of FPGA-based SDR platforms are Airblue (24), Xilinx Zynq-hased implementation of IEEE 802.11ah (80), and the work found in [81] that used the same FPGA board to imple: ment a complete communication sytem with channel coding, 3.3.1. Definition and uses [An FPGA is an array of programmable logic blocks, such as general logic, memory, and multiplier blocks, that are surrounded by a routing fabric, which is also programmable (82). This cireait has the eapability ‘of implementing any design or funetion and is able to be easly updated Although FPGAs consume more power and occupy more area than ASICS, the programmability feature is the reason behind thei i ‘creasing adoption ina wide range of applications, Furthermore, when the reconfiguration delay is inthe order of milliseconds, the SDR can _switeh between different modes and protocols seamlessly [3]. Another major diference is that, ASIC fabrication is expensive (atleast afew tens of thousands of dollars), and the process requires a few months. In ‘contrast, FPGAs can be quickly reprogrammed and their cost is within, 1 few tens to afew thousands of dollars, at most. The low-end product ‘yele, along with atractive hardware processing advantages, ike high speed performance, low power consumption and portability, when ‘compared to processors such as GPPs and DSPs, present FPGAS as ‘contenders that offer the best of both worlds (82). Ina study by the authors in (84), they compared the performance of, Xiliny FPGAs (85) against 16-core GPPs. The calculation of peak per formance for GPPS was performed by multiplying the number of ‘loating point function units on each core by the number of cores and by the clock frequency. For FPGAs, performance is calculated by picking & configuration, adding up the Lookup Tables (LUTS), fliplops and DSP slices needed, and then multiplying them by the appropriate clock frequency. The authors caleulated the theoretieal peaks for 64-bit ‘loating point arithmetic and showed that Xilinx Virtex7 FPGA is about 4.2 times faster than a 16-core GPP. This is shown in Fi. 2. Even with a ‘one-to-one adder/multiplier configuration, the V7-2000T achieved 245.A5GFLOPS, which is better than a 16-core GPP. From Intel (51), ‘Stratix 10 FPGAS can achieve a 10 Tera FLOPS peak floating point performance [86]. This is duc to the fixed architecture of the GPP where not all funetional units can be fully utilized, the inherent pti 9 ee a Fig. 2 Peak performance of GPPs vermis FPGAs when performing. 64-bit Aostng point operations (8s) lean be obwerved that FPGAs incensed thei Aoacog pot performance by an order of magnitude compared to GPPs parallelism of FPGAs, and their dynamic architecture, In addition, de spite having lower clock frequencies (up to 300 MHz), FPGAs can achieve better performances de to their architectures which allows for higher levels of parallelism through custom design (871. Furthermore, the authors in [88] compared the performance and power efficiency of FPGAs to that of GPPs and GPUs using double precision floating point matrix-vector multiplication: The results show that FPGAs are capable ‘of outperforming the other platforms while maintaining their flex ‘bility. In addition, the authors in (54) thoroughly analyzed and com. pared FPGAs against GPUs via the implementations of various algo- ‘ithms. The authors concluded that although both architectures support a high level of parallelism, which is erucial to signal processing appl tations, FPGAs offer a larger increase in parallelism, while GPUs have a fixed parallelism due to thelr data path and memory system. 3.3.2. Adoption Over the past decade, FPGAs have significantly advanced and be- come more powerful computationally. They now exist in many diferent versions such as Xilinx Kintex UltaScale (85) and Intel Aria 10 (51) {89,90}, In adition, the availability of various toolsets gave FPGAS an advantage by making them more accesible. This is supported by the Availability of compilers that have the capability of generating Register transfer Level (RTL) code, such as Verilog and Very high speed in- tegrated circuits Hardware Description Language (VHDL), that is needed to run on FPGAS, from high-level programming languages. This process i typically referred to as High Level Synthesis (HLS), Examples of such compilers include HDL Coder [91] for MATLAB code [92] and Xilinx HLS {93} or Altera Nios If C2H compiler (94) for ©, C++, and SystemC. We will explain some of these tools in Section 4 HIS allows software engineers to design and implement applic tions, such as SDRs, on FPGAs using a familiar programming language to code, namely C, C+ +, SystemC, and MATLAB, without the need to posses a prior rich Knowledge about the target hardware architecture (refer to Section 4.1), These compilers an also be used to speed up or accelerate parts ofthe software code running on a GPP or DSP that are Abc Desf Camper Commoners 128 2018) 106-125 ‘Table S {Comparison of FPGAS and FPGA-based SoCs iin inex? Tete Gydone VGX LawiceHCP370——Xiine Z)ng700 2.7000 ‘tel Gyms VSESSC Mice Sura? ero (851 er sora) 971 xe72020 15) 51130) (25090) (8) ule Cals 66 7 @ = at Mena (0) 485 a46 aa “ 307 8 oe (SD) 10 115 so no Mo 15 Site WA WA wa Dosti ARM Corea Duslcone ARM ARM Corte 3 ‘causing slowdowns or setbacks 1 the overall performance. This willbe further discussed in Section 3.4. Further, FPGAs can achieve high per formance while sill consuming less energy than the previously dis ‘essed processors (95] (e., Intel Statix 10 FPGA can achieve up to 100 GFLOPS/W (96], compared to 23 GFLOPS/W for NVIDIA GeForce {GTX 9807 (52). naddition, power dissipation can be Further lowered through the implementation of several techniques at a system, device, and/or architecture level like clock gating and glitch reduction (83). “Table 5 presents a summary ofthe widely-used FPGA platforms. 33.3. Shortcomings One of the challenges of using FPGAS, however, Is the prior knowledge about the target hardosore architecture and resources tha @ ‘developer needs to posses in order to design an application efficiently for FPGAs. In the SDR domain, designing the platform has typically been the job of software engincers, and thus the process can be time: ‘consuming and less trivial to incoeporate this experience into hardware design, However, as it will be discusted in Section 4.1, the adoption of FPGA solutions ean be made more feasible through HLS role. 3.4. Hybrid design (aka, codesign) “The fourth approach towards realizing SDR isthe hybrid approach, ‘where both hardware and software-based techniques are combined into ‘one platform. This is commonly referred 0 as the co-design or hybrid ‘approach, Examples of SDRS that adopted the couesign approad clude WARP [25] and CODIPHY (99) 3.4.1. Definition Hardware/software co-design as a concept has been around for over ‘a decade, and ita evolved at a faster rate inthe pat few yeats due to ‘am increasing intrest in solving integrated cirult design problems with ‘anew and different approach. Even with GPP becoming more powerful than ever, and with multi-core designs, itis clear that in order to ‘achieve higher performance and create applications that demand real: time processing, designers had to shift attention to nev design schemes that utilize hardware solutions, namely, FPGAs and ASICS (100,101). Codesign indicates the use of hardware design methodology, re presented by the FPGA fabri, and software methodology, represented by processors ‘As more applications in the automotive, communication, and medical fies grow in complexity and size it has become a common practice 10 design systems that integrate both software like firmware ‘and operating system) and hardware [102]. This ha been made feasible fn the recent years thanks othe advances in high-level synthesis and in ‘developing tools that not only have the capability to produce efficient RTL. from software codes, but also define the interface between both sides. The industry has identified the huge market for co-design and has provided various SoC boards that, in addition to the FPGA fabric, ‘contain multiple processors. For example, the Xilinx Zynq board [85] includes an FPGA fabric as well as two ARM Cortex:A9 processors 1103}. In addition to the aforementioned advantages, there are other reasons that make co-design even more interesting, such as faster time to market, lower power consumption (when optimized for this), flex Ibily, and higher processing speeds, asthe hardware in these systems is typically used as an acceleration to software bottlenecks (104) ‘Adopting the co-design methodology in essence fs a matter of par tiioning the system into synthesizable hardware and executable sof ware blocks. This process depends on a strict criteria that is developed by the designer 105,106]. The authors in (107) and [108] discuss their partitioning methodologies and present the process of making the roper architectural decisions. Common methods typically provide ‘sefal information tothe designer in order to make the best decision of what to implement in hardware and what to keep in software, This {information can include possible speedups, communication overheads, data dependencies, and the locality and regularity of computations {108}, Examples of SoC boards available in the market can be found in Tables. 3.4.2 Adoption ‘As mentioned in Section 2, SDRS ean be considered inherently hy- brid or heterogeneous systems, implying the need for both hardware and software blocks. This is due to the fact that the control partis usually taken care of by a general processor. Other functions, such as signal processing, ate taken care of by a specialized processor like DSPs, and they are sometimes accelerated using dedicated hardware ke FPGAs [109]. This design approach fits well with SDRs and ean be fully tized to meet certain requirements that pertain to their attractive features. For example, accelerating portions of a block or moving it entirely to the FPGA fae can help to push the processing time to the limit in order to achieve a real-time performance for real-life deploy ‘ment. In addition, through carefel implementation of RTL optimization techniques, the development of power efficient systems for mobile and Jo applications would be possible. On the other hand, running most of the MAC layer operations on a processor, or multi-processor, can be advantageous for easy reconfiguration. Therefore, diferent partitioning schemes can be adopted to meet the requirements of the application at hand, tis worth noting that FPGA vendors, namely Intel [51] and Xiline {85}, are widening their produet base with more SoCs and Muli-pro: cessor SoCs (MPSOCS) [110], due to the growing demand for such de vices. An example of an SDR realized on an MPSOC is the work by {111}. 1n a white paper, National Instruments, the company that owns USRP [21], predicts that the Future of SDRs is essentially a co-design Implementation (112, especially due to the introduction of FPGAS that are equipped with a large number of DSP slices that are used for handling intensive signal processing tasks, as depicted in Fig. 3. This also ean be seen from USRP E310 model, which incorporates a Xiling yng 80 [85] A. Shortcomings {A downside of adopting SoCs for co-design is that thei prices are generally higher, compared to the previously mentioned design ap- roaches, beeause they have multiple components on the same board, Abc Desf enonD nil Fig. 3. Number of DSP Sie in Xilinx FPGAS 85]. The values ontop ofthe ars refer tothe CMOS technology wed le, processor and FPGA fabric. Other factors that contribute to this are ‘extn memory and sophisticated interfaces. Another challenge of co design is the shared memory access, eg, external DDR memory, be- ‘bveen the processor and FPGA fabri. The study of [113] shows thatthe number of memory read and write operations performed by a GPP Is higher than that of FPGAS. This is due to the fact that processors per form operation on registers, while FPGAs operate on buffers. Since memory accesses add up to the overall latency, this ean cause a bot Ueneck to the overall performance. In alton, the authors have de- veloped methodology for predicting shared memory bandwidth by using a functionally-equivalent software. Ths enables the designers to be aware of any bottlenecks before implementing the entire eodesign. 3.5. Comparison When we covered different design methodologies and hardware platforms for a wide selection of SDRs, we intended to compare them analytically one-on-one using a cross-platform implementation of on ofthe wireless communication protocols, which means the software can be implemented on multiple hardware platforms. However, the liters ture showed series of abstract comparisons using a set of benchmarks that targeted High Performance computing but not necessarily SDR applications. Its somewhat difieult to draw conclusions from these numbers alone, since a performance comparison in the SDR field re: ‘quires real-world testing In Table 6 we provide a high Level comparison between three major design approaches as a guideline for designers towards choosing the method that best meets their application requirements.To help us ‘compile the information mentioned in the table, we use prominent ‘examples from corresponding vendors. These examples include Intel, Core i§ [51] for GPPs, T1C66x [65] for DSPs, and Xilinx Virtex [85) for FPGAs. In this comparison, we used the criteria that was introduced at the beginning of Section 3. However, we do not make assumptions on ‘what the best approach is and believe i is the developer's responsibility Camper Commoners 128 2018) 106-125 ‘o make the best judgment depending on the application area. Please note that in this table we did aot include GPUs, as they typically act as co-processors to GPPs, and their addition generally improves perfor ‘mance. We also did not include co-design, since it combines GPPs with FPGAs As Table 6 shows, while GPPs are easy to program and extremely Aesible, they lack the power to meet specifications in realtime and are very inefficient in terms of power. To increase their performance, multiple cores with similar instruction sets are included in the same (GPP platform to exploit parallelism and perform more operations per clock cycle. However, hardware replication (le, adding more cores to GPPs) may not necessarily translate to a higher performance. GPUs tackle this by offering the same control logic for several funetional tunits. The sequential portion of the code runs an the GPP, which ean be optimized on mult-core GPPs, while the computationally intensive Portion runs on a several-hundred-core GPU, where the cores operate in parallel, Another example of a customized processor Is the DSP. It performs significantly better than GPPs, while atthe same time main. tains the ease-of-use feature that GPs possess, making them very at tractive options. They are also more power effiient and Detter ft for signal processing applications. On the other hand, they are more ex pensive, which is the main trade-off. Finally, FPGAs combine the Mlex- ‘lity of processors and efficiency of hardware. FPGAs can achieve a high level of parallelism through dynamic reconfiguration, while yeng better power efficiency [49]. FPGAS are typically more sultable for fixed-point arithmetic, like in signal processing tasks, but in the recent years their floating-point performance as inereasedsig- nificantly [88,114]. However, the designers are expected to know a lot ‘more about the hardware, which is sometimes a deterring feature. In a comparative analysis by [115] the authors studied the per formance and energy efficiency of GPUs and FPGAS using a number of benchmarks in terms of targeted applications, complexity, and data type. The authors concluded that GPUs perform better for streaming applications, while FPGAs are more suitable for applications that em ploy intensive FET computations, due to their ability to handle non- Sequential memory accesses in a faster and more energy efficient manner. Similarly, in [49], the authors review and report the sustain able performance and energy efficiency fr different applications. One oftheir findings related to SDReis that FPGAs should be used for signal processing without floating point, which eonfirms the aforementioned results. In addition, the authors in (116] report that GPUs are ten times faster than FPGAs with regards to FFT processing, while the authors in [88] demonstrate that the power efficiency of FPGAs is always better than GPUS for matrix operations, Finally, che author i [117] compare (GPPs, GPUs, and FPGAs through the implementation of LDPC decoders, and theie results led to the conclusion that GPUs and FPGAs perform better than GPP. tis obvious from the above studies that trade-offs are to be expected when a particular design methodology is adopted, hence cagefal analysis should be eared out beforehand. Other comparative studies include (118-120) with similar results and conclusions. ‘Table 6 Comparison of SDR design approves. or ‘compat eed Annet Engines Execution Seven Tereghpat toe at Wat nite by Bus Wty Program ‘ay ‘Comper Agotins ty vo Dace Ports eine Meters Power Bieny ave 50 68) FEA T) eed Asda Engines ‘ser Contprble ane Paral ara ‘sy Para Mati igs Nat Hig nite by Bas With gs sy Madeare ty Moderate Det Ports User Conigable ores tom Maleate rate ih Net Smal Abc Desf C1 G++/MATLAB /oponct High Lovet tanguage Pipelining /dtatow!unvoting HS Toa } Verog io. (State OL os) a rrC—~s—SsSS PCA ngemetaton Siahean 4. HIS design lw commonly adopted by Xilinx (9), tne (194), and MATLAB [91 4. Development tools [As we mentioned in Section 3.3, HLS is an abstract method of d- ‘signing hardware using a high-level programming language. Developers fof FPGA and co-design based SDRs ean benefit from HLS since it re ‘quires no prior experience with haedware design. Unlike the rest ofthe ‘development tools, HLS tools share a common theme and offer similar features. Thus, we fist discuss HLS in this section. Next, we review the ‘common development tools that are typically used in the process of PR ‘design and implementation for diferent design approaches. “4.1. High Level Symthests (HLS) HIS has been a hot research topic for aver a deeade, with both ‘academia and industry trying to make hardware design more accessible to every developer (121]. HLS is the process of converting an algo rithmic specification of the design that is deseribed by a high-level programming language to an RTL implementation. HLS provides a new level of design abstraction through exploring the micro-architecture ‘and any hardware constraints, The resulting RTL is highly optimized, in terms of power, throughput and latency, and it is reasonably compar- able to a hand:-tuned code. Fig. 4 depicts this process. The major di ference between RTL and C is the absence ofthe timing description in the high-level model, which is merely a behavioral deseription of the system with no details about the underlying hardware. The second difference is the processing architecture, While GPP architecture is fixed, the best possible processing architecture i built by the compiler for FPGA [122]. In addition, HLS can speed up the development cycle (Gime to marked), going from several months down to several weeks [123]. This is because the task of producing an optimized RTL is han ‘ed by the HLS tool, and the developers effort are focused on de scribing the system's algorithmie deseription. In [124] the authors presented LegUP, an open-source HIS tool “This tool is capable of profiling code to identify frequently executed sections ofthe cade for hardware acceleration (Le., moving them tothe FPGA fabric). The authors in [125] survey HLS compilers and their ‘capability to provide an accurate estimation of functional area and timing, and they compare them with the results from hané-tuned hardware designs. In an effort to help the developer make the right decision in picking an HLS tool that yields the best results for their application, the authors in (1.23) present a study where they compared three ofthe industry tools, namely Vivado HLS (93], Intel FPGA SDK for Camper Commoners 128 2018) 106-125 Open [126], and MaxCompiler (1271, through developing LDPC decoders, which are often used as error correcting blocks in SDRs, Al three tools successfully synthesized LDPC decoders and implemented them on Intel [51] and Xilinx [85] FPGA boards. The difference, however, vas in the logic wtlization and pesformance. Similarly, the authors in [128], compare the same aforementioned list of compilers quantitatively and qualitatively using several financial engincering problems (eg, Monte Carlo-based Option Pricing) and compare the performance of several FPGA boards. Their results show that both Intel FPGA SDK for OpenCl and MaxCompiler performed better than Vivado HLS due to theirablity to extract parallelism more effectively In (129), the authors comprehensively review recent HLS tools and provide a rmethedology based on C benchmarks to compare some of these tools tnd their optimization features. The various benchmarks implemented demonstrate that some tools are better suited for certain applications than others, with no specific tol deminating the HLS field. The authors also show that open-source HLS tools, such as LegUP [130], ean be as effective as their commercial counterparts. Other surveys and analyses Include [104,131,132] which focused on open-source tools, and [133), hich studied some of the trade-offs of HLS-generated designs and their degree of reliability when errors are injected. All of the studies above prove the feasibility and reliability of HLS tools to generate RTL codes, Aespite having different development and optimization solutions. Examples of HLS tools include Xiline Vivado HLS [93] and SDSoC (1951; Intel HLS Compiler (1:54) and FPGA SDK for OpenCL [126]; Cadence Stratus High-level Synthesis, which combines Cadence C1o- Silicon and Forte Cymthesizer [136]; Synopsys Synphony © Compiler [137]; Maxeler MaxCompiler [127]; MATLAB HDL Coder (91); and Leg? [120], which unlike the rest ofthe tools is vendor-independent (works with al types of FPGA boards lke Xilinx [85], Intel (51), Latice [97], and Microsemi (98). ‘Table 7 presents a summary of the commercial HLS tools. While some of them are vendor specific, other tools work with a variety of FPGA boards. The examples mentioned in the table all provide aset of ‘cea and timing optimizations such a resource sharing, scheduling, and Pipelining. However, not all of them are capable of generating test: benches for the design. 42, Tools In this section we review the existing software tools for SDR de- velopment. For each desiga methodology, we discuss a compatible development tool and list is features. We also provide an overall comparison between them to highlight the diferences. This review is partiewlariy important in order to make the right decision of picking the ‘most fitting tool for the intended application. Learning about the fea- tures offered by each tool helps the developers flly utilize the available tools. Table 8 presents an overview ofthese tools. 4.2.1. MATLAB and Simulink Most designers start with modeling and simulating the system using “Mathworks MATLAB [92] and Simulink (140), With the availability of 1 wide range of built-in functions and toolboxes, especially for signal processing and communication, developing. and resting applications bbecame very common and widely adopted. However, in order to use ‘table? HLS tots “iin Vivo HLS [95] lel FPGA SDK for Opn. Clence Sata Highlevel Sates Synopsys Sypony © Manele MaxConple (261 ser compe 157) a7 epee cr fyaeme eer + pen cer + apse ocr Mast Ontpar ——iIDLVening Sys VHDL Venlo Moen uDUVerg/Sptemc VDL Tobenh Ye x Ye Ye x Opintzatoas Yes Ys ve ve Ye Abc Desf Camper Commoners 128 2018) 106-125 ‘Table 8 Development tools and platforms. MATLAB & Simin (138) Vio HIS & SDRC (85) LegUlP [90] __ GNU Rao (47 ‘eben 1591 DA E81 Tepe MATIAN/ Grp e+ + pene € Gaphial/yiea/C+ + Graphical C/e+ + FORTRAN Python Ontpae — MaTAB/C TL cma om cnn cnn. chin Cade atom GPPGHU/DS/ARGA ——-GPPNGA CGre/reGA—GonvGeuaseyenen —GvGrunserencA GRU MATLAB maine ‘arn Baign &Maseng = = | beded Syston Tot tgraion |-+{ Smuink Coder + Embedoed Coser 5. Mathworks SoC design flow (125). these models for different platforms, developers would need to use MATLAB Coder [141] and Simulink Coder (142] to generate C/C++ codes. The generated codes can be used with Embedded Coder (143] to ‘optimize them and generate software interfaces with AXI drivers forthe sake of running on embedded processors and microprocessors, like the ‘dual ARM cortex A9 MPcore [103] on the ZedBoard [144]. Alter. natively, developers ean use the HDL Coder (91) to generate synthe: sizable RIL (Verilog or VHDL) code to be implemented on FPGAs or ASICS, Ic also as support for Xilinx (85] and Intel [51] SoC deviees by providing some information and optimizations that pertain to esouree utilization and distributed pipelining. Fig. S shows the design flow for ‘SoC platforms that the aforementioned tools offer and how they sre ‘connected. Examples of using MATLAB and Simulink to develop an SDR are found inthe works by (145) and [146], where the authors used the RILSDR very low-cost SDR dongle [147] (= $20) with a desktop ‘computer (GPP) to design an academic euriculum fr teaching DSP and ‘commuinications theory. 4.2.2. Vado HLS and SDS0C Xilinx Vivado HIS [93] is a design environment for high-level ‘synthesis. This tool offers a variety of features to tweak and improve the RIL netlist eutput that is compatible and optimized for Xilime FPGA boards. It accepts input specifications described in several languages (gs, G,C+ +, SystemG, and OpenCL) and generates hardware modules in Verilog or VHDL. Developers are provided with several options to ‘optimize the solution in terms of area and timing through the use of directives, which are guidelines for the optimization process, and pragmas for RTL optimization. These optimizations include loop un: rolling, loop pipelining, and operation chaining. SDS0G (135) is an- ‘other tool by Xiline (85). The major difference between the two tools is that the later has the capability to provide solutions for SoCs. SDSOC is built on top of Vivado HLS and has the same Cto-RTL conversion ‘capability. The main advantage of using SDSOC is that it automatically [generates data movers, which are responsible for transferring data be tween the software on the processor and the hardware on the FPGA. ‘A similar too to SDSoC that is open-source i LegUP [130). It was ‘developed atthe University of Toronto a part of an academic research ‘effort to design an HLS tool that is capable of taking in C code as an input and providing three possible outputs: a synthesizable RTL code for an FPGA, a pure software executable, and a hardware/software co: design solution for a S0C. 423. GNU Ratho 11 is an open-source software development toolkit that provides ignal processing. blocks to implement SDRs [47,148]. It runs on desktop oF laptop computers and can build a basic SDR, with the aé- dition of simple hardware such as USRP B200 (211. Its often used by academia and the research community for simulation, as well a5 to ‘quickly set up SDR platforms, Similar to the System Generator tool (149] and Simulink (140), it includes different kinds of blocks such as decoders, demodulators, and filters. It i also capable of connecting these blocks and managing data transfer in a reliable fashion, in addi tion, it supports the popular USRP systems (21]. One of the attractive features of GNU Radio is the ability to define and add new blocks rough programming in C+ + of Python, An example of using GNU Radio is inthe work by [150), where the author uses i with a USRP to realize different types of transceivers such as Time Division Multiplexed [Access (TDMA) and Cartier Sense Multiple Access (CSMA). Similarly, the authors in [151] successfully achieve real-time communication between two computers using USRP [21] and RTL-SDR (147). 424, LabVIEW ‘A widely used tool from National Instruments (139) that offers a visual programming environment for test, automation and control ap- plications used by both industry and academia. 1c is similar to GNU Radio and Simulink, where the design can be constructed schematically by connecting a chain of various blocks together, each of which per forms a certain function. It also offers complete support for USRP (21) to enable rapid prototyping of communications systems. Designing diferent blocks of the system can be achieved using high-level lan guages, such as C or MATLAB, or using a graphical dataflow. An SDR platform development using LabVIEW is found in the work by [152), Where the author deseribes a wireless communication course design that incorporates USRP and LabVIEW, due to their ease of use, in order ‘o help teach students basie concepts, Similarly, in [153] the authors designed an SDR platform, namely FRAMED-SOFT, that includes two types of USRDs and is intended for an academic environment. 425. CUDA Developed by NVIDIA, it issues and manages computing platforms and programming models for data-parallel computing on GPUs [55. Developers typically use CUDA when GPUs are part of the processing architecture as co-processors; they want to take fll advantage of their power by speeding up applications. As discussed in Section 3.1.2, in ‘order to identity application components that should be run on a GPP and the parts that should be accelerated by the GPU, one needs to look atthe tasks at hand. Programming languages that can be used in CUDA include C, C+ +, Python, FORTRAN, and MATLAB [92]. In addition to the rich library fall of GPU-elated acceleration functions, the toolkit Includes a compiler, development tools, and a CUDA runtime library. It Is used to develop applications and optimize them for systems that in- corporate GPUS. 5. Platforms In this section, we list the different types of SDRS from the archi- tecture and design point of view. We analyze them, examine their strengths and shortcomings, and discuss their Impact on SDR

You might also like