Initial Cell Serch Paper
Initial Cell Serch Paper
by
Thesis submitted to the Faculty of the Graduate School of the University of Maryland in partial fulllment of the requirements for the degree of Master of Science 2002
Title of Thesis:
Comparison of Initial Cell Search Algorithms for W-CDMA Sytems Sanat Kamal Bahl, Master of Science, 2002
ABSTRACT In this thesis, an Improved Cell Search Design (Improved CSD) using cyclic codes is compared with the 3GPP Cell Search Design using comma free codes (3GPP-comma free CSD) in terms of (1) hardware utilization on a eld programmable gate array (FPGA) and (2) acquisition time for different probabilities of false alarm rates. Our results indicate that for a channel whose signal-to-noise ratio is degraded with additive white gaussian noise (AWGN), the Improved CSD achieves faster synchronization with the base station and has lower hardware utilization when compared with the 3GPP-comma free CSD scheme under the same design constraints.
Table of Contents
1.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2.0 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.0 Cell Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1 Synchronization Channels in W-CDMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Cell Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.1 Stage 1: Slot Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.2 Stage 2: Frame Synchronization and Code Group Identification . . . . . . . . 13 3.2.3 Stage 3: Scrambling Code Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.0 Improved Cell Search Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1 Stage1: Slot Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Stage2: Frame Synchronization and Code Group Identification . . . . . . . . . . . . . 21 4.3 Stage3: Scrambling Code Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.3.1 Scrambling Code Generator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.3.2 Descrambler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.0 3GPP-comma free Cell Search Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.1 Stage 2 of 3GPP-comma free Cell Search Design . . . . . . . . . . . . . . . . . . . . . . . . 32 5.2 Reduced Length FHT Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.0 Experimental Method and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.1 Experimental Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.1.1 FPGA Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 7.0 Summary, Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 7.2 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 8.0 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
List of Abbreviations
AMPS ASIC A/D AWGN BS Cp Cssc Cs CLB CPICH D/A DFT DSP DS-CDMA FHT FPGA GIC GPS GSM LC LFSR LUT MS PSC P-SCH SSC SNR Advanced Mobile Phone Service Application Specific Integrated Circuit Analog-to-Digital Additive White Gaussian Noise Base Station Primary Synchronization Code Secondary Synchronization Code Cyclic Hierarchical Sequence Configurable Logic Block Common Pilot Channel Digital-to-Analog Discrete Fourier Transform Digital Signal Processing Direct Sequence-Code Division Multiple Access Fast Hadamard Transformer Field Programmable Gate Array Group Indicator Code Global Positioning System Global System for Mobile communication Logic Cell Linear Feedback Shift Register Look-Up Table Mobile Station Primary Synchronization Code Primary Synchronization Channel Secondary Synchronization Code Signal-to-Noise Ratio
Synchronization Channel Secondary Synchronization Channel Third Generation Third Generation Partnership Project Telecommunications Industry Association Wideband-Code Division Multiple Access
List of Figures
Figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 1 2 3 4 5 6 7 8 9 DS-CDMA Transmitter-Receiver Block Level Diagram . . . . . . . . . . . . . . . . . . . . . . 3 Synchronization Channels in Cell Search. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Hierarchical Matched Filter (64-chip and 4-symbol accumulation). . . . . . . . . . . . . . 17 Hierarchical Matched Filter (16-chip and 16-symbol accumulation). . . . . . . . . . . . . 18 Slot Boundary Detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Frame Synchronization and Code Group Identification . . . . . . . . . . . . . . . . . . . . . . . 24 Scrambling Code Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Multiple Scrambling Code Generator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Scrambling Code Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
10 Individual Stage of FHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 11 16 chip FHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 12 Hadamard Code Metrics (Butterfly Operation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 13 2-Slice Virtex-E CLB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 14 Detailed View of Virtex-E Slice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 15 Comparison of Improved CSD and 3GPP-comma free CSD PFA=10-3 . . . . . . . . . . . 48 16 Comparison of Improved CSD and 3GPP-comma free CSD PFA=10-4 . . . . . . . . . . . 48
List of Tables
Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 1 2 3 4 5 6 7 8 Hierarchical Matched Filter (16 and 64-chip Accumulation). . . . . . . . . . . . . . . . . . . 16 Sequences X1,i and X2,i for Code Groups 1 to 32. . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Masking Functions used in Stage 3: Scrambling Code Generator . . . . . . . . . . . . . . . 28 Allocations of SSCs for Secondary SCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Timing Diagram of Inputs to FHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Reduced Length Walsh Sequences (256 chip sequence to 16 chip sequence) . . . . . . 41 Hardware Specifications of System: Quantization 4 Input Data Bits. . . . . . . . . . . . . 49 Hardware Specifications of FHT: 16 and 256 chip sequence. . . . . . . . . . . . . . . . . . . 49
Chapter 1
Introduction
1.0 Introduction
First generation (1G) mobile communications systems were based on analog technology and started in the early to mid 1980s. These 1G systems had a number of limitations which included (1) low quality voice service, (2) limited capacity and (3) inability to provide global roaming.
Digital second generation (2G) systems were then developed in Europe and US. The various second generation systems included (1) Global System for Mobile communication (GSM) which utilizes time division multiple access (TDMA). In TDMA each user is assigned a particular time slot. (2) The TDMA/136 specication which was dened in the US, in 1988, by the Telecommunications Industry Association (TIA), developed with the aim of digitizing the analog Advanced Mobile Phone Service (AMPS). (3) In the US, IS95 was proposed for 2G systems, to provide better voice quality and higher capacity. IS95 was based on CDMA technology. However, different 2G technologies were not interoperable and not available across geographic areas. In addition, the low bit rate of 2G systems could not meet subscriber demands for multimedia services. Third generation (3G) systems aim to solve these problems encountered with 2G systems, by promising global roaming across 3G standards, higher data rates, improved quality of service and
support for multimedia applications. The most popular candidates for 3G cellular systems are CDMA2000 and Wideband-Code Division Multiple Access (W-CDMA) [1] [2]. Both of these schemes are based on Direct Sequence-Code Division Multiple Access (DSCDMA) technology. In DS-CDMA, the data signals are directly modulated by a digital code signal.
In a spread spectrum CDMA system, the transmitted signal is spread over a wide frequency band that is wider than the minimum bandwidth required to transmit the information being sent. In a typical scenario where there are multiple users or mobile stations (MSs) in a cell, each user has a unique scrambling code. This scrambling code should be such that it has low cross correlation properties with the other user codes. The signal received by the MS from the transmitting base station (BS) is correlated with the users scrambling code. This despreads only the signal of that particular user whereas the other spread spectrum signals will remain spread. A block diagram of a DS-CDMA transmitter and receiver is shown in Figure 1. Spreading consists of multiplying the input data by a scrambling code sequence whose bit rate is much higher than the data bit rate. At the receiving side the signal is multiplied with the same scrambling code sequence that is exactly synchronized to the received code sequence. The Encoding block shown in Figure 1 is used to add error correcting bits and to perform interleaving in order to protect information bits from channel noise and interference. The reverse operations are performed in the Decoding stage at the receiver.
10
D/A
A/D
Baseband Data
Encoding
Decoding
Baseband Data
The main difference between W-CDMA and CDMA2000 is that W-CDMA supports asynchronous BSs whereas CDMA2000 relies on synchronized BSs. Synchronous CDMA systems need an external time reference. A Global Positioning System (GPS) clock can be used by all BSs to synchronize their operations. This allows the MS to use different phases of the same scrambling code to distinguish between adjacent BSs. In an asynchronous CDMA system, each BS has an independent time reference, and the MS, does not have prior knowledge of the relative time difference between various BSs. The advantage of asynchronous operation is that it eliminates the need to synchronize the BSs to an accurate external timing source. However, since there is no external time synchronization between the adjacent BSs, different phases of the same code cannot be used to distinguish
11
adjacent BS. Thus, in an asynchronous CDMA system, adjacent BSs can only be identied by using distinct scrambling codes. Consequently, cell search, which involves the process of achieving code, time and frequency synchronization of the MS with the BS, takes longer in comparison to a synchronous CDMA system. Cell search is complicated in the presence of signals which are intended for other mobile systems within a cell as well as signals from other BSs. Thus, it is very important to develop algorithms and hardware implementations to perform cell search using lower acquisition time and minimum hardware resources for asynchronous CDMA systems.
Cell search is performed according to the algorithm proposed by Wang et al. [3]. In the proposed cell search algorithm, code and time synchronization is achieved assuming a large frequency error and after achieving code and time synchronization, frequency synchronization is performed. In this study we consider the problem of achieving code and time synchronization. The process of achieving code and time synchronization in the cell search algorithm for W-CDMA systems is divided into three stages (1) slot synchronization, (2) frame synchronization and code group identication, and (3) scrambling code identication. This thesis presents a 3G Partnership Project (3GPP) cell search design using cyclic codes (Improved CSD) to achieve faster synchronization at lower hardware complexity. The second part of this thesis compares the two design algorithms for performing initial cell search: the Improved CSD and the 3GPP cell search design using comma free codes (3GPP-comma free CSD) in terms of (1) acquisition time measure and (2) hardware specications on a Xilinx Virtex-E XCV1000E eld programmable gate array (FPGA). The thesis also proposes design improvements in stage 2 of the 3GPP-
12
comma free CSD beyond those proposed by Li et al. [4]. The 3GPP-comma free CSD proposed in this thesis uses a Fast Hadamard Transformer (FHT) in stage 2 that achieves lower hardware complexity and faster decoding. Furthermore, masking functions are used in stage 3 of both the Improved CSD and the 3GPP-comma free CSD to reduce the number of scrambling code generators required as described in previous work [4]. This results in a reduction in the ROM size required to store the initial phases of the scrambling code generators in stage 3. The Improved CSD proposed in this thesis aims to achieve faster synchronization between the MS and the BS and thus improves system performance. The experiments carried out using accumulation over multiple slots in stage 1 indicate that for an additive white gaussian noise (AWGN) channel in a high signal-to-noise ratio the Improved CSD achieves faster synchronization with the BS and has lower hardware utilization when compared with the 3GPP-comma free CSD scheme under the same design constraints.
The thesis is organized as follows. Work done by other research groups and suggestions by the 3GPP working group are presented in Chapter 2. Chapter 3 describes the synchronization channels in W-CDMA cell search and introduces the three step cell search algorithm used in W-CDMA for synchronization between the MS and the BS. Chapter 4 describes the Improved cell search design using cyclic codes proposed as a means of achieving faster synchronization. Chapter 5 discusses the 3GPP cell search design using comma free codes. Chapter 6 presents the experimental method and results of the comparison of the two cell search algorithms on a Xilinx Virtex-E XCV1000E FPGA. Chapter 7 is a summary, discussion, and an overview of future directions of this research.
Chapter 2
Background
Cell search design is critical as it impacts the system performance and there is a need to design efcient receiver structures and algorithms to reduce the cell search time. This Chapter summarizes efforts by research groups and the 3GPP working groups to design efcient schemes and algorithms for each of the three stages of the cell search algorithm.
2.0 Background
Wang et al. proposes a pipelined process to be used in rst three stages of the cell search algorithm [3]. The cell search scenarios considered in their study are (1) initial cell search: when a mobile is switched on and (2) target cell search: during idle and active modes of the MS. Instead of the serial cell search sequentially searching through code, time and frequency, their method rst acquires code and time synchronization assuming a larger frequency error and then performs frequency synchronization [3] [5].
The synchronization code sequences used in stage 1 and stage 2 of the cell search algorithm are made up of bits called "chips" which can be either +1 or -1. The synchronization code sequences are 256 chips in length. If a traditional matched lter is used then a huge adder circuit (256 input adder) will be required to sum up the correlation results. This will
13
14
lead to wastage of hardware resources. Hence, Siemens and Texas Instruments in their working group draft have suggested a hierarchical matched lter design which uses two matched lters to reduce the hardware complexity signicantly [6]. The details of the hierarchical matched lter design will be presented in Chapter 4.
The 3GPP specication uses comma free codes in stage 2 of the cell search algorithm [7] [8]. Nortel networks in their working group proposal have suggested the use of cyclic codes in the SCHs [9]. The use of cyclic codes for generating the synchronization codes will be explained in more detail in Chapter 4. These cyclic codes can reduce hardware utilization and acquisition time if the receiver is properly designed.
To reduce the complexity of searching through all the 512 scrambling codes, the concept of code grouping and group indicator codes (GIC) was introduced [10]. This reduces the cell search time as the scrambling code is identied by rst detecting the code group. Once the code group is detected then the scrambling code used by the cell can be easily identied as there are a limited number of codes in each code group. This reduces the cell search time signicantly. This idea was accepted in the 3GPP specications. To further reduce cell search time, frame boundary synchronization is also achieved in stage 2 after identifying the code group and slot ID [11].
Ericsson in their working group draft have proposed increasing the number of code groups in stage 2 of the cell search [12]. Increasing the number of code groups reduces the number of scrambling codes in a code group. Their proposed scheme uses either 256,
15
128 or 64 code groups in stage 2 of the cell search. They claim that the scheme using 256 code groups is the preferred scheme as it requires only two scrambling code correlators in stage 3 of initial cell search and achieves reduced hardware complexity.
In stage 2 of the 3GPP-comma free CSD presented in this thesis, a FHT design is proposed in replacement to the Golay correlator presented by Li et al. [4]. A FHT provides an efcient technique to detect the code group and slot ID in stage 2. Previous FHT designs [13] and [14] utilize a lot of hardware resources, hence, a fast and efcient Hadamard transformer is needed to reduce the hardware utilization and to perform faster decoding. A compact and efcient FHT design will also draw less power from the handset.
Siemens in their working group draft have suggested the use of masking functions in stage 3 to reduce the design complexity for generating the scrambling codes in parallel [15]. The use of masking functions reduces the number of scrambling code generators required to generate the codes in parallel. Any masking function can be selected by the designer as long as they generate codes with minimum overlap. The use of masking functions reduces the hardware signicantly as compared to the previous design by Li et al. [4].
Li et al. have designed an application specic integrated circuit (ASIC) for performing cell search in W-CDMA systems [4]. In stage 1 and stage 2 of their cell search design the authors use a correlator structure to detect the code group and slot ID. The correlator structure used is a Golay correlator [16]. In stage 3 of the cell search algorithm, 16 scram-
16
bling code generators are used for generating the codes in parallel.
In summary, most of the literature found in this area have presented simulation results of their algorithms and have not investigated the hardware complexity of their design schemes except the work presented by Li et al. [4]. The designs used by the mobile manufacturers is company proprietary and there are very few documents which describe their actual design schemes. It is critical to consider a practical hardware implementation of the cell search algorithm especially because chip area and power utilization are the two most important factors in a mobile handset.
Chapter 3
This Chapter describes the synchronization channels in W-CDMA cell search and introduces the cell search algorithm used in the synchronization of the MS with the BS for WCDMA systems.
In CDMA systems, spreading codes are used to differentiate physical channels from the same transmitter, and scrambling codes are used to differentiate transmitters. The MS needs to achieve code and time synchronization with the BS before any communication with the BS can start. The process of searching for a code and achieving synchronization with the BS is called cell search. Cell search is performed in two scenarios: when a MS is switched on (initial cell search) and during active or idle mode (target cell search). Target cell search is used to nd handover candidates during a call. Cell search design is important and needs to be completed in minimum delay as it impacts the system performance.
Each cell in a CDMA system is identied by its downlink scrambling code which is of length 38,400 chips. The 38,400 chips form a radio frame which is divided into 15 slots.
17
18
P-SCH
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S-SCH
CPICH
10 CPICH Symbols
Figure 2 shows the slot and frame structure of the three synchronization channels used in cell search: the Primary-Synchronization Channel (P-SCH), Secondary-Synchronization Channel (S-SCH) and the Common Pilot Channel (CPICH) [7] [17]. The P-SCH together with the S-SCH are also called Synchronization Channel (SCH). In the P-SCH, a 256 chip sequence is transmitted at the start of each slot. The same P-SCH sequence is used by all the BSs and is transmitted once every slot. As the same sequence is used by all the transmitting stations, only one matched lter is sufcient to detect the slot boundary value. To reduce the complexity of the matched lter implementation, a hierarchical scheme is used as will be explained in detail in Chapter 4. The S-SCH is used for carrying 15 different sequences, one in each slot, for the different code groups and is repeated after every frame. These sequences are used in identifying the code group. The CPICH is used
19
to carry the downlink common pilot symbols scrambled by the scrambling code of the BS. Each slot of this channel is divided into 10 symbols, each of 256 chips in length.
To reduce the complexity of synchronizing to the BSs in W-CDMA, the concept of code grouping and the use of code group indicator codes (GIC) were introduced [10]. The 512 scrambling codes used in W-CDMA are divided into code groups. After the code group is identied then only the scrambling code used by the cell needs to be detected. The number of possible scrambling codes from which one code needs to be identied depends on how many code groups are selected in stage 2 of the design. For example, if 32 code groups are used in stage 2 then the number of scrambling codes in stage 3 are 16. Similarly, if 64 code groups are used then there will be 8 possible scrambling codes. Although, the number of scrambling codes will be xed at 512, the number of code groups can be increased from 32 to 256 [12]. The complexity is further reduced by combining frame synchronization and code group identication in stage 2 of the cell search algorithm [11].
The process of achieving code and time synchronization in the cell search algorithm is divided into three stages (1) slot synchronization, (2) frame synchronization and code group identication, and (3) scrambling code identication [3] [7] [8] [18].
20
During stage 1 of the cell search procedure the MS uses the SCHs Primary Synchronization Code (PSC) to acquire slot synchronization to a cell. This is typically done with a single matched lter matched to the PSC which is common to all cells. The slot timing of the cell can be obtained by detecting peak values in the matched lter output. The starting position of the synchronization code may be determined from observations over one slot duration. However, decisions based on observations over a single slot may be unreliable, when the signal-to-noise ratio (SNR) is low or if fading is severe. Reliable slot synchronization is required to minimize cell search time. In order to increase reliability, observations are made over multiple slots and the results are then combined. This ensures that the correct slot boundary is identied.
During stage 2 of the cell search procedure, the MS uses the SCHs Secondary Synchronization Code (SSC) to achieve frame synchronization and identify the code group of the cell found in stage 1. This is done by correlating the received signal with all possible SSC sequences and identifying the maximum correlation value. Since the cyclic shifts of the sequences are unique, the code group as well as the frame synchronization is determined.
During stage 3 of the cell search procedure, the MS determines the exact primary scrambling code used by the cell. The primary scrambling code is typically identied through
21
symbol-by-symbol correlation over the CPICH with all codes within the code group identied in stage 2. In this stage, a threshold value is used to decide whether the code has been identied. The threshold value can be predetermined using a parameter called probability of false alarm rate [19].
This three stage cell search algorithm helps in simplifying the synchronization process of the MS with the BS. Each stage and their hardware implementation will be explained in the following Chapters.
Chapter 4
This Chapter describes the Improved CSD using a set of cyclic codes. The cyclic codes were proposed by Nortel networks to be used on the Secondary SCH [9]. These cyclic codes allow very efcient detection and improves the cell search in terms of acquisition time and hardware utilization. The three stage cell search design and their hardware implementation are explained in Sections 4.1, 4.2 and 4.3.
The MS rst needs to acquire the PSC which is common to all the BSs. These codes are of length 256 chips. The matched lter output is given by
255
Y =
R jC p j
(1)
j=0
where Rj is the jth sample of the received complex signal, and Cpj is the jth bit of the PSC
Hence, a traditional matched lter implementation would require 256 taps and a large
22
23
adder circuit. This would increase the delay as well as power consumption at the receiver which is not desirable. Thus, a hierarchical structure is proposed for performing the matched lter operations which will need lesser number of taps, reduced circuitry and lower power consumption [6]. The PSC consists of an unmodulated hierarchical sequence of length 256 chips, transmitted once every slot. The PSC is the same for every BS in the system and is transmitted time aligned with the slot boundary. The PSC is chosen to have good auto-correlation properties. This means that when the PSC sequence is correlated with itself, the interference from adjacent BSs is minimized and a high peak value is obtained.
The hierarchical sequences used for generating the PSC are constructed from two constituent sequences X1 and X2 of length n1 and n2, respectively, using the following equation Cp(n)=X1(n mod n2)+X2(n div n1) modulo 2, n=0,1,..,(n1*n2)-1 where n1=n2=16. The constituent sequences X1 and X2 are both dened as: X1=X2=(1,1,-1,-1,-1,-1,1,-1,1,1,-1,1,1,1,-1,1) [9]. (2)
There are different techniques in which the hierarchical matched lter can be designed as shown in Table 1.
24
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
InData PSCH Code
X X X X X X X X X X X X X X X X
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
X X X X X X X X X X X X X X X X
Shift Register 1
Adder Tree 1
+ + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + +
5 levels of adders
+
1
PSCH Code
64 65
X
128 129
X
192 193
X
256
X
Shift Register 2
Adder Tree 2
+ + +
Result
Figure 3: Hierarchical Matched Filter (64 chip and 4 symbol accumulation)
The hierarchical matched lter consists of two concatenated matched lter blocks. The design using 64 taps is shown in Figure 3. This solution is not ideal because of the following reasons. First, the matched lter design requires 64 taps. Second, the design needs a 64-input adder as shown in Figure 3. A better solution is to use the design shown in Figure 4. Hence, in stage 1 of both the Improved CSD and the 3GPP-comma free CSD the hierarchical matched lter using 16 chip and 16 symbol accumulation is used.
25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Shift Register 1
+ + + + + + + + + + + + + + + +
Adder Tree 1
3 levels of adders
+
1 16 17 32 33 48 49 64 65 80 81 96
PSCH Code
X X X X X X
176 177 192 193 208 209 224 225 240 241 256
X X X X X X
Shift Register 2
+ Adder Tree 2
+
+
3 levels of adders
Result
Figure 4: Hierarchical Matched Filter (16 chip and 16 symbol accumulation)
In this design, the rst matched lter receives the input signals serially from the BS. Correlation over X1 (16 chip accumulation) is performed before correlation over X2 (16 symbol accumulation). However, the two matched lters can be interchanged and the selection is an implementation option. After 16 clock cycles when the shift register 1 is lled, the data stored in the shift register 1 is matched in parallel with the code applied to the taps of the matched lter (tap coefcients). The tap coefcients are the PSC sequences which are the same for all the BSs. Hence, the same matched lter structure can be used for all the BSs. The adder circuit is implemented as a tree structure with the 16 inputs applied in parallel. If the data bits in the shift register 1 match with the tap coefcients then the result of the adder tree will be the highest value possible (16 or greater). The second matched lter has a shift register 2 of size 256 registers. Only 16 taps are needed to
26
match every sixteenth value of the shift register 2. The result from the rst adder tree is stored in the shift register 2 of the second matched lter. After 256 clock cycles the shift register 2 in the second matched lter will be lled with the results from the rst matched lter. The data in the shift register 2 is then matched in parallel with the tap coefcients. The tap coefcients are the same as the PSC sequence. If the data bits match the code sequence then the result of the second adder tree will be 256 or greater in magnitude corresponding to the peak value. An advantage of this scheme is that no multiplier circuit is needed as the correlations can be performed using an adder/subtractor circuit.
Each memory cell in shift register 1 is 4-bits wide assuming that, at the input to the digital receiver, the signal is sampled with a 4-bit analog-to-digital (A/D) convertor. Shift register 2 is 8-bits wide to store the result from the rst adder tree block. For performing the correlation, it is not necessary to perform 16*16 operations but only 16+16 accumulation operations, which leads to a considerable reduction in hardware complexity. The hardware complexity of implementing the hierarchical matched lter is calculated as shown. In one slot period (2,560 chips), the receiver has to perform at least 81,920 complex additions per slot, (2,560*(16+16)). The traditional matched lter implementation without the hierarchical structure would require 256 complex additions. Thus, the hierarchical matched lter achieves a saving of a factor of 8 in terms of complex additions. From Figure 2, each slot has a duration of 0.67 msec (670 sec). The complexity of stage 1 in terms of real additions per second is 245 Madds/sec (8,1920*2/670). The incoming complex signal is divided into two components, the sine part called the "in-phase" (Iphase) and the cosine part called the "quadrature-phase" (Q-phase). The factor of 2 is for
27
the two branches I and Q of the complex signal. Thus, in stage 1 of the initial search, 8,1920 complex additions in 1 slot and computing power of 245 Madds/sec is needed.
Q-Phase
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
InData PSCH Code
X X X X X X X X X X X X X X X X
I-Phase
Shift Register 1
+ + + + + + + + + + + + + + + +
Adder Tree 1
+
Non-Coherent Detection Block
Shift Register 2
3 levels of adders
+
(.)2
1 16 17 32 33 48 49 64 65 80 81 96
PSCH Code
X X X X X X
176 177 192 193 208 209 224 225 240 241 256
X X X X X X
Comparator
+
+
3 levels of adders
(.) 2
There are two such hierarchical matched lters for the I and Q channels of the received complex signal as shown in Figure 5. The correlation results over I and Q channels are combined non-coherently over 1 slot duration and the result is stored in an accumulator which is implemented as a shift register. The output of the accumulator is given to a comparator block to detect the peak value corresponding to the slot boundary of the closest BS and the MS needs to synchronize with this BS. As the code can be affected by AWGN and fading, accumulation over multiple slots is needed to correctly identify the slot boundary. It is important that the slot boundary is correctly identied in order to avoid the cost of increased acquisition time in case the wrong slot boundary is given to stage 2.
28
The Secondary SCH consists of 15 sequences belonging to a family of cyclic codes (SSCs), each of length 256 chips. These SSCs are transmitted repeatedly in parallel with the Primary SCH. The procedure for constructing the cyclic codes is similar to that of the hierarchical sequence (equation 2) for the Primary SCH except that it uses specic sequences of length 16 from Table 2 for each code group.
The procedure for constructing the cyclic hierarchical sequence Csi,1 for slot 1 is exactly the same as constructing the hierarchical sequence Cp for the Primary SCH. The sequence Csi,1 for slot 1 will be referred to as the zero cyclic shift sequence as no shift is applied to the constituent sequence X1i. For slots 2 to 15, the cyclic codes are constructed from the two constituent sequences X1i,k-1 and X2i,k-1 of length n1 and n2 respectively using the following formula Csi,k(n)=X2i,k-1 (n mod n2)+X1i,k-1 (n div n1) modulo 2, n=0,1,..,(n1*n2)-1 where i is code group number, k=2,3,..,15 is slot number, n is chip number in slot, n1=n2=16, and the constituent sequences X1i,k-1 and X2i,k-1 in each code group i are chosen to be the following sequences from Table 2 [9]. (3)
29
The constituent sequence X2i,k-1 (inner sequence) is exactly equal to the base sequence X2i in every slot, i.e. X2i,k-1=X2i at all k. The constituent sequence X1i,k-1 (outer sequence) are formed from the base sequence X1i by cyclic right shifts of X1i on k-1 positions (from 0 to 15) clockwise for each slot number k, from 1 to 15. The generation of the cyclic codes can be understood clearly by considering the following example. For the rst code group the sequence is given by X11,0=(1,1,1,-1,-1,-1,1,-1,-1,1,1,-1,1,-1,1,1), k=1 for slot 1, No cyclic shift X11,1=(1,1,1,1,-1,-1,-1,1,-1,-1,1,1,-1,1,-1,1), k=2 for slot 2, cyclic right shift by 1 position X11,14=(1,-1,-1,-1,1,-1,-1,1,1,-1,1,-1,1,1,1,1), k=15 for slot 15, cyclic right shift by 14 positions.
30
The same procedure for forming the cyclic codes will be used for other code groups. Thus, for the 32 codes groups and 15 slots (in one frame), 512 different cyclic codes with a length of 256 chips each are constructed. In other words, each of the 32 code groups has 16 cyclic codes. This set of 512 (32X16) cyclic codes has good correlation properties that make it good candidates for the SSCs. Many pairs of cyclic codes are fully orthogonal as the cross correlation is zero, some pairs have small cross correlation properties. The cross correlation of each cyclic hierarchical sequence Csi,k with Cp code of Primary SCH is small. These 512 cyclic codes are unique for each code group/slot locations pair. Thus, it is possible to uniquely determine both the scrambling code group and the frame timing in the second stage of the initial cell search.
By identifying the code group/slot location pair that gives the maximum correlation value, the code group as well as the frame synchronization is determined. The output from the matched lter is given to a non-coherent block which computes the energy over I and Q channels and then gives the result to the comparator module as shown in Figure 6. One slot search period time (2,560 chips) is enough to uniquely identify the correct code group and the frame timing in the second stage of acquisition when the signal-to-noise ratio is high. This is one major difference with the 3GPP-comma free CSD where at least three slots are necessary to uniquely identify the correct code group and frame timing. The Improved CSD also uses a smaller size ROM 32X16 to store the cyclic codes as compared to the 3GPP-comma free CSD which uses a ROM of size 32X60 to store the comma free codes.
31
Q-Phase
I-Phase
Sampling Counter
Secondary Buffer
Shift Register 1
1 2 3 4 5 6 7 8 9 10111213 14 15 16
256
5X SysClock
XXXXXXXXXXXXXXXX
Code Register 1
+ + + + + + + + + + + + + + + + Adder Tree 1
+
Shift Register 2
1 2 3 4 5 6 7 8 9 10111213 14 15 16
5X SysClock
1 2 3
XXXXXXXXXXXXXXXX
(.) 2
32
+
+
Code Group
Slot ID
Matched Filter 2
(.) 2
Comparator
Stage 2 Complete
The input data samples for the Secondary SCH are stored in an input buffer with 256 complex memory cells called the Secondary Buffer as shown in Figure 6. These input data samples are produced after waveform matched ltering and sampling at the chip rate. The result from the hierarchical matched lter design is then given to a non-coherent module which is used to calculate the energy over I and Q channels and then give it to a comparator block.
The ROM-stored code sequences given in Table 2 are each tried in sucession before the data from the next slot comes in. The data in the shift register is latched till all these
32
sequences have been correlated. This is achieved in stage 2 of the Improved CSD scheme using two clocks, a slow clock called the system clock in the design and a fast clock which runs at 5X system clock. The sampling is performed at the slow clock rate (system clock). Once the data is latched in the buffer, the fast clock (5X system clock) is used to perform the correlations.
The comparator block gives the highest correlated code group from the Table 2 with the data sequence and also the number of shifts which have been applied to the code group sequence. The number of shifts is the same as the slot ID. From the slot ID the frame boundary can easily be identied because the number of slots in a frame is xed at 15.
After achieving code group and frame synchronization, the scrambling code is identied by correlating the symbols in the CPICH with all possible scrambling codes in the code group. The codes are generated using a scrambling code generator and the descrambling operation is carried out using a descrambler. The details of the scrambling code generator and the descrambler used in stage 3 of the cell search are explained in Sections 4.3.1 and 4.3.2 respectively.
33
sequences are constructed by combining two real sequences into a complex sequence [7]. Each of the two real sequences are constructed as the position wise modulo 2 sum of 38,400 chip segments of two binary sequences generated by means of two generator polynomials of degree 18. Let x and y be the two sequences respectively. The resulting sequences constitute segments of a set of Gold sequences. The x sequence is constructed using the primitive polynomial 1+X7+X18. The y sequence is constructed using the polynomial 1+X5+X7+X10+X18. The sequence depending on the chosen scrambling code number n is denoted as zn. Furthermore, let x(i), y(i) and zn(i) denote the ith symbol of the sequence x, y, and zn, respectively. The sequences x and y are constructed as x(i+18)=x(i+7)+x(i) modulo 2, i=0,1,..,218 - 20 (4) (5)
y(i+18)=y(i+10)+y(i+7)+y(i+5)+y(i) modulo 2, i=0,1,..,218 - 20 The nth Gold code sequence zn, n=0,1,..,218 - 2, is then dened as zn(i)=x((i+n) modulo (218 -1))+y(i) modulo 2, i=0,1,..,218- 2 (6)
Finally, the nth complex scrambling code sequence sn is dened as sn(i)=zn(i)+jzn((i+131,072) modulo (218-1)), i=0,1,..,38,399 (7)
The pattern from phase 0 up to the phase of 38,399 is repeated for every radio frame.
34
+
10 9 8 7 6 5 4 3 2 1 0 171615141312 11
I Channel Code
+
10 9 8 7 6 5 4 3 2 1 0 171615141312 11
Q Channel Code
+
Figure 7: Scrambling Code Generator
The scrambling code generator used to generate the long codes is shown in Figure 7. A total of 218 -1=262,143 scrambling codes, numbered 0,1,..,262,142 can be generated using the code generator. However not all the scrambling codes are used. The scrambling codes are divided into 512 sets each of a primary scrambling code and 15 secondary scrambling codes. The primary scrambling codes consist of scrambling codes n=16*i where i=0,1,..,511. The ith set of secondary scrambling codes consists of scrambling codes 16*i+k, where k=1,2,..,15. There is a one-to-one mapping between each primary scrambling code and 15 secondary scrambling codes in a set such that ith primary scrambling code corresponds to ith set of secondary scrambling codes. The set of primary scrambling codes is further divided into 32 scrambling code groups, each consisting of 16 primary scrambling codes. The jth scrambling code group consists of primary scrambling codes 16*16*j+16*k, where j=0,1,..,31 and k=0,1,..,14.
35
In stage 3, 16 scrambling codes need to be generated in parallel. If the scrambling code generator shown in Figure 7 is used to generate the codes then 16 such code generators would be required. However, generating the codes in parallel using 16 code generators could be expensive as a huge ROM would be required to store the initial phases for all the 16 code generators.
LFSR 1
7
+
Initial Phases for Code generator
1 2
+
+
5
I Channel Code
17
10
+
7
Q Channel Code
...
Masking Function for I Channel
LFSR 2
...
32
ROM 32 X 18
Figure 8: Multiple Scrambling Code Generator
36
In order to reduce the hardware utilization, in stage 3 of both the designs only one scrambling code generator is used to generate 16 codes in parallel when 32 code groups are used as shown in Figure 8. Sixteen masking functions are used to generate the codes in parallel [15]. Masking functions can generate codes which have minimum overlap and reduce the hardware circuitry to a single scrambling code generator at the expense of a few logic gates. The masking functions used for generating the codes are given in Table 3. Masking function for I and Q Channel Code in linear feedback shift register (LFSR) 2 were kept xed as 000000000000000001 and 001111111101100000. Besides reducing the hardware from 16 code generators to one code generator, the design also reduces the ROM size to 32X18 from the size 512X18 if 16 code generators were used.
4.3.2 Descrambler
Descrambling is carried out using data over the CPICH and the codes generated by the scrambling code generator and masking functions. Counters are used as shown in Figure 9 to keep track of the votes obtained after the descrambling and the comparison operations. After these operations are completed, the nal step is to decide whether cell search
37
has been successful and a code has been found. For this purpose a parameter called probability of false alarm rate (PFA) is used to predene the threshold value (VTH) [19]. The relation can be expressed by the following equation PFA=e-VTH/V (8)
If the counter exceeds VTH then the cell search operation is declared a success and the particular long code is identied.
Descrambler 16
Descrambler Output 16
counter 15..16
Descrambler 3 Descrambler 2
counter 13..14
I Channel Code
Descrambler 1
counter 11..12
counter 10..9
17
+
Q Channel
(.) 2
+
+
Data
Q Channel Code
+ +
I Channel Code
X
counter 7..8
Code
Found
Increment
Counter
counter 5..6
17
+
Q Channel Code
5
+
0
Q Channel Code
Descrambler
Long Code
counter 3..4
1 2
I Channel X Data
Output1
. . .
+
(.) 2
counter 1..2
Threshold
. . .
Data
I Channel Code
X
Descrambler Output 1
Value
32
ROM 32 X 18
38
Chapter 5
This Chapter discusses stage 2 of the 3GPP cell search design using comma free codes. Stage 1 and stage 3 for the 3GPP-comma free CSD design were kept the same as the Improved CSD to compare stage 2 of both the designs. A Fast Hadamard Transformer (FHT) is proposed to be used in stage 2 of the cell search algorithm. To reduce the hardware utilization of the FHT design, reduced length Walsh sequences are proposed as explained in Section 5.1.
In CDMA systems, the BS identies each user in a cell by a unique scrambling code. In order to minimize the interference in a cell when two users transmit at the same time, orthogonal (Walsh) codes are used. The Walsh codes are generated using a Walsh-Hadamard function. When these Walsh codes are transmitted by the BS, they are affected by interference, fading and noise which may be AWGN. At the receiver, a decoding logic is required to correctly determine which of the Walsh codes was the most likely to have been sent. A FHT can be used to provide such a decoding circuitry.
The table provided in the 3GPP Specications for the comma free codes is for 64 code
39
40
groups. For comparison with the Improved CSD scheme which uses 32 code groups, only 32 of the possible 64 code groups are used. The 32 secondary SCH sequences are constructed such that their cyclic shifts are unique, i.e., a non-zero cyclic shift less than 15 of any of the 32 sequences is not equivalent to some cyclic shift of any other of the 32 sequences. Also, a non-zero cyclic shift less than 15 of any of the sequences is not equivalent to itself with any other cyclic shift less than 15. Table 4 lists the sequences of SSCs used to encode the 32 different scrambling code groups [7].
41
The 16 SSCs, (Cssc,1,..,Cssc,16), are complex-valued with identical real and imaginary components, and are constructed from position wise multiplication of a Hadamard sequence and a sequence z, dened as z=(b,b,b,-b,b,b,-b,-b,b,-b,b,-b,-b,-b,-b,-b), where b=(1,1,1,1,1,1,-1,-1,1,-1,1,-1,1,-1,-1,1). The Hadamard sequence is obtained from one of the rows of a Hadamard matrix which consists of +1 and -1. The rows and columns of the Hadamard matrix have the property that they are mutually orthogonal. The following examples show how to construct a Hadamard matrix H2 = 1 1 1 1 1 H4 = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
where HN is a matrix of size N X N. If a vector X with length N is an input then a vector Y obtained as a result of the Hadamard transform is equal to Y=HN*X (10)
42
The entries in Table 4 denote what SSC to use in the different slots for the different scrambling code groups, e.g. the entry "5" means that SSC Cssc,5 shall be used for the corresponding scrambling code group and slot. The kth SSC, Cssc,k k=1,2,..,16 can be calculated using the following expression: Cssc,k=(1+j)(Hm(0)z(0),Hm(1)z(1),Hm(2)z(2),..,Hm(255)z(255)) where m=16(k-1) (11)
As each element of the Hadamard matrix is either +1 or -1, the multiplication operation used in equation 11 can be reduced to a series of addition/subtraction operations. In general, for a N-point input sample, the FHT algorithm needs to perform Nlog2N addition and subtraction operations.
+ +
0 1
1 2
En
+ Enable
1 2
of FHT
Figure 10 shows an individual stage of the FHT. Each stage has an upper and a lower input terminal. The upper input terminal is congured to receive multiple input signals which are either Walsh chips (if the stage is the rst stage of the FHT) or intermediate correlation coefcients (if the stage is not the rst stage of the FHT). If an input of N-Walsh chips is to be processed then the upper input terminal receives N/2 input signal bits and the lower input terminal receives the other N/2 input bits.
Sampling Counter
Buffer
1 2 3 4 5 6 7 8 9 10 111213 14 15 16
Data to FHT
Phase 2
1
Adder
1
Phase 3
0
1 2
0
Phase 4
1
Phase 5
+ +
0
1
1
En En En
1 2
1 0
1 0
+ +
0
+ +
1 0
+ 1 23 4
+ -
+ -
Adder/Subtractor
+ Comparator
Slot1 Slot2 Slot3
Detector
MSB (C2)
Slot15
1 1 1 2 2 1 1 5 3 1 2 1
16 10 12
y1 y2
y1+y2 y1-y2
(y1+y2)+(y3+y4) (y1+y2)-(y3+y4)
Input Phase 1
Phase 2
Phase 3
Phase 4
32 2 6 9
43
44
Figure 11 shows the design for a FHT structure which is used for decoding a 16 chip sequence. The design proposed is a very compact and efcient implementation as compared to previous designs [13] [14]. The inputs to the FHT are applied according to the timing diagram as shown in Table 5. The inputs are applied in a non-sequential order and hence a buffer is required to initially store the vectors before passing them to the FHT structure. If a 16 chip sequence needs to be decoded then a buffer of length 16 registers is required to initially store the vectors. The addition and subtraction operations in the FHT algorithm are used to generate correlation coefcients for the received Walsh code. The correlation coefcients express the likelihood that a received codeword is the correct Walsh code.
Input
Phase 1
y1
y1+y2
y2
y1-y2
y3
y3+y4
y4
y3-y4
y5
y5+y6
y6
y5-y6
y7
y7+y8
y8
y7-y8
y9
y9+y10
y10
y9-y10
y11
y11+y12
y12
y11-y12
y13
y13+y14
y14
y13-y14
y15
y15+y16
y16
y15-y16
45
46
The correlation coefcients are also called the Hadamard code metrics and are generated as shown in Figure 12 for a 16-point FHT. This operation is also called the buttery operation. The buttery operation is also used in other digital signal processing (DSP) applications such as calculating the discrete fourier transform (DFT). The Walsh code having the largest metric is then selected as the most likely code that will be transmitted.
It is the job of the detector to nd which of the code groups and slot ID is being used from the table provided in the 3GPP specications [7], using the three Hadamard rows (Walsh codes). The detector needs to identify the code group in the minimum amount of time which uses a lot of hardware resources. Also, if the correct sequence of Hadamard rows is not identied and given to the detector then it can lead to wastage of additional clock cycles as it will try to nd the sequence from the table provided in the 3GPP specications. The detection circuitry is used to locate the sequence from the table and hence nd the code group and slot ID. Also, in the 3GPP-comma free CSD implementation, two clocks are not needed. Even if two clocks are used, a marginal gain will be achieved only in the detection phase 5 as shown in Figure 11. This is due to the fact that detection of the code group and slot ID cannot start till at least three slots have been identied by phases 1 - 4.
There are a number of stages in the FHT design depending on the length of the Walsh sequence. Each subsequent stage receives an input from the previous stage in half the number of clock cycles required for the previous stage. This is achieved by reducing the length of shift register by a factor of two for each subsequent stage of the FHT.
47
A counter is used as a clock to determine the time interval at which each successive pair of input signals is received by the FHT. The upper shift registers in each of the stages are always enabled whereas the lower shift registers are enabled by the bits of the counter. The length of the counter register is dependent on how many stages are there in the FHT. The counter bit C0 is the LSB and C2 is the MSB. Counter bit C2 is alternately high for four clock cycles and then goes low for four clock cycles (000...011, 100...111). The bit C0 is alternately high and low for each clock cycle (000,001,...etc.). The number of bits in the counter depend on the number of stages, which in turn depends on the length of WalshHadamard sequence to be used. If there are N Walsh chips then the counter length must be log2N bits. The length of the shift register in each of the stage s of the design is given by the following relation (N/4)/2s. For example the length of the shift registers used in the rst stage of the FHT is (16/4)/20=4. Similarly, the length of registers used in other stages can be calculated.
In the rst stage, the input signals corresponding to Walsh chips 0 to 7 arrive at the upper adder whereas the Walsh chips from 8 to 15 are applied to the adder/subtractor circuit in the lower half of stage 1. During the rst four clock cycles, the data bits from the adder unit are selected by the multiplexer 1 in stage 1. The lower shift register of stage 1 is enabled to store the outputs from the adder/subtractor unit. Thus at the end of four clock cycles, the upper shift register stores the result of addition of the rst four pairs whereas the lower shift register stores the result of subtraction. In the fth clock cycle, C2 goes high which disables the lower shift register in stage 1. The result of the upper shift register in stage 1 and the adder output from stage 1, which gives the addition of a new
48
pair of inputs, is then passed onto the adder and adder/subtractor unit in stage 2. Thus, each subsequent stage receives its input from the previous stage. This process is then repeated for each of the other stages in the FHT. At the end of eight clock cycles, all of the 16 correlation coefcients are generated and the largest coefcient is selected as the most likely Walsh-Hadamard codeword to have been transmitted. The design is exible and can be easily modied to incorporate any chip sequence which has a length of a power of two.
If the 256X256 matrix is observed carefully then it is noticed that the 256 chip sequence can be identied by 16 chip sequences shown in Table 6.
Table 6: Reduced Length Walsh Sequences (256 chip sequence to 16 chip sequence)
Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 3 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 4 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 5 1 1 1 1 -1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1 6 1 -1 1 -1 -1 1 -1 1 1 -1 1 -1 -1 1 -1 1 7 1 1 -1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1 1 1 8 1 -1 -1 1 -1 1 1 -1 1 -1 -1 1 -1 1 1 -1 9 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 10 1 -1 1 -1 1 -1 1 -1 -1 1 -1 1 -1 1 -1 1 11 1 1 -1 -1 1 1 -1 -1 -1 -1 1 1 -1 -1 1 1 12 1 -1 -1 1 1 -1 -1 1 -1 1 1 -1 -1 1 1 -1 13 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 14 1 -1 1 -1 -1 1 -1 1 -1 1 -1 1 1 -1 1 -1 15 1 1 -1 -1 -1 -1 1 1 -1 -1 1 1 1 1 -1 -1 16 1 -1 -1 1 -1 1 1 -1 -1 1 1 -1 1 -1 -1 1
49
Thus in a CDMA receiver, only the rst 16 chips of the entire Walsh sequence can be used. The buffer, which is used to store the input value, will also be reduced in length from 256 to 16 registers. The proposed design ideas lead to considerable savings in hardware resources. The reduced length Walsh sequence helps in achieving faster decoding. The two designs were synthesized and the hardware resources utilized were compared on a Xilinx Virtex-E XCV1000E FPGA.
Chapter 6
This Chapter explains the method used to measure the acquisition time for both of the cell search designs, Improved CSD and the 3GPP-comma free CSD. Section 6.1.1 provides details of the FPGA used for prototyping the algorithms and for comparing the hardware specications of both designs. Section 6.2 presents the results of the acquisition time measure and the hardware comparison. Section 6.2 also compares the hardware utilization of the FHT design using 256 and 16 chip sequences.
The acquisition time was measured by counting the number of clock cycles used by the RTL simulation. The input chip rate is given by the 3GPP specications and this gives the acquisition time measure. For comparing the hardware specications and the maximum frequency of operation of both designs on the FPGA, the Xilinx Foundation ISE software was used to generate the bit map le for programming the FPGA. The details of the FPGA and the design process used for the hardware comparison are explained in Section 6.1.1.
50
51
52
53
The entire design was coded in Verilog at the Register Transfer Level (RTL). The RTL design was then synthesized using the Synopsys FPGA Express synthesis tool available with the Foundation ISE software. The bit map generated was then used to program the FPGA using the JTAG cable.
To compare the acquisition time between the Improved CSD and the 3GPP-comma free CSD, experiments were carried out using input vectors generated in Matlab. Threshold values determined for the two probabilities of false alarm rates (PFA=10-3 and PFA=10-4) were 28 and 37 respectively. The number of clock cycles between the start of the system and the point when the counter in stage 3 exceeds the computed threshold values was determined. The equivalent gate count and maximum frequency of operation were compared for both the designs using a 256 chip sequence in stage 2 and the same design constraints in the FPGA Express synthesis tool on a Xilinx Virtex-E XCV1000E FPGA.
From the experiments conducted, it was observed that the Improved CSD uses fewer number of slots to achieve synchronization as compared to the 3GPP-comma free CSD in stage 2. The results obtained indicate that when averaging is carried out over 15 slots in stage 1 of both the designs (PFA1=10-3 and VTH1=28), the Improved CSD has an acquisition time of 13.66 msec as compared to 14.53 msec for the 3GPP-comma free CSD. Thus, the Improved CSD achieves an improvement of 0.87 msec for an AWGN channel (Figure
54
15). Similarly, an improvement of 0.87 msec was observed when PFA2=10-4 and VTH2=37. Figures 15 and 16 show the acquisition time measures for 2,4,8 and 15 slots in stage 1 of the design. The number of slots in the other stages, as discussed in previous Chapters, were kept xed as 1 slot in stage 2 of the Improved CSD and three slots in 3GPP-comma free CSD and 15 slots in stage 3 of both designs.
55
14
12
10
10
12
14
16
Figure 15: Comparison of Improved CSD and 3GPP-comma free CSD PFA=10-3
Acquisition Time Measures: Quantization 4 Input Data Bits 16
14
12
10
10
12
14
16
Figure 16: Comparison of Improved CSD and 3GPP-comma free CSD PFA=10-4
56
As seen from Table 7, the Improved CSD had a lower equivalent gate count (136,297) and a higher maximum frequency of operation (22.066 MHz) on a Xilinx Virtex-E XCV1000E FPGA as compared to the 3GPP-comma free CSD when the same constraints were used in the synthesis of both the designs.
In the FHT design, the input Walsh sequence length can be reduced from 256 chips to 16 chips to reduce the hardware utilization. The proposed idea leads to considerable savings in hardware resources. The buffer, which is used to store the input value, is reduced in length from 256 to 16 registers. The reduced length Walsh sequence helps in achieving faster decoding. The FHT designs using 16 and 256 chip sequences were synthesized and the hardware resources utilized were compared using a Xilinx Virtex-E XCV1000E FPGA. The hardware utilization for both the FHT designs are compared in Table 8.
The results of the reduced length sequence indicate that the FHT design, using 16 chip sequence, achieves 90% reduction in hardware resources (equivalent gate count) as compared to the design which uses 256 chip sequence. Also, the maximum frequency of oper-
57
ation of the 16 chip FHT (35.679 MHz) is more than double that of the 256 chip FHT (16.025 MHz).
Chapter 7
In this Chapter the conclusions drawn form the experimental results are summarized and the scope for future work is outlined.
7.1 Summary
In Chapter 2, we discussed some of the previous work done by other research groups and also the 3GPP working group suggestions. Chapter 3 introduced the cell search algorithm, which is divided into three stages to simplify the synchronization between the MS and the BS. Chapter 4 discussed the Improved CSD which is the proposed design scheme to perform initial cell search. The hierarchical matched lter design proposed by Siemens and Texas Instruments was used in stage 1 of both the cell search designs [6]. In stage 2 of the initial cell search algorithm, two possible design schemes were compared: the Improved CSD which uses cyclic codes and the 3GPP-comma free CSD using the comma free codes. The details of the Improved CSD are described in Chapter 4. In stage 3 of both the cell search designs, masking functions are proposed to reduce the hardware utilization as compared to the previous design described by Li et al. [4]. Chapter 5 described the 3GPP-comma free CSD using a FHT design in stage 2 of the cell search algorithm. Further design improvements are suggested in the FHT design by reducing the length of
58
59
the input Walsh sequence from 256 chips to 16 chip sequences. Chapter 6 discussed the experimental method and presented the results in terms of acquisition time and hardware utilization for both the Improved CSD and the 3GPP-comma free CSD. The hardware utilization of the FHT design using 256 chip sequences and the reduced length (16 chip sequences) are also presented.
7.2 Conclusions
For an AWGN channel model in a high signal-to-noise ratio environment, it was found that accumulation over one slot in the Improved CSD scheme and accumulation over three slots in the 3GPP-comma free CSD scheme in stage 2 of the cell search algorithm gives correct code group and slot boundary identication. Due to the reduction in the required number of slots, the Improved CSD uses lesser number of clock cycles in stage 2 as compared to the 3GPP-comma free CSD to detect the code group and slot ID. This reduction in the number of clock cycles leads to faster acquisition, fewer calls getting dropped and lower power consumption during the synchronization between the MS and the BS. The use of cyclic codes in the Improved CSD has lower hardware utilization and a higher maximum frequency of operation as compared to the 3GPP-comma free CSD. In conclusion, the Improved CSD is a better cell search design in comparison to the 3GPP-comma free CSD since it has faster acquisition time and lower hardware utilization.
60
This thesis investigates code and time synchronization of the cell search algorithm. In addition to code and time synchronization, frequency synchronization between the MS and the BS needs to be achieved. The receiver design presented in this thesis would need to include another module to achieve frequency synchronization. Also, the cell search considered in this thesis is initial cell search. There is another cell search called target cell search which needs to be performed during a call and when a MS is in motion and moves from one cell to another. VLSI implementations to perform target cell search efciently need to be investigated.
Kiessling et al. [21] suggest performance enhancements to W-CDMA initial cell search algorithm. The authors consider the advantages of oversampling and passing multiple candidates in the cell search stages instead of one candidate to reduce the cell search time. Passing multiple candidates in each of the stages will reduce the cell search time but increase the design complexity and hardware utilization. The improved cell search time at the expense of increased hardware utilization needs to be studied.
The results presented in this thesis show that for an AWGN channel model in a high signal-to-noise ratio environment, the Improved CSD achieves faster synchronization at lower hardware complexity in comparison to the 3GPP-comma free CSD. Future work needs to investigate how the Improved CSD compares with the 3GPP-comma free CSD under multipath channel conditions.
References
8.0 References
[1] [2]
T. Ojanper and R. Prasad, An overview of air interface multiple access for IMT-2000/UMTS, IEEE Commun. Mag., vol. 36, pp. 82-95, Sept. 1998. E. Dahlman, P. Beming, J. Knutsson, F. Ovesj, M. Persson, and C. Roobol, WCDMA The radio interface for future mobile multimedia communications, IEEE Trans. Veh. Technol., vol. 47, pp. 11051118, Nov. 1998.
Yi-Pin Eric Wang and Tony Ottosson, Cell Search in W-CDMA, IEEE J. Select. Areas in Commun., vol. 18, no. 8, pp. 1470-1482, August 2000. Chi-Fang Li, Wern-Ho Sheen, Ho, J.J.-S.,Yuan-Sun Chu, ASIC design for cell search in 3GPP W-CDMA, Fall. IEEE VTC 2001, vol 3, pp. 1383-1387. R. L. Peterson, R. E. Ziemer, and D. E. Borth, Introduction to Spread Spectrum Communication, Englewood Cliffs, NJ, Prentice-Hall, 1995. Siemens and Texas Instruments, Generalised Hierachical Golay Sequence for PSC with low complexity correlation using pruned efficient Golay correlators, TSG-RAN Working Group1 Meeting 5, TSGR1-554/99.
3GPP RAN TS 25.213 v4.0.0 (2001-03) Technical specification group radio access network: spreading and modulation (FDD), www.3GPP.org, Release 4. 3GPP RAN TS 25.214 v4.0.0 (2001-03) Technical specification group radio access network: Physical layer procedures (FDD), www.3GPP.org, Release 4. Nortel Networks, Synchronization Channel with cyclic hierarchical sequences, TSG-RAN Working Group1 Meeting 2, TSGR1#2(99)090.
[10] K. Higuchi, M. Sawahashi and F. Adachi, Fast cell search algorithm in DS-CDMA mobile using long spreading codes, in Proc. IEEE 1997 Veh. Technol. Conf., Phoenix, AZ, May 1997, pp. 1430-1434. [11] Nystrom, K. Jamal, Y.-P. E. Wang, and R. Esmailzadeh, Comparison of cell search methods for asynchronous wideband CDMA cellular system, in Proc. IEEE, Int. Conf. Universal Personal Commun., Florence, Italy, Oct. 1998. [12] Ericsson, New downlink scrambling code grouping scheme for UTRA/FDD, TSG-RAN Working Group1 Meeting 6, TSGR1#6(99)884. [13] A. Amira, A. Bouridane, P. Milligan and M. Roula, Novel FPGA implementations of Walsh-Hadamard transforms for signal processing, in Proc. IEE Vision, Image and Signal Processing, Dec. 2001, vol. 148, no. 6, pp. 377-383. [14] S.S. Nayak and P.K. Meher, High throughput VLSI implementation of discrete orthogonal transforms using bit-level vector-matrix multiplier, IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, Volume 46, Issue 5, May 1999, pp. 655-658.
61
62
[15] Siemens, A modified generator for Multiple-Scrambling Codes, TSG-RAN Working Group1 Meeting 7, TSGR1#7(99)B87. [16] B.M. Popovic, Efficient Golay correlator , IEEE Electronics Letters, vol. 35, no.17, Aug. 1999, pp. 1427 -1428. [17] 3GPP RAN TS 25.211 v4.0.0 (2001-03) Technical specification group radio access network: Physical channels and mapping of transport channels onto physical channels (FDD), www.3GPP.org, Release 4. [18] Harri Holma, Antti Toskala, WCDMA for UMTS: Radio Access for Third Generation Mobile Communications, John Wiley, 2000. [19] A. J. Viterbi, CDMA: Principles of Spread Spectrum Communication, Addison Wesley, 1995. [20] Xilinx The Programmable Logic Data Book 2000. [21] M. Kiessling and S.A. Mujtaba, Performance Enhancements to the UMTS (W-CDMA) Initial Cell Search Algorithm, IEEE International Conference on Communications, vol. 1, May 2002, pp. 590594.
63