0% found this document useful (0 votes)
69 views63 pages

Initial Cell Serch Paper

The document compares two initial cell search algorithms for W-CDMA systems: the Improved Cell Search Design (Improved CSD) using cyclic codes and the 3GPP Cell Search Design using comma free codes (3GPP-comma free CSD). Through FPGA implementation and experiments with additive white gaussian noise, the Improved CSD achieved faster synchronization with the base station and had lower hardware utilization than the 3GPP-comma free CSD under the same design constraints. Key stages of the cell search algorithms included slot synchronization, frame synchronization, code group identification, and scrambling code identification.

Uploaded by

acidwarrior
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views63 pages

Initial Cell Serch Paper

The document compares two initial cell search algorithms for W-CDMA systems: the Improved Cell Search Design (Improved CSD) using cyclic codes and the 3GPP Cell Search Design using comma free codes (3GPP-comma free CSD). Through FPGA implementation and experiments with additive white gaussian noise, the Improved CSD achieved faster synchronization with the base station and had lower hardware utilization than the 3GPP-comma free CSD under the same design constraints. Key stages of the cell search algorithms included slot synchronization, frame synchronization, code group identification, and scrambling code identification.

Uploaded by

acidwarrior
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Comparison of Initial Cell Search Algorithms for W-CDMA Systems

by

Sanat Kamal Bahl

Thesis submitted to the Faculty of the Graduate School of the University of Maryland in partial fulllment of the requirements for the degree of Master of Science 2002

Title of Thesis:

Comparison of Initial Cell Search Algorithms for W-CDMA Sytems Sanat Kamal Bahl, Master of Science, 2002

Thesis directed by:

James F. Plusquellic Assistant Professor


Dept. of Computer Science and Electrical Engineering

ABSTRACT In this thesis, an Improved Cell Search Design (Improved CSD) using cyclic codes is compared with the 3GPP Cell Search Design using comma free codes (3GPP-comma free CSD) in terms of (1) hardware utilization on a eld programmable gate array (FPGA) and (2) acquisition time for different probabilities of false alarm rates. Our results indicate that for a channel whose signal-to-noise ratio is degraded with additive white gaussian noise (AWGN), the Improved CSD achieves faster synchronization with the base station and has lower hardware utilization when compared with the 3GPP-comma free CSD scheme under the same design constraints.

Table of Contents
1.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2.0 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.0 Cell Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1 Synchronization Channels in W-CDMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Cell Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.1 Stage 1: Slot Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.2 Stage 2: Frame Synchronization and Code Group Identification . . . . . . . . 13 3.2.3 Stage 3: Scrambling Code Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.0 Improved Cell Search Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1 Stage1: Slot Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Stage2: Frame Synchronization and Code Group Identification . . . . . . . . . . . . . 21 4.3 Stage3: Scrambling Code Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.3.1 Scrambling Code Generator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.3.2 Descrambler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.0 3GPP-comma free Cell Search Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.1 Stage 2 of 3GPP-comma free Cell Search Design . . . . . . . . . . . . . . . . . . . . . . . . 32 5.2 Reduced Length FHT Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.0 Experimental Method and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.1 Experimental Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.1.1 FPGA Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 7.0 Summary, Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 7.2 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 8.0 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

List of Abbreviations
AMPS ASIC A/D AWGN BS Cp Cssc Cs CLB CPICH D/A DFT DSP DS-CDMA FHT FPGA GIC GPS GSM LC LFSR LUT MS PSC P-SCH SSC SNR Advanced Mobile Phone Service Application Specific Integrated Circuit Analog-to-Digital Additive White Gaussian Noise Base Station Primary Synchronization Code Secondary Synchronization Code Cyclic Hierarchical Sequence Configurable Logic Block Common Pilot Channel Digital-to-Analog Discrete Fourier Transform Digital Signal Processing Direct Sequence-Code Division Multiple Access Fast Hadamard Transformer Field Programmable Gate Array Group Indicator Code Global Positioning System Global System for Mobile communication Logic Cell Linear Feedback Shift Register Look-Up Table Mobile Station Primary Synchronization Code Primary Synchronization Channel Secondary Synchronization Code Signal-to-Noise Ratio

SCH S-SCH 3G 3GPP TIA W-CDMA

Synchronization Channel Secondary Synchronization Channel Third Generation Third Generation Partnership Project Telecommunications Industry Association Wideband-Code Division Multiple Access

List of Figures
Figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 1 2 3 4 5 6 7 8 9 DS-CDMA Transmitter-Receiver Block Level Diagram . . . . . . . . . . . . . . . . . . . . . . 3 Synchronization Channels in Cell Search. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Hierarchical Matched Filter (64-chip and 4-symbol accumulation). . . . . . . . . . . . . . 17 Hierarchical Matched Filter (16-chip and 16-symbol accumulation). . . . . . . . . . . . . 18 Slot Boundary Detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Frame Synchronization and Code Group Identification . . . . . . . . . . . . . . . . . . . . . . . 24 Scrambling Code Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Multiple Scrambling Code Generator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Scrambling Code Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

10 Individual Stage of FHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 11 16 chip FHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 12 Hadamard Code Metrics (Butterfly Operation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 13 2-Slice Virtex-E CLB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 14 Detailed View of Virtex-E Slice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 15 Comparison of Improved CSD and 3GPP-comma free CSD PFA=10-3 . . . . . . . . . . . 48 16 Comparison of Improved CSD and 3GPP-comma free CSD PFA=10-4 . . . . . . . . . . . 48

List of Tables
Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 1 2 3 4 5 6 7 8 Hierarchical Matched Filter (16 and 64-chip Accumulation). . . . . . . . . . . . . . . . . . . 16 Sequences X1,i and X2,i for Code Groups 1 to 32. . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Masking Functions used in Stage 3: Scrambling Code Generator . . . . . . . . . . . . . . . 28 Allocations of SSCs for Secondary SCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Timing Diagram of Inputs to FHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Reduced Length Walsh Sequences (256 chip sequence to 16 chip sequence) . . . . . . 41 Hardware Specifications of System: Quantization 4 Input Data Bits. . . . . . . . . . . . . 49 Hardware Specifications of FHT: 16 and 256 chip sequence. . . . . . . . . . . . . . . . . . . 49

Chapter 1

Introduction
1.0 Introduction

First generation (1G) mobile communications systems were based on analog technology and started in the early to mid 1980s. These 1G systems had a number of limitations which included (1) low quality voice service, (2) limited capacity and (3) inability to provide global roaming.

Digital second generation (2G) systems were then developed in Europe and US. The various second generation systems included (1) Global System for Mobile communication (GSM) which utilizes time division multiple access (TDMA). In TDMA each user is assigned a particular time slot. (2) The TDMA/136 specication which was dened in the US, in 1988, by the Telecommunications Industry Association (TIA), developed with the aim of digitizing the analog Advanced Mobile Phone Service (AMPS). (3) In the US, IS95 was proposed for 2G systems, to provide better voice quality and higher capacity. IS95 was based on CDMA technology. However, different 2G technologies were not interoperable and not available across geographic areas. In addition, the low bit rate of 2G systems could not meet subscriber demands for multimedia services. Third generation (3G) systems aim to solve these problems encountered with 2G systems, by promising global roaming across 3G standards, higher data rates, improved quality of service and

support for multimedia applications. The most popular candidates for 3G cellular systems are CDMA2000 and Wideband-Code Division Multiple Access (W-CDMA) [1] [2]. Both of these schemes are based on Direct Sequence-Code Division Multiple Access (DSCDMA) technology. In DS-CDMA, the data signals are directly modulated by a digital code signal.

In a spread spectrum CDMA system, the transmitted signal is spread over a wide frequency band that is wider than the minimum bandwidth required to transmit the information being sent. In a typical scenario where there are multiple users or mobile stations (MSs) in a cell, each user has a unique scrambling code. This scrambling code should be such that it has low cross correlation properties with the other user codes. The signal received by the MS from the transmitting base station (BS) is correlated with the users scrambling code. This despreads only the signal of that particular user whereas the other spread spectrum signals will remain spread. A block diagram of a DS-CDMA transmitter and receiver is shown in Figure 1. Spreading consists of multiplying the input data by a scrambling code sequence whose bit rate is much higher than the data bit rate. At the receiving side the signal is multiplied with the same scrambling code sequence that is exactly synchronized to the received code sequence. The Encoding block shown in Figure 1 is used to add error correcting bits and to perform interleaving in order to protect information bits from channel noise and interference. The reverse operations are performed in the Decoding stage at the receiver.

10

D/A

A/D

Baseband Data

Encoding

Decoding

Baseband Data

Scrambling Code Generator

Scrambling Code Generator

Scrambling Code Synchronization Transmitter Receiver

Figure 1: DS-CDMA Transmitter-Receiver Block Level Diagram

The main difference between W-CDMA and CDMA2000 is that W-CDMA supports asynchronous BSs whereas CDMA2000 relies on synchronized BSs. Synchronous CDMA systems need an external time reference. A Global Positioning System (GPS) clock can be used by all BSs to synchronize their operations. This allows the MS to use different phases of the same scrambling code to distinguish between adjacent BSs. In an asynchronous CDMA system, each BS has an independent time reference, and the MS, does not have prior knowledge of the relative time difference between various BSs. The advantage of asynchronous operation is that it eliminates the need to synchronize the BSs to an accurate external timing source. However, since there is no external time synchronization between the adjacent BSs, different phases of the same code cannot be used to distinguish

11

adjacent BS. Thus, in an asynchronous CDMA system, adjacent BSs can only be identied by using distinct scrambling codes. Consequently, cell search, which involves the process of achieving code, time and frequency synchronization of the MS with the BS, takes longer in comparison to a synchronous CDMA system. Cell search is complicated in the presence of signals which are intended for other mobile systems within a cell as well as signals from other BSs. Thus, it is very important to develop algorithms and hardware implementations to perform cell search using lower acquisition time and minimum hardware resources for asynchronous CDMA systems.

Cell search is performed according to the algorithm proposed by Wang et al. [3]. In the proposed cell search algorithm, code and time synchronization is achieved assuming a large frequency error and after achieving code and time synchronization, frequency synchronization is performed. In this study we consider the problem of achieving code and time synchronization. The process of achieving code and time synchronization in the cell search algorithm for W-CDMA systems is divided into three stages (1) slot synchronization, (2) frame synchronization and code group identication, and (3) scrambling code identication. This thesis presents a 3G Partnership Project (3GPP) cell search design using cyclic codes (Improved CSD) to achieve faster synchronization at lower hardware complexity. The second part of this thesis compares the two design algorithms for performing initial cell search: the Improved CSD and the 3GPP cell search design using comma free codes (3GPP-comma free CSD) in terms of (1) acquisition time measure and (2) hardware specications on a Xilinx Virtex-E XCV1000E eld programmable gate array (FPGA). The thesis also proposes design improvements in stage 2 of the 3GPP-

12

comma free CSD beyond those proposed by Li et al. [4]. The 3GPP-comma free CSD proposed in this thesis uses a Fast Hadamard Transformer (FHT) in stage 2 that achieves lower hardware complexity and faster decoding. Furthermore, masking functions are used in stage 3 of both the Improved CSD and the 3GPP-comma free CSD to reduce the number of scrambling code generators required as described in previous work [4]. This results in a reduction in the ROM size required to store the initial phases of the scrambling code generators in stage 3. The Improved CSD proposed in this thesis aims to achieve faster synchronization between the MS and the BS and thus improves system performance. The experiments carried out using accumulation over multiple slots in stage 1 indicate that for an additive white gaussian noise (AWGN) channel in a high signal-to-noise ratio the Improved CSD achieves faster synchronization with the BS and has lower hardware utilization when compared with the 3GPP-comma free CSD scheme under the same design constraints.

The thesis is organized as follows. Work done by other research groups and suggestions by the 3GPP working group are presented in Chapter 2. Chapter 3 describes the synchronization channels in W-CDMA cell search and introduces the three step cell search algorithm used in W-CDMA for synchronization between the MS and the BS. Chapter 4 describes the Improved cell search design using cyclic codes proposed as a means of achieving faster synchronization. Chapter 5 discusses the 3GPP cell search design using comma free codes. Chapter 6 presents the experimental method and results of the comparison of the two cell search algorithms on a Xilinx Virtex-E XCV1000E FPGA. Chapter 7 is a summary, discussion, and an overview of future directions of this research.

Chapter 2

Background

Cell search design is critical as it impacts the system performance and there is a need to design efcient receiver structures and algorithms to reduce the cell search time. This Chapter summarizes efforts by research groups and the 3GPP working groups to design efcient schemes and algorithms for each of the three stages of the cell search algorithm.

2.0 Background

Wang et al. proposes a pipelined process to be used in rst three stages of the cell search algorithm [3]. The cell search scenarios considered in their study are (1) initial cell search: when a mobile is switched on and (2) target cell search: during idle and active modes of the MS. Instead of the serial cell search sequentially searching through code, time and frequency, their method rst acquires code and time synchronization assuming a larger frequency error and then performs frequency synchronization [3] [5].

The synchronization code sequences used in stage 1 and stage 2 of the cell search algorithm are made up of bits called "chips" which can be either +1 or -1. The synchronization code sequences are 256 chips in length. If a traditional matched lter is used then a huge adder circuit (256 input adder) will be required to sum up the correlation results. This will

13

14

lead to wastage of hardware resources. Hence, Siemens and Texas Instruments in their working group draft have suggested a hierarchical matched lter design which uses two matched lters to reduce the hardware complexity signicantly [6]. The details of the hierarchical matched lter design will be presented in Chapter 4.

The 3GPP specication uses comma free codes in stage 2 of the cell search algorithm [7] [8]. Nortel networks in their working group proposal have suggested the use of cyclic codes in the SCHs [9]. The use of cyclic codes for generating the synchronization codes will be explained in more detail in Chapter 4. These cyclic codes can reduce hardware utilization and acquisition time if the receiver is properly designed.

To reduce the complexity of searching through all the 512 scrambling codes, the concept of code grouping and group indicator codes (GIC) was introduced [10]. This reduces the cell search time as the scrambling code is identied by rst detecting the code group. Once the code group is detected then the scrambling code used by the cell can be easily identied as there are a limited number of codes in each code group. This reduces the cell search time signicantly. This idea was accepted in the 3GPP specications. To further reduce cell search time, frame boundary synchronization is also achieved in stage 2 after identifying the code group and slot ID [11].

Ericsson in their working group draft have proposed increasing the number of code groups in stage 2 of the cell search [12]. Increasing the number of code groups reduces the number of scrambling codes in a code group. Their proposed scheme uses either 256,

15

128 or 64 code groups in stage 2 of the cell search. They claim that the scheme using 256 code groups is the preferred scheme as it requires only two scrambling code correlators in stage 3 of initial cell search and achieves reduced hardware complexity.

In stage 2 of the 3GPP-comma free CSD presented in this thesis, a FHT design is proposed in replacement to the Golay correlator presented by Li et al. [4]. A FHT provides an efcient technique to detect the code group and slot ID in stage 2. Previous FHT designs [13] and [14] utilize a lot of hardware resources, hence, a fast and efcient Hadamard transformer is needed to reduce the hardware utilization and to perform faster decoding. A compact and efcient FHT design will also draw less power from the handset.

Siemens in their working group draft have suggested the use of masking functions in stage 3 to reduce the design complexity for generating the scrambling codes in parallel [15]. The use of masking functions reduces the number of scrambling code generators required to generate the codes in parallel. Any masking function can be selected by the designer as long as they generate codes with minimum overlap. The use of masking functions reduces the hardware signicantly as compared to the previous design by Li et al. [4].

Li et al. have designed an application specic integrated circuit (ASIC) for performing cell search in W-CDMA systems [4]. In stage 1 and stage 2 of their cell search design the authors use a correlator structure to detect the code group and slot ID. The correlator structure used is a Golay correlator [16]. In stage 3 of the cell search algorithm, 16 scram-

16

bling code generators are used for generating the codes in parallel.

In summary, most of the literature found in this area have presented simulation results of their algorithms and have not investigated the hardware complexity of their design schemes except the work presented by Li et al. [4]. The designs used by the mobile manufacturers is company proprietary and there are very few documents which describe their actual design schemes. It is critical to consider a practical hardware implementation of the cell search algorithm especially because chip area and power utilization are the two most important factors in a mobile handset.

Chapter 3

Cell Search Algorithm


3.0 Cell Search Algorithm

This Chapter describes the synchronization channels in W-CDMA cell search and introduces the cell search algorithm used in the synchronization of the MS with the BS for WCDMA systems.

3.1 Synchronization Channels in W-CDMA

In CDMA systems, spreading codes are used to differentiate physical channels from the same transmitter, and scrambling codes are used to differentiate transmitters. The MS needs to achieve code and time synchronization with the BS before any communication with the BS can start. The process of searching for a code and achieving synchronization with the BS is called cell search. Cell search is performed in two scenarios: when a MS is switched on (initial cell search) and during active or idle mode (target cell search). Target cell search is used to nd handover candidates during a call. Cell search design is important and needs to be completed in minimum delay as it impacts the system performance.

Each cell in a CDMA system is identied by its downlink scrambling code which is of length 38,400 chips. The 38,400 chips form a radio frame which is divided into 15 slots.

17

18

Each slot in the radio frame is of 2,560 chips [7].

256 chips (0.067 msec)

P-SCH
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

S-SCH

CPICH

2,560 chips (0.67 msec)

10 CPICH Symbols

38,400 chips One Frame = 15 slots (10 msec)

Figure 2: Synchronization Channels in Cell Search

Figure 2 shows the slot and frame structure of the three synchronization channels used in cell search: the Primary-Synchronization Channel (P-SCH), Secondary-Synchronization Channel (S-SCH) and the Common Pilot Channel (CPICH) [7] [17]. The P-SCH together with the S-SCH are also called Synchronization Channel (SCH). In the P-SCH, a 256 chip sequence is transmitted at the start of each slot. The same P-SCH sequence is used by all the BSs and is transmitted once every slot. As the same sequence is used by all the transmitting stations, only one matched lter is sufcient to detect the slot boundary value. To reduce the complexity of the matched lter implementation, a hierarchical scheme is used as will be explained in detail in Chapter 4. The S-SCH is used for carrying 15 different sequences, one in each slot, for the different code groups and is repeated after every frame. These sequences are used in identifying the code group. The CPICH is used

19

to carry the downlink common pilot symbols scrambled by the scrambling code of the BS. Each slot of this channel is divided into 10 symbols, each of 256 chips in length.

To reduce the complexity of synchronizing to the BSs in W-CDMA, the concept of code grouping and the use of code group indicator codes (GIC) were introduced [10]. The 512 scrambling codes used in W-CDMA are divided into code groups. After the code group is identied then only the scrambling code used by the cell needs to be detected. The number of possible scrambling codes from which one code needs to be identied depends on how many code groups are selected in stage 2 of the design. For example, if 32 code groups are used in stage 2 then the number of scrambling codes in stage 3 are 16. Similarly, if 64 code groups are used then there will be 8 possible scrambling codes. Although, the number of scrambling codes will be xed at 512, the number of code groups can be increased from 32 to 256 [12]. The complexity is further reduced by combining frame synchronization and code group identication in stage 2 of the cell search algorithm [11].

3.2 Cell Search Algorithm

The process of achieving code and time synchronization in the cell search algorithm is divided into three stages (1) slot synchronization, (2) frame synchronization and code group identication, and (3) scrambling code identication [3] [7] [8] [18].

3.2.1 Stage 1: Slot Synchronization

20

During stage 1 of the cell search procedure the MS uses the SCHs Primary Synchronization Code (PSC) to acquire slot synchronization to a cell. This is typically done with a single matched lter matched to the PSC which is common to all cells. The slot timing of the cell can be obtained by detecting peak values in the matched lter output. The starting position of the synchronization code may be determined from observations over one slot duration. However, decisions based on observations over a single slot may be unreliable, when the signal-to-noise ratio (SNR) is low or if fading is severe. Reliable slot synchronization is required to minimize cell search time. In order to increase reliability, observations are made over multiple slots and the results are then combined. This ensures that the correct slot boundary is identied.

3.2.2 Stage 2: Frame Synchronization and Code Group Identication

During stage 2 of the cell search procedure, the MS uses the SCHs Secondary Synchronization Code (SSC) to achieve frame synchronization and identify the code group of the cell found in stage 1. This is done by correlating the received signal with all possible SSC sequences and identifying the maximum correlation value. Since the cyclic shifts of the sequences are unique, the code group as well as the frame synchronization is determined.

3.2.3 Stage 3: Scrambling Code Identication

During stage 3 of the cell search procedure, the MS determines the exact primary scrambling code used by the cell. The primary scrambling code is typically identied through

21

symbol-by-symbol correlation over the CPICH with all codes within the code group identied in stage 2. In this stage, a threshold value is used to decide whether the code has been identied. The threshold value can be predetermined using a parameter called probability of false alarm rate [19].

This three stage cell search algorithm helps in simplifying the synchronization process of the MS with the BS. Each stage and their hardware implementation will be explained in the following Chapters.

Chapter 4

Improved Cell search Design


4.0 Improved Cell Search Design

This Chapter describes the Improved CSD using a set of cyclic codes. The cyclic codes were proposed by Nortel networks to be used on the Secondary SCH [9]. These cyclic codes allow very efcient detection and improves the cell search in terms of acquisition time and hardware utilization. The three stage cell search design and their hardware implementation are explained in Sections 4.1, 4.2 and 4.3.

4.1 Stage 1: Slot Synchronization

The MS rst needs to acquire the PSC which is common to all the BSs. These codes are of length 256 chips. The matched lter output is given by
255

Y =

R jC p j

(1)

j=0

where Rj is the jth sample of the received complex signal, and Cpj is the jth bit of the PSC

Hence, a traditional matched lter implementation would require 256 taps and a large

22

23

adder circuit. This would increase the delay as well as power consumption at the receiver which is not desirable. Thus, a hierarchical structure is proposed for performing the matched lter operations which will need lesser number of taps, reduced circuitry and lower power consumption [6]. The PSC consists of an unmodulated hierarchical sequence of length 256 chips, transmitted once every slot. The PSC is the same for every BS in the system and is transmitted time aligned with the slot boundary. The PSC is chosen to have good auto-correlation properties. This means that when the PSC sequence is correlated with itself, the interference from adjacent BSs is minimized and a high peak value is obtained.

The hierarchical sequences used for generating the PSC are constructed from two constituent sequences X1 and X2 of length n1 and n2, respectively, using the following equation Cp(n)=X1(n mod n2)+X2(n div n1) modulo 2, n=0,1,..,(n1*n2)-1 where n1=n2=16. The constituent sequences X1 and X2 are both dened as: X1=X2=(1,1,-1,-1,-1,-1,1,-1,1,1,-1,1,1,1,-1,1) [9]. (2)

There are different techniques in which the hierarchical matched lter can be designed as shown in Table 1.

Table 1: Hierarchical Matched Filter (16 and 64 chip Accumulation)


16 chip Accumulator Register Taps 16 16 symbol Accumulator 16 64 chip Accumulator 64 4 symbol Accumulator 4

24

Table 1: Hierarchical Matched Filter (16 and 64 chip Accumulation)


16 chip Accumulator Adder Length 16 16 symbol Accumulator 16 64 chip Accumulator 64 4 symbol Accumulator 4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
InData PSCH Code
X X X X X X X X X X X X X X X X

49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
X X X X X X X X X X X X X X X X

Shift Register 1
Adder Tree 1

+ + + + + + + + + + + + + + + +

+ + + + + + + + + + + + + + + +

5 levels of adders
+

1
PSCH Code

64 65
X

128 129
X

192 193
X

256
X

Shift Register 2
Adder Tree 2

+ + +

Result
Figure 3: Hierarchical Matched Filter (64 chip and 4 symbol accumulation)

The hierarchical matched lter consists of two concatenated matched lter blocks. The design using 64 taps is shown in Figure 3. This solution is not ideal because of the following reasons. First, the matched lter design requires 64 taps. Second, the design needs a 64-input adder as shown in Figure 3. A better solution is to use the design shown in Figure 4. Hence, in stage 1 of both the Improved CSD and the 3GPP-comma free CSD the hierarchical matched lter using 16 chip and 16 symbol accumulation is used.

25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

InData PSCH Code


X X X X X X X X X X X X X X X X

Shift Register 1
+ + + + + + + + + + + + + + + +

Adder Tree 1

3 levels of adders
+

1 16 17 32 33 48 49 64 65 80 81 96
PSCH Code
X X X X X X

176 177 192 193 208 209 224 225 240 241 256
X X X X X X

Shift Register 2

+ Adder Tree 2

+
+

3 levels of adders

Result
Figure 4: Hierarchical Matched Filter (16 chip and 16 symbol accumulation)

In this design, the rst matched lter receives the input signals serially from the BS. Correlation over X1 (16 chip accumulation) is performed before correlation over X2 (16 symbol accumulation). However, the two matched lters can be interchanged and the selection is an implementation option. After 16 clock cycles when the shift register 1 is lled, the data stored in the shift register 1 is matched in parallel with the code applied to the taps of the matched lter (tap coefcients). The tap coefcients are the PSC sequences which are the same for all the BSs. Hence, the same matched lter structure can be used for all the BSs. The adder circuit is implemented as a tree structure with the 16 inputs applied in parallel. If the data bits in the shift register 1 match with the tap coefcients then the result of the adder tree will be the highest value possible (16 or greater). The second matched lter has a shift register 2 of size 256 registers. Only 16 taps are needed to

26

match every sixteenth value of the shift register 2. The result from the rst adder tree is stored in the shift register 2 of the second matched lter. After 256 clock cycles the shift register 2 in the second matched lter will be lled with the results from the rst matched lter. The data in the shift register 2 is then matched in parallel with the tap coefcients. The tap coefcients are the same as the PSC sequence. If the data bits match the code sequence then the result of the second adder tree will be 256 or greater in magnitude corresponding to the peak value. An advantage of this scheme is that no multiplier circuit is needed as the correlations can be performed using an adder/subtractor circuit.

Each memory cell in shift register 1 is 4-bits wide assuming that, at the input to the digital receiver, the signal is sampled with a 4-bit analog-to-digital (A/D) convertor. Shift register 2 is 8-bits wide to store the result from the rst adder tree block. For performing the correlation, it is not necessary to perform 16*16 operations but only 16+16 accumulation operations, which leads to a considerable reduction in hardware complexity. The hardware complexity of implementing the hierarchical matched lter is calculated as shown. In one slot period (2,560 chips), the receiver has to perform at least 81,920 complex additions per slot, (2,560*(16+16)). The traditional matched lter implementation without the hierarchical structure would require 256 complex additions. Thus, the hierarchical matched lter achieves a saving of a factor of 8 in terms of complex additions. From Figure 2, each slot has a duration of 0.67 msec (670 sec). The complexity of stage 1 in terms of real additions per second is 245 Madds/sec (8,1920*2/670). The incoming complex signal is divided into two components, the sine part called the "in-phase" (Iphase) and the cosine part called the "quadrature-phase" (Q-phase). The factor of 2 is for

27

the two branches I and Q of the complex signal. Thus, in stage 1 of the initial search, 8,1920 complex additions in 1 slot and computing power of 245 Madds/sec is needed.

Q-Phase

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
InData PSCH Code
X X X X X X X X X X X X X X X X

I-Phase

Shift Register 1
+ + + + + + + + + + + + + + + +

Adder Tree 1

+
Non-Coherent Detection Block
Shift Register 2

3 levels of adders
+

(.)2

1 16 17 32 33 48 49 64 65 80 81 96
PSCH Code
X X X X X X

176 177 192 193 208 209 224 225 240 241 256
X X X X X X

Comparator

+
+

3 levels of adders

Accumulator Slot Boundary Value


Adder Tree 2
Stage 1 Complete

(.) 2

Figure 5: Slot Boundary Detection

There are two such hierarchical matched lters for the I and Q channels of the received complex signal as shown in Figure 5. The correlation results over I and Q channels are combined non-coherently over 1 slot duration and the result is stored in an accumulator which is implemented as a shift register. The output of the accumulator is given to a comparator block to detect the peak value corresponding to the slot boundary of the closest BS and the MS needs to synchronize with this BS. As the code can be affected by AWGN and fading, accumulation over multiple slots is needed to correctly identify the slot boundary. It is important that the slot boundary is correctly identied in order to avoid the cost of increased acquisition time in case the wrong slot boundary is given to stage 2.

28

4.2 Stage 2: Frame Synchronization and Code Group Identication

The Secondary SCH consists of 15 sequences belonging to a family of cyclic codes (SSCs), each of length 256 chips. These SSCs are transmitted repeatedly in parallel with the Primary SCH. The procedure for constructing the cyclic codes is similar to that of the hierarchical sequence (equation 2) for the Primary SCH except that it uses specic sequences of length 16 from Table 2 for each code group.

The procedure for constructing the cyclic hierarchical sequence Csi,1 for slot 1 is exactly the same as constructing the hierarchical sequence Cp for the Primary SCH. The sequence Csi,1 for slot 1 will be referred to as the zero cyclic shift sequence as no shift is applied to the constituent sequence X1i. For slots 2 to 15, the cyclic codes are constructed from the two constituent sequences X1i,k-1 and X2i,k-1 of length n1 and n2 respectively using the following formula Csi,k(n)=X2i,k-1 (n mod n2)+X1i,k-1 (n div n1) modulo 2, n=0,1,..,(n1*n2)-1 where i is code group number, k=2,3,..,15 is slot number, n is chip number in slot, n1=n2=16, and the constituent sequences X1i,k-1 and X2i,k-1 in each code group i are chosen to be the following sequences from Table 2 [9]. (3)

29

Table 2: Sequences X1i and X2i for Code Groups 1 to 32


Code Group 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 1 1-1-1-1 1-1-1 1 1-1 1-1 1 1 1-1 1 1-1 1 1 1-1-1 1 1 1 1 1-1 1 1-1 1-1-1-1 1-1 1-1 1 1-1-1-1 1-1-1-1-1 1-1-1-1-1-1-1 1 1-1 1 1 1 1-1 1 1-1 1-1 1 1-1-1 1-1-1 1-1 1 1 1-1-1-1-1-1 1 1-1-1-1 1 1 1-1 1 1 1 1-1-1 1-1 1-1 1 1 1 1-1-1-1 1-1 1 1-1-1-1-1-1-1 1-1 1 1-1 1-1-1-1 1 1-1 1-1-1 1 1 1 1-1-1-1-1 1-1-1 1 1 1 1-1-1 1-1 -1 1-1-1-1-1-1 1 1 1 1-1 1-1 1-1 -1-1-1 1-1 1-1-1 1-1 1 1 1 1 1 1 1-1-1-1 1-1-1 1-1-1-1 1 1 1 1-1 1 1-1 1 1 1-1-1-1 1-1-1 1-1 1 1 1-1-1-1-1 1 1-1 1 1 1-1 1 1 1-1 1 1-1 1-1-1 1 1 1-1 1 1 1-1 1 1 Code Group 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 1-1 1 1-1 1-1 1 1 1-1 1 1 1-1 1 1 1 1-1-1-1-1-1 1-1-1-1 1-1-1-1 1-1-1-1 1-1-1 1 1 1 1-1-1-1-1 1 1 1-1 1 1 1-1-1 1-1 1 1-1 1-1-1 -1-1-1 1 1-1-1 1 1-1 1-1 1 1-1-1 -1 1-1-1 1 1-1-1 1 1 1 1 1-1-1 1 -1-1 1-1 1-1 1-1-1 1 1-1-1-1-1-1 -1 1 1 1 1 1 1 1-1-1 1 1-1 1-1 1 -1 1 1 1-1-1 1 1 1-1-1-1-1-1-1 1 -1-1 1-1-1 1 1-1 1 1-1 1-1 1-1-1 -1 1 1 1 1 1-1-1 1-1-1-1 1 1 1-1 -1-1 1-1 1-1-1 1 1 1-1 1 1-1 1 1 -1 1-1-1 1 1 1 1 1-1 1 1 1 1-1 1 -1-1-1 1 1-1 1-1 1 1 1-1 1-1-1-1 -1 1 1 1-1-1 1 1-1 1 1 1 1 1 1-1 -1-1 1-1-1 1 1-1-1-1 1-1 1-1 1 1

The constituent sequence X2i,k-1 (inner sequence) is exactly equal to the base sequence X2i in every slot, i.e. X2i,k-1=X2i at all k. The constituent sequence X1i,k-1 (outer sequence) are formed from the base sequence X1i by cyclic right shifts of X1i on k-1 positions (from 0 to 15) clockwise for each slot number k, from 1 to 15. The generation of the cyclic codes can be understood clearly by considering the following example. For the rst code group the sequence is given by X11,0=(1,1,1,-1,-1,-1,1,-1,-1,1,1,-1,1,-1,1,1), k=1 for slot 1, No cyclic shift X11,1=(1,1,1,1,-1,-1,-1,1,-1,-1,1,1,-1,1,-1,1), k=2 for slot 2, cyclic right shift by 1 position X11,14=(1,-1,-1,-1,1,-1,-1,1,1,-1,1,-1,1,1,1,1), k=15 for slot 15, cyclic right shift by 14 positions.

30

The same procedure for forming the cyclic codes will be used for other code groups. Thus, for the 32 codes groups and 15 slots (in one frame), 512 different cyclic codes with a length of 256 chips each are constructed. In other words, each of the 32 code groups has 16 cyclic codes. This set of 512 (32X16) cyclic codes has good correlation properties that make it good candidates for the SSCs. Many pairs of cyclic codes are fully orthogonal as the cross correlation is zero, some pairs have small cross correlation properties. The cross correlation of each cyclic hierarchical sequence Csi,k with Cp code of Primary SCH is small. These 512 cyclic codes are unique for each code group/slot locations pair. Thus, it is possible to uniquely determine both the scrambling code group and the frame timing in the second stage of the initial cell search.

By identifying the code group/slot location pair that gives the maximum correlation value, the code group as well as the frame synchronization is determined. The output from the matched lter is given to a non-coherent block which computes the energy over I and Q channels and then gives the result to the comparator module as shown in Figure 6. One slot search period time (2,560 chips) is enough to uniquely identify the correct code group and the frame timing in the second stage of acquisition when the signal-to-noise ratio is high. This is one major difference with the 3GPP-comma free CSD where at least three slots are necessary to uniquely identify the correct code group and frame timing. The Improved CSD also uses a smaller size ROM 32X16 to store the cyclic codes as compared to the 3GPP-comma free CSD which uses a ROM of size 32X60 to store the comma free codes.

31

Q-Phase

Enable Stage1 Complete

I-Phase

Sampling Counter

Secondary Buffer
Shift Register 1
1 2 3 4 5 6 7 8 9 10111213 14 15 16

256

Slot Boundary Value

5X SysClock
XXXXXXXXXXXXXXXX

Buffer used to ll the Data Register of Matched Filter1

Code Register 1
+ + + + + + + + + + + + + + + + Adder Tree 1

+ + + + + 3 levels adder tree


+
Matched Filter 1

+
Shift Register 2
1 2 3 4 5 6 7 8 9 10111213 14 15 16

5X SysClock
1 2 3

XXXXXXXXXXXXXXXX

Non-coherent Detection Block

Code Register 2 Adder Tree 2 + + + + + + + + + + + + + + + +


+ +

(.) 2

32

+ + + + + 3 levels adder tree


+

+
+

Code Group
Slot ID

Cyclic Codes Rom 32 X 16

Matched Filter 2
(.) 2

Comparator
Stage 2 Complete

Figure 6: Frame synchronization and Code Group Identication

The input data samples for the Secondary SCH are stored in an input buffer with 256 complex memory cells called the Secondary Buffer as shown in Figure 6. These input data samples are produced after waveform matched ltering and sampling at the chip rate. The result from the hierarchical matched lter design is then given to a non-coherent module which is used to calculate the energy over I and Q channels and then give it to a comparator block.

The ROM-stored code sequences given in Table 2 are each tried in sucession before the data from the next slot comes in. The data in the shift register is latched till all these

32

sequences have been correlated. This is achieved in stage 2 of the Improved CSD scheme using two clocks, a slow clock called the system clock in the design and a fast clock which runs at 5X system clock. The sampling is performed at the slow clock rate (system clock). Once the data is latched in the buffer, the fast clock (5X system clock) is used to perform the correlations.

The comparator block gives the highest correlated code group from the Table 2 with the data sequence and also the number of shifts which have been applied to the code group sequence. The number of shifts is the same as the slot ID. From the slot ID the frame boundary can easily be identied because the number of slots in a frame is xed at 15.

4.3 Stage 3: Scrambling Code Identication

After achieving code group and frame synchronization, the scrambling code is identied by correlating the symbols in the CPICH with all possible scrambling codes in the code group. The codes are generated using a scrambling code generator and the descrambling operation is carried out using a descrambler. The details of the scrambling code generator and the descrambler used in stage 3 of the cell search are explained in Sections 4.3.1 and 4.3.2 respectively.

4.3.1 Scrambling Code Generator


Each cell is allocated one and only one primary scrambling code. The scrambling code

33

sequences are constructed by combining two real sequences into a complex sequence [7]. Each of the two real sequences are constructed as the position wise modulo 2 sum of 38,400 chip segments of two binary sequences generated by means of two generator polynomials of degree 18. Let x and y be the two sequences respectively. The resulting sequences constitute segments of a set of Gold sequences. The x sequence is constructed using the primitive polynomial 1+X7+X18. The y sequence is constructed using the polynomial 1+X5+X7+X10+X18. The sequence depending on the chosen scrambling code number n is denoted as zn. Furthermore, let x(i), y(i) and zn(i) denote the ith symbol of the sequence x, y, and zn, respectively. The sequences x and y are constructed as x(i+18)=x(i+7)+x(i) modulo 2, i=0,1,..,218 - 20 (4) (5)

y(i+18)=y(i+10)+y(i+7)+y(i+5)+y(i) modulo 2, i=0,1,..,218 - 20 The nth Gold code sequence zn, n=0,1,..,218 - 2, is then dened as zn(i)=x((i+n) modulo (218 -1))+y(i) modulo 2, i=0,1,..,218- 2 (6)

Finally, the nth complex scrambling code sequence sn is dened as sn(i)=zn(i)+jzn((i+131,072) modulo (218-1)), i=0,1,..,38,399 (7)

The pattern from phase 0 up to the phase of 38,399 is repeated for every radio frame.

34

+
10 9 8 7 6 5 4 3 2 1 0 171615141312 11

I Channel Code

+
10 9 8 7 6 5 4 3 2 1 0 171615141312 11

Q Channel Code

+
Figure 7: Scrambling Code Generator

The scrambling code generator used to generate the long codes is shown in Figure 7. A total of 218 -1=262,143 scrambling codes, numbered 0,1,..,262,142 can be generated using the code generator. However not all the scrambling codes are used. The scrambling codes are divided into 512 sets each of a primary scrambling code and 15 secondary scrambling codes. The primary scrambling codes consist of scrambling codes n=16*i where i=0,1,..,511. The ith set of secondary scrambling codes consists of scrambling codes 16*i+k, where k=1,2,..,15. There is a one-to-one mapping between each primary scrambling code and 15 secondary scrambling codes in a set such that ith primary scrambling code corresponds to ith set of secondary scrambling codes. The set of primary scrambling codes is further divided into 32 scrambling code groups, each consisting of 16 primary scrambling codes. The jth scrambling code group consists of primary scrambling codes 16*16*j+16*k, where j=0,1,..,31 and k=0,1,..,14.

35

In stage 3, 16 scrambling codes need to be generated in parallel. If the scrambling code generator shown in Figure 7 is used to generate the codes then 16 such code generators would be required. However, generating the codes in parallel using 16 code generators could be expensive as a huge ROM would be required to store the initial phases for all the 16 code generators.

... Masking Function for I Channel ...


17

Masking Function for Q Channel

LFSR 1
7

+
Initial Phases for Code generator
1 2

+
+
5

I Channel Code

17

10

+
7

Q Channel Code

...
Masking Function for I Channel

LFSR 2

...

32

Masking Function for Q Channel

ROM 32 X 18
Figure 8: Multiple Scrambling Code Generator

Table 3: Masking Functions used in Stage 3: Scrambling Code Generator


Masking Function For I Channel Code in LFSR 1 Code1 Code2 Code3 Code4 Code5 Code6 Code7 Code8 000000000000000001 000000000000000010 000000000000000100 000000000000001000 000000000000010000 000000000000100000 000000000001000000 000000000010000000 Masking Function For Q Channel Code in LFSR 1 001000000001010000 010000000010100000 100000000101000000 000000001000000001 000000010000000010 000000100000000100 000001000000001000 000010000000010000

36

Table 3: Masking Functions used in Stage 3: Scrambling Code Generator


Masking Function For I Channel Code in LFSR 1 Code9 Code10 Code11 Code12 Code13 Code14 Code15 Code16 000000000100000000 000000001000000000 000000010000000000 000000100000000000 000001000000000000 000010000000000000 000100000000000000 001000000000000000 Masking Function For Q Channel Code in LFSR 1 000100000000100000 001000000001000000 010000000010000000 100000000100000000 000000001010000001 000000010100000010 000000101000000100 000001010000001000

In order to reduce the hardware utilization, in stage 3 of both the designs only one scrambling code generator is used to generate 16 codes in parallel when 32 code groups are used as shown in Figure 8. Sixteen masking functions are used to generate the codes in parallel [15]. Masking functions can generate codes which have minimum overlap and reduce the hardware circuitry to a single scrambling code generator at the expense of a few logic gates. The masking functions used for generating the codes are given in Table 3. Masking function for I and Q Channel Code in linear feedback shift register (LFSR) 2 were kept xed as 000000000000000001 and 001111111101100000. Besides reducing the hardware from 16 code generators to one code generator, the design also reduces the ROM size to 32X18 from the size 512X18 if 16 code generators were used.

4.3.2 Descrambler
Descrambling is carried out using data over the CPICH and the codes generated by the scrambling code generator and masking functions. Counters are used as shown in Figure 9 to keep track of the votes obtained after the descrambling and the comparison operations. After these operations are completed, the nal step is to decide whether cell search

37

has been successful and a code has been found. For this purpose a parameter called probability of false alarm rate (PFA) is used to predene the threshold value (VTH) [19]. The relation can be expressed by the following equation PFA=e-VTH/V (8)

where V is twice the variance of the I and Q components.

If the counter exceeds VTH then the cell search operation is declared a success and the particular long code is identied.

Descrambler 16

Descrambler Output 16
counter 15..16

Multiple Scrambling Code Generator


.

Masking Function for Q Channel

Descrambler 3 Descrambler 2
counter 13..14

I Channel Code

Descrambler 1
counter 11..12

. . . Masking Function for I Channel . . .


I Channel Data X
7
0

counter 10..9

17

+
Q Channel

(.) 2

+
+
Data
Q Channel Code
+ +

I Channel Code
X

counter 7..8

Code
Found

Increment
Counter
counter 5..6

Initial Phases for Code generator


10
7

17

+
Q Channel Code
5

+
0

Q Channel Code
Descrambler

Long Code
counter 3..4

1 2

I Channel X Data

Output1

. . .
+
(.) 2

Masking Function for I Channel


Q Channel

counter 1..2

Threshold

. . .
Data
I Channel Code

X
Descrambler Output 1

Value

32

Masking Function for Q Channel

ROM 32 X 18

First Comparator Block

Second Comparator Block

Figure 9: Scrambling Code Identication

38

Chapter 5

3GPP-comma free Cell Search Design


5.0 3GPP-comma free Cell Search Design

This Chapter discusses stage 2 of the 3GPP cell search design using comma free codes. Stage 1 and stage 3 for the 3GPP-comma free CSD design were kept the same as the Improved CSD to compare stage 2 of both the designs. A Fast Hadamard Transformer (FHT) is proposed to be used in stage 2 of the cell search algorithm. To reduce the hardware utilization of the FHT design, reduced length Walsh sequences are proposed as explained in Section 5.1.

5.1 Stage 2 of 3GPP-comma free Cell Search Design

In CDMA systems, the BS identies each user in a cell by a unique scrambling code. In order to minimize the interference in a cell when two users transmit at the same time, orthogonal (Walsh) codes are used. The Walsh codes are generated using a Walsh-Hadamard function. When these Walsh codes are transmitted by the BS, they are affected by interference, fading and noise which may be AWGN. At the receiver, a decoding logic is required to correctly determine which of the Walsh codes was the most likely to have been sent. A FHT can be used to provide such a decoding circuitry.

The table provided in the 3GPP Specications for the comma free codes is for 64 code

39

40

groups. For comparison with the Improved CSD scheme which uses 32 code groups, only 32 of the possible 64 code groups are used. The 32 secondary SCH sequences are constructed such that their cyclic shifts are unique, i.e., a non-zero cyclic shift less than 15 of any of the 32 sequences is not equivalent to some cyclic shift of any other of the 32 sequences. Also, a non-zero cyclic shift less than 15 of any of the sequences is not equivalent to itself with any other cyclic shift less than 15. Table 4 lists the sequences of SSCs used to encode the 32 different scrambling code groups [7].

Table 4: Allocation of SSCs for Secondary SCH


Scrambling Code Group Group 0 Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Group 8 Group 9 Group 10 Group 11 Group 12 Group 13 Group 14 Group 15 Group 16 Group 17 Group 18 Group 19 Group 20 Group 21 Group 22 Group 23 Group 24 Group 25 Group 26 Group 27 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 2 2 2 3 4 5 6 6 7 7 8 8 9 9 10 11 12 12 15 16 2 2 3 3 4 4 2 2 5 1 3 16 4 11 6 10 13 8 10 12 14 2 15 9 14 12 15 4 3 5 12 6 8 7 13 3 8 16 15 1 6 7 3 6 10 2 5 9 9 10 15 6 11 4 13 5 3 12 10 3 16 2 9 12 4 9 7 5 8 6 4 4 14 4 14 7 16 9 14 15 16 15 13 14 4 7 11 16 15 12 9 5 12 5 10 3 5 6 11 1 10 9 11 2 2 7 4 1 16 2 7 2 7 14 6 9 11 5 16 15 4 7 6 15 14 12 5 15 5 9 10 7 6 4 9 13 15 10 13 6 9 2 3 10 13 3 8 3 14 9 15 7 8 16 16 2 5 5 2 2 13 5 3 15 16 15 7 14 4 10 8 16 13 5 10 3 13 3 11 10 8 10 3 6 5 12 3 11 13 16 5 8 1 5 8 8 10 16 12 14 7 12 8 11 5 13 14 2 5 9 16 10 11 8 1 6 2 9 11 13 3 8 1 5 1 11 5 16 2 8 5 2 8 14 6 9 14 2 10 2 5 2 4 15 2 10 2 13 10 2 16 13 11 10 7 2 8 1 6 14 14 5 12 7 5 5 15 11 7 12 16 4 12 8 12 5 6 9 6 8 5 4 8 4 12 5 13 2 16 7 13 9 9 5 14 5 12 15 14 11 6 16 7 12 14 4 1 6 15 12 10 2 5 13 3 11 10 8 4 3 8 2 15 11 13 13 7 12 15 3 11 6 9 1 1 14 4 2 4 5 16 12 3 15 8 11 2 10 13 9 12 8 16 7 14 16 10 12 7 2 8 3 13 16 10 5 2 8 4 9 3 14 6 11 13 11 15 8 14 7 12 16 4

41

Table 4: Allocation of SSCs for Secondary SCH


Scrambling Code Group Group 28 Group 29 Group 30 Group 31 0 2 2 2 2 1 5 5 6 6 2 9 11 2 9 3 9 7 13 7 4 3 2 3 7 5 12 11 3 16 6 8 9 12 13 7 14 4 9 3 8 15 16 7 12 9 12 7 16 2 10 14 16 6 13 11 5 9 9 12 12 3 14 16 9 13 2 14 13 16 14 15 4 12 6

The 16 SSCs, (Cssc,1,..,Cssc,16), are complex-valued with identical real and imaginary components, and are constructed from position wise multiplication of a Hadamard sequence and a sequence z, dened as z=(b,b,b,-b,b,b,-b,-b,b,-b,b,-b,-b,-b,-b,-b), where b=(1,1,1,1,1,1,-1,-1,1,-1,1,-1,1,-1,-1,1). The Hadamard sequence is obtained from one of the rows of a Hadamard matrix which consists of +1 and -1. The rows and columns of the Hadamard matrix have the property that they are mutually orthogonal. The following examples show how to construct a Hadamard matrix H2 = 1 1 1 1 1 H4 = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

In general the Hadamard matrix can be dened recursively as HN = HN HN H N H N (9)

where HN is a matrix of size N X N. If a vector X with length N is an input then a vector Y obtained as a result of the Hadamard transform is equal to Y=HN*X (10)

42

The entries in Table 4 denote what SSC to use in the different slots for the different scrambling code groups, e.g. the entry "5" means that SSC Cssc,5 shall be used for the corresponding scrambling code group and slot. The kth SSC, Cssc,k k=1,2,..,16 can be calculated using the following expression: Cssc,k=(1+j)(Hm(0)z(0),Hm(1)z(1),Hm(2)z(2),..,Hm(255)z(255)) where m=16(k-1) (11)

As each element of the Hadamard matrix is either +1 or -1, the multiplication operation used in equation 11 can be reduced to a series of addition/subtraction operations. In general, for a N-point input sample, the FHT algorithm needs to perform Nlog2N addition and subtraction operations.

Upper Input Terminal Lower Input Terminal

+ +

0 1

1 2

En

+ Enable

Output to Next Stage


1 0

1 2

of FHT

Figure 10: Individual Stage of FHT

Figure 10 shows an individual stage of the FHT. Each stage has an upper and a lower input terminal. The upper input terminal is congured to receive multiple input signals which are either Walsh chips (if the stage is the rst stage of the FHT) or intermediate correlation coefcients (if the stage is not the rst stage of the FHT). If an input of N-Walsh chips is to be processed then the upper input terminal receives N/2 input signal bits and the lower input terminal receives the other N/2 input bits.

Enable Stage1 Complete

Sampling Counter

Buffer
1 2 3 4 5 6 7 8 9 10 111213 14 15 16

Slot Boundary Value from Stage1

Data to FHT

Phase 2
1

Adder
1

Phase 3
0
1 2
0

Phase 4
1

Phase 5

Phase 1 Shift Register + 0 123 4 + 1

+ +
0
1
1

En En En
1 2
1 0
1 0

+ +
0

+ +
1 0

Register to Store Hadamard Row Ids


1 2 3

+ 1 23 4

+ -

Input Data Bits from Buffer

+ -

Adder/Subtractor

+ Comparator
Slot1 Slot2 Slot3

Detector

MSB (C2)

Code Group Slot ID

3 Bit Counter LSB (C0)

Slot15
1 1 1 2 2 1 1 5 3 1 2 1
16 10 12

Hadamard Code Metrics


((y1+y2)+(y3+y4))+((y5+y6)+(y7+y8)) ((y1+y2)+(y3+y4))-((y5+y6)+(y7+y8)) ((y1+y2)+(y3+y4))+((y5+y6)+(y7+y8))+((y9+y10)+(y11+y12)) +((y13+y14)+(y15+y16)) ((y1+y2)+(y3+y4))+((y5+y6)+(y7+y8))-((y9+y10)+(y11+y12)) +((y13+y14)+(y15+y16)) ((y1-y2)-(y3-y4))-((y5-y6)-(y7-y8)) +((y9-y10)-(y11-12))-((y13-y14)+(y15-y16)) ((y1-y2)-(y3-y4))-((y5-y6)-(y7-y8)) -((y9-y10)-(y11-12))-((y13-y14)+(y15-y16))

y1 y2

y1+y2 y1-y2

(y1+y2)+(y3+y4) (y1+y2)-(y3+y4)

y15 y15+y16 (y13-y14)+(y15-y16) ((y9-y10)-(y11-y12))+((y13-y14)+(y15-y16)) y16 y15-y16 (y13-y14)- (y15-y16) ((y9-y10)-(y11-12))-((y13-y14)+(y15-y16))

Input Phase 1

Phase 2

Phase 3

Phase 4

32 2 6 9

Comma Free Codes ROM 32X60


Table 4 3GPP 25.213 v4.0

Figure 11: 16 chip FHT

43

44

Table 5: Timing Diagram of Inputs to FHT


Phase 1 Upper Input Phase 1 Lower Input Phase 2 Upper Input Phase 2 Lower Input Phase 3 Upper Input Phase 3 Lower Input Phase 4 Upper Input Phase 4 Lower Input 0 8 1 9 2 10 3 11 4 12 0 4 5 13 1 5 6 14 2 6 0 2 7 15 3 7 1 3 0 1

Figure 11 shows the design for a FHT structure which is used for decoding a 16 chip sequence. The design proposed is a very compact and efcient implementation as compared to previous designs [13] [14]. The inputs to the FHT are applied according to the timing diagram as shown in Table 5. The inputs are applied in a non-sequential order and hence a buffer is required to initially store the vectors before passing them to the FHT structure. If a 16 chip sequence needs to be decoded then a buffer of length 16 registers is required to initially store the vectors. The addition and subtraction operations in the FHT algorithm are used to generate correlation coefcients for the received Walsh code. The correlation coefcients express the likelihood that a received codeword is the correct Walsh code.

Figure 12: Hadamard Code Metrics (Buttery Operation)


Phase 2 Phase 4 ((y1+y2)+(y3+y4))+((y5+y6)+(y7+y8))+((y9+y10)+(y11+y12))+((y13+y14)+(y15+y16)) ((y1+y2)+(y3+y4))+((y5+y6)+(y7+y8))-((y9+y10)+(y11+y12))+((y13+y14)+(y15+y16)) ((y1+y2)+(y3+y4))-((y5+y6)+(y7+y8))+((y9+y10)+(y11+y12))-((y13+y14)+(y15+y16)) ((y1+y2)+(y3+y4))-((y5+y6)+(y7+y8))-((y9+y10)+(y11+y12))-((y13+y14)+(y15+y16)) ((y1+y2)-(y3+y4))+((y5+y6)-(y7+y8))+((y9+y10)-(y11+y12))+((y13+y14)-(y15+y16)) ((y1+y2)-(y3+y4))+((y5+y6)-(y7+y8)-((y9+y10)-(y11+y12))+((y13+y14)-(y15+y16)) ((y1+y2)-(y3+y4))-((y5+y6)-(y7+y8)+((y9+y10)-(y11+y12))-((y13+y14)-(y15+y16)) ((y1+y2)-(y3+y4))-((y5+y6)-(y7+y8))-((y9+y10)-(y11+y12))-((y13+y14)-(y15+y16)) ((y1-y2)+(y3-y4))+((y5-y6)+(y7-y8))+((y9-y10)+(y11-y12))+((y13-y14)+(y15-y16)) ((y1-y2)+(y3-y4))+((y5-y6)+(y7-y8))-((y9-y10)+(y11-y12))+((y13-y14)+(y15-y16)) ((y1-y2)+(y3-y4))-((y5-y6)+(y7-y8))+((y9-y10)+(y11-y12))-((y13-y14)+(y15-y16)) ((y1-y2)+(y3-y4))-((y5-y6)+(y7-y8))-((y9-y10)+(y11-y12))-((y13-y14)+(y15-y16)) ((y1-y2)-(y3-y4))+((y5-y6)-(y7-y8))+((y9-y10)-(y11-y12))+((y13-y14)-(y15-y16)) ((y1-y2)-(y3-y4))+((y5-y6)-(y7-y8))-((y9-y10)-(y11-y12))+((y13-y14)-(y15-y16)) ((y1-y2)-(y3-y4))-((y5-y6)-(y7-y8))+((y9-y10)-(y11-y12))-((y13-y14)-(y15-y16)) ((y1-y2)-(y3-y4))-((y5-y6)-(y7-y8))-((y9-y10)-(y11-y12))-((y13-y14)-(y15-y16)) (y1+y2)+(y3+y4) (y1+y2)-(y3+y4) (y1-y2)+(y3-y4) (y1-y2)-(y3-y4) (y5+y6)+(y7+y8) (y5+y6)-(y7+y8) (y5-y6)+(y7-y8) (y5-y6)-(y7-y8) (y9+y10)+(y11+y12) (y9+y10)-(y11+y12) (y9-y10)+(y11-y12) (y9-y10)-(y11-y12) (y13+y14)+(y15+y16) (y13+y14)-(y15+y16) (y13-y14)+(y15-y16) (y13-y14)-(y15-y16) ((y9+y10)-(y11+y12))+((y13+y14)-(y15+y16)) ((y9+y10)-(y11+y12))-((y13+y14)-(y15+y16)) ((y9-y10)+(y11-y12))+((y13-y14)+(y15-y16)) ((y9-y10)+(y11-y12))-((y13-y14)+(y15-y16)) ((y9-y10)-(y11-y12))+((y13-y14)+(y15-y16)) ((y9-y10)-(y11-12))-((y13-y14)+(y15-y16)) ((y9+y10)+(y11+y12))-((y13+y14)+(y15+y16)) ((y9+y10)+(y11+y12))+((y13+y14)+(y15+y16)) ((y1-y2)-(y3-y4))-((y5-y6)-(y7-y8)) ((y1-y2)-(y3-y4))+((y5-y6)-(y7-y8)) ((y1-y2)+(y3-y4))-((y5-y6)+(y7-y8)) ((y1-y2)+(y3-y4))+((y5-y6)+(y7-y8)) ((y1+y2)-(y3+y4))-((y5+y6)+(y7+y8)) ((y1+y2)-(y3+y4))+((y5+y6)-(y7+y8)) ((y1+y2)+(y3+y4))-((y5+y6)+(y7+y8)) ((y1+y2)+(y3+y4))+((y5+y6)+(y7+y8)) Phase 3

Input

Phase 1

y1

y1+y2

y2

y1-y2

y3

y3+y4

y4

y3-y4

y5

y5+y6

y6

y5-y6

y7

y7+y8

y8

y7-y8

y9

y9+y10

y10

y9-y10

y11

y11+y12

y12

y11-y12

y13

y13+y14

y14

y13-y14

y15

y15+y16

y16

y15-y16

45

46

The correlation coefcients are also called the Hadamard code metrics and are generated as shown in Figure 12 for a 16-point FHT. This operation is also called the buttery operation. The buttery operation is also used in other digital signal processing (DSP) applications such as calculating the discrete fourier transform (DFT). The Walsh code having the largest metric is then selected as the most likely code that will be transmitted.

It is the job of the detector to nd which of the code groups and slot ID is being used from the table provided in the 3GPP specications [7], using the three Hadamard rows (Walsh codes). The detector needs to identify the code group in the minimum amount of time which uses a lot of hardware resources. Also, if the correct sequence of Hadamard rows is not identied and given to the detector then it can lead to wastage of additional clock cycles as it will try to nd the sequence from the table provided in the 3GPP specications. The detection circuitry is used to locate the sequence from the table and hence nd the code group and slot ID. Also, in the 3GPP-comma free CSD implementation, two clocks are not needed. Even if two clocks are used, a marginal gain will be achieved only in the detection phase 5 as shown in Figure 11. This is due to the fact that detection of the code group and slot ID cannot start till at least three slots have been identied by phases 1 - 4.

There are a number of stages in the FHT design depending on the length of the Walsh sequence. Each subsequent stage receives an input from the previous stage in half the number of clock cycles required for the previous stage. This is achieved by reducing the length of shift register by a factor of two for each subsequent stage of the FHT.

47

A counter is used as a clock to determine the time interval at which each successive pair of input signals is received by the FHT. The upper shift registers in each of the stages are always enabled whereas the lower shift registers are enabled by the bits of the counter. The length of the counter register is dependent on how many stages are there in the FHT. The counter bit C0 is the LSB and C2 is the MSB. Counter bit C2 is alternately high for four clock cycles and then goes low for four clock cycles (000...011, 100...111). The bit C0 is alternately high and low for each clock cycle (000,001,...etc.). The number of bits in the counter depend on the number of stages, which in turn depends on the length of WalshHadamard sequence to be used. If there are N Walsh chips then the counter length must be log2N bits. The length of the shift register in each of the stage s of the design is given by the following relation (N/4)/2s. For example the length of the shift registers used in the rst stage of the FHT is (16/4)/20=4. Similarly, the length of registers used in other stages can be calculated.

In the rst stage, the input signals corresponding to Walsh chips 0 to 7 arrive at the upper adder whereas the Walsh chips from 8 to 15 are applied to the adder/subtractor circuit in the lower half of stage 1. During the rst four clock cycles, the data bits from the adder unit are selected by the multiplexer 1 in stage 1. The lower shift register of stage 1 is enabled to store the outputs from the adder/subtractor unit. Thus at the end of four clock cycles, the upper shift register stores the result of addition of the rst four pairs whereas the lower shift register stores the result of subtraction. In the fth clock cycle, C2 goes high which disables the lower shift register in stage 1. The result of the upper shift register in stage 1 and the adder output from stage 1, which gives the addition of a new

48

pair of inputs, is then passed onto the adder and adder/subtractor unit in stage 2. Thus, each subsequent stage receives its input from the previous stage. This process is then repeated for each of the other stages in the FHT. At the end of eight clock cycles, all of the 16 correlation coefcients are generated and the largest coefcient is selected as the most likely Walsh-Hadamard codeword to have been transmitted. The design is exible and can be easily modied to incorporate any chip sequence which has a length of a power of two.

5.2 Reduced Length FHT Design

If the 256X256 matrix is observed carefully then it is noticed that the 256 chip sequence can be identied by 16 chip sequences shown in Table 6.

Table 6: Reduced Length Walsh Sequences (256 chip sequence to 16 chip sequence)
Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 3 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 4 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 5 1 1 1 1 -1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1 6 1 -1 1 -1 -1 1 -1 1 1 -1 1 -1 -1 1 -1 1 7 1 1 -1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1 1 1 8 1 -1 -1 1 -1 1 1 -1 1 -1 -1 1 -1 1 1 -1 9 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 10 1 -1 1 -1 1 -1 1 -1 -1 1 -1 1 -1 1 -1 1 11 1 1 -1 -1 1 1 -1 -1 -1 -1 1 1 -1 -1 1 1 12 1 -1 -1 1 1 -1 -1 1 -1 1 1 -1 -1 1 1 -1 13 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 14 1 -1 1 -1 -1 1 -1 1 -1 1 -1 1 1 -1 1 -1 15 1 1 -1 -1 -1 -1 1 1 -1 -1 1 1 1 1 -1 -1 16 1 -1 -1 1 -1 1 1 -1 -1 1 1 -1 1 -1 -1 1

49

Thus in a CDMA receiver, only the rst 16 chips of the entire Walsh sequence can be used. The buffer, which is used to store the input value, will also be reduced in length from 256 to 16 registers. The proposed design ideas lead to considerable savings in hardware resources. The reduced length Walsh sequence helps in achieving faster decoding. The two designs were synthesized and the hardware resources utilized were compared on a Xilinx Virtex-E XCV1000E FPGA.

Chapter 6

Experimental Method and Results


6.0 Experimental Method and Results

This Chapter explains the method used to measure the acquisition time for both of the cell search designs, Improved CSD and the 3GPP-comma free CSD. Section 6.1.1 provides details of the FPGA used for prototyping the algorithms and for comparing the hardware specications of both designs. Section 6.2 presents the results of the acquisition time measure and the hardware comparison. Section 6.2 also compares the hardware utilization of the FHT design using 256 and 16 chip sequences.

6.1 Experimental Method

The acquisition time was measured by counting the number of clock cycles used by the RTL simulation. The input chip rate is given by the 3GPP specications and this gives the acquisition time measure. For comparing the hardware specications and the maximum frequency of operation of both designs on the FPGA, the Xilinx Foundation ISE software was used to generate the bit map le for programming the FPGA. The details of the FPGA and the design process used for the hardware comparison are explained in Section 6.1.1.

50

51

6.1.1 FPGA Design Process


The FPGA used for prototyping the designs is a Xilinx Virtex-E XCV1000E BG560 with a speed grade of 6. As the name suggests, FPGAs are capable of being recongured to implement any desired digital circuit. This is made possible by having a large number of small congurable logic blocks (CLB) and a connection mechanism between these blocks which is used to interconnect the CLBs according to the design. The basic building block of the Virtex-E CLB is the logic cell (LC). Each Virtex-E CLB contains four LCs, organized in two similar slices, as shown in Figure 13 [20]. A LC includes a 4-input function generator, carry logic, and a storage element. Virtex-E function generators are implemented as 4-input look-up tables (LUTs). Along with the LUTs the CLB also contains D ip-ops for storing data. The output from the function generator in each LC drives both the CLB output and the D input of the ip-op. The block diagram of a 2-Slice Xilinx Virtex-E CLB is as shown in Figure 13. The detailed view of a Virtex-E Slice is shown in Figure 14 [20].

52

Figure 13: 2-Slice Virtex-E CLB

Figure 14: Detailed View of Virtex-E Slice

53

The entire design was coded in Verilog at the Register Transfer Level (RTL). The RTL design was then synthesized using the Synopsys FPGA Express synthesis tool available with the Foundation ISE software. The bit map generated was then used to program the FPGA using the JTAG cable.

6.2 Experimental Results

To compare the acquisition time between the Improved CSD and the 3GPP-comma free CSD, experiments were carried out using input vectors generated in Matlab. Threshold values determined for the two probabilities of false alarm rates (PFA=10-3 and PFA=10-4) were 28 and 37 respectively. The number of clock cycles between the start of the system and the point when the counter in stage 3 exceeds the computed threshold values was determined. The equivalent gate count and maximum frequency of operation were compared for both the designs using a 256 chip sequence in stage 2 and the same design constraints in the FPGA Express synthesis tool on a Xilinx Virtex-E XCV1000E FPGA.

From the experiments conducted, it was observed that the Improved CSD uses fewer number of slots to achieve synchronization as compared to the 3GPP-comma free CSD in stage 2. The results obtained indicate that when averaging is carried out over 15 slots in stage 1 of both the designs (PFA1=10-3 and VTH1=28), the Improved CSD has an acquisition time of 13.66 msec as compared to 14.53 msec for the 3GPP-comma free CSD. Thus, the Improved CSD achieves an improvement of 0.87 msec for an AWGN channel (Figure

54

15). Similarly, an improvement of 0.87 msec was observed when PFA2=10-4 and VTH2=37. Figures 15 and 16 show the acquisition time measures for 2,4,8 and 15 slots in stage 1 of the design. The number of slots in the other stages, as discussed in previous Chapters, were kept xed as 1 slot in stage 2 of the Improved CSD and three slots in 3GPP-comma free CSD and 15 slots in stage 3 of both designs.

55

Acquisition Time Measures: Quantization 4 Input Data Bits 16

14

12

Acquisition Time (in msec)

10

Improved CSD 3GPPcomma free CSD

10

12

14

16

Number of Slots in Stage1

Figure 15: Comparison of Improved CSD and 3GPP-comma free CSD PFA=10-3
Acquisition Time Measures: Quantization 4 Input Data Bits 16

14

Acquisition Time (in msec)

12

10

Improved CSD 3GPPcomma free CSD

10

12

14

16

Number of Slots in Stage 1

Figure 16: Comparison of Improved CSD and 3GPP-comma free CSD PFA=10-4

56

Table 7: Hardware Specications of System: Quantization 4 Input Data Bits


FPGA XCV 1000E BG560 Speed Grade 6 Improved CSD 3GPP-comma free CSD Number of Slice Registers 9086 10141 Number of 4 Input LUTs 7354 7777 Equivalent Gate Count 136297 144180 Max. Frequency of Operation (Post Route Timing) 22.066 MHz 12.887 MHz

As seen from Table 7, the Improved CSD had a lower equivalent gate count (136,297) and a higher maximum frequency of operation (22.066 MHz) on a Xilinx Virtex-E XCV1000E FPGA as compared to the 3GPP-comma free CSD when the same constraints were used in the synthesis of both the designs.

Table 8: Hardware Specications of FHT: 16 and 256 chip sequence


FPGA XCV 1000E BG560 Speed Grade 6 FHT 16 chips FHT 256 chips Number of Slice Registers 71 1070 Number of 4 Input LUTs 173 1370 Equivalent Gate Count 1591 17,191 Max. Frequency of Operation (Post Route Timing) 35.769 MHz 16.025 MHz

In the FHT design, the input Walsh sequence length can be reduced from 256 chips to 16 chips to reduce the hardware utilization. The proposed idea leads to considerable savings in hardware resources. The buffer, which is used to store the input value, is reduced in length from 256 to 16 registers. The reduced length Walsh sequence helps in achieving faster decoding. The FHT designs using 16 and 256 chip sequences were synthesized and the hardware resources utilized were compared using a Xilinx Virtex-E XCV1000E FPGA. The hardware utilization for both the FHT designs are compared in Table 8.

The results of the reduced length sequence indicate that the FHT design, using 16 chip sequence, achieves 90% reduction in hardware resources (equivalent gate count) as compared to the design which uses 256 chip sequence. Also, the maximum frequency of oper-

57

ation of the 16 chip FHT (35.679 MHz) is more than double that of the 256 chip FHT (16.025 MHz).

Chapter 7

Summary, Conclusions and Future Work


7.0 Summary, Conclusions and Future Work

In this Chapter the conclusions drawn form the experimental results are summarized and the scope for future work is outlined.

7.1 Summary

In Chapter 2, we discussed some of the previous work done by other research groups and also the 3GPP working group suggestions. Chapter 3 introduced the cell search algorithm, which is divided into three stages to simplify the synchronization between the MS and the BS. Chapter 4 discussed the Improved CSD which is the proposed design scheme to perform initial cell search. The hierarchical matched lter design proposed by Siemens and Texas Instruments was used in stage 1 of both the cell search designs [6]. In stage 2 of the initial cell search algorithm, two possible design schemes were compared: the Improved CSD which uses cyclic codes and the 3GPP-comma free CSD using the comma free codes. The details of the Improved CSD are described in Chapter 4. In stage 3 of both the cell search designs, masking functions are proposed to reduce the hardware utilization as compared to the previous design described by Li et al. [4]. Chapter 5 described the 3GPP-comma free CSD using a FHT design in stage 2 of the cell search algorithm. Further design improvements are suggested in the FHT design by reducing the length of

58

59

the input Walsh sequence from 256 chips to 16 chip sequences. Chapter 6 discussed the experimental method and presented the results in terms of acquisition time and hardware utilization for both the Improved CSD and the 3GPP-comma free CSD. The hardware utilization of the FHT design using 256 chip sequences and the reduced length (16 chip sequences) are also presented.

7.2 Conclusions

For an AWGN channel model in a high signal-to-noise ratio environment, it was found that accumulation over one slot in the Improved CSD scheme and accumulation over three slots in the 3GPP-comma free CSD scheme in stage 2 of the cell search algorithm gives correct code group and slot boundary identication. Due to the reduction in the required number of slots, the Improved CSD uses lesser number of clock cycles in stage 2 as compared to the 3GPP-comma free CSD to detect the code group and slot ID. This reduction in the number of clock cycles leads to faster acquisition, fewer calls getting dropped and lower power consumption during the synchronization between the MS and the BS. The use of cyclic codes in the Improved CSD has lower hardware utilization and a higher maximum frequency of operation as compared to the 3GPP-comma free CSD. In conclusion, the Improved CSD is a better cell search design in comparison to the 3GPP-comma free CSD since it has faster acquisition time and lower hardware utilization.

60

7.3 Future Work

This thesis investigates code and time synchronization of the cell search algorithm. In addition to code and time synchronization, frequency synchronization between the MS and the BS needs to be achieved. The receiver design presented in this thesis would need to include another module to achieve frequency synchronization. Also, the cell search considered in this thesis is initial cell search. There is another cell search called target cell search which needs to be performed during a call and when a MS is in motion and moves from one cell to another. VLSI implementations to perform target cell search efciently need to be investigated.

Kiessling et al. [21] suggest performance enhancements to W-CDMA initial cell search algorithm. The authors consider the advantages of oversampling and passing multiple candidates in the cell search stages instead of one candidate to reduce the cell search time. Passing multiple candidates in each of the stages will reduce the cell search time but increase the design complexity and hardware utilization. The improved cell search time at the expense of increased hardware utilization needs to be studied.

The results presented in this thesis show that for an AWGN channel model in a high signal-to-noise ratio environment, the Improved CSD achieves faster synchronization at lower hardware complexity in comparison to the 3GPP-comma free CSD. Future work needs to investigate how the Improved CSD compares with the 3GPP-comma free CSD under multipath channel conditions.

References

8.0 References

[1] [2]

T. Ojanper and R. Prasad, An overview of air interface multiple access for IMT-2000/UMTS, IEEE Commun. Mag., vol. 36, pp. 82-95, Sept. 1998. E. Dahlman, P. Beming, J. Knutsson, F. Ovesj, M. Persson, and C. Roobol, WCDMA The radio interface for future mobile multimedia communications, IEEE Trans. Veh. Technol., vol. 47, pp. 11051118, Nov. 1998.

[3] [4] [5] [6]

Yi-Pin Eric Wang and Tony Ottosson, Cell Search in W-CDMA, IEEE J. Select. Areas in Commun., vol. 18, no. 8, pp. 1470-1482, August 2000. Chi-Fang Li, Wern-Ho Sheen, Ho, J.J.-S.,Yuan-Sun Chu, ASIC design for cell search in 3GPP W-CDMA, Fall. IEEE VTC 2001, vol 3, pp. 1383-1387. R. L. Peterson, R. E. Ziemer, and D. E. Borth, Introduction to Spread Spectrum Communication, Englewood Cliffs, NJ, Prentice-Hall, 1995. Siemens and Texas Instruments, Generalised Hierachical Golay Sequence for PSC with low complexity correlation using pruned efficient Golay correlators, TSG-RAN Working Group1 Meeting 5, TSGR1-554/99.

[7] [8] [9]

3GPP RAN TS 25.213 v4.0.0 (2001-03) Technical specification group radio access network: spreading and modulation (FDD), www.3GPP.org, Release 4. 3GPP RAN TS 25.214 v4.0.0 (2001-03) Technical specification group radio access network: Physical layer procedures (FDD), www.3GPP.org, Release 4. Nortel Networks, Synchronization Channel with cyclic hierarchical sequences, TSG-RAN Working Group1 Meeting 2, TSGR1#2(99)090.

[10] K. Higuchi, M. Sawahashi and F. Adachi, Fast cell search algorithm in DS-CDMA mobile using long spreading codes, in Proc. IEEE 1997 Veh. Technol. Conf., Phoenix, AZ, May 1997, pp. 1430-1434. [11] Nystrom, K. Jamal, Y.-P. E. Wang, and R. Esmailzadeh, Comparison of cell search methods for asynchronous wideband CDMA cellular system, in Proc. IEEE, Int. Conf. Universal Personal Commun., Florence, Italy, Oct. 1998. [12] Ericsson, New downlink scrambling code grouping scheme for UTRA/FDD, TSG-RAN Working Group1 Meeting 6, TSGR1#6(99)884. [13] A. Amira, A. Bouridane, P. Milligan and M. Roula, Novel FPGA implementations of Walsh-Hadamard transforms for signal processing, in Proc. IEE Vision, Image and Signal Processing, Dec. 2001, vol. 148, no. 6, pp. 377-383. [14] S.S. Nayak and P.K. Meher, High throughput VLSI implementation of discrete orthogonal transforms using bit-level vector-matrix multiplier, IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, Volume 46, Issue 5, May 1999, pp. 655-658.

61

62

[15] Siemens, A modified generator for Multiple-Scrambling Codes, TSG-RAN Working Group1 Meeting 7, TSGR1#7(99)B87. [16] B.M. Popovic, Efficient Golay correlator , IEEE Electronics Letters, vol. 35, no.17, Aug. 1999, pp. 1427 -1428. [17] 3GPP RAN TS 25.211 v4.0.0 (2001-03) Technical specification group radio access network: Physical channels and mapping of transport channels onto physical channels (FDD), www.3GPP.org, Release 4. [18] Harri Holma, Antti Toskala, WCDMA for UMTS: Radio Access for Third Generation Mobile Communications, John Wiley, 2000. [19] A. J. Viterbi, CDMA: Principles of Spread Spectrum Communication, Addison Wesley, 1995. [20] Xilinx The Programmable Logic Data Book 2000. [21] M. Kiessling and S.A. Mujtaba, Performance Enhancements to the UMTS (W-CDMA) Initial Cell Search Algorithm, IEEE International Conference on Communications, vol. 1, May 2002, pp. 590594.

Technical Publications On This Thesis


[1] Sanat Kamal Bahl, Jim Plusquellic and Joseph Thomas, Comparison of Initial Cell Search Algorithms for 3GPP W-CDMA Systems Using Cyclic and Comma Free Codes, 45th IEEE Midwest Symposium on Circuits and Systems, Tulsa, Oklahoma, August 2002. [2] Sanat Kamal Bahl, Jim Plusquellic and Joseph Thomas, Improved Cell Search Design for W-CDMA", Submitted to Jounal of Circuits, Systems and Computers (Awaiting acceptance).

63

You might also like