Doctoral Dissertation

Hardware Implementation of Signal Processing in Smart Antenna Systems for High Speed Wireless Communication
(高速移動通信を実現するスマートアンテナ技術の実用化における信号処理部の実装に関する研究)

Submitted on
December 15, 2004

Committee:
Prof. Hiroyuki Arai (Supervisor, Chair)
Prof. Yasuo Kokubun
Prof. Ryuji Kohno
Associate Prof. Toshihiko Baba
Associate Prof. Koichi Ichige

Minseok Kim

Department of Electrical and Computer Engineering,
Yokohama National University, Yokohama, Japan
博士論文

高速移動通信を実現するスマートアンテナ技術の実用化における信号処理部の実装に関する研究

平成16年12月15日提出

審査委員会
新井宏之 教授 (指導教官)
国分泰雄 教授
河野隆二 教授
馬場俊彦 助教授
市毛弘一 助教授

金ミンソク

横浜国立大学 工学府
物理情報工学専攻 電気電子ネットワークコース
Abstract

Smart antennas basically attempt to enhance the desired signal power and suppress the interferers by beamforming toward the DOA (direction-of-arrival) of the desired signal and nullsteering in the case of the interferences’ DOAs in line-of-sight (LOS) situation. This paper considers a smart antenna system for cellular base station application. The signals at the base station are received through the scattering process from the local scatterers around the mobile station because the antenna is usually deployed higher than surrounding scatterers. In this case, although the fading across the antenna elements spaced by λ/2 is highly correlated, the AS is typically only a few degrees. Hence, the angle domain approach exploiting the DOA information can be useful. Moreover, only the DOAs are commonly useful channel information in both uplink and downlink channels of Frequency Division Duplex (FDD) systems where each link uses different frequency. This is because the average angular power spectrums in both links are almost identical. Thus, in an uplink, we can estimate the channel parameters with the received data that have passed the channel and apply the estimated DOA information to downlink beamforming. In a downlink channel, smart antenna transmitters can focus the energy to the desired user using the DOA information, and thus reduce the interference for other users. This is the most significant feature of the angle domain approach with DOA estimation.

This paper mainly aims at the practical and general applications of signal processing implementation of a smart antenna system for wireless cellular basestation systems. While theoretical studies are useful for analyses and simulation purposes, their implementation in real hardware and their validation with actual measurement data will be a meaningful challenge. This paper studies general signal processing independent of specific communication systems and hardware platforms. This paper proposes the smart antenna system based on the DOA estimation of incident signals, which includes fast DOA estimation and beamforming processors. The prototype system integrating signal processors on field programmable gate arrays (FPGAs) was manufactured and the evaluation testbed was configured. The prototype system employs an 8-element antenna array for 5 GHz band and an RF transceiver for each branch. The functionality of the baseband smart antenna processor was verified through hardware simulations and laboratory experiments. The entire hardware system, including RF, antenna arrays, and air interface, at both the transmitter and receiver configurations were also tested in a radio anechoic chamber.

Smart antenna systems, especially those based on DOA estimations, usually require high computational performance for array signal processing. Apart from the hardware complexity
of multiple transceivers and an impractically high cost, the limited performance of the signal processing devices has been a big obstacle to its realization. With the general Von Neumann architected processors such as the general micro processing unit (MPU) or the digital signal processor (DSP) with a single or few Multiplier-Accumulators (MACs), it may be difficult to process a large amount of data from multiple antenna elements in real-time. They also inefficiently consume large power because they are not optimized for the specific purpose. In this paper, FPGAs are basically used as digital signal processors. Due to to the recent progress of Very Large Scale Integration (VLSI) technology, FPGAs have evolved from flexible logic design platforms to signal processing engines. FPGAs are now expected to be essential components of a software radio, which includes smart antenna technology, due to their high order of flexibility by SRAM-based structure and real-time processing capabilities. They are inherently suitable for high-speed parallel MAC functions. However, it is incorrect to assume that all DSP functions can always be implemented in FPGAs. Floating point operation is difficult to implement in FPGAs because a large amount of gate resources are wasted and the performance cannot be optimized. Further, processing involving matrix inversion (or scalar division) is also more suitable to a DSP or MPU platform. Therefore, FPGAs and DSP have coexisted for a long time, and a flexible platform usually includes their combination.

This paper presents the fixed-point processors of a fast DOA estimator and beamformer implemented with FPGAs. A DOA estimator incorporates unitary MUSIC algorithm, termed “Unitary MUSIC Processor (UMP).” Unitary MUSIC algorithm has a super resolution capability and is suitable for integration on logic circuit devices such as FPGAs. The techniques suitable for FPGAs, such as the eigenvalue decomposition by a cyclic Jacobi processor based on COordinate Rotation Digital Computer (CORDIC) and spectrum computation by the spatial Discrete Fourier Transformation (DFT), can be exploited to optimize the fixed point arithmetic. This paper also proposes a novel beamforming technique with extremely low complexity that can use the estimated DOA information in the UMP. This beamformer computes the optimum weight vector with a simple convolution operation of precomputed beam and notch patterns; thus, it is able to provide a fast beamforming performance without any matrix inversion in conventional techniques.

The remainder of this dissertation is organized as follows. Chapter 2 introduces the developed hardware platforms of the smart antenna transceiver system and includes its implementation details. The hardware requirements in smart antenna technologies are also discussed. The benefits of FPGAs as array signal processors will be described in comparison with general-purpose DSP processors. The prototype transceivers were designed taking into consideration of the reconfigurable IF processing based on software-defined radio concept as well as the adaptive control of antenna beampattern. In order to implement this, the prototype transceivers are architected by low-IF super heterodyne with undersampling at receivers/harmonic band generation scheme in transmitters. In this chapter, the sampling jitter effect, which determines the maximum IF input frequency, is quantified based on measurements. The implementation of the smart antenna signal processing is described in Chapters 3 and 4. Chapter 3 presents the fast DOA estimation processor using unitary MUSIC algorithm. First, the design of a
fast eigenvalue decomposition processor based on a cyclic Jacobi method with the CORDIC technique is described. Second, the implementation of a fast DOA estimation processor termed UMP by FPGAs on a developed transceiver platform is described. The performance will be also quantified by fixed-point hardware level simulation. Chapter 4 proposes a novel beamforming technique with extremely low complexity and presents its performance and hardware implementation. Chapter 5 describes the experimental evaluation of the integrated system performance. The effects caused by real hardware imperfections, such as mutual coupling among antenna branches, are discussed and array calibration issues will be dealt with in this chapter. Finally, Chapter 6 summarizes the dissertation describing the contributions and remaining problems and outlines the future works to be examined.
Acknowledgments

I am deeply grateful to many peoples who supported me in their helpful comments and enlightening suggestions.

- First of all, my supervisor, Prof. Hiroyuki Arai always gave me the courage to carry out my research in Japan (aboard for me) as well as the technical and mental supports. I wish to express my gratitude to the committee members of Prof. Yasuo Kokubun, Prof. Ryuji Kohno, Associate Prof. Toshihiko Baba and Associate Prof. Koichi Ichige for helpful comments and careful reviews of this dissertation. I would like to express special appreciation to Associate Prof. Koichi Ichige for a plenty of advices and criticism about my research work. I also thanks to Associate Prof. Nobuhiko Kuga for his kind advise for my special research work.

- I would like to express deep thanks to Dr. Masahiro Fukuta, Mr. Hirotake Horiuchi and Dr. Li Chao Tan of Brains Corporation for financial support. I am also deeply grateful to Prof. Kwang-cheol Ko of Hanyang University, Seoul, Korea that Prof. Ko and Dr. Fukuta actually gave me the opportunity to study in Japan and participate in this research work. Dr. Fukuta also helped me the hardware support in designing and manufacturing testbed system and guided the hardware implementation skills.

- I would like to thank to Dr. Masahiro Karikomi of Nihon Dengyo Kousaku, Dr. Yoshio Ebine and Dr. Keizo Cho of NTT DoCoMo, Inc. and Dr. Koichi Tsumekawa of NTT Corporation for their research co-operation and helpful comments.

- My appreciation goes to my colleagues; Shintaro Muramatsu, Naoya Matsumoto, Tatsuo Fuji, Aiko Kiyono and Atsushi Nakajima for their research cooperation. I would like to thank to other colleagues in my research group; Yuki Inoue and Dr. Kohei Mori of Sony Corporation and Dr. Naobumi Michishita of National Defense Academy for their helpful discussion. I also would like to extend my gratitude to everyone in Prof. Arai, Prof. Ichige and new born Prof. Kuga laboratories. I would like to express my appreciation to Mr. Takehiro Miyamoto of Nihon Dengyo Kousaku for his helpful research co-operation.

- Finally, I owe special thanks to Shil my wife and Jaeweon my lovely son for their patience and understanding. And I also would like to thank my family including my parents in Korea for continuous encouragement.
# Contents

Abstract  

Acknowledgments  

**Chapter 1  Introduction**

1.1 Background  .................................................. 1  
1.2 Wireless Communication Systems  ................................ 2  
1.3 Smart Antenna Technologies  ..................................... 4  
   1.3.1 Overview .................................................. 4  
   1.3.2 Basic Configuration and Optimization Techniques  ....... 7  
   1.3.3 Benefits .................................................. 9  
1.4 Smart Antenna System using Direction-of-Arrival Estimation .... 13  
1.5 Scope and Contributions  ....................................... 14  
1.6 Outline ...................................................... 16  

**Chapter 2  Smart Antenna Testbed System**  

2.1 Introduction .................................................. 18  
2.2 Hardware Requirements ........................................ 18  
   2.2.1 Digital Transceiver Architecture ......................... 18  
   2.2.2 Data Conversion Devices ................................... 20  
   2.2.3 Signal Processing Devices ................................. 20  
2.3 Developed Smart Antenna Testbed System Overview ............. 26  
   2.3.1 Radio Frequency (RF) Components ........................ 26  
   2.3.2 Digital Signal Processing Unit .............................. 28  
   2.3.3 Digital Receiver Implementation .......................... 32  
2.4 Undersampling and ADC’s Input Frequency Limitation at IF Band 36  
   2.4.1 Sampling Jitter Effect in Undersampling Scheme ........ 36  
   2.4.2 Evaluation of Jitter Effect ............................... 40  
2.5 Summary ..................................................... 44  

**Chapter 3  Fast DOA Estimation Processor**  

3.1 Introduction .................................................. 45  
3.2 DOA Estimation using MUSIC algorithm ........................ 46
Chapter 3  Digital Signal Processor Design

3.1  Introduction ................................. 3
3.2  Data Model ................................. 46
3.2.1  MUSIC algorithm .......................... 47
3.2.2  Unitary MUSIC DOA Estimation ............... 48
3.3  Digital Signal Processor Design ............... 50
3.3.1  Correlation Matrix with Unitary Transformation .... 50
3.3.2  MUSIC Spectrum Computation ................ 52
3.3.3  Local Minima (LM) Detection ................. 55
3.4  Eigenvalue Decomposition Processor ............ 58
3.4.1  Cyclic Jacobi Method ....................... 56
3.4.2  CORDIC Algorithm .......................... 57
3.4.3  Fixed-Point Implementation .................. 61
3.4.4  Processor Structure ........................ 64
3.4.5  Computational Load and Expected Performance .... 65
3.5  Hardware Implementation ..................... 68
3.6  Performance Assessments ..................... 71
3.6.1  Noise Channel under Planar Wave Assumption .... 71
3.6.2  Multipath Channel with Small Angle Spread ....... 75
3.7  Real Data Examples ........................... 77
3.8  Summary .................................... 79

Chapter 4  DOA-based Beamforming Processor .......... 80

4.1  Introduction ................................ 80
4.2  DOA-based Beamforming Techniques ............... 81
4.2.1  Optimum Beamformers ...................... 81
4.2.2  Low-sidelobe Beamformers .................... 84
4.2.3  Performance Comparison ..................... 85
4.3  Null Steering Beamformer with Notch Beams ......... 88
4.3.1  Basic Principle ............................. 88
4.3.2  Performance Improvement and Limitation .......... 92
4.4  Hardware Implementation ........................ 95
4.5  Performance Evaluation by Computer Simulations .... 98
4.6  Summary .................................... 103

Chapter 5  Integrated System Evaluation ............... 104

5.1  Introduction ................................ 104
5.2  System Configuration .......................... 104
5.3  Data Model .................................. 107
5.4  Integrated System Functions ..................... 108
5.4.1  Estimation of Number of Sources ............... 108
5.4.2  DOA Estimation ............................. 108
5.4.3  User Identification .......................... 110
5.4.4  DOA Tracking .............................. 110
Chapter 1

Introduction

1.1 Background

In recent years, wireless communication systems have progressed greatly and the market, especially for the cellular phone, has witnessed explosive growth. Moreover, as the demand for multimedia services increases, a wider bandwidth of information will be required for next-generation wireless systems. These systems will be allocated at a higher frequency band because a number of useful frequency bands have already been allotted to and are occupied by existing systems. In order to accommodate a larger number of subscribers and to provide better quality services, it is necessary to increase the channel capacity. Further, the technologies required for power saving and efficient frequency reusability will be necessary for various multimedia services.

In fact, multipath fading, which is typically caused by a reflection from any physical structure, is an unavoidable phenomenon in wireless communication environments, because the signals are usually propagated through a multipath, as shown in Fig.1.1 [3]. When passing through a multipath, the signals are delayed and out of phase from the signals passing through a direct path; this causes the signal strength to be weakened at the receiver, thus reducing the receiving quality. A wider band and higher transmission rate will make this a more critical problem in the improvement of the quality of service in next-generation communication. Many solutions have been proposed to resolve this problem. In fact, the flexible configuration of the receiver and transmitter is important as a response to the signal environment that recognizes spatial channel profiles as well as temporal ones. As an example, spatial diversity techniques using composite information from the array to achieve average signal-to-noise ratio of spatially separated antenna, thus minimizing undesirable fading effects of multipath propagation, have been studied in many applications.

There exists another problem that is caused by multipath propagation. Delay spread is the difference in propagation delays among the multiple paths. It leads to inter-symbol-interference (ISI) that limits the maximum data rates. Frequency-selective fading occurs if the delay spread is equal to or larger than the inverse of the signal bandwidth.

The third major limitation in wireless communication system is referred to as co-channel interferences (CCIs). In the case of the time division multiple access (TDMA) system, such as
the Japanese personal digital cellular (PDC) systems, IS-136, and European Global System for Mobile communications (GSM), the CCIIs are primarily caused by frequency reuse; this implies that the spectrum allocated for the system is reused multiple times in neighboring cells. The problem is that the signals received by a base station and a mobile handset usually contain a signal component that has originated from more distant cells as well as a signal component of the desired channel data from the current cell. CCIIs degrade the carrier-to-interference power ratio (CIR); thus, the system capacity will be limited by more frequent handoffs and reduced connectivity.

In order to cope with these inherent problems in wireless communication systems, smart antennas having multiple antennas have been proposed as a very promising technology to mitigate multipath fading and reject CCIIs by efficiently exploiting the spatial domain signal processing; spatial domain signal processing is the next frontier of the resources of wireless systems for frequency, time, and code domain. Smart antennas can extend the capabilities of the next-generation systems to provide customers with increased data transmission throughput for high-speed multimedia applications.

1.2 Wireless Communication Systems

Wireless communication technologies have progressed greatly from the 1980s, as shown in Fig.1.2. This section presents a brief history of the wireless communication systems and provides an outlook into the future systems.

In the first generation (1G) systems, analog systems such as advanced mobile phone system (AMPS) and total access communications system (TACS) grew rapidly in the 1980s and are still available today. These systems use frequency division multiplexing to divide the bandwidth into specific frequencies that are assigned to individual calls.

The second generation (2G) systems introduced in the mid-90s are still used today. The second generation (2G) systems are digital and use either TDMA or code division multiple access (CDMA) methods. The Japanese PDC and European GSM are 2G digital systems with
Fig. 1.2: Wireless cellular communication system roadmap.

their own TDMA access methods. On the other hand, the cellular systems in U.S.A., Korea, and Japan, use CDMA technology. The 2G digital services began appearing in the 1990s, providing expanded capacity and various unique services.

“Third generation (3G)” has become an umbrella term to describe cellular data communications with a target data rate of 2 Mbits/sec (actually 64 ~ 384 Kbps). The International Telecommunication Union (ITU) originally attempted to define 3G in its IMT-2000 (International Mobile Communications-2000) specifications that specified global wireless frequency ranges, data rates, etc. However, a global standard was difficult to implement due to differing frequency allocations around the world and conflicting inputs. Therefore, three operating modes were specified by WCDMA, CDMA2000, and Universal Wireless Communications-136 (UWC-136).

The fourth generation (4G) systems may become available even before 3G is fully developed because 3G is a confusing mix of standards. In 4G systems, it is expected that the target data rate will be up to 100-Mbit/sec in pedestrian and 1-Gbit/sec in indoor area possibly higher. 4G will require a channel capacity above 10 times that of 3G systems and must also fully support Internet Protocol (IP) [26]. High data rates are a result of advances in signal processors, new modulation techniques, and smart antennas that can focus signals directly at the users. Orthogonal frequency division multiplexing (OFDM), multiple-input-multiple-output (MIMO), and a combination of the above are promising schemes that can provide extremely high wireless data rates.
CHAPTER 1. INTRODUCTION

1.3 Smart Antenna Technologies

1.3.1 Overview

The efficient use of the frequency resources is necessary to achieve higher data transmission throughput. Smart antenna systems are capable of automatically changing the directionality of their radiation patterns in response to their signal environment. This can noticeably improve the performance characteristics, such as channel capacity and quality of a wireless system. Smart antenna systems, by using spatially separated antennas, referred to as antenna array, maximize the signal-to-interference-plus-noise ratio (SINR) of the received signals, and suppress interferences and noise power by digital signal processing after analog to digital conversion.

Conventional Antenna Systems

Conventional antenna systems, which employ a single antenna, radiate and receive information equally in all directions. This omni-directional radiation leads to the distribution of energy in all directions. This wasted power becomes a potential source of interference for other users or for other base stations in other cells. Interference and noise reduce the signal-to-noise ratio (SNR) used for detection and demodulation, resulting in poor signal quality. Today’s cellular systems usually introduce 120° sectorization of the coverage to enhance capacity and reduce CCIs [14]. Figure 1.3 illustrates that three directional antennas can cover each sector area of 120°.

Smart Antenna Systems

On the other hand, smart antennas at the transmitter are capable of steering the maximum radiation pattern toward the desired mobile; at the receiver, they can spatially separate the energy of the interference and mitigate multipath fading using a software algorithm. This ensures that an optimum quality of service is delivered to users, and it provides maximum
Fig. 1.4: Switched beam antenna configuration.

Fig. 1.5: Smart antenna receiver configuration with digital beamforming network.

Fig. 1.6: Multiple-Input-Multiple-Output (MIMO) configuration.
CHAPTER 1. INTRODUCTION

coverage for a base station. Smart antennas can use spatial domain processing by using multiple antennas, thus enabling them to have intelligence to process the data at both receiver and transmitter. The smart antennas are often classified as switched-beam arrays and adaptive array antennas [5, 8]. The switched-beam arrays comprise beamforming networks and a beam selection processor, as shown in Fig.1.4. The processor selects the beam with maximum power response by switching the beams. However, the adaptive array antennas incorporate more intelligence than the switched beam arrays. Adaptive array antennas can estimate their environment in accordance with the propagation channel responses between the receiver and the transmitter. This information is then used to weigh the data received at/transmitted from the antenna array to maximize the response for the desired user. Figure 1.5 illustrates the typical configuration of the adaptive array antennas. The processor determines the optimum weight vector for a given environment and thus, the adaptive antennas can combine the received signal maximizing the SNR and not merely SNR. As shown in Figs.1.4 and 1.5, smart antennas can provide space division multiple access (SDMA). In this case, the users may use the same frequency, time, or code allocations over the air interface and only be separated spatially. SDMA can be applied as a complementary scheme to Frequency Division Multiple Access (FDMA), TDMA, and CDMA, thus providing additional capacity enhancement.

**MIMO (Multiple-Input Multiple-Output) System**

In future wireless systems, both the base station and the mobile station will employ a smart antenna technique, referred to as MIMO system, as shown in Fig.1.6. Ironically, MIMO systems incorporate multipath propagation, which traditionally has been an inherent problem faced by wireless transmission. Multiple spatial channels can be used to dramatically increase the data rate [27, 28]. Moreover, space-time block coding (STBC) techniques have been proposed to improve the channel capacity [29, 30].
Terminology

In this paper, the term “smart antennas” is used to denote complete antenna systems, including all components from the antenna to baseband digital signal processing. It is often termed “adaptive (array) antenna” [4], “software antenna” [78, 79], or “digital beamforming antenna (DBF antenna)” [9, 10]. Together, these imply an intelligent antenna system different from a conventional omni-directional or fixed-beam antenna system, which merely receive and transmit signals without any adaptive behavior to a change of environment [10, 11, 79]. In [5, 8], the term “smart antennas” is used broadly to imply a multiple antenna system including switched multi-beam antennas and adaptive array antennas. This paper uses the broad term “smart antennas” to imply a similar meaning as “adaptive array antenna”. Figure 1.7 shows the relationships between these terms.

1.3.2 Basic Configuration and Optimization Techniques

In general, smart antenna systems consist of the antenna array, frequency converters, data converters, and smart antenna processor, as shown in Fig.1.5. The antenna array is a set of multiple antennas that are designed to receive or transmit the signals with an adaptive beampattern. Arrays have various physical arrangements such as linear, circular, rectangular, etc. The geometry of an array antenna is determined taking into consideration its applications.

The primary functions of the Radio Frequency (RF)/Intermediate Frequency (IF) circuitry are frequency conversion, filtering, and amplifying. The critical problem faced in the practical applications of smart antenna systems is the heavy hardware with multiple transceivers and high-performance digital signal processors. However, in order to improve frequency reusability, the smart antenna technology should be integrated in 4G systems.

In order to explain the principle of smart antenna signal processing, the simple discrete wavefront model with narrowband signal is illustrated in Fig.1.8. The array geometry is assumed to be a uniformly equispaced linear array of identical and omni-directional $M$ elements.

---

1On the other hand, the term “software defined radio (SDR)” appears to be used in a broader sense. SDR means a set of all target processors required for radio communications, including antenna technology [80]. Hence, “smart antenna” technique might be a part of a SDR technology; however, the terms are often used interchangeably [78].
Fig. 1.8: Path difference in planar wave model (linear array of $M$-element).

The electromagnetic wave arriving at the antenna array is also assumed to be planar. Let the angle between array normal and incident wave be $\theta$; the far-field expression of the electrical signal at the $k$-th element at any time $t$ is given by

$$x_k(t) = s(t) \cdot \exp \left( -j \frac{2\pi}{\lambda} d(k-1) \sin \theta \right) + n_k(t) \quad k = 1, 2, \cdots, M$$  \hspace{1cm} (1.1)$$

where $s(t)$, $\lambda$, and $\theta$ are the envelop, wavelength, and direction-of-arrival (DOA) of an incident wave, respectively. $n_k(t)$ is the additive white Gaussian noise (AWGN) at the $k$-th element, and $d$ is the distance spaced between each antenna. In Eq.(1.1), if $s(t)$ is a narrow band signal, the temporal delay caused by path difference between the elements corresponds to the phase difference. The output of the array antenna is produced by the inner product (multiply-accumulate operation) of input signals and weight coefficients determined by adaptive algorithms as

$$y(t) = \sum_{k=1}^{M} w_k^* x_k(t).$$ \hspace{1cm} (1.2)$$

Equation (1.2) can be also rewritten by vector expression as

$$y(t) = \mathbf{w}^H \mathbf{x}(t)$$ \hspace{1cm} (1.3)$$

where the superscript $^H$ denotes Hermitian transpose operator. The data vector $\mathbf{x}$ is written by

$$\mathbf{x}(t) = \mathbf{a}(\theta)s(t) + \mathbf{n}(t),$$ \hspace{1cm} (1.4)$$

where $\mathbf{a}(\theta)$ denotes array mode vector as

$$\mathbf{a}(\theta) = \left[ 1, \exp \left( -j \frac{2\pi}{\lambda} d \sin \theta \right), \cdots, \exp \left( -j \frac{2\pi}{\lambda} d(M-1) \sin \theta \right) \right]^T,$$ \hspace{1cm} (1.5)$$

In the absence of interference, the adaptive beamforming performs by maximizing the output SNR. In the presence of interferences, the SINR will be maximized by reducing the power of the interference. If a single planar wave arrives through a single path as per the above assumption, Eqs.(1.3)-(1.5) show that the optimum weight vector becomes the mode vector of the incident wave.
However, in an actual multipath propagation environment, the signals arrive in a more complex manner experiencing different paths, for example, as

\[
x(t) = \sum_{i=0}^{N-1} \mathbf{v}_i(t, \tau) s_i(t) + n(t),
\]

(1.6)

where

\[
\mathbf{v}_i(t, \tau) = \sum_{l=0}^{L(t)-1} A_{l,i}(t) e^{j \phi_{l,i}(t)} a(\theta_{l,i}(t)) \delta(t - \tau_{l,i}(t)),
\]

(1.7)

\(A_{l,i}, \phi_{l,i}, \tau_{l,i}, \text{and } \theta_{l,i}\) denote the amplitude, carrier phase shift, time delay, and DOA of the \(l\)-th signal component of the \(i\)-th mobile, respectively. In general, each of these signal parameters will be time-varying. \(L(t)\) is the number of multipath components. The amplitude \(A_{l,i}\) of the multipath components is usually modeled as a Rayleigh distributed random variable, while the phase shift \(\phi_{l,i}\) is uniformly distributed. The time varying components depend on the Doppler shift \(f_d\) [23].

A number of techniques have been proposed to determine the optimum weight, such as minimizing mean square error (MMSE), whose solutions are based on solving Wiener-Hopf equation, maximizing signal to noise ratio (MSN), which is based on generalized eigenvalue problem, and linearly constrained minimum variance filter (LCMV) or directional constrained minimizing power (DCMP), which requires the information of DOAs of desired signals and interferers as priori. They require some channel information, such as the training sequence in the case of MMSE and the directional information of incident signals in the case of MSN and LCMV. There are also blind methods, such as constant modulus algorithm (CMA), in which the directional information is not necessary. However, the spatial channel signature such as DOAs of incident signals is often required for efficient spatial domain processing. Various techniques of DOA estimation have been studied as another field of adaptive antenna technologies [7,12]. We have cited several excellent review papers which report these techniques [5,7,13].

### 1.3.3 Benefits

Smart antennas can provide solutions for three major wireless problems-multipath fading, delay spread, and CCI-as given below:

#### Diversity

Smart antennas basically improve SNR with a combination of signals at each branch; additionally, they can provide diversity gain. Diversity technique incorporates the fading uncorrelation. The basic methods by which diversity gain is provided are by using the spatial, polarization, and angle diversity [3]. Among the elements, spatial diversity with spatially separated antennas has a low correlation. Theoretically, among the antenna elements, a \(M\)-element smart antenna system has an \(M\)-fold diversity gain with sufficiently low fading correlation as well as
CHAPTER 1. INTRODUCTION

$M$-fold antenna gain (beamforming gain). When it cancels $L$ interferers ($L < M$), $(M - L)$-fold diversity gain can be achieved [8]. In an urban cellular base station, the signals at the base station are received by a scattering process in the vicinity of the mobile station because the antenna is usually deployed higher than the surrounding scatterers. Thus, if the mobile station is sufficiently far from the base station [23], the angle spread (AS) is typically only a few degrees. The measured AS in an urban cellular base station with an antenna height of 35 meters (rooftop of a building) and 80 meters (pylon) at 2 GHz band was reported to be approximately $2 \sim 5$ degrees [25]. Figure 1.9 shows an example of the diversity and beamforming gains in an 8-element antenna array under none-line-of-sight (NLOS) multipath Rayleigh fading channel with an AS of 10 degrees, where UBF, DCBF, and MRC denote the uniform beamformer, Dolph-Chebyshev beamformer, and maximum ratio combiner, respectively. The smaller the AS, the lesser will be the diversity gain achieved, as shown in Fig.1.10, where there is no apparent diversity gain with an AS of 0 degree.

Delay Spread Reduction

The spatial dimension can be utilized by using multiple antennas in receivers/transmitters. Array signal processing is capable of producing transmit/receive beams toward the desired mobile. It is simultaneously possible to place spatial nulls in the direction of undesired interferences; this is termed as null-steering. This feature can improve the performance of a mobile communication system. Smart antennas have a higher gain than a conventional omni-directional antenna. The higher gain can be used to either increase the effective coverage, or to increase the receiver sensitivity. Conversely, it can be used to reduce transmit power and electromagnetic radiation in the communication network. Multi-path propagation in mobile radio environments results in ISI. The adverse effects of multi-path and ISI can be reduced by using transmit and receive beams that are directed toward the desired mobile. Smart antenna transmitters emit less interference by only sending RF power that is concentrated in the desired directions.

Interference Suppression

Smart antennas can improve SINR (Signal to Interference and Noise Ratio) performance by adaptively controlling the directivity of the antenna radiation pattern. Interference suppression is enabled by exploiting the spatial dimension; this is not possible with processing in the time domain only. The desired signal and CCI arrive at the receiver with well separated spatial signatures thus allowing the smart antenna to extract the desired signal and suppress the CCIs completely. During transmission, the smart antenna can focus the energy to the desired user, thus reducing the interference for other users.
Fig. 1.9: Diversity gains with 8-element antenna array (multipath Rayleigh fading channel, angle spread =10°).
Fig. 1.10: Diversity combining with 8-element antenna array (multipath Rayleigh fading channel with small angle spread).
1.4 Smart Antenna System using Direction-of-Arrival Estimation

Smart antennas basically attempt to enhance the desired signal power and suppress the interferers by beamforming toward the DOA of the desired signal and nullsteering in the case of the interferer's DOAs in line-of-sight (LOS) situation. Theoretically, the maximum \((M - 1)\) interferers can be canceled with the \(M\)-element array. If LOS exists between the transmitter and the receiver, maximum incident energy is concentrated on the look direction as a single plane wave. The high-resolution estimation techniques for DOAs of incident signals, including MUltiple Signal Classification (MUSIC) [31] and Estimation of Signal Parameters Rotational Invariance Technique (ESPRIT) [32], are useful for efficient beamforming. These types of beamforming techniques are based on the angle domain approach that exploits the fading correlation among antenna elements. Hence, diversity gain cannot be achieved.

However, in multipath propagation environments, there may be several situations that are NLOS. In such cases, there are usually a large number of multipath signals, more than the number of antennas. The beamforming in angle domain with DOA information of the propagation channel is not valid here because the DOA of the signal cannot be considered as the single wavefront of a plane wave. However, smart antenna systems can combine the multipath signals and cancel interferences that are less than the number of antenna elements independent of the number of paths in a multipath. On the basis of the signal domain approach, additional diversity gain can be achieved by exploiting the fading uncorrelation among antenna elements. When the delay spread is not sufficiently small, the correlation between the original signal and the delayed versions decreases; thus, the smart antenna system cancels the delayed versions as separate signals. In this case, the supposition that the maximum \((M - 1)\) delayed signals can be canceled with \(M\)-element array is also valid [8].

This paper considers a smart antenna system for cellular base station application. As mentioned above, the signals at the base station are received through the scattering process from the local scatterers around the mobile station because the antenna is usually deployed higher than surrounding scatterers, as shown in Fig.1.11. In this case, although the fading across the antenna elements spaced by \(\lambda/2\) is highly correlated, the AS is typically only a few degrees. Hence, the angle domain approach exploiting the DOA information can be useful. Moreover, only the DOAs are commonly useful channel information in both uplink and downlink channels of Frequency Division Duplex (FDD) systems where each link uses different frequency. This is because the average angular power spectrums in both links are almost identical. Thus, in an uplink, we can estimate the channel parameters with the received data that have passed the channel and apply the estimated DOA information to downlink beamforming. In a downlink channel, smart antenna transmitters can focus the energy to the desired user using the DOA information, and thus reduce the interference for other users. This is the most significant feature of the angle domain approach with DOA estimation.

In fact, the DOA estimation techniques basically assume that the incident waves are planar and incoherent; hence, they are often expressed by discrete wavefronts from point sources. In an urban cellular base station environment, a large number of coherent signals experiencing
multipath of far field local scatterers arrive from spread DOAs. This reduces the fading correlation among the antenna elements. Hence, the SNR at every element becomes a random variable with any distribution and the estimation performance in this case will be degraded [40]. However, various joint estimation techniques of DOAs and AS have been proposed [41–43]. Further, tracking techniques can provide a statistical average performance of DOA estimation in a fading environment with small AS [66]. The main features of the angle domain approach using DOA estimations are as follows:

- Application for urban cellular base station with small angle spread [8, 16].
- Estimated DOAs can be the only common useful information in FDD system [19–21, 66].
- No straight diversity gain; however, angular diversity can be available by combining multipaths [66].
- Large computational load is required for DOA estimation [16].

1.5 Scope and Contributions

This paper aims at the practical and general applications of signal processing implementation of a smart antenna system for wireless cellular basestation systems. While theoretical studies are useful for analyses and simulation purposes, their implementation in real hardware and their validation with actual measurement data will be a meaningful challenge. This paper proposes the smart antenna system based on the DOA estimation of incident signals, which includes fast DOA estimation and beamforming processors. The prototype system integrating signal processors on field programmable gate arrays (FPGAs) was manufactured and the evaluation testbed was configured. The prototype system employs an 8-element antenna array for 5 GHz band and an RF transceiver for each branch. The functionality of the baseband smart antenna processor was verified through hardware simulations and laboratory experiments. The entire hardware system, including RF, antenna arrays, and air interface, at both the transmitter and receiver configurations were also tested in a radio anechoic chamber.
CHAPTER 1. INTRODUCTION

Smart antenna systems, especially those based on DOA estimations, usually require high computational performance for array signal processing. Apart from the hardware complexity of multiple transceivers and an impractically high cost, the limited performance of the signal processing devices has been a big obstacle to its realization. With the general Von Neumann architected processors such as the general micro processing unit (MPU) or the digital signal processor (DSP) with a single or few Multiplier-ACCumulators (MACs), it may be difficult to process a large amount of data from multiple antenna elements in real-time. They also inefficiently consume large power because they are not optimized for the specific purpose. In this paper, FPGAs are basically used as digital signal processors. Due to to the recent progress of Very Large Scale Integration (VLSI) technology, FPGAs have evolved from flexible logic design platforms to signal processing engines. FPGAs are now expected to be essential components of a software radio, which includes smart antenna technology, due to their high order of flexibility by SRAM-based structure and real-time processing capabilities. They are inherently suitable for high-speed parallel MAC functions. However, it is incorrect to assume that all DSP functions can always be implemented in FPGAs. Floating point operation is difficult to implement in FPGAs because a large amount of gate resources are wasted and the performance cannot be optimized. Further, processing involving matrix inversion (or scalar division) is also more suitable to a DSP or MPU platform. Therefore, FPGAs and DSP have coexisted for a long time, and a flexible platform usually includes their combination [82].

This paper presents the fixed-point processors of a fast DOA estimator and beamformer implemented with FPGAs. A DOA estimator incorporates unitary MUSIC algorithm, termed “Unitary MUSIC Processor (UMP).” Unitary MUSIC algorithm has a super resolution capability and is suitable for integration on logic circuit devices such as FPGAs. The techniques suitable for FPGAs, such as the eigenvalue decomposition by a cyclic Jacobi processor based on COordinate Rotation Digital Computer (CORDIC) and spectrum computation by the spatial Discrete Fourier Transformation (DFT), can be exploited to optimize the fixed point arithmetic. This paper also proposes a novel beamforming technique with extremely low complexity that can use the estimated DOA information in the UMP. This beamformer computes the optimum weight vector with a simple convolution operation of precomputed beam and notch patterns; thus, it is able to provide a fast beamforming performance without any matrix inversion in conventional techniques. The major original contributions of this dissertation can be listed at a glance as follows:

- Hardware implementation of smart antenna transceivers testbed.
- Designing a fast eigenvalue decomposition processor based on cyclic Jacobi method with hardware-friendly CORDIC technique [101].
- Fixed-point processor design and simulation of Unitary MUSIC (MULTiple Signal Classification) algorithm [74].
- Hardware implementation on FPGAs of fast DOA estimation processor using Unitary MUSIC algorithm [74].
• Proposal and evaluation of a novel beamforming technique of NDC-BF (Null steering Dolph-Chebyshev beamformer) [49].

• Evaluation of the integrated real-hardware system to verify the functionality.

• Examination of sampling jitter effect on digital downconversion receiver architecture [94].

1.6 Outline

The remainder of this dissertation is organized as shown in Fig.1.12. Chapter 2 introduces the developed hardware platforms of the smart antenna transceiver system and includes its implementation details. The hardware requirements in smart antenna technologies are also discussed. The benefits of FPGAs as array signal processors will be described in comparison with general-purpose DSP processors. The prototype transceivers were designed taking into consideration of the reconfigurable IF processing based on software-defined radio concept as well as the adaptive control of antenna beampattern [78]. In order to implement this, the prototype transceivers are architected by low-IF super heterodyne with undersampling at receivers/harmonic band generation scheme in transmitters. In this chapter, the sampling jitter effect, which determines the maximum IF input frequency, is quantified based on measurements. The implementation of the smart antenna signal processing is described in Chapters 3 and 4. Chapter 3 presents the fast DOA estimation processor using unitary MUSIC algorithm. First, the design of a fast eigenvalue decomposition processor based on a cyclic Jacobi method with the CORDIC technique is described. Second, the implementation of a fast DOA estimation processor termed UMP by FPGAs on a developed transceiver platform is described. The performance will be also quantified by fixed-point hardware level simulation. Chapter 4 proposes a novel beamforming technique with extremely low complexity and presents its performance and hardware implementation. Chapter 5 describes the experimental evaluation of the integrated system performance. The effects caused by real hardware imperfections, such as mutual coupling among antenna branches, are discussed and array calibration issues will be dealt with in this chapter. Finally, Chapter 6 summarizes the dissertation describing the contributions and remaining problems and outlines the future works to be examined.
Fig. 1.12: Structure of dissertation.
Chapter 2

Smart Antenna Testbed System

2.1 Introduction

This chapter introduces the developed hardware platforms of the smart antenna transceiver system and its implementation details. First, the hardware requirements to realize smart antenna technologies are discussed from the viewpoint of the SDR concept [80]. The benefits of the FPGAs as array signal processors will be described in comparison with general-purpose DSP processors. This system was designed taking into consideration of the reconfigurable IF processing in accordance with the concept of SDR as well as adaptive control of antenna beampattern [78]. In order to achieve this, a low-IF super heterodyne architecture with undersampling at the receivers [94] and harmonic band generation scheme at the transmitters [95] is a promising scheme that is currently available. In this chapter, the sampling jitter effect in receivers, which usually determines the maximum IF input frequency of ADCs, is discussed and quantified by measurement. This chapter also introduces the testbed systems involved in RF components and baseband signal processors.

2.2 Hardware Requirements

2.2.1 Digital Transceiver Architecture

Traditional radio transceiver systems perform frequency up/downconversion, mixing, bandpass or lowpass filtering, gain control, and quadrature modulation/demodulation after DACs/before ADCs by analog devices. This section describes smart antenna architecture focusing on the receiver; however, the transmitter is usually completely reciprocal to the receiver. Figure 2.1 shows the traditional super-heterodyne receiver architecture. And, there can be several other receiver architectures as well. On the basis of the number of downconversion stages it has, these architectures are generally classified into direct conversion with only single downconversion stage at RF into baseband and super-heterodyne with more than one downconversion stage at RF into baseband via IF.

However, in a digital radio system, the placement of the ADCs is an extremely critical factor in the entire system architecture. In order to make the systems commercially available and
practically realizable in newer technologies, low IF oversampling architecture is an extremely attractive option. An example of low IF oversampling architecture is illustrated in Fig. 2.2. Such a system directly digitizes the downconverted IF signals and subsequently, generates complex baseband signals by the digital downconversion (DDC) function. The DDC consists of a Numerical Controlled Oscillator (NCO), a pair of multipliers (mixers in analog sense), lowpass filters, and decimators for the I and Q component. In this architecture, the analog mixer and lowpass filters are replaced by digital signal processing. The linearity of digital signal processing of mixers and filters ensures the perfect orthogonality of the I/Q signals. In general, once a signal is sampled by an ADC at a high frequency band, fewer analog components are used for downconversion and filtering and the system flexibility will be increased [73].

However, recent technology has provided no easy solution for oversampling the signals directly at the RF or IF band, ranging from tens of MHz to several GHz. For example, the typical sample-hold performance of ADCs are degraded with an increase in the input frequency, thus limiting the input frequency bandwidth of ADCs [89]. In addition, more stringent requirements have to be met for extremely high Q analog bandpass filters in order to prevent distortion of the desired signal by strong adjacent channel signals. Currently, there are very few devices that are capable of directly sampling the RF signal; therefore, from the practical and commercial points of view, undersampling is an alternative technique [85].
2.2.2 Data Conversion Devices

The performance of an ADC can be characterized by several parameters. These are resolution, conversion speed, linearity, dynamic range, and power consumption. Usually, the requirement of an ADC depends on its specific application. In SDR technologies including smart antennas, most analog components will be replaced by digital signal processing in order to maximize system flexibility [78, 80]. The digital signal processor can perform various operations including downconversion, demodulation, and filtering. The analog-to-digital converters (ADCs) in such systems will be located as close to the antennas as possible in order to achieve almost complete digital processing. In order to realize this, ADCs capable of digitizing a high-frequency wideband signal at very high sampling rates will be required along with wideband or multi-band antennas and RF analog devices. However, direct analog-to-digital conversion at oversampling rates of very high RF or IF signals, typically ranging between hundreds of MHz to several GHz, may not yet be practical because the reasonably priced ADCs and sufficiently high-speed digital devices, such as current signal processors and buffer memories, cannot be used. The undersampling technique is always useful by performing frequency downconversions and quantization at the same time.

2.2.3 Signal Processing Devices

Software radios with smart antenna system implementation lead to a considerable increase in processing requirements. Previously, a single stream of data emanated from a single antenna; now, multiple data streams have to be processed. The data flow in smart antenna processing is not a single-input data stream. We have $K$ data streams that must be processed from the $K$ antenna elements. The basic fundamental operation carried out in a smart antenna system is passing the data stream from each antenna through an adaptive finite impulse response (FIR) filter. It must be noted that in narrowband applications, the adaptive FIR filters simplify to a single weight vector.

If we consider a simple example where we have $K$ ($= 8$ herein) antenna elements and a narrowband system, the computational requirements are

- Weight computation: In case of Minimum Mean Square Error (MMSE) smart antenna processing by recursive least square (RLS) method, $4K^2 + 4N + 2$ complex multipliers, or $16K^2 + 16K + 8$ real multipliers per update are required [9, 50]. If 16 iterations are required for the weight convergence per update and the processor serves 8 users every 0.5 ms (2,000 KHz), the total number of real multiply operations required in processing RLS can be obtained by

$$N_{RLS} = r_u (16K^2 + 16K + 8) \quad [\text{ops/sec}]$$
$$= (8 \times 2000 \times 16) \times (16 \times 8^2 + 16 \times 8 + 8)$$
$$= 297 \text{ MOPS}$$

where $r_u$ denotes the update rate (per second) and MOPS stands for mega operation per second. If a real multiplication is approximately equivalent to a floating point operation
in general-purpose processors, 297 MFLOPS (Mega FLoating point OPerationS) should be required.

- Beamforming: Beamforming is performed by the multiplication of the data vectors (multiple data symbols) and the weight vector. For the above condition, the required number of real multiplications is given by

\[ N_{\text{BF}} = 4 \times r_b \times L_{\text{frame}} \times K \quad \text{[ops/sec]} \]
\[ = 4 \times (8 \times 2000) \times 136 \times 8 \]
\[ = 70 \text{ MOPS} \]

where \( r_b \) denotes the beamforming rate that was assumed to be the update rate per 1 ms for 8 users.

- FIR filtering: If the receiver employs IF processing such as digital downconversion, which mainly involves FIR filtering, the number of real multiplications required for I and Q components is given by

\[ N_{\text{FIR}} = r_s \times n_{\text{tap}} \times (2 \times K) \quad \text{[ops/sec]} \]
\[ = 40 \times 10^6 \times 8 \times (2 \times 8) \]
\[ = 5,120 \text{ MOPS} \]

where \( r_s \) denotes the sampling rate, which was assumed as 40 MHz and 8-tap FIR filter was assumed.

The resulting amount of processing given above appears to be reasonable for performance by a general-purpose DSP processor that, today, offers approximately 1 GFLOPS. However, if the system needs to support more users and to employ more antennas within a shorter computation time, the processing capability of a general-purpose processor will attain its performance limit. Powerful DSP devices can be developed for handling high-performance requirements by using an application-specific processors, such as Application Specific Integrated Circuits (ASICs) and FPGAs. Furthermore, we can take advantage of the FPGA flexibility for directly handling data acquisition control and other DSP functions such as digital downconversion, demodulation, and smart antenna processing.

**FPGAs (Field Programmable Gate Arrays)**

Since the VLSI technologies have made significant progress in recent times, the processing speed is increasing and the scale of integration is becoming larger. SRAM-based FPGA technology has led to another alternative solution for digital signal processing. As mentioned above, we can easily have an extremely high-performance DSP processor with multiple MACs using FPGAs. For example, the performance of 1 GOPS (Giga Operation Per Second) can be achieved when 10 MACs operating at 100 MHz are integrated\(^1\). The solution for power consumption is a

---

\(^1\)In a general-purpose DSP, 1 GOPS is equivalent to 1 GFLOPS
significant factor when considering applications for a mobile terminal. FPGA can also provide optimization of power consumption.

A programmable logic device (PLD) is loosely defined as a device with configurable logic and flip-flops (FFs) linked together with a programmable interconnect. An FPGA is a type of programmable logic device that is combined with an array of gates having programmable interconnections with SRAM, logic blocks, and memory blocks, as shown in Fig. 2.3. In addition, recent FPGAs often provide dedicated MAC blocks operating at frequencies greater than 200 MHz to meet increasing DSP demands [103]. FPGAs can be reconfigured infinite times after manufacture; however, they are generally distinct from PLDs because they have a higher order of logic capacity. FPGAs consist of logic blocks and an interconnection to connect them. The logic block usually contains lookup tables (LUTs) and FFs store data, as shown in Fig. 2.4. Input ports are connected to LUT input ports or FF input ports and outputs from LUTs are either connected to output ports of the logic block or connected to FF input ports. Various combinations of inputs can be chosen by multiplexing, and sequential logic with memory element consisting of FFs as well as combinational logic with LUTs are also available [81].

**Digital Signal Processing on FPGAs**

There are several methods of implementing digital signal processing. One is to use a general-purpose processor (DSP or MPU), and the other is to use an application-specific device, such as an ASIC. While general-purpose processor solutions are very flexible because their architectures are optimized to process a fixed set of instructions, they may not be ideally suitable for
the specific application; ASIC solutions enable the designing of a custom architecture that is optimized for a particular application. For example, a general-purpose processor has single or multiple MAC stages so that the computations can be sequentially executed, namely serially. However, an ASIC implementation can have multiple parallel MAC stages. Regarding a specific application, when comparing the performance of the ASIC vis-a-vis the general-purpose processor, it becomes apparent that the latter offers relatively low speed but maximum flexibility (programmability) while the former provides a high speed with minimal flexibility.

On the other hand, an FPGA combines the versatility of a programmable solution with the performance of a dedicated processor, as shown in Fig. 2.5. An FPGA can achieve the true aim of parallel processing executing algorithms with the inherent parallelism due to distributed arithmetic structure while avoiding the instruction fetch and data load/store bottlenecks of traditional Von Neumann architectures [62, 63, 81]. Figure 2.6 shows the comparison of MAC operations for an FIR filter in a general-purpose processor with that of a single MAC stage and an FPGA with multiple MAC stages. For example, it can be seen that a 256-tap FIR filter requires 256 iterations of MAC operations in a general-purpose processor while in an FPGA, 256 MACs can operate simultaneously at only one clock cycle. Implementing signal processing function in FPGA devices has the following advantages, as listed in Tab. 2.1 [81]. FPGAs are expected to be a key device in the implementation of software defined radios including smart antennas due to their high performance, flexibility, reconfigurability, etc.

FPGAs have evolved from being flexible logic design platforms to signal processing engines. Due to their flexibility and real-time processing capabilities, they are now an essential component of software defined radios, including smart antenna technology. System designers are increasingly porting more and more signal processing functionalities in FPGAs. The flexibility of having the ability to integrate logic design with signal processing is pushing designers to replace traditional DSPs with FPGAs. FPGAs are inherently suited for high-speed parallel multiply and accumulate functions. For example, current generation FPGAs can perform $36 \times 36$ MAC operations at speeds greater than 200 MHz. This makes FPGAs an ideal platform for operations such as FFT, FIR filters, digital downconverters (DDC), digital upconverters.

Fig. 2.4: Simple diagram of FPGA logic element (LE) (by Altera Corp.).
(DUC), correlators, and pulse compression (for radar processing). However, this does not imply that all digital signal processing functionalities may be implemented in FPGAs. It is difficult to implement floating point operations in FPGAs due to the large amount of area needed in the device. Further, processing involving matrix inversion (or division) is also more suited to a DSP/MPU platform. It can be expected that FPGAs and DSP have thus coexisted for a long time, and a flexible platform will include their combination [82].
256 iterations are required for 256-tap FIR filter  

(a) Single MAC in general DSP processor  

(b) Multiple MACs in FPGA 

Fig. 2.6: Comparison of MAC operations for FIR filter.

<table>
<thead>
<tr>
<th></th>
<th>FPGA</th>
<th>DSP</th>
</tr>
</thead>
<tbody>
<tr>
<td>Programmable Languages</td>
<td>VHDL and Verilog</td>
<td>C language and Assembly</td>
</tr>
<tr>
<td>Ease of S/W programming</td>
<td>Fairly easy but needs understanding the hardware architecture</td>
<td>Easy</td>
</tr>
<tr>
<td>Performance</td>
<td>Very fast if optimized</td>
<td>Speed depends on operating clock speed</td>
</tr>
<tr>
<td>Reconfigurability/Programmability</td>
<td>SRAM-type FPGAs can be reconfigured infinite times</td>
<td>Re-programmable by changing program</td>
</tr>
<tr>
<td>Outperforming Area</td>
<td>Digital Filters, FFT, etc</td>
<td>Sequential processing</td>
</tr>
<tr>
<td>Power Consumption</td>
<td>Can be minimized if circuit is optimized</td>
<td>Cannot optimize</td>
</tr>
<tr>
<td>Implementation</td>
<td>Parallel and distributive arithmetic</td>
<td>Repeat operation of one or a few MACs</td>
</tr>
<tr>
<td>Speed of MAC</td>
<td>Can be fast if a parallel algorithm</td>
<td>Depends on operation clock speed</td>
</tr>
<tr>
<td>Parallelism</td>
<td>Can be parallelized for high performance</td>
<td>Usually sequential and cannot be parallelized</td>
</tr>
</tbody>
</table>

Table 2.1: Comparison of FPGA and DSP processor (simple version of [81]).
Table 2.2: Specifications of RF parts.

<table>
<thead>
<tr>
<th>Antennas</th>
<th>8-element sleeve antennas</th>
</tr>
</thead>
<tbody>
<tr>
<td>Array geometry</td>
<td>Linear array equi-spaced by λ/2</td>
</tr>
<tr>
<td>RF frequency</td>
<td>2 ~ 5 GHz tunable</td>
</tr>
<tr>
<td>Gain</td>
<td>30 ~ 35 dB</td>
</tr>
<tr>
<td>Bandwidth</td>
<td>20 MHz</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Transmitter (Tx)</th>
<th>Channels</th>
<th>4</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>IF freq.</td>
<td>1st : 10.7 MHz, 2nd : 160 MHz</td>
</tr>
<tr>
<td>Receiver (Rx)</td>
<td>Channels</td>
<td>up to 16</td>
</tr>
<tr>
<td></td>
<td>IF freq.</td>
<td>1st : 160 MHz, 2nd : 40 MHz</td>
</tr>
<tr>
<td></td>
<td>Output volt.</td>
<td>4 Vpp</td>
</tr>
<tr>
<td>Control</td>
<td>Gain</td>
<td>0 ~ −20 dB</td>
</tr>
<tr>
<td></td>
<td>Phase</td>
<td>±90°</td>
</tr>
</tbody>
</table>

2.3 Developed Smart Antenna Testbed System Overview

While theoretical studies are useful for analysis and simulation purposes, their implementing on real hardware and validation with actual measurement data will be the meaningful challenge. Certain architectures available for smart antennas were discussed in the previous section from the view point of SDR. We developed a smart antenna testbed system selecting the IF sampling super heterodyne scheme. In this section, the details of the smart antenna testbed systems are described.

2.3.1 Radio Frequency (RF) Components

The eight sleeve antennas, linearly spaced by a half wavelength, were used. The RF transceiver configuration is shown in Fig.2.7. It employs Yttrium-Iron-Garnet (YIG) bandpass filters; hence, the RF frequency is tunable from 2 to 5 GHz. In order to achieve an easy phase adjustment at the IF stage, bandpass filters are replaced with a combination of lowpass and highpass filters. The phase and amplitude of the IF output signal can be adjusted by ±90° and in the range 0 ~ −20 dB. The IF output frequency at the receiver is 40 MHz and the IF input frequency is 10.7 MHz. The detailed specifications are listed in Tab.2.2.
CHAPTER 2. SMART ANTENNA TESTBED SYSTEM

(a) Receiver Diagram (only 4 CHs depicted.)

(b) Transmitter Diagram (only 4 CHs depicted.)

(c) Photograph

Fig. 2.7: Radio Frequency (RF) transceiver.
2.3.2 Digital Signal Processing Unit

The digital signal processing unit was developed to integrate and evaluate smart antenna signal processors. It consists of the signal processing board and controller board. The signal processing board incorporates 16 channels of ADCs and DACs, and high density FPGAs (approximately 1 mega equiv. gates) for signal processing\(^2\). A general-purpose CPU (SH4, 200 MHz, HITACH) is used as a system control unit and numerical computation coprocessor with floating-point arithmetic (32 bits). The length of the data bus between the CPU and FPGA is 32 bits. The operating system (OS), NetBSD, is embedded in this system [67]. It also offers a data communication interface via Ethernet. The block diagram of the digital signal processing unit is shown in Fig. 2.8 and the photographs are illustrated in Fig. 2.9. The data are captured and stored in embedded memory blocks in FPGA where approximately 4 kilowords can be collected for every complex baseband channel with I and Q. The external memory subboard with a SDRAM DIMM module is also available for a large number of continuous samples up to approximately 16 megawords per complex baseband channel of I and Q when using 512 MB SDRAM module. ADCs and DACs have 14-bit resolution and up to 80 MHz and 165 MHz conversion rates, respectively.

Our research group has developed several prototype processing units including the above-mentioned one for a smart antenna testbed system (also see Appendix C) as

- 2-channel receiver processor [68].
- 16-channel receiver and transmitter processors with distributed processing architecture of 4 (data acquisition and IF processing) + 1 (adaptive array processing) FPGAs [69-72].
- 4-channel receiver processor with high performance FPGA (Altera STRATIX) featured by embedded DSP blocks [74].
- 16-channel transceiver processor on single board (current system) [75].

The received RF signals at each antenna element are downconverted into IF. This system performs frequency downconversion again with quasi-coherent detection by digital signal processing. Digital downconversion is a distinguishing feature of the SDR receiver that places ADCs as close to the antenna as possible, digitizes broad band signals and subsequently, performs all functions completely with the software. By replacing the analog IF downconversion stages with digital signal processing, downsizing of system scale, reduction of power consumption, etc. can be achieved. FPGAs are able to perform smart antenna processing with downconverted complex baseband I/Q signals after digital downconversion with quasi-coherent detection.

\(^2\)In this work, only eight ADC channels were used by the IF sampling scheme. Alternatively, 16 ADC channels can cover an 8-element array with baseband sampling scheme at each channel of I and Q
Fig. 2.8: Block diagram of digital processing unit.
(a) Signal processing board including ADCs, DACs and FPGAs.

(b) Controller board (CPU, LAN).

Fig. 2.9: Photographs of digital signal processing parts.
### Table 2.3: Specifications of signal processing board.

<table>
<thead>
<tr>
<th>ADC (Rx)</th>
<th>Part</th>
<th>AD9245 (Analog Device)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Resolution</td>
<td>14 bits</td>
</tr>
<tr>
<td></td>
<td>Max. sampling rates</td>
<td>80 MHz</td>
</tr>
<tr>
<td></td>
<td>Analog Input BW</td>
<td>500 MHz</td>
</tr>
<tr>
<td></td>
<td>Aperture jitter</td>
<td>0.3 ps (RMS)</td>
</tr>
<tr>
<td></td>
<td>Channels</td>
<td>up to 16</td>
</tr>
<tr>
<td>DAC (Tx)</td>
<td>Part</td>
<td>DAC904 (Burrbrown)</td>
</tr>
<tr>
<td></td>
<td>Resolution</td>
<td>14 bits</td>
</tr>
<tr>
<td></td>
<td>Max. data rates</td>
<td>up to 165 MHz</td>
</tr>
<tr>
<td></td>
<td>Channels</td>
<td>up to 16</td>
</tr>
<tr>
<td>FPGA</td>
<td>Part</td>
<td>Altera STRATIX EP1S40 × 2</td>
</tr>
<tr>
<td></td>
<td>LEs</td>
<td>41,250 (1 Mega Gates)</td>
</tr>
<tr>
<td></td>
<td>Embedded Memory</td>
<td>3,423,744 bits</td>
</tr>
<tr>
<td></td>
<td>Embedded DSP multipliers</td>
<td>112 (9 × 9)</td>
</tr>
<tr>
<td></td>
<td>PLLs</td>
<td>12</td>
</tr>
</tbody>
</table>

### Table 2.4: Specifications of CPU board.

<table>
<thead>
<tr>
<th>CPU</th>
<th>Part</th>
<th>SH4 (HITACHI)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Operation freq</td>
<td>200 MHz</td>
</tr>
<tr>
<td></td>
<td>Performance</td>
<td>360 MIPS</td>
</tr>
<tr>
<td>OS</td>
<td>NetBSD 1.5</td>
<td></td>
</tr>
<tr>
<td>User I/F</td>
<td>Ethernet 100base-T</td>
<td></td>
</tr>
</tbody>
</table>
2.3.3 Digital Receiver Implementation

Digital downconversion with quasi-coherent detection can be easily implemented by using NCO, mixer, and lowpass filters, as shown in Fig. 2.10. It consists of NCO digital local oscillator and a couple of Finite Impulse Response (FIR) lowpass filters. When digitizing the received analog signals at the IF band, only one ADC is required for each antenna element so that it can reduce the system scale by half when compared with the baseband sampling systems. Bandpass signals can be expressed as a sum of two quadrature components that are $\pi/2$ out of phase. In general, a bandpass signal is represented by

$$x(n) = x_I(n)\cos\omega_c n + x_Q(n)\sin\omega_c n,$$

(2.4)

where $x_I(n)$ is the in phase component, $x_Q(n)$ is the quadrature component of the signal $x(n)$, and $\omega_c$ is the center frequency of the bandpass signal (carrier frequency). The downconversion process shifts the carrier frequency $\omega_c$ to the baseband. It performs a multiplication of the incoming bandpass signal $x(n)$ with the complex phasor $e^{-j\omega_c n} = [\cos\omega_c n - j\sin\omega_c n]$. Subsequently, it lowpass filters the signal of

$$\hat{x}(n) = x(n)\cos\omega_c n - jx(n)\sin\omega_c n$$

(2.5)

The desired frequency shift is accomplished with this operation. After lowpass filtering, the second harmonic components are filtered out and the result remains as the desired complex baseband signal representation of $x(n)$ as

$$\text{LPF} (\hat{x}(n)) = \frac{1}{2} [x_I(n) - jx_Q(n)]$$

(2.6)

[1]. If the sampling frequency is four times the IF center frequency, a combination of the NCO and mixer can be easily implemented. In this system, the combination of NCO and
mixer was implemented with a switching circuit in sequential manner of \( \{0, x(t), 0, -x(t)\} \). The FIR lowpass filters of 8 taps were used for simplicity.

**NCO and Mixer**

The downconversion process requires an numerical controlled oscillator (NCO), a mixer multiplying bandpass signal, and a digital local sine/cosine signal generated by NCO as shown in Fig. 2.10. There exist various methods by which these can be implemented. Generally, the method of generation of sinusoidal signals by LUT is generally used; however, this system performs downconversion function at a rate of 4 times that of the carrier frequency of bandpass signal. Thus, multiplication-free simple implementation is available.

**Lowpass Filter**

Digital filter implementation on FPGAs has many advantages over general-purpose DSPs. It is said that a DSP and microprocessor can implement an 8-tap FIR filter at 5 Msps and an off-the-shelf FIR filter at 30 Msps; however, an FPGA can implement an identical filter at over 100 Msps because, as described in the preceding sections, an FPGA is suitable for parallel processing and distributed arithmetic [64]. Digital downconversion requires two of lowpass filters per channel. The inputs are mixed signals of NCO and bandpass signal and the outputs are complex baseband signals of in phase and quadrature. Lowpass filters suppress a second harmonic component and allow only frequency-shifted signals into baseband. In this system, the FIR filter has 8-tap 12-bit registers that are arranged in a shift register configuration. The output of each register, also called a tap, is represented by \( x(n) \), where \( n \) is the tap number. Each tap is multiplied by filter coefficients \( h(n) \), and subsequently, all the products are summed. The equation for this filter is

\[
y(n) = \sum_{k=1}^{L} h(n - k) \cdot x(k),
\]  

\( (2.7) \)
where $L$ is the filter length. In the case of a linear phase response FIR filter, the coefficients are symmetric around the center values. The FIR filter can be reconstructed, as shown in Fig.2.11(a), by taking advantage of the symmetry; this reduces the circuit resource. In addition, Fig.2.11(b) can be optimized by an FPGA using LUTs in a memory block. The multiplication and addition can be performed in parallel using LUTs [64]. As mentioned in the previous section, the recent FPGA technologies can provide high-performance FIR filters using dedicated MAC blocks operating at greater than 200 MHz [103].

Relation of Undersampling and Digital Downconversion

This system was designed by taking into consideration both the sampling schemes of oversampling relatively low frequency signal and the undersampling high frequency signal. Figure 2.12 displays these examples. An IF bandpass signal centered at 10 MHz can be oversampled by an ADC at 40 MspS maintaining the 4 times oversampling rule. As shown in the case of undersampling, the sampling rate of an ADC should be chosen appropriately. In Fig.2.12, 32 MspS is suitable for the IF frequency of 40 MHz. In the case of undersampling, the sampling rate $f_s$ must satisfy the condition as

$$\frac{2f_u}{k} \leq f_s \leq \frac{2f_l}{(k-1)} \quad (2.8)$$

where $f_u$ and $f_l$ is the upper and lower frequency bound, respectively, of an IF signal, and $k(\geq 2)$ is an integer number [83]. Figure 2.13(a) presents the relationships between undersampling range versus $k$ at the bandwidth of 2, 3, and 4 MHz. Further, the 4 times oversampling condition is given by

$$f_s = 4f_b \quad (2.9)$$

where $f_b$ is the in-band signal frequency (downconverted version of input IF signal frequency by undersampling). The above two conditions are plotted in Fig.2.13(b), and the intersections imply that both conditions are satisfied. Tab.2.5 shows the values of the intersections that are shown in Fig.2.13(b). Hence, in this system 32 MspS is typically chosen for undersampling the IF signal at the center frequency of 40 MHz.
Fig. 2.13: Determination of sampling frequency (IF center freq. = 40 MHz).

Table 2.5: Candidate sampling frequencies for undersampling of IF signal at 40 MHz.

<table>
<thead>
<tr>
<th>k</th>
<th>$f_s$ (MHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>2</td>
<td>53.333</td>
</tr>
<tr>
<td>3</td>
<td>32.000</td>
</tr>
<tr>
<td>4</td>
<td>22.857</td>
</tr>
<tr>
<td>5</td>
<td>17.778</td>
</tr>
<tr>
<td>6</td>
<td>14.546</td>
</tr>
</tbody>
</table>
2.4 Undersampling and ADC’s Input Frequency Limitation at IF Band

The undersampling (bandpass sampling or subsampling) technique is a useful alternative solution for converting high frequency bandpass signals to lower frequency digital signals exploiting intentional aliasing because of hardware limitations such as data conversion devices and RF analog devices. Traditionally, the sampling rates in an analog-to-digital conversion are determined by Nyquist sampling criterion that specifies the minimum sampling rate required for signal reconstruction. However, in fact, for bandpass signals, the undersampling technique at rates that are more than twice the signal bandwidth can also provide a perfect reconstruction of the original information using in-band aliasing [83,84].

However, the problem is that the dynamic range of ADCs is limited by Signal to Noise Ratio (SNR) degradation that is caused by the sampling jitter effect, which results in amplitude errors. This, eventually, leads to phase errors in the constellation pattern of the phase modulated signals such as Phase Shift Keying (PSK).

We have some research papers including theoretical reviews and mathematical analyses of the jitter effect in analog-to-digital conversion [83–85], as well as certain reports on measurement [86–88]. However, in the communication systems incorporating practical digital downconversion receivers with the undersampling scheme, jitter effects such as Bit Error Rate (BER) and Error Vector Magnitude (EVM) degradation has rarely been reported. This section examines the relationship between the input signal frequency of the ADC and the sampling jitter effect on phase-modulated signals in a digital IF downconversion receiver with undersampling scheme. In order to quantitatively evaluate the jitter effect, we introduced EVM in various clock jitter configurations in the custom testbed system with commonly used and reasonably priced ADCs, the Root Mean Square (RMS) aperture jitter value and sampling rate of which were 0.25 ps and 40 MHz, respectively. The ADC’s aperture jitter value is sufficiently small and can be ignored; however, the sampling clock instability is the critical factor in dominating the total jitter condition of the system.

2.4.1 Sampling Jitter Effect in Undersampling Scheme

As mentioned above, the undersampling technique is an extremely useful technique but careful consideration should be taken of the sampling jitter effect caused by the sampling clock instability. This subsection presents the preliminaries of the sampling jitter effect. Further, the problems in a digital downconversion receiver with undersampling scheme are theoretically formulated and some computer simulation results with PSK and Quadrature Amplitude Modulation (QAM) signals are presented.

Sampling jitter is associated with the ADC’s aperture jitter due to the uncertainty of the aperture time in the sample-hold amplifiers and the sampling clock instability of the system. It results in additional noise power. In reality, it is the sampling clock instability that dominates the sampling jitter condition and not the aperture jitter of ADC itself. As the frequency of the input signal increases, the jitter effect becomes more prominent. Theoretically, the input
frequency of an ADC is limited by the following relation \[90\] as

\[
\frac{dx(t)}{dt} = 2\pi f_{in} A \cos(2\pi f_{in} t) < \frac{\text{LSB}/2}{\sigma_j},
\]

(2.10)

where the input signal \(x(t)\) is assumed to be full-scale sinusoidal, given by \(A \sin(2\pi f_{in} t)\). \(f_{in}\) and \(\sigma_j\) are the input IF frequency and RMS aperture jitter, respectively. Since the Least Significant Bit (LSB) \(\triangleq 2A/2^M\), the input frequency is limited as

\[
f_{in} < \frac{1}{2\pi \cdot 2^M \cdot \sigma_j},
\]

(2.11)

where \(A\) and \(M\) are half of the input range and the resolution of the ADC, respectively. For example, the jitter effect can be completely neglected below approximately 155 MHz, only if the ADC’s aperture jitter is taken into consideration, where the aperture jitter and resolution of the ADC are 0.25 ps and 12 bits, respectively. However, the frequency becomes more and more limited due to the sampling clock instability. Assuming that the sampling clock jitter is 100 ps, the input frequency is dramatically limited to below approximately 388 kHz. In other words, the input signals that are higher than the frequency limitation are usually associated with additional jitter noise. The degradation in SNR due to the sampling jitter is typically given by

\[
SN_jR \ [\text{dB}] = 10 \log \left( \frac{1}{(2\pi f_{in} \sigma_j)^2} \right)
\]

(2.12)

[87]. Eq (2.12) also implies that the maximum input frequency is limited by the SNR requirement. In other words, the maximum possible SNR is degraded as the input frequency increases. M. Nagahara et al. have examined the degradation of (Effective Number Of Bits (ENOB), which is proportional to SNR) caused by the sampling clock jitter [88]. They have reported that in order to guarantee an ENOB over 8 bits, the input frequency of the given ADC (AD6640, Analog Devices) was limited to below 50 MHz with a sampling clock jitter of 10 ps.

In the digital receiver system, as shown in Fig. 2.2, the downconverted bandpass signal at the IF band is given by

\[
r(t) = \text{Re} \left( \hat{s}_{bb}(t)e^{j\psi} + z_1(t) e^{j[2\pi f_{IF} t + \chi]} \right).
\]

(2.13)

In Eq. (2.13), \(\hat{s}_{bb}(t)\) is the transmitted signal, given by \(\sum_k s_{bb}(k) \cdot g(t - kT_0)\), where \(s_{bb}(k)\) is the sequence of the data symbols, \(g(t)\) is the modulation waveform shaped by filters, and \(T_0\) is the symbol duration [1]. Further, \(z_1(t)\) is the complex white Gaussian noise. \(f_{IF}\), \(\psi\) and \(\chi\) are the IF center frequency and the phase offsets due to the propagation delay and initial phase of local oscillators, respectively. After sampling by ADC, IF signals \(\hat{r}(n)\) can be represented by rewriting Eq. (2.13) as
\[
\tilde{r}(n) = r((nT_s + \epsilon(n))
\]
\[
= \text{Re}\left\{ \hat{s}_{\text{bb}}(nT_s + \epsilon(n)) e^{j\varphi} + z_1(nT_s + \epsilon(n)) \right\} e^{j(2\pi f_{\text{IF}}(nT_s + \epsilon(n)) + \chi)}
\]
\[
= \text{Re}\left\{ \hat{s}_{\text{bb}}(nT_s + \epsilon(n)) e^{j\varphi} + z_1(nT_s + \epsilon(n)) \right\} e^{j\left( \frac{2\pi n}{N_s} + J(n) + \chi \right)}.
\]

(2.14)

where \(T_s\) is the sample period and \(\epsilon(n)\) is the sampling jitter and the sampling rate
\[
f_s \triangleq N_s \cdot f_{\text{IF}},
\]

(2.15)

where \(N_s\), an integer, is the oversampling rate of the IF center frequency, and \(J(n)\) is the phase variation caused by sampling jitter, given by \(2\pi f_{\text{IF}}\epsilon(n)\). The sampled bandpass signal is then multiplied by \(N_c\) as
\[
\tilde{r}_{\text{bb}}(n) = \tilde{r}(n) \cdot e^{j\left( -\frac{2\pi n}{N_c} \right)}.
\]

(2.16)

After the lowpass filtering of Eq.(2.16), the complex baseband signals are obtained as
\[
r_{\text{bb}}(n) = \hat{s}_{\text{bb}}(nT_s + \epsilon(n)) e^{j(J(n) + \Phi)} + z_2(n),
\]

(2.17)

where \(\Phi\) and \(z_2(n)\) are the total phase offset and resulting noise term, respectively. If the symbol period \(T_0\) is considerably larger than the sampling jitter \(\epsilon(n)\), the sampling jitter effect on the signal waveform is negligible; thus, Eq.(2.17) is approximated by
\[
r_{\text{bb}}(n) \approx \hat{s}_{\text{bb}}(n) e^{j(J(n) + \Phi)} + z_2(n).
\]

(2.18)

From Eq.(2.18), it can be observed that the sampling jitter and the phase offset of local oscillators have effects on the phase error in the constellation pattern of digitally downconverted signals. The phase error \(J(n)\) by the sampling jitter becomes larger as the frequency of the input signal increases. Hereafter, this paper assumes that the sampling jitter \(\epsilon(n)\) follows the Gaussian distribution of \(N(0, \sigma^2)\) and that the variance \(\sigma^2\) has the same meaning as the RMS value of the sampling jitter. Computer simulations confirm the sampling jitter effect on the constellation pattern of Quaternary Phase Shift Keying (QPSK) and 16QAM modulated signals. The oversampling rate \(N_s\) of Eq.(2.15) was typically assumed to be four; this makes the multiplication of Numerical Controlled Oscillator (NCO) and the input signal easy to implement by sequentially multiplying \(\{1, j, -1, -j\}\) to the input signal. The sampling rate and resolution of the ADC were assumed to be 40 MHz and 12 bits, respectively. The ADC can be obtained at a reasonable price in the today’s market. The typical IF center frequency becomes 10 MHz for 4 times oversampling scheme. In this undersampling scheme, the digital downconversion can be applied in an identical manner using the in-band alias at 10 MHz. Herein, the input IF frequencies of ADC were selected as
\[
f_{\text{IF}} = 40n + 10
\]
\[
\begin{cases}
  n = 0 & \text{(Oversampling)} \\
  n = 1, \ldots, 8 & \text{(Undersampling)}
\end{cases}
\]

(2.19)
(a) Typical constellation pattern
$(\sigma_j = 25, f_c = 170 \text{ MHz}, \text{SNR} = 7 \text{ dB})$.

(b) QPSK signal
$(\sigma_j = 150, f_c = 170 \text{ MHz}, \text{SNR} = 30 \text{ dB})$.

(c) 16QAM signal
$(\sigma_j = 150, f_c = 170 \text{ MHz}, \text{SNR} = 20 \text{ dB})$.

Fig. 2.14: Constellation patterns spread by clock jitter.
Table 2.6: Specifications of jitter measurement system.

| IF input Frequency | 10 MHz : Nyquist sampling  
|--------------------|--------------------------|
| ADC                | AD9430 (Analog Device)  
| Resolution         | 12 bits  
| Sampling rates     | 40 MHz  
| Analog Input BW    | 700 MHz  
| Aperture jitter    | 0.25 ps (RMS)  
| FPGA               | Altera STRATIX EP1S25  
| CPU                | HITACHI SH4  
| OS                 | NetBSD  

where $f_{c,IF}$ ranged from 40 to 330 MHz.

Figure 2.14(a) shows the constellation pattern of a QPSK modulated signal in practical SNR environments (SNR = 7 dB) in the AWGN (Additive White Gaussian Noise) channel, where the input signal frequency and the sampling jitter were 170 MHz and 150 ps, respectively. The sampling jitter is so small ($\sigma_j = 25$ ps) that the jitter effect is insignificant because the noise contribution by AWGN is quite dominant. Figure 2.14(b) and (c), however, clearly shows the typical constellation pattern of the QPSK and 16QAM modulated signals spread by the sampling jitter, where the input signal frequency and the sampling jitter were identical to those in the above example. An SNR of 30 dB is sufficiently high to ignore the noise contribution and we observe that the jitter effect appears to be clearer. In fact, the phase errors in the constellation patterns will degrade the BER performance; hence, the sampling jitter effect should be taken into careful consideration in designing the digital IF downconversion receiver with an undersampling scheme.

2.4.2 Evaluation of Jitter Effect

In the case of varying jitter conditions, the jitter effects that were configured in our testbed system are presented. Table 2.6 illustrates the specifications of the testbed system. This system employs a wide-band ADC the analog input bandwidth for which is 700 MHz for undersampling applications [91]. The sampling clock management of ADCs is performed by the FPGA’s on-chip internal Phase Lock Loop (PLL). The FPGA’s internal PLL is indispensable in designing logic circuits, providing advanced functions of phase shift and frequency manipulation as well as clock lock and boost. However, it has been reported that the sampling clock jitter tends to be enhanced using the PLL [92, 93]. In analog PLLs of Altera FPGAs, nondeterministic noise causes the internal voltage controlled oscillator (VCO) to fluctuate in frequency. The internal control circuitry adjusts the VCO back to the specified frequency; however, this change appears as a clock jitter. Other frequency fluctuations are caused by variations in supply voltage,
Prototype System

![Prototype System Diagram]

**Table 2.7:** Three different jitter configurations.

<table>
<thead>
<tr>
<th>Configurations</th>
<th>Estimated RMS Jitters (ps)</th>
</tr>
</thead>
<tbody>
<tr>
<td>C1 Function Generator (FG) + FPGA Internal PLL</td>
<td>150</td>
</tr>
<tr>
<td>C2 Clock Synthesizer (CS) + FPGA internal PLL</td>
<td>70</td>
</tr>
<tr>
<td>C3 Clock Synthesizer (CS) (Direct clock input)</td>
<td>25</td>
</tr>
</tbody>
</table>

Fig. 2.15: Measurement configuration (clock synchronization network).

In this testbed system, three jitter configurations were available with different types of sampling clock management; these configurations are illustrated in Table 2.7. C1 and C2 used the function generator (FG) and clock synthesizer (CS) as the sampling clock source of 10 MHz, respectively, and the sampling clock was buffered and the frequency was boosted by 4 times in FPGA’s internal PLL. C3 also used the clock synthesizer; however, the clock signal of 40 MHz was directly connected without buffering in PLL. In Fig.2.15, the configurations of the measurement system, including the synchronization network between the equipment, are depicted in detail. Since the general function output from the function generator as a clock source was used, C1 was considered to be the worst jitter condition. However, the CS used in C2 and C3 generates a relatively pure clock signal; thus, the jitter condition was better than that in C1. Moreover, C3 has the best jitter condition.
because it is directly connected to ADC’s sampling clock pin and does not pass through the FPGA’s internal PLL. The jitter values in each configuration were estimated by merely matching the simulation and measurement results of the phase variation with a sinusoidal signal. The RMS jitters in each configuration were estimated as shown in Tab.2.7. The input IF signal was a CW (Continuous Wave) signal generated by the signal generator. The phase variation in the constellation pattern could be easily measured using the CW signal generated by the offline processing of digital downconversion\(^3\).

EVM specifications are usually required for many communication system standards. In order to examine the sampling jitter effect on the digital IF downconversion receiver with an undersampling scheme for practical communication systems, the EVM characteristics in each jitter configuration were quantified through the measurements and computer simulations. In addition, the maximum input frequency in such an IF undersampling receiver was discussed. The EVM is obtained by averaging the error vector magnitude as

$$EVM \ [\%] = \frac{1}{K} \sum_{k=1}^{K} \left( \frac{\sum_{l=1}^{L} |\hat{x}_{p,l}^{k} - \hat{x}_{o,l}^{k}|^2}{\sum_{l=1}^{L} |\hat{x}_{o,l}^{k}|^2} \right) \times 100$$

(2.20)

where \(K\) and \(L\) are the number of symbols for the applied modulation scheme and the number of measured points, respectively, \(\hat{x}_{p,l}^{k}\) and \(\hat{x}_{o,l}^{k}\) are the vectors for the ideal point and the practical points of the \(l\)-th symbol of \(k\)-th frame, respectively. The lowpass filters in the digital downconverter were root-Nyquist filters, and the signal roll-off ratio was 0.5. The coherent detection was taken by each clock synchronization network as shown in Fig.2.15.

In Fig.2.16(a) and (b), the simulation and measurement results are in relatively good agreement; however, the statistics having more measurements will approximate the simulation results better. The maximum input frequencies for the wireless LAN standard of the EVM below 5\% and W-CDMA standard of the EVM below 10\% are presented in Tab. 2.8. The results show that in order to meet the specification of the EVM less than 5\%, the IF input frequencies \(f_{IF}\) in each jitter condition of \(\sigma_j = 25, 70,\) and 150 ps were limited below 290 (210), 130 (50) and 50 (10) MHz, respectively for QPSK (16QAM) signals. Further, in order to meet the specification of the EVM less than 10\%, the IF input frequencies \(f_{IF}\) were limited below 730 (450), 250 (170) and 90 (90) MHz, for QPSK (16QAM) signals. It can be clearly observed that the EVM performance is dramatically degraded as the sampling jitter increases.

Additionally, Figs.2.16(c) and (d) show the statistical results of the BER performances of QPSK and 16QAM signals with identical sampling jitter conditions at a given SNR. We determine that the BER performances are further degraded by the additional phase noise caused by the sampling jitter effect in the AWGN channel. We can also quantitatively confirm that the BER performances degrade with an increase in the input frequency.

\(^3\)In fact, the clock jitter can be directly measured by oscilloscopes and estimated by analyzing the measured data.
Table 2.8: IF input frequency limitation.

<table>
<thead>
<tr>
<th>Unit (MHz)</th>
<th>Wireless LAN (EVM &lt; 5%)</th>
<th>W-CDMA (EVM &lt; 10%)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>QPSK</td>
<td>16QAM</td>
</tr>
<tr>
<td>C1 (σ_j = 25 ps)</td>
<td>290</td>
<td>210</td>
</tr>
<tr>
<td>C2 (σ_j = 70 ps)</td>
<td>130</td>
<td>50</td>
</tr>
<tr>
<td>C3 (σ_j = 150 ps)</td>
<td>50</td>
<td>10</td>
</tr>
</tbody>
</table>

(a) EVM of QPSK signal.  
(b) EVM of 16QAM signal.  
(c) BER Characteristics of QPSK signal.  
(d) BER Characteristics of 16QAM signal.  

Fig. 2.16: Jitter effect on EVM and BER.
2.5 Summary

In this chapter, the developed hardware platforms for the smart antenna transceiver system, including its implementation details were introduced. The hardware requirements to realize smart antenna technologies were discussed from the viewpoint of the SDR concept. The benefits of the FPGAs as an array signal processor was described by comparing them with general-purpose DSP processors. The testbed systems involved in RF components and baseband digital signal processors were presented, and the implementation of IF digital processing functions, such as NCO, mixer, and FIR filters, were briefly described. Further, the sampling jitter effect, which usually limits the maximum IF input frequency, was quantified by theoretical analysis and measurement.
Chapter 3

Fast DOA Estimation Processor

3.1 Introduction

DOA information of incoming signals is frequently used in various positioning applications such as emergency systems, radio surveillance system, radar system, military system, etc. In fact, well-known DOA estimation techniques, including MUSIC [31] and ESPRIT [32], basically assume that the incident waves are planar and incoherent; hence, they can be expressed by discrete wavefronts from point sources.

From a theoretical point of view, in smart antenna applications for cellular systems, many excellent smart antenna optimization techniques require the DOAs of the desired signal and interferers in advance [5, 7, 13]. As mentioned in Chapter 1, high-resolution estimation techniques for DOAs of incident signals are useful for efficient beamforming in near line-of-sight (LOS) environments. However, a majority of situations are generally NLOS in multipath propagation environments. In this case, beamforming in the angle domain with DOA information of the propagation channel is not valid because the DOA of the signal cannot be considered as the single wavefront of a plane wave. In general, there exist conventional signal domain approaches of the Minimizing Mean Square Error (MMSE) based combining techniques, such as Least Mean Square (LMS) and Recursive Least Square (RLS). These techniques employ a temporal reference signal instead of the explicit channel information of the DOAs. However, in urban cellular base station environment, the signals at the base station are received through the scattering process in the local scatterers around the mobile station because the antenna is usually deployed higher than the surrounding scatterers. In this case, although the fading across the antenna elements spaced by λ/2 is highly correlated, and the AS is typically only a few degrees. Thus, the angle domain approach exploiting the DOA information can be useful.

In the case of uplink, we can estimate the channel parameters with the received data that have passed through the channel; however, this is not valid in the case of downlink. In systems, such as Japanese PDC and European GSM, the frequency bands for uplink and downlink are not identical; hence, the small scale fading between uplink and downlink is uncorrelated. However, the DOAs and AS are statistically reciprocal in uplink and downlink channels in an FDD system, which use different frequencies at each link. This is the most significant advantage of DOA estimation. Using the knowledge of explicit directional information, the basestations
can steer maximum transmitting power toward the desired user direction. However, DOA estimation is usually a time-consuming task. With the general Von Neumann architected processors, it is be difficult to simultaneously meet the requirements of real-time computation and compact architecture for low power consumption in future communication systems [81].

Although the fast DOA estimator is indispensable for an ideal beamforming with explicit directional information, we have faced several difficulties with regard to its practical high-speed digital signal processor implementation. Some practical studies have been carried out with a dedicated circuit of FPGAs [55]. However, the quantitative evaluation of the fixed-point operation of FPGAs, including the hardware implementation details, has rarely been reported. A recent study implemented a DOA-based smart antenna for European GSM systems using the unitary ESPRIT and Minimum Variance Method (MVM) as DOA estimators [66]. However, it was implemented with a general-purpose processor (DEC Alpha 500 MHz) that may not be optimized for the dedicated tasks and would consume a considerable amount of electrical power.

In this chapter, the digital signal processor of a fast DOA estimator is designed with FPGAs and its hardware implementation is presented. The inherent parallelism, reconfigurability, and optimisability of FPGAs provide more benefits than general-purpose processors. This system will be useful for high speed DOA-based beamforming smart antennas in wireless cellular systems. It incorporates the unitary MUSIC algorithm that is a high resolution DOA estimation technique. When compared with other well-known subspace-based high resolution techniques such as ESPRIT, the MUSIC-like algorithm has many advantages in real hardware implementation due to its simplicity. However, there still remains the computational complicity of the arithmetic based on complex numbers, which does not allow for fast computations in the low hardware complexity. Using a unitary transformation, the eigenvalue decomposition (EVD) of the correlation matrix can be solved by computations based only on real numbers [34, 35].

The unitary MUSIC processor (hereafter UMP) performs all digital signal processing procedures only using the fixed-point operation with a finite word-length. The EVD and MUSIC angular spectra computation are solved by a Cyclic Jacobi processor based on fixed-point CORDIC [101] and the spatial DFT, respectively.

### 3.2 DOA Estimation using MUSIC algorithm

#### 3.2.1 Data Model

Let us assume the basic model of the narrowband signal $s_i(t)$ for the $i$-th source, where $i = 1, 2, \cdots, L$. The signals received at the $K$-element antenna array spaced by a half wavelength can be modeled by

$$ x(t) = V s(t) + n(t), $$

where the array output $x(t)$ is a snapshot vector, and $s(t)$ and $n(t)$ are the signal and complex AWGN vectors, at time $t$, respectively. The columns of the channel matrix $V = [v_1, v_2, \cdots, and v_L]$ consist of the spatial channel vectors for $L$ sources. The spatial channel
vector \( \mathbf{v}_i \) for the \( i \)-th source can be provided by the array response vector \( \mathbf{a}(\theta_i) \) under the assumption that the plane waves arrive at an ideal omni-directional antenna array from the point sources as
\[
\mathbf{v}_i = \mathbf{a}(\theta_i) = \begin{bmatrix} 1, e^{-j\pi \sin \theta_i}, \cdots, e^{-j\pi (K-1) \sin \theta_i} \end{bmatrix}^T,
\]
where \( \theta_i \) is the DOA for the \( i \)-th source and the superscript \( T \) denotes the transpose operator.

### 3.2.2 MUSIC algorithm

The MUSIC algorithm is a type of DOA estimation technique based on eigenvalue decomposition (EVD) and it is also called a subspace-based method [31]. It has more advantages, including implementation simplicity and higher resolution, than any other subspace based method. The correlation matrix of \( \mathbf{x}(t) \) is given by
\[
\mathbf{R}_{xx} = E[\mathbf{x}(t)\mathbf{x}^H(t)] = \mathbf{V} \mathbf{R}_s \mathbf{V}^H + \sigma^2 \mathbf{I},
\]
where \( E[\cdot] \) and the superscript \( ^H \) denote the expectation and Hermitian conjugate operators, respectively. Further, \( \mathbf{R}_s = E[\mathbf{s}(t)\mathbf{s}^H(t)] \) is the signal covariance matrix and \( \sigma^2 \) represents the noise variance. Since the correlation matrix \( \mathbf{R}_{xx} \) is a positive definite Hermitian, it can be decomposed to signal and noise subspaces by the complex-valued EVD as
\[
\mathbf{R}_{xx} = \mathbf{U} \Lambda \mathbf{U}^H,
\]
where \( \mathbf{U} \) is a unitary matrix composed of eigenvectors and \( \Lambda \) is \( \text{diag}\{\lambda_1, \lambda_2, \cdots, \lambda_K\} \) of real eigenvalues ordered by \( \lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_K > 0 \). If an eigenvector \( \mathbf{e}_i \) is orthogonal to \( \mathbf{V}^H \) of rank \( L \), i.e., \( \mathbf{e}_i \) is orthogonal to the range of \( \mathbf{V} \), shown by the following relation as
\[
\mathcal{N}[\mathbf{V}^H] = \perp \mathcal{R}[\mathbf{V}],
\]
then \( \mathbf{e}_i \) is an eigenvector of \( \mathbf{R}_{xx} \) with the eigenvalue of \( \sigma^2 \) as
\[
\mathbf{R}_{xx} \mathbf{e}_i = (\mathbf{V} \mathbf{R}_s \mathbf{V}^H + \sigma^2 \mathbf{I}) \mathbf{e}_i = \sigma^2 \mathbf{e}_i,
\]
The eigenvectors of \( \mathbf{R}_{xx} \) with eigenvalue of \( \sigma^2 \) lie in the nullspace of \( \mathbf{V}^H \). On the other hand, some eigenvectors lie in the range of \( \mathbf{V} \), and the eigenvalues are ordered as
\[
\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_L \geq \lambda_{L+1} = \cdots = \lambda_K = \sigma^2.
\]
They can be partitioned into signal and noise components. In a similar manner, the correlation matrix can be also partitioned as
\[
\mathbf{R}_{xx} = \mathbf{U}_s \Lambda_s \mathbf{U}_s^H + \mathbf{U}_n \Lambda_n \mathbf{U}_n^H,
\]
where \( \mathbf{U}_s \) and \( \mathbf{U}_n \) are the unitary matrices of signal subspace and noise subspace, respectively, and \( \Lambda_s \) and \( \Lambda_n \) are the diagonal matrices of the eigenvalues.
Fig. 3.1: Typical MUSIC spectrum (8-element array, two waves from 0° and 30° with SNR=20dB).

The noise subspace eigenvectors of corresponding eigenvalues of \( \sigma^2 \) are orthogonal to the signal subspace and eventually, orthogonal to the array response vectors. Based on this, the MUSIC spectrum is typically expressed by

\[
P_{MU}(\theta) = \frac{a^H(\theta)a(\theta)}{a^H(\theta)E_n E_n^H a(\theta)^*}
\]  

(3.9)

where \( E_n \) is the matrix whose columns consist of noise subspace eigenvectors. In the result spectrum of Eq.(3.9), the peaks appear at the DOAs of incident signals. For example, if two incoherent waves are impinging at an 8-element antenna array from 0, 30° with SNR = 20 dB and the typical MUSIC spectrum can be calculated, as shown in Fig. 3.1.

3.2.3 Unitary MUSIC DOA Estimation

In reality, the correlation matrix in Eq.(3.3) is complex-valued. It has been clarified that the EVD with a complex-valued correlation matrix would be a high computational burden. Reducing the computational complexity by unitary transformation enables real-valued eigenvalue decomposition with the transformed real number correlation matrix [33–36]. Since the EVD procedure bears a large portion of the entire computational load of MUSIC-like subspace based algorithms, real-valued eigenvalue decomposition is a very attractive option to reduce the computation complexity. If the steering vector of Eq.(3.2) is rearranged in the conjugate

\(^1\)Generally, complex number multiplication requires four real number multiplications and two real number additions
CHAPTER 3. FAST DOA ESTIMATION PROCESSOR

centro-symmetric manner, the correlation matrix $\mathbf{R}_{xx}$ becomes centro-Hermitian. The real-valued correlation matrix $\hat{\mathbf{R}}_{xx}$ can be obtained by an appropriate unitary transformation $\mathbf{Q}$ as

$$
\hat{\mathbf{R}}_{xx} = \text{Re} \left\{ \mathbf{Q}^H \mathbf{R}_{xx} \mathbf{Q} \right\}.
$$

where the unitary transformation matrix $\mathbf{Q}$ for the $M$-by-$M$ matrix can be chosen as

$$
\mathbf{Q} = \frac{1}{\sqrt{2}} \begin{pmatrix}
    I & jI \\
    II & -jII
\end{pmatrix}
$$

or

$$
\mathbf{Q} = \frac{1}{\sqrt{2}} \begin{pmatrix}
    I & 0 & jI \\
    0^T & \sqrt{2} & -j0^T \\
    II & 0 & -jII
\end{pmatrix},
$$

on the basis of whether the number of antenna elements is even or odd respectively, where the vector $\mathbf{0} = [0, 0, \cdots, 0]^T$, and $I$ and $II$ are the identity matrix and column flipped identity matrix in the left-right direction, respectively [33,35,36].
3.3 Digital Signal Processor Design

The computation flow of the unitary MUSIC DOA estimation involves 4 main steps [34,101]: (1) estimation of the correlation matrix including unitary transformation and spatial smoothing, if necessary, (2) EVD of the correlation matrix, (3) computation of MUSIC spectrum, and (4) 1-dimensional peak search. This section describes the digital signal processor design for the dominant procedures in a unitary MUSIC DOA estimation algorithm, where we assume that the antenna array has a 4-element linear geometry for simplicity. It is possible to extend the processor using more antenna elements in a different geometry.

The computation flow of DOA estimation by the spectral MUSIC algorithm is illustrated in Fig. 3.2. First, the correlation matrix $R_{xx}(t)$ is computed by $E[x(t) \cdot x^H(t)]$, where $x(t)$ is the data vector received at the array antenna, $E[\cdot]$ is the expectation operator, and the superscript $H$ denotes the Hermitian conjugate operator. In reality, the average of the correlation matrix with a finite number of snapshots is usually used to approximate the stochastic process. In such a case, the spatial smoothing process decreases the correlation between the incident signals to separate them, although the signals are fully correlated (coherent) with one another. The correlation matrix is decomposed into signal and noise subspace eigenvectors by EVD, and the DOAs can be determined by computing the angular spectrum with an inner product of the noise subspace and the array mode vectors [31].

It appears to be relatively simple to implement the correlation matrix computation, the spatial smoothing filter, and the spectrum synthesis with a dedicated circuit of FPGAs by applying any fast algorithm to them. However, EVD computation, in particular, is rather complex and not suitable for FPGA implementation. In general, the EVD process is believed to have $30 \sim 50\%$ of the entire computational load of the DOA estimation. In fact, there are many algorithms for EVD problems; however, these are merely numerical solutions for serial processing on general-purpose computers with floating-point arithmetic. Therefore, it is necessary to study a hardware-friendly algorithm that is suitable for the parallelism of dedicated circuit, such as FPGAs.

3.3.1 Correlation Matrix with Unitary Transformation

UMP can extract a signal from the correlated mixed signals using the spatial smoothing technique [44]. With the unitary transformation, EVD, which has the largest computational cost among all procedures, can be solved using only real number arithmetic. In addition, the backward averaging effect is obtained by selecting only the real part in Eq. (3.10). Thus, Forward-and-Backward (FB) spatial smoothing can be simultaneously achieved (see Appendix A) [34]. The first step in the unitary MUSIC algorithm is to transform the input data vector $x$ into $y$ with a unitary matrix $Q$ as

$$y_i = Q^H x_i,$$  \hspace{1cm} (3.13)

where $x_i$ and $y_i$ ($i = 1, \cdots, M$) are the divided $M$ sub-vectors of the snapshot vector and the corresponding transformed sub-vectors for spatial smoothing, respectively, as
Obtain the Correlation Matrix $\mathbf{R}_{xx}$

Spatial Smoothing

Obtain Eigenvectors corresponding to Noise Eigenvalues

Generate MUSIC Spectrum With the Eigenvectors

Fig. 3.2: Computational Flow of MUSIC Method

$$x_i = [x_i, \cdots, x_{i+K-M}]^T,$$  \hspace{1cm} (3.14)

$$y_i = [y_i, \cdots, y_{i+K-M}]^T.$$  \hspace{1cm} (3.15)

By unitary transformation, the real-valued correlation matrix $\hat{\mathbf{R}}_{yy}$ is computed by Eqs.(3.3) and (3.10). In reality, however, it is approximated by uniform averaging with certain numbers of snapshots sampled at time $nT$, where $T$ is a sampling period, as

$$\mathbf{R}_{yy}(n) = \frac{1}{N} \sum_{n-N+1}^{n} \sum_{i=1}^{M} \text{Re}\{\hat{\mathbf{R}}_{yy,i}(m)\},$$  \hspace{1cm} (3.16)

where

$$\hat{\mathbf{R}}_{yy,i}(m) = y_i(m)y_i^H(m),$$  \hspace{1cm} (3.17)

and $N$ is the number of snapshots. This type of averaging can be implemented by a sliding boxcar FIR filter with unit gain. In the case of hardware implementation, this filter requires such a long shift register that the memory resources are exhausted. Therefore, the first order IIR (Infinite Impulse Response) exponential averaging filter, as shown in Fig. 3.3, is a reasonable choice. It requires only a single register. In this figure, the correlation matrix can be written by

$$\hat{\mathbf{R}}_{yy}(n) = \beta\hat{\mathbf{R}}_{yy}(n-1) +$$

$$(1 - \beta) \sum_{i=1}^{M} \text{Re}\{y_i(n)y_i^H(n)\},$$  \hspace{1cm} (3.18)

where $\beta(< 1)$ is the forgetting factor of a real positive number. This type of filter can quickly respond to the non-stationary channel if the smoothing factor $\beta$ is determined appropriately\footnote{In this study, $\beta$ was typically selected as $0.75(1 - \frac{1}{2})$ or $0.875(1 - \frac{1}{8})$ for easy fixed-point arithmetic with shift and add operation.}.
3.3.2 MUSIC Spectrum Computation

After the EVD procedure, the MUSIC spectrum of Eq. (3.9) is computed. In order to reduce the system complexity in the fixed-point system, only the denominator in Eq. (3.9) is considered. This reciprocal spectrum can be generated from the sum of spatial DFT spectra of \((K - L)\) noise eigenvectors, which have been transformed to complex values by the inverse unitary transformation, as

\[
P_{MU,\text{reciprocal}} = \sum_{i=L+1}^{K} |\text{DFT}\{Q \cdot e_i\}|^2, \tag{3.19}\]

where \(e_i\) is the \(i\)-th eigenvector belonging to the noise subspace.

The angular spectrum in the spectral MUSIC algorithm should be computed in order to determine the DOAs of incoming signals. There exists an alternative solution in MUSIC algorithms; root-MUSIC is based on solving the roots of the MUSIC polynomial, [34, 35]. However, the root-determination problem of the polynomial with complex number coefficients is less suitable for the dedicated circuit computer with fixed-point arithmetic, such as FPGAs. A simple iterative algorithm would be best suited for the fast digital signal processor implementation on FPGAs. The spatial DFT technique can be applied to compute the angular MUSIC spectrum. This has a well-known performance guarantee as well as the simplicity of an FFT algorithm. The DFT can be applied to the computation of the MUSIC angular spectrum as follows.

The simple continuous spatial signal model for a 1-dimensional distance is typically given by

\[
x_d = u(t) \cdot e^{-j2\pi f_{\text{spc}} d}, \tag{3.20}\]

where \(u(t)\) includes all the complex-valued time-varying components, and \(d\) and \(f_{\text{spc}}\) are the distance from the first reference antenna element and the spatial frequency, respectively. In the case of array antenna signal processing, we can consider the elements of a snapshot vector as the sampled data at a distance of \(m \cdot D_{\text{spacing}}\) in each element of antenna sensors. The distance between each antenna is a discrete value as

\[
d \rightarrow m \cdot D_{\text{spacing}}, \tag{3.21}\]

where \(m\) and \(D_{\text{spacing}}\) are the indices of the discrete distance and antenna spacing (or spatial period), respectively. However, the spatial frequency depending on the incident DOA still
Fig. 3.4: Relationship between DFT indices \((k \text{ and } l)\) and non-uniform discrete wavefront \((\theta)\) in reciprocal MUSIC spectrum \((P = 256)\).

remains continuous. Under the plane wave assumption, the continuous spatial frequency can be thought to be \(P\) discrete spatial frequencies as

\[
2\pi \cdot f_{\text{spa}} \rightarrow \frac{2\pi}{P \cdot D_{\text{spacing}}} k,
\]

where \(k\) is the index of the discrete spatial frequency. Based on this, the discrete spatial frequency domain components can be obtained by applying the spatial \(P\)-point DFT as

\[
X_d[k] = \frac{1}{P} \sum_{m=0}^{P-1} x_d[m] \cdot e^{-j \frac{2\pi}{P} m k},
\]

when \(D_{\text{spacing}} = \lambda/2\). Eventually, the discrete wavefront \(\theta\) can be computed from the following relationships of Eqs.(3.24)-(3.25).

\[
\theta = \sin^{-1} \left( \frac{k}{P/2} \right)
\]

From the above facts, it can be observed that in the MUSIC algorithm, the spatial DFTs of the noise subspace eigenvectors in Eq.(3.19) provide the distribution of the spatial frequency. However, the spatial spectrum obtained by DFT with only a few spatial samples as the antenna array length usually has coarse resolution; an accurate estimation is not possible in this case. That is, high-resolution estimation will not be available from the coarse spectrum because the antenna array length is usually limited for practical reasons. Therefore, the interpolation
Fig. 3.5: Non-uniform discrete wavefront (DOA) corresponding to DFT index \(P = 256\).

of the spectrum should be considered. According to the digital signal processing theory, the DFT spectrum can be finely and smoothly generated by adding zeroes to the spatial samples of the noise eigenvector elements. The spectrum generated by the spatial DFT is completely equivalent to the denominator of Eq. (3.9).  

Figure 3.4 shows an example of the relationship among the DFT index \(k\), rearranged index \(l\), and corresponding discrete wavefront \(\theta\) in the reciprocal MUSIC spectrum (denominator of Eq. (3.9)) generated by spatial DFT (DFT bin \(P = 256\)). In other words, Fig. 3.4(b) is the rearranged version of Fig. 3.4(a) (actual order stored in memories) by the relation

\[
l = \begin{cases} 
  k + P/2 & (k < P/2) \\
  k - P/2 & (k \geq P/2) 
\end{cases}
\]  

(3.26)

From Eqs. (3.25) and (3.26), the concrete discrete wavefronts (DOAs) in Fig. 3.4(c) are obtained as

\[
\theta = \sin^{-1}\left(\frac{l - P/2}{P/2}\right). 
\]

(3.27)

As shown in Fig. 3.4(c), the discrete wavefronts generated by the spatial DFT spectrum are not uniformly spaced. Thus, the estimation resolution in the angular region close to endfire direction becomes lower as the angular spacing gets wider. However, this can be possibly be neglected in the practical sectorized base station configuration. According to Eq. (3.27), the DOA \(\theta\) is an inverse sinusoidal function of \(l\). Figure 3.5 shows the effect of non-uniform

---

3This implies the angular power spectrum is obtained by scanning main beam toward all directions \(P\) discrete wavefronts).
discrete wavefronts. In the region between approximately $-30$ and $30$ degrees, the estimation resolution can be approximated by a linear function, the gradient of which is given by the derivative of Eq.(3.27) at \( l = P/2 \) (broadside) as

\[
\frac{d\theta}{dt}\bigg|_{t=P/2} = \frac{1}{P/2} \text{[rad]}.
\]

(3.28)

In this linear region, the estimation resolution (angular spacing) can be regarded to be almost uniform. Furthermore, the estimation resolution becomes higher with larger \( P \) of DFT bin length as that is inversely proportional to \( P \). When \( P \) is 256, the estimation resolution is approximately $0.4476^\circ$ in the linear region from Eq.(3.28).

### 3.3.3 Local Minima (LM) Detection

Instead of determining peaks in the MUSIC spectrum written in Eq.(3.9), the local minima (LM) detection of the reciprocal MUSIC spectrum generated by DFT in Eq.(3.19) is also available; this enables simple division-free implementation. The LM detection can be implemented by a memory scanning circuit comprising a two-word shift register, comparator, and logic gates of XOR and AND, as shown in the LM detection section of the entire DSP block diagram in Fig.3.6. The comparator compares two words loaded in the shift register to output one-bit decision result \( c(n) \) that implies the sign of derivative between the indices \((n-1)\) and \( n \). The AND(\(\oplus\)) output of, the \( c(n) \) and the XOR(\(\otimes\)) output \( d(n) \), notifies the memory writing controller of the transition of \( c(n) \) \((0 \to 1)\) as

\[
\begin{align*}
    c(n) &= \begin{cases} 
    0 & \text{if } s(n-1) > s(n) \\
    1 & \text{otherwise}
    \end{cases} \\
    d(n) &= c(n-1) \otimes c(n) \\
    e(n) &= c(n) \oplus d(n).
\end{align*}
\]

(3.29)

This procedure can be implemented by a simple logic circuit with high-speed operation.
3.4 Eigenvalue Decomposition Processor

In the next step, the correlation matrix is eigen-decomposed by the EVD processor. Generally, the subspace-based methods, such as the MUSIC algorithm, are based on the EVD of the covariance (or correlation) matrix. In the EVD-based system, the complex logic and heavy computational load are usually disadvantages in the real-time processing. This paper proposes a circuit design of a fast EVD processor that is suitable for real-time processing and can be practically used for adaptive antenna processing. It uses the Cyclic Jacobi method that is well known for simple algorithm and easy implementation; however, its convergence time is slower than that of other factorization algorithms such as QR-method [97]. However, if the fast parallel computation of the EVD with a dedicated circuit like FPGAs and ASICs is concerned, the Cyclic Jacobi method can be a suitable choice since it offers a very high degree of parallelism and easier implementation than QR-method [98]. This study uses the hardware-friendly CORDIC algorithm for fixed-point vector rotators and arctangent computers that are the basic processors of this design.

3.4.1 Cyclic Jacobi Method

This section describes the basic principle of the Cyclic Jacobi method, an EVD computation technique. It can be implemented with a simple iterative process of plane rotations. This method solves symmetric eigenvalue problems by applying a sequence of orthonormal rotations to the left and right sides of the target matrix $R \in \mathbb{R}^{N \times N}$ as

$$
E^T \cdot R \cdot E = D,
$$

$$
\begin{bmatrix}
E_1 \cdot J_1 \cdot J_2 \cdot J_3 \cdots \\
J = W_{12} \cdot W_{13} \cdots W_{N-1,N}
\end{bmatrix},
$$

where $W_{pq}$ is an orthonormal plane rotation over an angle $\theta$ in the $(p,q)$ plane, whose elements are $w_{pp} = \cos \theta$, $w_{pq} = \sin \theta$, $w_{qp} = -\sin \theta$, and $w_{qq} = \cos \theta$ ($p > q$), and is defined as Eq.(3.2). $J$ is the multiple rotation of $W_{pq}$’s in the cyclic-by-row manner of $(p,q)$, which is termed a Jacobi sweep; the superscript $T$ and $N$ denote the transposition operator and array length, respectively.

$$
W_{pq} =
\begin{pmatrix}
1 \\
\vdots \\
\cos \theta \cdots \sin \theta \\
\vdots \\
1 \\
-\sin \theta \cdots \cos \theta \\
\vdots \\
1
\end{pmatrix}
$$

Theoretically, it is certain from the infinite number of transformation of the matrix $R$ that $E$ and $D$ converge into the matrix, whose column vectors are composed of eigenvectors and
the matrix, whose diagonal elements are eigenvalues. A symmetric matrix \( R \) is transformed to \( R' \) by plane rotation as

\[
R' = W_{pq}^T \cdot R \cdot W_{pq}
\]

\[
= \begin{pmatrix}
\cdots & r_{1p}' & \cdots & r_{1q}' & \cdots \\
\vdots & \vdots & \vdots & \vdots & \vdots \\
\vdots & \vdots & \vdots & \vdots & \vdots \\
r_{q1}' & \cdots & r_{qq}' & \cdots & r_{qN}' \\
\vdots & \vdots & \vdots & \vdots & \vdots \\
\cdots & r_{Np}' & \cdots & r_{Nq}' & \cdots
\end{pmatrix},
\]

(3.32)

By the above transformation, only the \( p \)-th and \( q \)-th rows and columns of \( R' \) are changed as Eq.(3.32). The optimal rotation angle in a \((p,q)\) plane is determined by Eq.(3.33). The basic strategy of the Cyclic Jacobi method is that the iterative process of the plane rotation converges the \((p,q)\) and \((q,p)\) elements of the target matrix into zero.

\[
r_{pq}' = r_{qp}' = 0
\]

(3.33)

In a similar manner as described above, matrix \( R \) converges into a diagonal eigenvalue matrix. In the Cyclic Jacobi method, the off-diagonal quantity \( S^{(h)} \) is defined by

\[
S^{(h)} = \sqrt{\frac{1}{2} \left[ \| R^{(h)} \|_F^2 - \sum_{i=1}^{N} \{ r_{ii}^{(h)} \}^2 \right]},
\]

(3.34)

where \( \| \cdot \|_F \) denotes the Frobenius norm. Therefore, if \( S^{(h)} \) converges into zero, the target matrix \( R \) becomes the eigenvalue diagonal matrix as

\[
\lim_{h \to \infty} S^{(h)} \to 0 \iff \lim_{h \to \infty} R^{(h)} \to diag[\lambda_1, \cdots, \lambda_N].
\]

(3.35)

On the other hand, the execution of a similarity transformation yields

\[
\left[S^{(h+1)}\right]^2 = \left[S^{(h)}\right]^2 - \left[\{ r_{pq}^{(h)} \}^2 - \{ r_{pq}^{(h+1)} \}^2 \right].
\]

(3.36)

From Eq.(3.36), the maximal reduction of \( S^{(h)} \) is obviously obtained if \( r_{pq}^{(h+1)} = 0 \). The condition for an optimal angle and maximal reduction of \( S^{(h)} \), is achieved as

\[
\theta_{opt} = \frac{1}{2} \tan^{-1} \left[ \frac{2r_{pq}}{r_{pp} - r_{qq}} \right] = \frac{1}{2} \tan^{-1} \tau,
\]

(3.37)

where \( \tau = \frac{2r_{pq}}{r_{pp} - r_{qq}} \) [97, 98].

3.4.2 CORDIC Algorithm

The CORDIC algorithm is operated in one of two modes: rotation mode and vectoring mode [99]. In this study, the rotation mode is used for fixed-point vector rotations and the vectoring
mode is used for computing the optimal rotation angle by arctangent of $\tau$. In the case of the rotation mode, the CORDIC equations are

$$\begin{align*}
    x_{i+1} &= x_i - y_i \cdot d_i \cdot 2^{-i} \\
    y_{i+1} &= y_i + x_i \cdot d_i \cdot 2^{-i} \\
    z_{i+1} &= z_i - d_i \cdot \tan^{-1} (2^{-i}) \\
    d_i &= -1 \text{ if } z_i < 0, \quad +1 \text{ otherwise}
\end{align*}$$

(3.38)

where $d_i = \pm 1$ (direction of rotation) and $z$ is the angle accumulator. The CORDIC algorithm provides the following results after a finite number of iterations that is as many as the word length.

$$\begin{align*}
    x_n &= A_n (x_0 \cos z_0 - y_0 \sin z_0) \\
    y_n &= A_n (y_0 \cos z_0 + x_0 \sin z_0) \\
    z_n &= 0 \\
    A_n &= \prod_{i=0}^{n} \sqrt{1 + 2^{-2i}}
\end{align*}$$

(3.39)

where $A_n$ is a computational gain.

On the other hand, the CORDIC equations for the vectoring mode are

$$\begin{align*}
    x_{i+1} &= x_i - y_i \cdot d_i \cdot 2^{-i} \\
    y_{i+1} &= y_i + x_i \cdot d_i \cdot 2^{-i} \\
    z_{i+1} &= z_i - d_i \cdot \tan^{-1} (2^{-i}) \\
    d_i &= +1 \text{ if } y_i < 0, \quad -1 \text{ otherwise}
\end{align*}$$

(3.40)

which also provides the following result after a finite number of iterations, as

$$\begin{align*}
    x_n &= A_n \sqrt{x_0^2 + y_0^2} \\
    y_n &= 0 \\
    z_n &= z_0 + \tan^{-1}(y_0/x_0) \\
    A_n &= \prod_{i=0}^{n} \sqrt{1 + 2^{-2i}}
\end{align*}$$

(3.41)

After the final step, scaling operation must be performed elsewhere in the system. From Eqs.(3.39) and (3.41), the scaling factor $K_n$ is obtained by

$$K_n = \frac{1}{A_n} = \prod_{i=0}^{n} \frac{1}{\sqrt{1 + 2^{-2i}}} \quad \text{Scaling factor.}$$

(3.42)

Due to the use of $2^0$ for the tangent in the first iteration, the rotation and vectoring modes of CORDIC algorithm are usually restricted to rotation angles between $-\pi/2$ and $\pi/2$.

**Circuit Architecture**

The CORDIC algorithm performs only shift and add operations; hence, it is easy to implement and suitable for dedicated circuit in a fixed-point system. In the implementation of the CORDIC algorithm, we can choose various designs depending on what is given greater importance?circuit resource or performance. CORDIC algorithm can be implemented by bit-serial
or bit-parallel, and unrolled or rolled (iterative) architectures [61]. This study uses the bit-parallel unrolled CORDIC architecture for high performance. It is a cascade structure of the consecutive CORDIC stages, as shown in Fig.3.7, where arctangent values are precomputed and stored in any memory block.

**Arctangent Computer**

If the angle accumulator \( z \) is initialized with zero \((z_0 = 0)\), the arctangent, \( \theta = \tan^{-1}(y/x) \), is directly computed using the vectoring mode. The result is taken from the angle accumulator as

\[
z_n = z_0 + \tan^{-1}(y_0/x_0),
\]

[61].

**Scaling in Double Rotation Architecture**

In CORDIC processing, scaling must be performed to avoid overflow caused by the finite word-length of the fixed-point processor. However, as is seen from Eq.(3.42), the scaling and normalizing operations using pre-computed scaling values cannot be performed only with shift and add operations. It may not be easy to implement the division and square root operation in a fixed-point system.

Double rotation by \( z_0/2 \) can solve this problem [98]. Let an elementary rotation by angle \( z_0 \) be composed of double rotation by \( z_0/2 \). Hence, Eq.(3.38) is rewritten as

\[
\begin{align*}
x_{i+1} &= (1 - 2^{-2i})x_i - y_i \cdot d_i \cdot 2^{-i+1} \\
y_{i+1} &= (1 - 2^{-2i})y_i + x_i \cdot d_i \cdot 2^{-i+1} \\
z_{i+1} &= z_i - d_i \cdot \tan^{-1}(2^{-i}) \\
d_i &= -1 \ 	ext{if} \ z_i < 0, \ +1 \ 	ext{otherwise}
\end{align*}
\]

It requires 4-shift and 5-add operations per iteration stage, as shown in Fig.3.8. The results after a finite number of iterations is

\[
\begin{align*}
x_n &= A_n^{(ii)} (x_0 \cos z_0 - y_0 \sin z_0) \\
y_n &= A_n^{(ii)} (y_0 \cos z_0 + x_0 \sin z_0) \\
z_n &= 0 \\
A_n^{(ii)} &= \{A_n\}^2 = \prod_{i=0}^{n}(1 + 2^{-2i})
\end{align*}
\]

In Eq.(3.45), the computational gain \( A_n^{(ii)} \) becomes square root operation free. In addition, double rotation makes the division operation free in the scaling factor as Eq.(3.46). This is because for a given precision \( b \) of the shift and add operations, all factors \( (1 - 2^{-n}) \) with \( n > b \) do not contribute to \( K_n^{(ii)} \). Thus, the scaling factor can be approximated to a simplified form as

\[
K_n^{(ii)} = \{K_n\}^2 = \prod_{i=0}^{n} \frac{1}{1 + 2^{-2i}} = \frac{1}{2} \prod_{i=1}^{n} \frac{1 - 2^{-2i}}{1 - 2^{-4i}} \\
\approx \frac{1}{2} \prod_{i=1}^{n/4} \left(1 - 2^{-(4i-2)}\right),
\]

(3.46)
Fig. 3.7: Unrolled CORDIC Architectures (Rotation Mode)

Fig. 3.8: $k$-th stage of double rotation (Rotation Mode)
Fig. 3.9: Reduction of off-diagonal norm with various word-length (6 × 6 random real symmetric matrix).

which can also be computed only with shift and add operations [98]. Fig3.8 shows the k-th stage of the double rotation. In this work, double rotation unrolled architecture is used for efficient fixed-point implementation.

3.4.3 Fixed-Point Implementation

This section describes the fixed-point implementation of Jacobi EVD processor based on CORDIC. The required number of Jacobi sweeps, the appropriate precision (word-length) for the desired accuracy, and the applicability to the MUSIC DOA estimator are examined. First, it is necessary to determine the number of Jacobi sweeps required to provide the desired convergence level.

Figure 3.9 illustrates the reduction of off-diagonal norm to the number of Jacobi sweeps with a 6 × 6 random real symmetric matrix of, for example, 12-bit precision. The off-diagonal reduction is defined as Eq.(3.34) and the lower value implies that it is closer to the diagonal matrix. On general-purpose computers, the 32-bit floating-point operation by the C language converges to the machine zero within a given precision after several Jacobi sweeps; however, Fig.3.9 shows that fixed-point arithmetic does not converge but keeps vibrating from 12 to 36 bits after approximately four Jacobi sweeps regardless of the precision. This is caused by the limited precision of the fixed-point operation. At the cost of computation accuracy, the fixed-point operation achieves simpler circuit implementation, high performance, and low power consumption. Without using an additional convergence decision circuit, fixing the
number of sweeps at four may be an appropriate choice. Further, greater accuracy cannot be achieved even with the execution of more Jacobi sweeps. Since a finite number of operations are determined in advance, it can always be computed in the same computation time.

Next, the precision of fixed-point arithmetic should be examined. Accuracy within an allowable error range should be guaranteed in order to verify a fixed-point system operation. Eq.(3.47) yields the error ratio where $v$'s are the vectors whose elements consist of eigenvalues computed in the subscripted ways. Further, $\| \cdot \|$ denotes the vector norm, and the error between vectors is defined by

$$
\text{Error} = \frac{\|v_{\text{float}} - v_{\text{fixed}}\|}{\|v_{\text{float}}\|}
$$

as shown in Fig.3.10.

Fig.3.11 shows the error ratio of the fixed-point arithmetic with various bit-lengths on the CORDIC-Jacobi EVD processor with respect to the 32-bit floating-point operation on general-purpose computers. Fig.3.11 uses a $6 \times 6$ random real symmetric matrix of 12-bit precision as an input matrix. The longer the word-length in the fixed-point arithmetic, the greater is the accuracy that can be achieved. When it was implemented with more than 16-bit precision, it had less than 0.5% error in the floating-point operation. Approximately 16-bit precision is desirable for practical use. In this paper, the computational load of the processor and the word-length are four Jacobi sweeps and 16-bit, respectively.

Regarding the application of the fixed-point arithmetic, the computation accuracy in a practical application should be carefully examined. The fixed-point effect on the computation accuracy was quantitatively calculated. For example, Fig.3.12 shows the result of the DOA estimation performed by the spectral MUSIC method assuming that two waves arrive at the 4-element linear array antenna. The simulation results are sufficiently good for the peaks to appear at precisely the same angles as the floating-point spectrum, where the EVD was

---

4It implies that The 32-bit floating-point operation with float type (single precision) in a C compiler with an Intel Pentium machine.
Fig. 3.11: Average error ratio of fixed-point operations to 32-bit floating point operation (6 x 6 random symmetric matrix).

Fig. 3.12: MUSIC Spectrum in case of 4-element array antenna and two incident waves (DOAs are -5° and 20° with SNR = 20dB); (a) 32-bit floating-point operation with a general computer and (b) 16-bit fixed-point operation by the CORDIC-Jacobi EVD processor.
computed by 32-bit floating-point arithmetic on a general-purpose computer and 16-bit fixed-point arithmetic on a proposed Jacobi EVD processor based on CORDIC. In this simulation, except for EVD computation all the other processes of the MUSIC method, such as correlation matrix, spatial smoothing, and MUSIC spectrum were computed on the general-purpose computer with a floating-point operation.

3.4.4 Processor Structure

As mentioned above, approximately four Jacobi sweeps are sufficient for convergence, and the fixed-point operation with a 16-bit word-length gives a good spectral estimation result. Based on these facts, this section proposes FPGA implementation of the EVD processor.

Fig. 3.13 shows the computational flow of Cyclic Jacobi EVD processor. As described earlier, the Jacobi type EVD is a very simple algorithm and is merely an iterative process of vector rotations until the desirable convergence level is achieved. The optimal rotation angle is determined from Eq. (3.37) and subsequently, the processor performs a similar transformation of correlation matrix $R$ and unitary matrix $E$ (initial value is identity matrix $I$) of eigenvectors. After four Jacobi sweeps, the computation is complete, the resulting matrix $R$ converges to the diagonal matrix of eigenvalues, and $E$ becomes the unitary matrix of the eigenvectors with the given precision.

The EVD processor consists of CORDIC vector rotators and CORDIC arctangent computers. With an optimum angle obtained by Eq. (3.37), the rotation $W_{pq}$ is determined as Eq. (3.31). The transformation $W_{pq}$ in Eq. (3.32) changes only the $p$-th and $q$-th rows and columns of the matrix $R$. Therefore, the transformation can be simplified as

$$
\begin{pmatrix}
  r'_p \\
  r'_q
\end{pmatrix} = \mathbf{w}^T_{pq} \cdot 
\begin{pmatrix}
  r_p \\
  r_q
\end{pmatrix} = 
\begin{pmatrix}
  \cos \theta & -\sin \theta \\
  \sin \theta & \cos \theta
\end{pmatrix} 
\begin{pmatrix}
  r_{p1} & \cdots & r_{pq} & \cdots & r_{pN} \\
  r_{q1} & \cdots & r_{qp} & \cdots & r_{qN}
\end{pmatrix},
$$

where $r_k$ and $w_{pq}$ denote the $k$-th row vector of the matrix $R$ and the $(p, q)$ plane rotation that
is a sub-matrix of $W_{pq}$ in Eq. (3.31), respectively. The right side transformation in Eq. (3.32) yields the same result due to the symmetric property of the matrix $R$; hence, it does not need to be considered if the second rotations of only two vectors $[r_{pp} r_{qp}]^T$ and $[r_{pq} r_{qp}]^T$ are performed again.

The most interesting feature between the QR and Jacobi methods to the symmetric eigenvalue problem is the inherent parallelism of the latter one. By grouping non-conflicting rotation, parallel rotation can be performed [96]. That is, $N(N-1)/2$ rotations for a Jacobi sweep are grouped by $(N-1)$ sets consisting of $N/2$ non-conflicting rotations; hence, the complexity can be decreased by $O(N^2)$ from $O(N^3)$. In the CORDIC matrix rotator, the CORDIC vector rotators are simultaneously performed with multiple processors using the inherent parallelism of dedicated circuit. It takes $N(N-1)/2$ matrix rotations per sweep to compute $R$ and $E$. Fig. 3.14 illustrates the architecture of the EVD processor core, where the ESB (Embedded System Block) is a memory block of an FPGA and stores the correlation matrix and eigenvector matrices.

### 3.4.5 Computational Load and Expected Performance

The basic arithmetic operation of the EVD processor consists of only shifts and adds. In the double rotation CORDIC stages, the computational load yields

$$\left(4B + \frac{1}{4}B\right) \text{ Shifts and } \left(5B + \frac{1}{4}B\right) \text{ Adds,}$$  \hspace{1cm} (3.49)

where $B$ is the word-length and $1/4B$ is approximately taken for scaling operation from Fig. 3.8 and Eq. (3.46). The number of vector rotations required for computing both eigenvalues and
eigenvalues is $4N(N - 1)(N + 2)$. This is because $E$ performs 4 $J$'s, $J$ consists of $N(N - 1)/2$ $W$'s, and $W$ requires $(N + 2)$ vector rotations from Eq.(3.30), where $N$ is the matrix dimension\textsuperscript{5}. Therefore, by addition of the arctangent, the total computational load of the CORDIC-Jacobi EVD processor is

$$[4N(N - 1)(N + 2) + 1] \times \left(\frac{17}{4} B \text{ Shifts} + \frac{21}{4} B \text{ Adds}\right).$$

(3.50)

On the other hand, a rough estimate of the circuit scale obtained by configuring this computational load with the parallel architecture is approximately $(N + 1) \times 15K$ equivalent gates (single 16 bit-CORDIC processor could be synthesized using approximately 15$K$ equivalent gates). For example, if $N = 8$ (4-element array antenna with real matrix extension and no spatial smoothing) the total circuit scale is approximately 75$K$ equivalent gates. The estimated result is obtained by synthesizing the circuit described by VHDL (Very high speed integrated circuit Hardware Description Language) with the Leonardo Spectrum, Exemplar Logic Inc., where the target device was Altera’s FPGA, APEX20KC [102].

If this computation load is configured by the parallel circuit architecture and one clock cycle is required for the shift or add operation (worst case scenario), the first EVD result appears after

$$[4 \times N(N - 1) \times 2 + 1] \times (B + 1)$$

(3.51)

clock cycles. For example, if $N = 6, B = 16$ (in the case of 4-element array antenna and spatial smoothing with 2 sub-arrays of 3 sub-elements), and the operation is performed at the speed of 100 MHz, this system can compute EVD computations at approximately 13,200 times/sec (75.8 $\mu$s/EVD). It is noteworthy that the pipeline processing performed by appropriately placing the register can achieve higher performance than that shown by the above example.

Taking into consideration the high-speed mobility under higher frequencies in next-generation communication, it is very difficult to realize the required performance for the robust system to fast fading with general-purpose processor. This proposed EVD processor is basically made of a combinatorial logic circuit. At present, advances in circuit technology are thought to offer high-speed operation of a general combinatorial logic circuit. Such a fast EVD processor can be of practical use in various applications.

\textsuperscript{5}In array antenna processing where the array length is $N$, $N$-by-$N$ correlation matrix with complex numbers can be computed by conversion into an extended form of $2N$-by-$2N$ matrix of only real numbers [97]. It is possible to apply the unitary transformation for the real number correlation matrix without extension of the matrix dimension.
Fig. 3.15: Block diagram of UMP digital signal processor.
Table 3.1: Core performances of dominant procedures (4-element with SS).

<table>
<thead>
<tr>
<th></th>
<th>Required Clocks</th>
<th>LEs (Logic Elements)</th>
<th>$f_{\text{max}}$ (MHz)</th>
<th>$t_{\text{min}}$ (μs)</th>
</tr>
</thead>
<tbody>
<tr>
<td>$R_{\text{yu}}$</td>
<td>$N$</td>
<td>8,301</td>
<td>27.4</td>
<td>0.04 $\times N$</td>
</tr>
<tr>
<td>EVD</td>
<td>1,836</td>
<td>4,045</td>
<td>110</td>
<td>16.69</td>
</tr>
<tr>
<td>FFT &amp; LM Detection</td>
<td>1,356</td>
<td>2,303</td>
<td>114</td>
<td>11.89</td>
</tr>
</tbody>
</table>

3.5 Hardware Implementation

The actual implementation was also attempted on single FPGA. The entire block diagram of the digital signal processing procedures is shown in Fig. 3.15. This is involved in 4 major procedure sections—Correlation Matrix Section, EVD Section, FFT Section, and LM Detection Section. The word-length of every section is shown in this figure. It is assumed that the exact number of waves were predetermined and already known from any other pre-processing method. The procedures with VHDL (Very high speed integrated circuits Hardware Description Language) have also been described.

Table 3.1 illustrates the roughly estimated performance of the dominant processing cores, where LEs (Logic Elements, see Fig. 2.4) represents the number of occupied logic blocks in FPGAs and $f_{\text{max}}$ is the maximum clock frequency at which normal operation can be guaranteed. The minimum computation time $t_{\text{min}}$ is calculated by $M_{\text{clk}}/f_{\text{max}}$, and $N$ is the number of snapshots. In this chapter, in order to reduce system complexity, we assumed a 4-element uniform linear array (ULA) antenna and that the number of coherent/incoherent waves arriving at the antenna is two. A 256-point radix-4 complex FFT was employed to generate spatial frequency spectrum [104], the FFT with 256 spatial samples composed of 3 elements of the noise eigenvector (1 dimension of array freedom is used for spatial smoothing), and $(256 - 3)$ zeroes are interpolated to ensure that the spectrum is fine and smooth. All the computations were performed by fixed-point arithmetic with 12-bit input data from ADCs. As shown in Tab. 3.1, if FFT was performed only once in a 4-element UMP with spatial smoothing (in the presence of two incident waves), EVD had the largest computational load. The pipeline scheduling enables us to divide the entire processing into sub-blocks. That is, if the three sub-blocks illustrated in Tab. 3.1 are considered, this system can perform single realization of DOA estimation within approximately 17 μs. The inherent parallelism and optimizability of FPGAs resulted in the fast computation performance. In addition, the recent FPGA manufacturing technology allowed for low electric power consumption at less than approximately 2 watts.

Figures 3.16 and 3.17 show the entire evaluation system configuration and its appearance.

---

6The implementation was attempted using three different FPGAs—APEX20KC, STRATIX EP1S25, and STRATIX EP1S40 by Altera. For example, STRATIX EP1S25 has approximately 0.6 million equivalent gates, 200 Kb internal memory blocks, and optimized digital signal processing (DSP) blocks [103].

7The VHDL sources were synthesized and place-and-routed by Leonardo spectrum (Exemplar Logic Inc.) and Quartus II (Altera).
respectively. The system architecture is a super-heterodyne IF sampling receiver with quasi-coherent detection as mentioned in Chapter 2. RF signals received at an antenna array are downconverted to IF band in analog downconversion receiver, where the RF and IF frequencies are 5 GHz and 40 MHz, respectively. The IF signals are then digitized by ADCs at the rate of 32 MSPS. The undersampled IF signals are again digitally downconverted to complex baseband and subsequently downsampled \( L \)-times, where \( L \) is an appropriate integer number. As shown in Fig. 3.16, single FPGA on AD board performs the digital signal processing of UMP. The user terminal PC communicates with the CPU SH4 via Ethernet, and the CPU controls the UMP via direct 32-bit data bus connection.
Fig. 3.17: Appearance of UMP digital processing unit integrated on 4-channel prototype board.
3.6 Performance Assessments

The performance of MUSIC DOA estimation technique has been studied theoretically [34,37–39]. However, in this section, the performance in the designed fixed point system is examined. We discuss the estimation performance of the UMP by hardware simulations with Matlab (Math works) on an offline PC. Hardware simulations imply that the fixed-point operation behavior of UMP in VHDL was perfectly described in Matlab m-file. It was assumed that the antennas and analog components prior to UMP had ideal characteristics or those that were well calibrated. In other words, these simulations considered the digital signal processing part only and neglected all system-specific effects of analog parts. It would be more efficient to assess the system without operating the entire system components for various cases. An input level adjustment circuit, such as AGC (Automatic Gain Control), was considered in order to utilize the full scale range of ADCs (typically 12 bits); the performance of UMP would otherwise be degraded because of the low dynamic range of the fixed-point operation with bit truncation. The equivalent baseband simulation model is illustrated in Fig. 3.18. In this section, all the simulations included 200 burst frame data where a single burst consisted of 136 snapshots (symbols). The source waves were $\pi/4$-shift QPSK modulated signals.

3.6.1 Noise Channel under Planar Wave Assumption

In the AWGN channel, the standard deviation of the estimated DOA is a good overall performance assessment of an estimation variation to input SNR when the single wave is impinging from broadside. A similar approach has been used in [66]. Figure 3.19 shows the simulation results computed by the 4-element UMP using 16-bit fixed-point operation with spatial smoothing (SS) and by the offline PC with 64-bit floating-point (double precision) operation, where the diamond, right-triangled, and left-triangled lines denote the incident DOAs of 0, 30, and 50 degrees, respectively. In addition, the results of 4- and 8-element UMP without SS at 0 degrees are also illustrated by the squared and circled lines, respectively.

Based on these results, it is clear that the estimation variation is below 2 degrees if the input SNR is greater than 5 dB within the linear region between -30 and +30 degrees. Further, it can be quantitatively confirmed that the estimation variation could be improved by the use of more antennas. The UMP performs well for an offline PC, although the estimation variation

---

Footnote: In Matlab, all numbers are stored internally using the long format (64-bit double precision) specified by the IEEE floating-point standard.
Fig. 3.19: Standard deviation of estimated DOA when single wave impinging at 0, 30 and 50 degrees from broadside.

is larger by some degrees because of its finite word-length (16-bit) fixed-point operation. In fact, it can be observed that the floating-point operation has SNR gain that is approximately 10 dB more than the fixed-point operation; this, however, is an unavoidable trade-off between the two. In Fig. 3.19, the estimation variation increases as the DOA moves away from the broadside. If an ideal omni-directional antenna pattern is considered, this will be caused only by the non-uniform discrete wavefronts generated by the spatial DFT. The non-uniform discrete wavefront effect is illustrated in Fig. 3.20, which shows the DOA dependency of the mean DOA deviation \( \vartheta \), where

\[
\vartheta = E \{|\theta_{\text{estimated}} - \theta_{\text{true}}|\}. \tag{3.52}
\]

In the case of multiple incident waves, the separability performance representing the manner in which two waves, which are spatially close, can be distinguished from each other can be applied when there exist multiple incident waves. The problems are simplified by considering only two incident waves. The criteria of the successful estimation was

\[
r^\varphi = \arg \max_{\theta} \{ R^{\varphi}_{\text{std} < \rho} \cap R^{\varphi}_{\text{mean} < \rho} \}, \tag{3.53}
\]

where \( \varphi, \theta, \) and \( \rho \) are the given SNR, the incident DOA, and the condition value, respectively. \( r^\varphi \) and \( R^{\varphi} \) denote the separable angle and the range of \( \theta \) satisfying subscripted condition to each estimation at the given SNR \( \varphi \), respectively. In this case, the condition of (mean < \( \rho \))
Fig. 3.20: DOA dependency of estimation performance.

eliminates far-off estimation with small variations in the estimation. Figure 3.21 shows the separability performance of the UMP where the solid and dotted lines denote incoherent and coherent waves, respectively. The two waves had identical powers. The spatial smoothing (SS) capability to reduce the correlation of two coherent waves by a few degrees. In Fig 3.21, the separability performance of the UMP increases as the EVD word-length gets longer; however, it is degraded if the two waves are highly correlated (coherent). This occurs because the SS with two sub-matrices in fixed-point arithmetic cannot provide sufficient reduction of the correlation between two waves. However, this will be improved with larger number of sub-matrices in a longer array antenna and longer word-length computation. The 16-bit fixed-point 4-element UMP without SS can separate two close incoherent waves, the DOA difference of which is approximately 10 degrees when the SNR is greater than 10 dB under the condition of $\rho = 3^\circ$, as shown in Fig 3.21 (a).
CHAPTE R 3. FAST DOA ESTIMATION PROCESSOR

(a) $\rho=3$ deg.

(b) $\rho=1$ deg.

Fig. 3.21: Separability performance of two incident waves.
3.6.2 Multipath Channel with Small Angle Spread

The above analysis would be fairly important in evaluating the behavior of the fixed-point
UMP at a given SNR. However, the practical channel was not appropriately modeled in the
above simulations. In wireless cellular communication, the fading phenomenon occurs by
multipath propagation. In such a fading environment, the SNR becomes a random variable
with any distribution. In addition, the large number of coherent signals caused by far field local
scatterers arrives from spread DOA to reduce the fading correlation between antenna elements.
Based on this, the plane wave model from point sources may not be valid. Thus, the estimation
performance is usually degraded in a multi-path fading environment [40]. Considering this,
it is extremely important to investigate the behavior of UMP assuming a multi-path fading
environment. Cellular base station applications were considered in this study. The signals at
the base station are received through the scattering process in the local scatterers around the
mobile station because the antenna is usually deployed higher than the surrounding scatterers,
as shown in Fig. 3.22. In this case, the fading across the antenna elements spaced by $\lambda/2$ is still
highly correlated and the AS is typically only a few degrees. In such an AS model, the time
delays among the sub-paths can be neglected. Hence, the channel vector for the $i$-th source in
Eq.(3.2) can be rewritten approximately as

$$\mathbf{v}_i = \sum_{j=1}^{N_i} \beta_{ij} a(\theta_i + \hat{\theta}_{ij}),$$

where

- $\beta_{ij}$: Complex amplitude with Rayleigh distributed magnitude of $j$-th scattered sub-path
- $\hat{\theta}_{ij}$: DOA deviation of Gaussian random variable $\sim \mathcal{N}(0, \sigma_{\theta_i}^2)$
- $N_i$: Number of sub-paths for $i$-th source

and $2\sigma_{\theta_i}$ is defined as AS. The relationship between the AS and estimated mean DOA deviation,
as given by Eq.(3.52), is shown in Fig. 3.23, where $N_1 = 30$ sub-paths are uniformly distributed
at the center of single wave source located at $\theta_i = 30^\circ$ and average SNR is 10 dB. In this
Fig. 3.23: Mean DOA deviation to angle spread in UMP (average SNR=10dB).

In the figure, the diamond line is the theoretically approximated result [40]. The UMP result is in good agreement with the theoretical approximation. Hence, the functionality of UMP in a realistic multipath channel model is observed to be still valid.
3.7 Real Data Examples

In this section, the entire system operation of the UMP will be experimentally demonstrated in a radio anechoic chamber. A 4-element omni-directional sleeve antenna array that was half wavelength equi-spaced in linear geometry was used. The fine-calibration procedure was not performed in this experiment; however, the coarse adjustment of amplitude and phase was manually conducted before measurement. This example assumed two incident waves, and the data snapshot vector of these waves was generated by a linear combination of each data snapshot vector of the single incident wave. The experimental parameters are illustrated in Table 3.2. The RF carrier frequency was 5 GHz. The transmitted waves were CW at different frequencies (the difference was approximately 1.3 MHz). Figure 3.24 shows the experimental result. The DOAs of each wave were set by 0 and 40 degrees. Two waves were transmitted at the same power, and the average SNR at each antenna element was approximately 19 dB. In Fig.3.24, the indices of the DFT spectrum corresponding to the LM points below any appropriate threshold level (relative fixed point level of 5,000 herein) were 127 and 208. With these indices, the concrete discrete wavefronts could be converted to \(-0.4476\) and 38.6822 degrees from the relationship given by Eq. (3.27), respectively. The additional estimation error was caused by the lack of the fine calibration for the antennas and analog components; however, the system operation could be sufficiently confirmed. In reality, the calibration of the entire system should be cleared for practical use.

Table 3.2: Experimental parameters.

<table>
<thead>
<tr>
<th>Antennas</th>
<th>4-element sleeve antenna in ULA</th>
</tr>
</thead>
<tbody>
<tr>
<td>Antenna spacing</td>
<td>(\lambda/2)</td>
</tr>
<tr>
<td>RF frequency</td>
<td>5 GHz</td>
</tr>
<tr>
<td>IF frequency</td>
<td>40 MHz</td>
</tr>
<tr>
<td>Modulation</td>
<td>CW</td>
</tr>
<tr>
<td>Sampling frequency</td>
<td>32 MHz</td>
</tr>
<tr>
<td>ADC resolution</td>
<td>12 bits</td>
</tr>
</tbody>
</table>

\(^9\)Fine calibration implies the compensation of the mutual coupling effect among branches of antennas and receivers. The array calibration will be discussed in Chapter 5.
Fig. 3.24: Experimental result of MUSIC spectrum and its reciprocal (4 antennas, two waves impinging at 0 and 40 degrees, same powers, average SNR=19 dB).
3.8 Summary

In this chapter, the FPGA design of the fast DOA estimator using the unitary MUSIC algorithm was proposed, and its real hardware implementation was also introduced. The unique features of this system are fast computation and compact architecture of the EVD and MUSIC angular spectrum generation with the Cyclic Jacobi processor based on CORDIC and spatial DFT, respectively. All procedures of digital signal processing were computed by fixed-point operation with only finite word-length for high speed and low power consumption. Regarding the optimization, which depends on the design technique, the hardware friendly parallel algorithm and processing concept will be still valid and outperform the serial architecture computer like general-purpose CPU.
Chapter 4

DOA-based Beamforming Processor

4.1 Introduction

A smart antenna beamforming system based on the DOAs of incident signals as the spatial channel information has certain significant advantages over the temporal reference-based techniques of combining techniques in specific environments. The first advantage is that the DOA-based beamforming has superior SINR performance compared to that of other techniques of combinations for small AS [16]. This might be reasonable in macro cellular suburban environments having local scatterers around a mobile terminal. In addition, they are directly applicable to the downlink (forward link) beamforming because of the knowledge of explicit directional information; hence, the base stations can steer maximum transmitting power toward the desired user direction. However, the drawback is that the DOA estimation is a time-consuming task. Furthermore, almost all DOA based beamforming techniques usually require complex and large computation loads, such as inversion of the complex valued correlation matrix for the SIR or SINR optimization problems. In reality, the performance depends on the DOA estimation accuracy and channel condition, such as AS. On the other hand, a low-sidelobe beamformer, such as the DC-BF, is an alternative approach as a result of its low complexity. The DC-BF only requires the DOA of the user signal steering its direction with a quiescent beam. Further, the interferers are totally suppressed by an equi-rippled low sidelobe.

This paper introduces a simple and safe beamforming technique with a low-sidelobe beampattern using the Dolph-Chebyshev (DC) tapering window. It has extremely low complexity with a simple amplitude weighting operation that is also suitable for implementation on high-speed logic devices such as FPGAs. High-speed weight adaptation capability due to its low complexity may be extremely useful for applications in various wireless communication systems for fast fading environments. In addition, equi-ripple low sidelobe leads to reasonable interference suppression performance in an AS environment. However, broadened beamwidth caused by low-sidelobe beampattern affects the angular resolution; hence, the SINR performance will be degraded if the source separation between the signal of interest and the adjacent interferer is small for its beamwidth. The relationship between low sidelobe and beamwidth is an inherent trade-off. In this study, a novel approach with cascade beamforming architecture will be proposed resolving the drawback of beamforming with low-sidelobe beampattern.
Fig. 4.1: Adaptive beamforming operation in cellular basestation (uplink).

This chapter, at first, surveys DOA-based beamforming systems using estimated DOAs of the incident signals and proposes a novel beamforming technique, as mentioned above. Hardware implementation using an FPGA will be also introduced.

4.2 DOA-based Beamforming Techniques

Digital beamforming can improve the SINR performance by adaptively controlling the directivity of the antenna radiation pattern, as shown in Fig.4.1. Several excellent optimum beamforming algorithms, such as Minimizing Mean Square Error (MMSE), Directional Constrained Minimizing Power (DCMP), and Maximum SNR (MSN) techniques [4,5,7,12], have been studied already. In addition, zero-forcing (ZF) technique based on channel response estimation can be considered as another choice; it forces zeros in interferers’ directions on the antenna pattern [1,53].

This paper assumes that the DOAs of incident signals are estimated and that their user signal is identified in advance by the DOA estimation process. Further, beamforming is performed based on the estimated DOA information. There exist optimum beamformers, such as DCMP and ZF with null-steering function, and low-sidelobe DC-BF with beamforming for the desired signal, as illustrated in Fig.4.2.

4.2.1 Optimum Beamformers

DOA-based optimum beamforming techniques include DCMP and ZF. However, these techniques require solving the (pseudo) inversion of the correlation matrix from the generalized eigenvalue problem that has high complexity. In the ZF technique, the sharp nulls will decrease the interferer rejection performance in AS environments as well as well-known noise
level enhancement effect [53]. Moreover, DCMP is extremely sensitive to the DOA estimation accuracy and a large number of snapshots is required to estimate the sample correlation matrix with high quality. However, in fact, DCMP has the tendency to miscapture the signal of interest from a small amount of DOA estimation errors; hence, the artificial noise injection technique has been studied by avoiding its unstable operation [54].

The sample data vector received by an antenna array can be modeled by the linear combination of the desired signal (subscripted by ‘0’) and the $L$ interferers as

$$
\mathbf{x}(t) = \mathbf{v}_0 s_0(t) + \sum_{l=1}^{L} \mathbf{v}_l s_l(t) + \mathbf{n}(t),
$$

(4.1)

where $v_i$ denotes the spatial signature of the signal $s_i(t)$, and $n(t)$ is an AWGN. The array output $y(t)$ is obtained as

$$
y(t) = y_d(t) + y_u(t) + y_n(t).
$$

(4.2)

The output signal components contributed by the desired signal, interferers, and noise are written as

$$
y_d(t) = \mathbf{w}^H \mathbf{v}_0 s_0(t),
$$

(4.3)

$$
y_u(t) = \mathbf{w}^H \sum_{l=1}^{L} \mathbf{v}_l s_l(t),
$$

(4.4)

$$
y_n(t) = \mathbf{w}^H \mathbf{n}(t),
$$

(4.5)

respectively, where $\mathbf{w}$ is the beamforming weight vector. On the basis of the above relations, the output SINR can be obtained by

$$
\text{SINR}_{\text{out}} = \frac{P_d}{P_t + P_n} = \frac{E[|y_d(t)|^2]}{E[|y_u(t)|^2] + E[|y_n(t)|^2]}
$$

$$
= \frac{P_0 \mathbf{w}^H \mathbf{v}_0 \mathbf{v}_0^H \mathbf{w}}{\sum_{l=1}^{L} P_l \mathbf{w}^H \mathbf{v}_l \mathbf{v}_l^H \mathbf{w} + \sigma_n^2 \mathbf{w}^H \mathbf{w}}
$$

(4.6)
where the desired signal power $P_0$ and interferers' power $P_i$ are $E[|s_0(t)|^2]$ and $E[|s_i(t)|^2]$, respectively. Further, additive noise is assumed to be spatially white as $E[n(t)n^H(t)] = \sigma_n^2 I$. Adaptive beamforming in smart antenna systems usually improves the SINR performance by selecting the optimum weight vector based on the following criteria.

**ZF (Zero-forcing)**

The optimum weight can be determined by the deterministic maximum likelihood waveform estimator [1, 53], which is referred to as ZF in this paper. The ZF technique can combine the output signal retaining the desired signal output by unit gain and suppressing all interference components completely by forcing zeros as

$$
\begin{align*}
w^H v_0 &= v_0^H w = 1 \\
w^H v_1 &= v_1^H w = 0 \\
& \quad \vdots \\
w^H v_L &= v_L^H w = 0
\end{align*}
$$

and the matrix notation is given by

$$V^H w = g. \quad (4.8)$$

where $g$ is a gain vector typically given by $[1, 0, \cdots, 0]^T$. Using Moore-Penrose's matrix inversion [2], the optimum weight vector with minimum norm can be determined as

$$w_{ZF} = V^{(+)\, g}, \quad (4.9)$$

where

$$V^{(+)\, g} = V(V^H V)^{-1}. \quad (4.10)$$

**DCMP (Directional Constrained Minimizing Power)**

The DCMP beamformer minimizes the output power under the constraint condition as

$$\begin{align*}
\arg \min_w & \left( P_{out} = w^H R_{xx} w \right) \\
\text{subject to} & \quad C^H w = g
\end{align*} \quad (4.11)$$

where $C \in \mathbb{C}^{M \times L}$ is a directional constraint matrix and $g \in \mathbb{C}^{M \times 1}$ is a gain vector, which often becomes the desired signal vector $v_0 \in \mathbb{C}^{M \times 1}$ and scalar value (typically chosen by unit gain), respectively, if a single constraint for the desired signal is applied. The optimum weight vector for the DCMP criteria can be determined by the Lagrange multiplier method as

$$w_{DCMP} = \left( \frac{1}{v_0^H R_{xx} v_0} \right) R_{xx}^{-1} v_0, \quad (4.12)$$

[50].
4.2.2 Low-sidelobe Beamformers

An adaptive beamformer in a smart antenna system can operate only in the presence of white noise by the quiescent beampattern for the desired signal. In such a situation, it is not necessary for a beamformer to be adaptive with null steering capability. However, if interferences exist, the undesired power is usually received by the sidelobe of the beampattern. In order to avoid this, it is necessary to select the beampattern with a low sidelobe that completely suppresses the interference power. Several well-known deterministic designs, such as the Dolph-Chebyshev array, have been referred to in this study. Dolph-Chebyshev array design is a simple method for obtaining beampatterns with prescribed equi-ripple low sidelobe [51, 52]. Figure 4.3 shows various quiescent beamformers, including Dolph-Chebyshev, Taylor, and binomial windows.

In certain environments, low-sidelobe beamformers may be considered the simple and safe beamformer with sub-optimum SNR performance. They have extremely low complexities with just simple scalar amplitude weighting operation of the steering vector, which are also suitable for implementation high-speed logic devices, such as FPGAs. Due to its low complexity, application of high-speed weight adaptation capability may be very useful in various wireless communication systems for fast fading environments. In addition, an equi-ripple low sidelobe leads to good interference suppression performance in the presence of the interferences’ AS. However, broadened beamwidth caused by a low-sidelobe beampattern affects the angular resolution; hence, the SINR performance will be degraded if the source separation between the desired signal and adjacent interferers is too small for its beamwidth. Hereafter, this paper refers to the interferers existing within the beamwidth as “in-beam” interferer. The relationship between low sidelobe and beamwidth is an inherent trade-off.
4.2.3 Performance Comparison

Optimum beamformers, such as ZF and DCMP, have an optimum performance combining the desired signal and suppressing the interferers; however, one of the major drawbacks is the computational complexity of the matrix inversion. On the other hand, low-sidelobe beamformers have extremely low complexities. Figure 4.4 shows the normalized number of operations required to compute the weight vector in each beamformer, where the weight vectors are computed respectively by

\[ w_{DC} = C_{DC} \odot v_0, \quad (4.13) \]
\[ w_{DCMP} = R_{xx}^{-1} v_0, \quad (4.14) \]
\[ w_{ZF} = V (V^H V)^{-1} g, \quad (4.15) \]

where \( C_{DC} \) and \( \odot \) denote the Dolfh-Chebychev tapering window coefficients and the element-wise multiplication operator, respectively. The computation load of ZF depends on the number of sources while those of DC-BF and DCMP are constant. In this comparison, only the operations for the weight computation are counted and it is assumed that the data, such as the correlation matrix and the spatial signatures of the sources \( v \), are already known. It can be seen that the Dolfh-Chebychev beamformer has great advantage of the computation load comparing with those of the optimum beamformers.

In general, the optimum beamformers and low-sidelobe beamformers have different performances depending on the separation angles between the desired signal and the interferer. Eq.(4.7) can be rewritten using a spatial correlation coefficient between the spatial signature of signals and the weight vector by

\[ \text{SINR}_{\text{out}} = \frac{P_0 |v_0|^2 |\beta|}{\sum_{l=1}^L R_l |v_l|^2 |\beta_l|^2 + \sigma_n^2}, \quad (4.16) \]

where the spatial correlation coefficient is

\[ \beta = \frac{w^H v}{\sqrt{w^H w} \cdot \sqrt{v^H v}} \quad (4.17) \]

Figure 4.5 displays the characteristics of the correlation coefficient \( \beta \), which refers to the contribution on the output signal (‘1’ implies fully contributive and ‘0’ implies no contribution). The subscripted \( \beta_{\text{des}} \) and \( \beta_{\text{int}} \) are used for the desired signal and interferer, respectively. This simulation assumed that the planar waves, including single desired signal (located at 0\(^\circ\), broadside) and single interference (located at 0\(^\circ\) \( \sim \) 90\(^\circ\)), arrived at an 8-element ULA (uniform linear array) antenna; the weight vector was determined by their DOAs. The optimum beamformers, including ZF and DCMP, simultaneously operate the beamforming and nullsteering. However, the uniform beamformer (U-BF) and low-sidelobe DC-BF only perform beamforming for the desired signal. The beamforming performance in the optimum beamformers with small source
Fig. 4.4: Computation load comparison; normalized number of operations.

separation within the beamwidth\(^1\) are significantly degraded. This is because an excessive interference rejection constraint leads to an unintentional shift of the mainlobe place. On the contrary, the interference’s power usually appears in the output signal in the U-BF and DC-BF to degrade the SINR characteristic in the small source separation within the beamwidth. This is because the interference rejection capability is not supported in such beamformers.

The performance of the adaptive beamformers is usually assessed by the SINR characteristics. Figure 4.6 shows the SINR characteristics when the SNR at every element is 10 dB. Theoretically, an optimum SINR of 19 dB can be achieved by beamforming with an 8-element antenna \((10 + 10 \log_{10} 8 = 19 \text{ dB})\) if the interference is power is perfectly eliminated. The SINR of U-BF has an optimum performance only when the interferer is located at null places on the beampattern; otherwise, it is degraded by the interference reception by the sidelobe. The DC-BF has a sub-optimum SINR performance that is slightly lower (approximately 1 dB) than the optimum performance with sufficient source separation. However, as mentioned above, broadened beamwidth caused by low-sidelobe beampattern decreases the angular resolution\(^2\). In other words, if the source separation between the desired and adjacent interferer is small for its beamwidth, the SINR performance will be significantly degraded to a level less than that of the optimum beamformers.

\(^1\)It means the beamwidth between the first null places in the uniform beampattern (around \(2 \times 13^\circ\))

\(^2\)The DC-BF has sub-optimum performance due to the gap of the correlation coefficient \(\rho_{\text{base}}\), as shown in Fig.4.5.
Fig. 4.5: Correlation coefficient $\beta$ depending on source separation, where the desired signal arrives at $0^\circ$, broadside, and the interferer is located at $0^\circ \sim 90^\circ$ (8-element ULA).

Fig. 4.6: SINR characteristics depending on source separation (8-element ULA, input SNR @ single element = 10 dB).
4.3 Null Steering Beamformer with Notch Beams

This paper proposes a novel beamforming technique with extremely low complexity. This is based on the beamforming technique with the low-sidelobe beampattern using Dolph-Chebyshev tapering window. The optional null-steering processing provides in-beam interference rejection capability with extremely low computation complexity, thus the SINR degradation can be decreased.

4.3.1 Basic Principle

It basically steers the mainlobe toward the user direction and completely suppresses the interferers by low sidelobe with the Dolph-Chebyshev beampattern. The low-sidelobe beamformer has the significant advantage of low complexity without huge computation load in optimum techniques, such as ZF and DCMP for explicitly nullifying interferences. However, if certain interferers exist within the broad mainlobe area (in-beam), the performance will be considerably degraded because of unavoidable interference reception. In this system, the interferers within the mainlobe can be canceled out by optional null-steering processing with a notch beampattern. The final beamforming weight is computed by simple convolution operations of pre-computed Dolph-Chebyshev beam and optional canceling beams of notch beams; hence, the computation load for weight generation is very small.

The Dolph-Chebyshev beam is usually used for beamforming toward the user DOA. The interferers are totally suppressed by a low sidelobe. The low-sidelobe beamformer, such as the DC-BF, does not require time-consuming tasks such as the inversion of the complex valued correlation matrix. However, the reception of the interferers’ power in a low sidelobe is not zero; hence, the SINR performance is not optimum, as mentioned above. Further, if an interferer exists within the broadened mainlobe area, the low-sidelobe beamformer cannot suppress the interferers any more. This paper proposes the cascade beamformer of a low-sidelobe DC beam and optional canceling beams in order to overcome this; this is termed Null-steering Dolph-Chebyshev Beamformer (NDC-BF). In NDC-BF, the beamformer adaptively performs beamforming for user signal and nullsteering only for the in-beam interferers with independent nulls as

\[ \mathbf{w}_{\text{null}}^H \hat{\mathbf{v}}_l = 0, \]  

and \( \mathbf{w}_{\text{beam}} \) is a low-sidelobe beam determined by Dolph-Chebyshev tapering window \( \mathbf{C}_{DC} \), denoted by

\[ \mathbf{w}_{\text{beam}} = \mathbf{C}_{DC} \odot \hat{\mathbf{v}}_0, \]  

where \( \hat{\mathbf{v}}_0 \) and \( \hat{\mathbf{v}}_l \) are the estimated spatial signature vectors of user and interferers, respectively, by DOA estimation pre-processing. Figure 4.7 shows the cascade beamforming structure consisting of the beamforming stage and optional post null-steering stage for a single null. Regarding digital signal processing, the total beamforming weight can be computed immediately
Fig. 4.7: Cascade beamforming structure (single null).

Fig. 4.8: Relationship between desired null place and required phase shift.
CHAPTER 4. DOA-BASED BEAMFORMING PROCESSOR

Fig. 4.9: Generalized cascade beamforming structure ($l$ nulls).

by element space convolution instead of physically cascaded structure [55] as

$$w_{NDG-BF}^T = \tilde{w}_{beam}^T * w_{null}^T,$$  \hspace{1cm} (4.20)

where * denotes convolution operation and $\tilde{w}_{beam} \in \mathbb{C}^{(M-l)\times 1}$, $w_{null} \in \mathbb{C}^2\times 1$. The notch beam to nulling the power in the direction $\theta_{null}$ is computed as

$$w_{null} = \left[1, e^{-j\pi \sin \phi} \right]^T,$$  \hspace{1cm} (4.21)

where

$$\phi = \begin{cases} 
\sin^{-1} \left[1 - \sin \theta_{null} \right] & 0 \leq \theta_{null} < \pi/2 \\
-\sin^{-1} \left[1 + \sin \theta_{null} \right] & -\pi/2 \leq \theta_{null} < 0
\end{cases},$$  \hspace{1cm} (4.22)

and the relationship between the phase shift $\phi$ and null-steering direction $\theta_{null}$ is illustrated in Fig. 4.8.

Figure 4.9 shows a generalized version of the cascade structure for the $M$-element array comprising the beamforming stage and $l(< M - 1)$ optional post null-steering stages with notch beams. The weight vector can be extended by

$$w_{NDG-BF}^T = \tilde{w}_{beam}^T * w_{null,1}^T * \cdots * w_{null,l}^T,$$  \hspace{1cm} (4.23)

Figure 4.10 shows the examples of some beamforming scenarios with multiple null steering stages to reject the in-beam interferers within the mainlobe when there exist seven interferers; one (Fig.4.10(a)), two (Fig.4.10(b)), and three (Fig.4.10(c)) are in-beam interference(s) within the mainlobe area. The user DOA is set by 0° and the in-beam interferers are located at 10°, −20° and 20°. The in-beam interferences are observed to be effectively eliminated by
Fig. 4.10: Beamforming scenarios in null-steering Dolph-Chebyshev beamformer (NDC-BF) with multiple nulls.

notch beams; however, the mainlobe is slightly shifted off the user direction. In addition, null-steering processing leads to an increase in the sidelobe level. Therefore, the number of independent nulls for the in-beam interferers may be limited by three or less interferers in 8-element array configuration. The performance improvement and limitation of NDC-BF will be discussed in the next subsection.
CHAPTER 4. DOA-BASED BEAMFORMING PROCESSOR

Fig. 4.11: SINR characteristics depending on source separation (8-element ULA, input SNR @ single element = 10 dB).

4.3.2 Performance Improvement and Limitation

Assuming there are no in-beam interferers without an optional null-steering stage, the proposed NDC-BF will have the same performance to that of the DC-BF. However, the SINR performance in a small source separation condition is dramatically improved, as shown in Fig.4.11. The optional null-steering stage was applied only when the source separation is below 25° (in-beam area) because the interference suppression by the low sidelobe has slightly better SINR characteristics than the optional null-steering process of the in-beam area.

The computation complexity, involved in NDC-BF is greatly low than that of the other optimum beamformers. Table 4.1 presents the computational complexity for the K-element ULA antenna in the presence of L sources. In NDC-BF, the multiplications and additions required for Eq.(4.20) were counted in each case assuming that the DC beams and notch beams were precomputed and stored in the ROM. The complexity of the DCMP in Eq.(4.14) can be computed based on the fact that the linear algebraic solution, including LU decomposition ($K^3/3$), forward substitution ($K^2/2$), and backward substitution ($K^2/2$), takes a total of ($K^3/3+K^2$) executions of complex multiplications and additions [97]. Similarly, the complexity of ZF in Eq.(4.15) can be computed based on the fact that the matrix inversion including LU decomposition ($K^3/3$), forward substitution ($K^3/3$), and backward substitution ($K^3/3$) takes a total of ($K^3$) executions of complex multiplications and additions [97]. As shown in Tab.4.1, the execution number of real operations are obtained as four times those of complex operations. Figure 4.12 shows the comparison between the computational complexity in the cases of 4 and 8 antenna elements, where no operations are required for NDC-BF in the presence of the
Table 4.1: Computational load for weight computation in real operations (K elements, L sources).

<table>
<thead>
<tr>
<th>Method</th>
<th>Multiplications</th>
<th>Additions</th>
<th>Total operations</th>
</tr>
</thead>
<tbody>
<tr>
<td>NDC-BF (single null)</td>
<td>8(K - 1)</td>
<td>2K</td>
<td>10K - 8</td>
</tr>
<tr>
<td>NDC-BF (two nulls)</td>
<td>12(K - 2) + 4</td>
<td>4K + 2</td>
<td>16K - 18</td>
</tr>
<tr>
<td>NDC-BF (three nulls)</td>
<td>16(K - 3) + 12</td>
<td>6K + 6</td>
<td>22K - 30</td>
</tr>
<tr>
<td>DCMP</td>
<td>(\frac{1}{7}K^3 + 4K^2)</td>
<td>(\frac{1}{7}K^3 + 4K^2)</td>
<td>(\frac{8}{7}K^3 + 8K^2)</td>
</tr>
<tr>
<td>ZF</td>
<td>8L^2K + 4L^3</td>
<td>8L^2K + 4L^3</td>
<td>16L^2K + 8L^3</td>
</tr>
</tbody>
</table>

![Graphs](image)

(a) K = 4 elements.  
(b) K = 8 elements.

Fig. 4.12: Computational load comparison.

desired signal only because it refers to the precomputed beam pattern for the desired signal in the ROM. It is clarified that the proposed NDC-BF has extremely low complexity.

On the other hand, the main lobe is slightly shifted off the user direction by eliminating the in-beam interferences with notch beams. Thus, the receive level for the user signal is usually dropped unavoidably. Fig.4.13 shows certain typical beam patterns of NDC-BF and ZF at different angles where the source separation is 10°. In fact, the level drop caused by the main lobe shift also appears in the ZF beamformer, however, it is less than that of NDC-BF. This is because ZF optimizes SINR by simultaneously applying constraints for both the main lobe and nulls while NDC-BF places nulls independent of the main lobe. Further, the number of nulls is typically limited to three or less for an 8-element antenna array.
Fig. 4.13: Angle dependency of mainlobe in NDC-BF, where the source separation is $10^\circ$. 
Table 4.2: FPGA resources.

<table>
<thead>
<tr>
<th>LEs</th>
<th>1,269 (about 30,000 equiv. gates)</th>
</tr>
</thead>
<tbody>
<tr>
<td>ROM</td>
<td>DC-beam</td>
</tr>
<tr>
<td></td>
<td>2,064 words (129 * 8 * 2)</td>
</tr>
<tr>
<td></td>
<td>1,792 words (129 * 7 * 2)</td>
</tr>
<tr>
<td></td>
<td>1,536 words (129 * 6 * 2)</td>
</tr>
<tr>
<td></td>
<td>1,280 words (129 * 5 * 2)</td>
</tr>
<tr>
<td>Notch-beam</td>
<td>258 words (129 * 1 * 2)</td>
</tr>
<tr>
<td>Total bits</td>
<td>83,160 bits (6,930 word * 12 bits)</td>
</tr>
</tbody>
</table>

MACs | 18 (16 × 12 MACs) |

Clock Freq. | 119 MHz |

Required clocks | < 30 + N_{frame} |

4.4 Hardware Implementation

The significant feature of NDC-BF is low complexity without a huge computation load in the case of optimum techniques, such as ZF and DCMP, for explicitly nulling interferences although the SINR performance is not optimum. Figure 4.14 shows the computation flow of this system, where we can place nulls at desired locations by simple operations; thus, the interferers within the mainlobe can be canceled by optional null-steering processing with notch beams, which are pre-computed and stored in the ROM in the FPGA’s embedded memory block. The final beamforming weight is computed by the simple convolution operation of the pre-computed DC beam and optional canceling beams; hence, the computation load for weight generation is very small. In the case of the single null, this operation requires only two scalar multiplications and one vector addition. Therefore, fast weight generation can be realized without additional complexity. The block diagram for NDC-BF on FPGA is shown in Fig.4.15. The DC beams and notch beams are pre-computed and stored in the ROM. According to the DOA estimation results in pre-processing, the simple convolution operation of the pre-computed weights can be performed easily. NDC-BF consists of ROMs for the pre-computed beams, two-tap complex FIR filter for convolution operation, and complex multipliers for beamforming. The precision of the DC beams and notch beams are appropriately chosen as 12 and 8 bits, respectively.

A minimum precision of 8 bits for DC beams and notch beams is required for a reasonable performance in fixed-point beamforming, as shown in Fig.4.16, where (a) and (b) show the fixed-point effect on beamwidth and the sidelobe level of DC beam, respectively and (c) and (d) show the fixed-point effect on null-width and null-depth of notch beam, respectively.

The FPGA implementation results are shown in Tab.4.2. According to the results, approximately 30,000 logic gates were used and 83,160 memory bits were required. Beamforming is computed extremely fast in parallel using 16 MACs. The associated time for weight generation and beamforming is, for example, approximately 1.4 µs, assuming that the frame length N_{frame} is 136 in the three-null case.
CHAPTER 4. DOA-BASED BEAMFORMING PROCESSOR

Fig. 4.14: Computation flow of NDC-BF.

Fig. 4.15: DSP Block Diagram for NDC-BF
CHAPTER 4. DOA-BASED BEAMFORMING PROCESSOR

Fig. 4.16: Characteristics of main beam and notch beam with fixed wordlength.
4.5 Performance Evaluation by Computer Simulations

This section evaluates the performance of the proposed NDC-BF through computer simulations in certain cases.

In the case of an urban cellular uplink channel, the BER (bit error rates) performance of the proposed NDC-BF is first discussed for both the LOS and multipath models. It was assumed that four waves, including single user and three interferers (two of them being in-beam interferers), arrived at the half wavelength equi-spaced 8-element antenna array; their DOAs are perfectly estimated. Figure 4.17 shows the schematic simulation configuration. The detailed parameters are also listed in Tab.4.3. The performances of the conventional uniform beamformer (U-BF), low-sidelobe DC-BF, and ZF are also simulated for comparison with the proposed system. While the planar wave model without fading was assumed in the LOS environment the Rayleigh fading model with a small AS was considered in the multipath environment. In an urban cellular environment, the time delays among the sub-paths can be neglected. Thus, the spatial signature for the i-th source is approximately given by

\[ v_i = \sum_{j=1}^{N_i} \beta_{ij} a(\theta_i + \tilde{\theta}_{ij}), \]  

(4.24)

where

\[ \beta_{ij} : \text{ Complex amplitude with Rayleigh distributed magnitude of } j\text{-th scattered sub-path} \]
\[ \theta_i : \text{ Nominal DOA for } i\text{-th source} \]
\[ \theta_{ij} : \text{ DOA deviation of } j\text{-th sub-path of Gaussian random variable } \sim \mathcal{N}(0, \sigma_{\theta_i}^2) \]
\[ N_i : \text{ Number of sub-paths for } i\text{-th source} \]

[40] and the AS $2\sigma_{\theta_i}$ for each source was set by $3^\circ$.

Figures 4.18 and 4.19 illustrate the BER performances in LOS and multipath environments, respectively. The average INR (Interferer to Noise Ratio) of each interferer was set as 9 dB. Except for DOA estimator, in order to evaluate only the beamformer, the information of the source locations was assumed to be perfectly known, as mentioned above. In the previous section, the SINR characteristics of the NDC-BF were not optimum; hence, they were inferior to those of optimum beamformers such as ZF. However, from these results, it is clear that the proposed system performance is comparable to the ZF technique under the BER criteria, and the computation load is extremely low. For example, an 8-element NDC-BF with two nulls requires only three scalar×vector multiplications and two vector summations. If the operations are counted, the computation load was approximately several tens greater than that of ZF, as mentioned in the previous section.

In order to determine the appropriate word-length for the DC beam and notch beam, the fixed-point effect was examined, as shown in Fig.4.20, where (a) and (b) show the fixed-point effect on BER performance and SINR, respectively. Simulation conditions similar to those in Tab.4.3 were applied. It can be clarified that 12- and 8-bit precisions for the DC beam and notch beam, respectively is sufficient to avoid performance degradation. Hence, the system was actually implemented with this precision.
Fig. 4.17: Simulation configuration; AS=3°.

Table 4.3: Simulation parameters.

<table>
<thead>
<tr>
<th>Antennas</th>
<th>8-Element ULA (λ/2)</th>
</tr>
</thead>
<tbody>
<tr>
<td>DOA Estimator</td>
<td>Perfect Estimation</td>
</tr>
<tr>
<td>Beamformers</td>
<td>U-BF,</td>
</tr>
<tr>
<td></td>
<td>DC-BF with SLR=40dB,</td>
</tr>
<tr>
<td></td>
<td>NDC-BF with SLR=40dB,</td>
</tr>
<tr>
<td></td>
<td>ZF</td>
</tr>
<tr>
<td>Transmitting Signal</td>
<td>π/4-DQPSK</td>
</tr>
<tr>
<td>Burst Frame Length</td>
<td>136 Symbols (272 bits)</td>
</tr>
<tr>
<td>Signal Sources</td>
<td>User 0°</td>
</tr>
<tr>
<td></td>
<td>Interferers −20°, 15° (In-Beam), −40° (Out-of-Beam)</td>
</tr>
<tr>
<td></td>
<td>Average INR @ single antenna of each signal is 9 dB</td>
</tr>
<tr>
<td>Channels</td>
<td>LOS (AWGN) and</td>
</tr>
<tr>
<td></td>
<td>Multipath (Rayleigh Fading with AS = 3°, N_t = 30)</td>
</tr>
<tr>
<td>Doppler Freq.</td>
<td>0 (Burst Stationary)</td>
</tr>
<tr>
<td>Trials</td>
<td>2000</td>
</tr>
</tbody>
</table>
Fig. 4.18: BER performance in noise channel under plane wave assumption (LOS model).

Fig. 4.19: BER performance in multipath channel under Rayleigh fading assumption (multipath model).
Fig. 4.20: Performance in NDC-BF with fixed-point arithmetic.
Fig. 4.21: Output SINR versus number of interferers in Rayleigh fading channel ($\text{AS} = 5^\circ$), where the DOA of the desired signal and the nearest interferer are set by 0 and 17° and the other interferers are located at $-70, -60, -50, -40, -30, 30, 40, 50, 60$ and $70^\circ$, respectively.

On the other hand, the output SINR in the presence of many interferers is discussed. Figure 4.21 shows the average output SINR performance for a number of interferences (up to 10) in Rayleigh fading channel where the AS was $5^\circ$. The DOAs of the desired signal and the nearest interferer are set as 0 and 17° respectively; the other interferers were located at $-70, -60, -50, -40, -30, 30, 40, 50, 60$, and $70^\circ$. In Fig.4.21, the output SINR was plotted adding an interferer in above sequence. All the incident signals had identical power and the other conditions were similar to those in the previous case. The result shows that conventional U-BF without adaptive null-steering or sidelobe suppression for the interferer is significantly degraded in the presence of a larger number of interferers; however, ZF and NDC-BF has an approximately flat SINR. On the basis of these results, it can be also seen that the NDC-BF will be a better choice than high-complexity optimum beamformers, such as ZF.
4.6 Summary

This paper proposes a novel beamforming technique for DOA-based smart antenna system. This system basically steers the mainlobe for the user with the Dolph-Chebyshev beam. Further, in order to avoid undesirable interference reception by broad mainlobe, an optional null-steering processing cancels the interferers within the mainlobe coverage with notch beams, which are pre-computed and stored in the ROM. This is based on a cascade structure with Dolph-Chebyshev low-sidelobe and simple notch beampatterns to cancel the in-beam interferer; however, the weight vector can be obtained by simple convolution operations of the pre-computed DC beam and optional canceling notch beams. Hence, the complexity is extremely low with such a simple weighting operation; this is suitable for implementation in high-speed logic devices, such as FPGAs. The performance improvement and limitation of the proposed beamformer was also discussed. Further, BER performances in several scenarios were assessed using computer simulations.
Chapter 5

Integrated System Evaluation

5.1 Introduction

This chapter describes an integrated system evaluation with developed hardware assuming an uplink smart antenna system that is useful for macro-cellular base stations. This system incorporates a fast DOA estimator termed UMP using the dedicated circuit device of FPGA, as presented in Chapter 3. In the case of uplink beamforming, this system basically steers the mainlobe for the user location with the beam. Further, in order to avoid undesirable interference reception by broad mainlobe, an optional null-steering processing cancels the interferers within the mainlobe coverage with notch beams. The beamforming weight is computed by a simple convolution operation of the pre-computed Dolph-Chebyshev beam and an optional canceling beam; hence, the computation load for weight generation is extremely small, as mentioned in Chapter 4.

In this chapter, the DOA estimator and beamformer are integrated on the developed testbed system that was presented in Chapter 2. Firstly, the evaluation system configuration co-designed using FPGAs and CPU will be introduced. The total integrated system will be evaluated verifying the normal operation and confirming the performance. Secondly, each processing functions are discussed according to the processing flow and the computer simulation results will be described. Some experimental results of the total performance evaluation in a radio anechoic chamber assuming LOS AWGN channel will be presented as well. Further, the problems with the calibration of antenna array and analog receivers will be discussed.

5.2 System Configuration

The integrated system consists of a DOA estimator and a beamformer, as shown in Fig.5.1. The received signals at the antenna array are downconverted into complex baseband signals and fed into the UMP DOA estimator where the DOAs of the incident signals are estimated. The UMP employs the unitary MUSIC algorithm. The NDC-BF is applied for beamforming. This basically uses the equi-ripple Dolph-Chebyshev beam with a sidelobe ratio (SLR) of 40 dB, steers the mainlobe toward the user direction, and completely suppresses the interferers by low sidelobe. Following the DOA estimation and identification, in order to eliminate in-
beam interferers, the interferers are classified into in-beam and out-of-beam. NDC-BF provides optional null steering processing by cascade beamforming with notch beams. The canceling beams are computed by the convolution of the pre-computed notch beams stored in the ROM. Finally, the uplink beamforming weight is combined by the convolution of the pre-computed DC beam and the canceling beam. The burst frame data is updated after beamforming with the final weight vector.

Figure 5.2 illustrates the evaluation configuration with the testbed system. It performs the DOA estimation and uplink beamforming with single FPGA (STRATIX EP1S40, Altera) and CPU (SH4, Hitachi). The FPGA has approximately 1 million equivalent gates, approximately 417 Kbytes internal memory blocks, and optimized DSP blocks [103]. CPU performs receiver calibration and synchronization tasks with software. The burst frame format is defined as shown in Fig. 5.3, where a single burst consists of 136 symbols, including 16 training symbols. The modulation scheme of π/4-shift QPSK was used.
**Chapter 5. Integrated System Evaluation**

**Fig. 5.2: Evaluation configuration with testbed system.**

**Fig. 5.3: Burst frame format.**
5.3 Data Model

The basic narrowband signal model was assumed in a macro cellular Rayleigh fading environment with a small AS, as shown in Fig. 5.4. The transmitted signals $s_i(t)$ from the $i$-th source, where $i = 1, 2 \cdots, L$ arrive at a $M$-element antenna array spaced by half-wavelength as

$$x(t) = Vs(t) + n(t),$$  \hspace{1cm} (5.1)

where the array output $x(t)$ is a received data vector, and $s(t)$ and $n(t)$ are the signal and complex AWGN vectors, at time $t$, respectively. The columns of the channel matrix $V = [v_1, v_2, \cdots, a_n v_L]$ consist of the spatial channel vectors for $L$ sources. In the case of the $i$-th source cluster, the spatial channel vector $v_i$ is modeled by

$$v_i = \sum_{j=1}^{N_i} \beta_{ij} a(\theta_i + \delta_{ij}),$$  \hspace{1cm} (5.2)

where

$$a(\theta) = [1, e^{-j\pi \sin \theta}, \cdots, e^{-j\pi(K-1)\sin \theta}]^T,$$ \hspace{1cm} (5.3)

$\beta_{ij}$: Complex amplitude with Rayleigh distributed magnitude of $j$-th scattered sub-path

$\theta_i$: Nominal DOA for $i$-th source

$\delta_{ij}$: DOA deviation of $j$-th sub-path of Gaussian random variable $\sim \mathcal{N}(0, \sigma_{\delta_{ij}}^2)$

$N_i$: Number of sub-paths for $i$-th source

The superscript $^T$ denotes the transpose operator. $\Delta \theta_i$ is AS of $2\sigma_i$, where $\sigma_i$ is the standard deviation of $\delta_{ij}$. This model neglects the time delays between the sub-paths, and $\delta_{ij}$ and $\beta_{ij}$ have identically independent distribution (i.i.d) for the indices $i$ and $j$. The observation period is assumed to be sufficiently short for the coherence time; hence, the channel is quasi-stationary in a burst frame duration [40].
CHAPTER 5. INTEGRATED SYSTEM EVALUATION

5.4 Integrated System Functions

5.4.1 Estimation of Number of Sources

Based on Eigenvalue Decomposition

Since \( L \) is typically unknown, the MUSIC algorithm usually requires an estimate of the number of sources \( L \) as priori. We have some well-known criteria that determines the rank of the data vector \( \mathbf{x} \) from the estimated eigenvalues of the data matrix. These criteria, termed AIC (An Information Criterion) \([46]\) and MDL (Minimum Description Length) \([45]\) are given by

\[
AIC(L) = -2N(K - L) \cdot \ln \delta(L) + K(2K - L) \cdot 2
\]

\[
MDL(L) = -2N(K - L) \cdot \ln \delta(L) + K(2K - L) \cdot \ln N
\]

respectively, where \( L, K, N \) and \( \lambda \) denote the rank of the signal subspace (number of signals), number of elements, number of snapshots, and eigenvalues, respectively. The statistic quantity is given by

\[
\ln \delta(L) = \frac{1}{K - L} \ln \left( \prod_{i=L+1}^{K} \lambda_i \right) - \ln \left( \frac{1}{K - L} \prod_{i=L+1}^{K} \lambda_i \right).
\]

Based on the above criteria, we can estimate the number of signals as \( K \) to minimize \( \delta(K) \).

There exists another simple criterion based on a heuristic approach that introduces the threshold levels to separate the signal and noise eigenvalues. The two threshold levels, \( \lambda_i \) from the maximum value and \( \lambda_b \) from the minimum value, are given as

\[
\Omega_i = \frac{\lambda_{\text{max}}}{\lambda_i}, \quad \Omega_b = \frac{\lambda_{\text{min}}}{\lambda_b},
\]

respectively. The threshold levels should be applied to the provided situation.

Based on Power Spectrum

A number of signals can be estimated with the angular power spectrum, which is computed by a conventional beamformer \([47]\). In this approach, the threshold level in a power spectrum is also necessary. This approach is frequently used to effectively determine the number of clusters in multipath environments \([48]\).

5.4.2 DOA Estimation

The fast DOA estimator termed UMP, implemented on FPGA \([49]\), employs spectral unitary MUSIC algorithm and performs all computation processes with a fixed-point arithmetic. This arithmetic typically has a 16-bit finite word-length for low complexity and low power consumption that is superior to those of general-purpose processors. The resulting gate resources used in the target FPGA and the computation time required to process single burst frame defined
Table 5.1: Gate resources and computation time.

<table>
<thead>
<tr>
<th></th>
<th>Used LEs (Equiv. Gates)</th>
<th>Computation Time (μs)</th>
</tr>
</thead>
<tbody>
<tr>
<td>4-Element with SS</td>
<td>12,007 (280,760)</td>
<td>30.59</td>
</tr>
<tr>
<td>4-Element</td>
<td>12,995 (303,860)</td>
<td>57.11</td>
</tr>
<tr>
<td>8-Element</td>
<td>29,472 (689,130)</td>
<td>373.99</td>
</tr>
</tbody>
</table>

Fig. 5.5: DOA estimation in noise channel with UMP where two sources impinge at 4-element antenna array from 30 and $-10^5$ (planar wave assumption).

in Fig. 5.3 are provided in Tab. 5.1. Fig. 5.5 shows DOA estimation examples of two incident signals with identical power in 4-element UMP, where the source locations are 30 and $-10^5$. Each SNR used 2,000 burst frames. The detailed performance of the UMP has been discussed in Chapter 3. In fixed-point system, the arithmetic with fixed word-length usually lead to low dynamic range. In UMP, the estimation performance is also limited. The dynamic range of estimation performance depends on the word-length of the arithmetic processor in fixed point system. In addition, MUSIC DOA estimation performance with finite number of samples is limited by the SNR. The dynamic range was assessed by the success rates in the presence of multiple waves with difference powers which is defined as the rates satisfying the condition of

$$\theta = |\theta_{\text{estimated}} - \theta_{\text{true}}| < \sqrt{T},$$

(5.8)

1The implementation results may not be optimum; however, it is important to note that better computation performance can be realized by enhancing parallelism and using a more efficient pipeline scheme.
where \( L \) is the number of waves. Figure 5.6 shows the success rates versus the power ratio between them assuming that two waves arrive at an 8-element antenna array from 0 and 35° and sampled by a 12-bit ADC\(^2\). When SNR is 10 dB as shown in Fig.5.6(a), the success rates are limited under SNR. Fig.5.6(b) shows the success rates in the SNR of 100 dB, where the noise contribution does not appear in the input signal sampled by 12-bit ADC. It is clarified that the performance degradation in the fixed point system, hence the word-length requirement should be examined depending on the applications.

### 5.4.3 User Identification

The simplest identification method is identifying the strongest of the received signals as the desired signal and the others as the interferers. This identification method is effective only if it is applied to relatively simple propagation environments. In the presence of a large number of multipaths comprising both the desired signal and interference components, the desired signal may not always be the strongest one. Further, it is certain that the strongest signal should be suppressed. [66] proposed an identification technique for all detected signals using pre-beamforming in practical environments. However, the pre-beamforming involves large computational tasks. On the other hand, the desired signal can be determined using a higher layer of the system control of periodic transmission with a relatively long unique word sequence and its correlation detection [55]. However, user identification is not included in the scope of this paper; we have assumed that the user signal is already known.

### 5.4.4 DOA Tracking

In a typical multipath propagation channel, the signal level becomes a random variable and the DOA is spread by local scatterers in the vicinity of the mobile terminals. Therefore, the beamforming with an instantaneous DOA estimation result based on the planar wave assumption leads to performance degradation. In small AS environments, such as an urban cellular base station, the statistical estimation by a tracker can be usefully introduced as

\[
\hat{\theta}_i(n) = \beta \hat{\theta}_i(n-1) + (1 - \beta) \theta_i(n),
\]

where \( \theta_i \) and \( \hat{\theta}_i \) are the instantaneous and tracked DOA, respectively. \( \beta \) is a forgetting factor (0.825 herein). Eq.(5.9) makes the estimation variation small, thus enabling the nominal DOAs to be tracked. Figures 5.7 and 5.8 show the estimation standard deviation of the DOA estimation with the 4-element UMP in a fading channel (AS is 3 degrees). In this simulation, 2000 burst frames were used, as shown in Fig. 5.3, and it was shown that an averaging effect can be obtained by using the tracker. The averaging effect of tracker has been clearly illustrated in Fig.5.9.

### 5.4.5 Beamforming

In this study, NDC-BF is applied, as discussed in Chapter 4. The in-beam interferers among estimated and classified signals are rejected by optional null-steering operation.

\(^2\)The dynamic range of ADC is about 72 dB.
Fig. 5.6: Dynamic range of UMP (8-element antenna array from $s_1 = 0$ and $s_2 = 35^\circ$).
Fig. 5.7: Estimation standard deviations of instantaneous and tracked DOAs in Rayleigh fading channel with small angle spread (AS=3°, source locations are 30° and -10°).

Fig. 5.8: Estimation standard deviations of instantaneous and tracked DOAs in Rayleigh fading channel with small angle spread (SNR=13dB, source locations are 30° and -10°).
Fig. 5.9: Instantaneous DOA estimation results and tracked DOAs in Rayleigh fading channel with small angle spread (Source locations are 30° and −10°).
5.5 Experimental Evaluations

This section evaluates the performance of the integrated system through experiments for certain cases.

5.5.1 Evaluation Model

The BER performance for urban cellular uplink channel was discussed in Chapter 4. However, in this section, the BER performance was analyzed experimentally. It was also assumed that four waves, including a single user and three interferers (two of them being in-beam interferers), arrived at the half-wavelength equi-spaced 8-element antenna array. The DOA estimation is carried out with a fixed-point UMP. Figure 5.10 is a schematic of the evaluation model in LOS environment with planar wave assumption without fading. The detailed parameters are also given in Tab.5.2. The performance of conventional (U-BF), low-sidelobe DC-BF, and ZF was computed for comparison with the proposed system. The typical beampatterns of various beamformers in this evaluation model are illustrated in Fig.5.11.

5.5.2 Measurement Setup

The measurement setup in a radio anechoic chamber is shown in Fig.5.12. The prototype processor in the developed testbed system includes the UMP DOA estimator and NDC beamformer processors. Further, it controls all measurement instruments for transceiver. The transmitting signals at an IF band of 10 MHz are generated by the digital-to-analog conversion and up-converted to RF band of 5 GHz. At the receiver, an 8-element ULA of sleeve antennas equi-spaced by λ/2 was employed. The received signals are downconverted to an IF band of 40 MHz and digitized by ADCs at 32 MHz using the undersampling technique. The
Table 5.2: Evaluation parameters.

<table>
<thead>
<tr>
<th>Antennas</th>
<th>8-Element Sleeve ULA (λ/2)</th>
</tr>
</thead>
<tbody>
<tr>
<td>DOA Estimator</td>
<td>16-bit Fixed Point-UMP w/o Spatial Smoothing</td>
</tr>
<tr>
<td>Beamformers</td>
<td>U-BF,</td>
</tr>
<tr>
<td></td>
<td>DC-BF with SLR=40dB,</td>
</tr>
<tr>
<td></td>
<td>NDC-BF with SLR=40dB,</td>
</tr>
<tr>
<td></td>
<td>ZF</td>
</tr>
<tr>
<td>Transmitting Signal</td>
<td>π/4-DQPSK</td>
</tr>
<tr>
<td>Burst Frame Length</td>
<td>136 Symbols (272 bits)</td>
</tr>
<tr>
<td>Signal Sources</td>
<td>User 0°</td>
</tr>
<tr>
<td>Interferers</td>
<td>-20, 15° (In-Beam), -40° (Out-of-Beam)</td>
</tr>
<tr>
<td></td>
<td>Average INR @ single antenna of each signal is 9 dB (fixed)</td>
</tr>
<tr>
<td>Channels</td>
<td>LOS (AWGN)</td>
</tr>
<tr>
<td>Trials</td>
<td>100</td>
</tr>
</tbody>
</table>

![Typical Beampatterns](image)

Fig. 5.11: Typical Beampatterns
symbol rates and modulation are 2 Msps (mega symbol per second) and π/4-QPSK, respectively. Undersampling provides the frequency downconversion of the received signals from 40 MHz to 8 MHz; hence, the ADC’s oversampling rates for symbol rates and carrier frequency are 16 (32/2) and 4 (32/8) times, respectively. The received signals are again downconverted into complex baseband signals during digital downconversion processing.

In this measurement setup, the carrier frequency was perfectly synchronized by direct connection between the local oscillators of the receiver and the transmitter via cable. Further, the operating clock of the prototype processor was also synchronized by a clock synthesizer, as shown in Fig. 5.12. Thus, coherent detection can be performed.
Fig. 5.13: Array antenna receiver in the presence of mutual coupling and initial receiver characteristics.

5.5.3 Antenna Array Calibration

In this study, each antenna in an antenna array system has been assumed to have an ideal omni-directional property. However, in general, the actual response of the antenna array significantly deviates from the assumed ideal model due to electromagnetic coupling among the antenna elements that are typically spaced closely by λ/2. In particular, in a spatial reference based system\(^3\) such as this study, a practical antenna response should be taken into careful consideration. Subspace based DOA estimators, such as MUSIC, require precise knowledge of the antenna array response from a standard source located in any direction. The collection of antenna array responses for all possible directions is termed as the array manifold. The performance of the subspace based system strongly depends on the accuracy of this array manifold \([56]\). However, in general, the measurement of an array manifold is generally a time-consuming task. There exist many studies of the mutual coupling effect in antenna arrays and a number of array calibration methods have been proposed \([56–60]\). The issue of array calibration is out of the scope of this work; hence, the mutual coupling effect in this system is only briefly discussed. In practice, the array calibration is performed using the compensation technique proposed in \([59]\).

**Formulation**

Assuming that a signal \(s_i(t)\) located at \(\theta_i\) arrives at the \(K\)-element antenna array associated with mutual coupling \(C\), initial receiver characteristics \(\Gamma\), and element pattern \(\Lambda\), as shown in Fig. 5.13, then the response of the array can be described by

\[
\mathbf{x}_{\theta_i} = \mathbf{C} \mathbf{\Gamma} \mathbf{\Lambda}_{\theta_i} \mathbf{v}(\theta_i) s_i(t) + \mathbf{n}(t),
\]  

(5.10)

\(^3\)It is also termed as angle domain system or DOA-based system.
where \( n(t) \) is AWGN and \( C, \Gamma, \) and \( \Lambda \) are
\[
C = \begin{pmatrix}
  c_{11} & c_{12} & \cdots & c_{1K} \\
  c_{21} & c_{22} & \cdots & c_{2K} \\
  \vdots & \vdots & \ddots & \vdots \\
  c_{K1} & c_{K2} & \cdots & c_{KK}
\end{pmatrix},
\]
\[
\Gamma = \begin{pmatrix}
  \Gamma_1 & 0 & \cdots & 0 \\
  0 & \Gamma_2 & \cdots & 0 \\
  \vdots & \vdots & \ddots & \vdots \\
  0 & 0 & \cdots & \Gamma_K
\end{pmatrix},
\]
\[
\Lambda = \begin{pmatrix}
  \Lambda_1(\theta_i) & 0 & \cdots & 0 \\
  0 & \Lambda_2(\theta_i) & \cdots & 0 \\
  \vdots & \vdots & \ddots & \vdots \\
  0 & 0 & \cdots & \Lambda_K(\theta_i)
\end{pmatrix},
\]
respectively. For simplicity, the element pattern of each antenna is assumed to be omni-directional; \( C \) and \( \Gamma \) are replaced by the total error matrix \( G \) by
\[
x_{\theta_i} = Gv(\theta_i)s_i(t) + n(t)
\]
(5.14)
It is clear that the calibrated data vector can be obtained by multiplying the inversion of the total error matrix to the received data vector as
\[
\bar{x}_{\theta_i} = G^{-1}x_{\theta_i} \\
= v(\theta_i)s_i(t) + G^{-1}n(t),
\]
(5.15)
if the total error matrix can be estimated.

**Total Error Matrix Estimation**

The total error matrix is estimated by using the technique proposed in [59]. This technique exploits the orthogonality of the signal and noise subspace. The error matrix can be estimated with a smaller number of measured data than the array manifold technique. The received data from the \( i \)-th known source located at \( \theta_i \) can be written as Eq.(5.14). The correlation matrix can be obtained by
\[
R_{\theta_i} = E[x_{\theta_i}x_{\theta_i}^H] = \sum_{j=1}^{N} \lambda_j^{(i)} e_j^{(i)} e_j^{(i)H},
\]
(5.16)
where \( \lambda_j \) and \( e_j \) are the \( j \)-th eigenvalue and eigenvector, respectively. The following relation can be achieved using the orthogonality of signal and noise subspace, as
\[
e_j^{(i)H} \cdot Gv(\theta_i) = 0, \quad j = 2, \cdots, K
\]
(5.17)
CHAPTER 5. INTEGRATED SYSTEM EVALUATION

If the noise eigenvectors obtained by received data from a single known source are \( \mathbf{e}_j \) \((j = 2, \ldots, K)\), \(K - 1\) equations can be achieved as

\[
\mathbf{e}_j^{(i)H} \cdot \mathbf{G}\mathbf{v}(\theta_i) = \left[ \epsilon_{j,1}, \epsilon_{j,2}, \cdots, \epsilon_{j,K} \right] \times \begin{bmatrix}
\epsilon_{j,1}\mathbf{v}_i,1 + \epsilon_{j,2}\mathbf{v}_i,2 + \cdots + \epsilon_{j,K}\mathbf{v}_i,K \\
\epsilon_{j,1}\mathbf{v}_i,1 + \epsilon_{j,2}\mathbf{v}_i,2 + \cdots + \epsilon_{j,K}\mathbf{v}_i,K \\
\vdots \\
\epsilon_{j,1}\mathbf{v}_i,1 + \epsilon_{j,2}\mathbf{v}_i,2 + \cdots + \epsilon_{j,K}\mathbf{v}_i,K 
\end{bmatrix} = 0, \quad (5.18)
\]

where the number of unknown parameters is \(K^2\). For example, if \(K = 8\) implies \(K^2 = 64\) unknown parameters are required. Hence, a minimum of 10 independent data sets from known sources are required. Fig.5.14 shows the mutual coupling effect on the element patterns, where (a) and (c) represent an ideal case without mutual coupling while (b) and (d) represent a practical case with an estimated error matrix including mutual coupling and receiver characteristics.

A sufficient number of independent data sets were collected to increase the estimation accuracy with known sources at specific locations (71 sets measuring at every 2° between \(-70^\circ\) and \(70^\circ\)).

Fig.5.15 shows the improvement of DOA estimation accuracy with array calibration. The measurement conditions are that SNR is sufficiently high, above 30 dB, and the number of snapshots was 1,000 samples. It can be observed from the results that the array calibration with an estimated total error matrix effectively eliminates the error factor caused by mutual coupling and receiver characteristics. The harmful effect of mutual coupling appears with greater severity in the null-steering of adaptive beamformers.
Fig. 5.14: Mutual coupling effect on element pattern of each antenna (8-element uniform linear array).
Fig. 5.15: Improvement of DOA estimation accuracy in 8-element UMP with array calibration.
5.5.4 Measurement in Anechoic Chamber

The BER (bit-error-rate) criterion is experimentally evaluated in order to verify the integrated system performance. The evaluation model is described in Tab. 5.2. In this study, the BER performance for two cases was measured. One case is when the actual DOAs are already known. This is not a practical case; however, the performance of the beamformer can be ideally evaluated regardless of the DOA estimation errors. The second case is when the DOAs are estimated by a developed UMP. It was assumed that the number of sources is already to be 4—one is the desired signal and the others are the interferers with INR (Interferer to Noise Ratio) of 9 dB. It is known that the DOA estimation performance will be significantly degraded when the number of sources is over- or under-estimated. The data vectors were obtained by combining each data vector with a single source, and the SNR was converted for the combined version of the data vector (see Appendix B).

The DOA estimation results by UMP, depending on the power of the desired signal, are shown in Fig.5.17. The UMP performs the DOA estimation with a fixed-point arithmetic, as described in Chapter 3; therefore, the dynamic range is usually low. In this study, the allowable power difference for a normal estimation is approximately 15 dB without noise. Hence, the DOA estimation performance was degraded with a low SNR, as shown in Fig.5.17.

The mutual coupling and receiver characteristics usually have harmful effects on the DOA estimation and beamforming performance. The effects on ZF and NDC beamformers, in the presence of mutual coupling and receiver characteristics of a practical system, are illustrated in Fig.5.18. It is clarified that the pattern nulls do not have sufficient depth and are deviated from the desired locations. The shallow nulls result in the reception of unintentional interference power; hence, the quality of the desired signal will be typically degraded. Figure 5.16 provides an example of beamforming processing, where the SNR and INR were 6 dB and 9 dB, respectively. Further, Fig.5.19 shows the BER performance in various cases—in cases where the true DOAs were previously known, as in (a), (c), and (e) and those cases where the DOAs were estimated by UMP, as in (b), (d), and (f). It can be seen that the mutual coupling compensation provides improved performance close to the ideal antenna system as shown in Fig.5.19(e) and (f). It can also be seen that the DOA estimation errors lead to an increase in BER under the condition of low SNR, as shown in Fig.5.19(b), (d), and (f).

In Tabs.5.3 and 5.4, the results of the true DOA and UMP cases have been separately summarized. The gains $G_1$ and $G_2$ implies the differences of the required $E_b/N_0$ to achieve the given BER performance (herein, BER = 0.01). It is clear that the proposed NDC-BF has comparable performance to the ZF beamformer and is superior to the uniform beamformer by 4.0 dB in a practical case in Tab.5.4.

---

4Generally, a low dynamic performance is the most critical problem in fixed-point systems.
Fig. 5.16: DOA estimation and beamforming example (SNR=6dB, INR=9dB).
Fig. 5.17: DOA estimation results by UMP (true DOAs are $-40$, $-20$, $0$ and $15^\circ$, 100 trials for a given desired signal’s power (SNR), and the number of sources was set by 4).

Fig. 5.18: Mutual coupling effect on beamforming.
Fig. 5.19: BER Performances.
### Table 5.3: $E_b/N_0$ Gains at BER=0.01 (True DOA)

<table>
<thead>
<tr>
<th>unit [dB]</th>
<th>Required $E_b/N_0$</th>
<th>Gains</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Single Elem. w/o Intfs (1)</td>
<td>U-BF (2)</td>
</tr>
<tr>
<td>Simulation</td>
<td>6.8</td>
<td>4.7</td>
</tr>
<tr>
<td>Measurement</td>
<td>7.5</td>
<td>4.2</td>
</tr>
<tr>
<td>Measurement</td>
<td>7.5 (with Calibration)</td>
<td>4.9</td>
</tr>
</tbody>
</table>

### Table 5.4: $E_b/N_0$ Gains at BER=0.01 (by UMP)

<table>
<thead>
<tr>
<th>unit [dB]</th>
<th>Required $E_b/N_0$</th>
<th>Gains</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Single Elem. w/o Intfs (1)</td>
<td>U-BF (2)</td>
</tr>
<tr>
<td>Simulation</td>
<td>6.8</td>
<td>4.2</td>
</tr>
<tr>
<td>Measurement</td>
<td>7.5</td>
<td>4.4</td>
</tr>
<tr>
<td>Measurement</td>
<td>7.5 (with Calibration)</td>
<td>5.0</td>
</tr>
</tbody>
</table>
CHAPTER 5. INTEGRATED SYSTEM EVALUATION

5.6 Summary

This chapter presented the experimental evaluation results with an integrated system assuming an uplink smart antenna system that is useful for macro-cellular base stations. The DOA estimator and beamformer were integrated on the developed testbed system. The evaluation system configuration co-designed by using FPGAs and CPU was introduced. Further, each processing function was discussed according to the processing flow and the computer simulation results were presented. This chapter also presented the experimental results of the total performance evaluation in a radio anechoic chamber assuming a LOS AWGN channel. Further, the problems of the calibration of antenna array and analog receivers were discussed.
Chapter 6

Conclusion

This section will summarize and conclude this paper.

The primary focus of this paper has been on the FPGA implementation of a smart antenna system including a DOA estimator and beamformer. This paper aims at the implementation of general signal processing independent of specific communication systems and hardware platforms. The prototype system integrating signal processors on FPGAs was manufactured and the evaluation testbed was configured. The prototype system employs an 8-element antenna array for the 5 GHz band and an RF transceiver for each branch. This paper verified the functionality of the baseband smart antenna processor via hardware level simulations and laboratory experiments. Further, this paper tested the entire hardware system including RF, antenna arrays, and air interface at both transmitter and receiver configurations in a radio anechoic chamber.

Chapter 2 comprised a detailed description of the developed hardware platforms of the smart antenna transceiver system including its implementation. The hardware requirements in smart antenna technologies have also been discussed. The benefits of the FPGAs as an array signal processor were described as compared to general-purpose DSP processors. The prototype transceivers take into consideration of the reconfiguration of an IF circuit as well as the adaptive control of an antenna beampattern in a SDR concept. In order to achieve this, the prototype transceivers are architected by low-IF super heterodyne with undersampling at receivers/harmonic band generation scheme at transmitters. In this chapter, the sampling jitter effect, which determines the maximum IF input frequency, was quantified based on measurements.

In this study, we succeeded in implementing the fixed-point processor of fast DOA estimator with FPGAs, as mentioned in Chapter 3. It incorporates the unitary MUSIC algorithm termed “UMP (Unitary MUSIC Processor).” This Unitary MUSIC algorithm has a super resolution capability and is suitable for integration on logic circuit devices such as FPGAs. The techniques suitable for FPGAs, such as eigenvalue decomposition by Cyclic Jacobi processor based on CORDIC and spectrum computation by spatial DFT (Discrete Fourier Transform), can be utilized to optimize it for fixed point arithmetic.

Chapter 4 proposed a novel beamforming technique for a DOA-based smart antenna system. This system basically steers the mainlobe for the user with the Dolph-Chebyshev beam.
Moreover in order to avoid undesirable interference reception by broad mainlobe, an optional null-steering processing cancels the interferers within the mainlobe coverage with notch beams, which are pre-computed and stored in the ROM. This is based on a cascade structure with Dolph-Chebyshev low-sidelobe and simple notch beampatterns to cancel the in-beam interferer; however, the weight vector can be obtained by simple convolution operations of the pre-computed Dolph-Chebyshev beam and the optional canceling beam. Hence, it has an extremely low complexity with a simple weighting operation that is suitable for implementation on high-speed logic devices, such as FPGAs. This paper has discussed the performance improvement and limitation of the proposed beamformer. Further, the BER performance in some cases was assessed through computer simulations.

Chapter 5 described the evaluation of the integrated system performance through a practical experiment. The effects caused by real hardware imperfection were discussed and array calibration issues were dealt with in this chapter. This chapter presented the evaluation results with an integrated system experimentally assuming uplink smart antenna system useful for macro-cellular base stations. The DOA estimator and beamformer were integrated on the developed testbed system. The evaluation system configuration co-designed by using FPGAs and CPU was introduced. Each processing function was discussed according to the processing flow and the computer simulation results were presented. The experimental results of total performance evaluation in a radio anechoic chamber assuming LOS AWGN channel were also presented. Further, the problems regarding the calibration of the antenna array and analog receivers were discussed.
Appendix A

FB Spatial Smoothing with Unitary Transformation

The real-valued correlation matrix is obtained with the unitary transformed forward only estimate \( R_f \) as

\[
\hat{R} = \text{Re} \left\{ Q^H R_f Q \right\} = \frac{1}{2} (Q^H R_f Q + (Q^*)^H R_f^* Q^* ) = \frac{1}{2} (Q^H R_f Q + Q^H J R_f^* J Q) \tag{A.1}
\]

where the unitary transformation matrix \( Q \) is column conjugate symmetric if

\[
J Q^* = Q, \tag{A.2}
\]

where \( Q \) can be chosen as

\[
Q = \frac{1}{\sqrt{2}} \begin{pmatrix} I & jI \\ jI & -jII \end{pmatrix} \tag{A.3}
\]

or

\[
Q = \frac{1}{\sqrt{2}} \begin{pmatrix} I & 0 & jI \\ 0^T & \sqrt{2} & -j0^T \\ II & 0 & -jII \end{pmatrix}, \tag{A.4}
\]

with an even and odd number of array elements, respectively, where the vector \( 0 = [0, 0, \cdots, 0]^T \), and \( I \) and \( II \) are the identity matrix and column flipped identity matrix in the left-right direction, respectively [33, 35, 36].

Moreover Eq. (A.2) can be rewritten as

\[
\hat{R} = \frac{1}{2} Q^H \left( R_f + J R_f^* J \right) Q = \frac{1}{2} Q^H R_{fb} Q, \tag{A.5}
\]

therefore we can achieve forward-and-backward (FB) spatial smoothing by the unitary transformation of forward-only correlation matrix.
Appendix B

SNR Conversion Relation

The received data vector only for i-th source is

\[ \bar{x}_i = v_i s_i(t) + n(t), \quad (B.1) \]

where additive noise \( n(t) \) is assumed to be spatially white as \( E[n(t)n^H(t)] = \sigma_n^2 I \). The SNR (signal-to-noise-ratio) is obtained by

\[ S_iNR = \frac{P_i}{\sigma^2}. \quad (B.2) \]

Using the linear combination of the collected data with single source the combined data vector for \( L \) sources is given by

\[ \bar{x} = \sum_{i=0}^{L-1} \bar{x}_i = v_0 s_0(t) + \sum_{i=1}^{L-1} v_i s_i(t) + \sqrt{L} \cdot n(t), \quad (B.3) \]

where the noise amplitude is enhanced by \( \sqrt{L} \) times. Therefore \( S_iNR \) for the i-th source can be obtained by

\[ S_iNR^{c} = \frac{P_i}{L \cdot \sigma^2}. \quad (B.4) \]

where,

\[ P_0 = E[s_0^*(t) \cdot s_0(t)] = E[|s_0(t)|^2] \]
\[ P_1 = E[s_1^*(t) \cdot s_1(t)] = E[|s_1(t)|^2] \]
\[ \vdots \]
\[ P_L = E[s_L^*(t) \cdot s_L(t)] = E[|s_L(t)|^2] \quad (B.5) \]
Appendix C

Design Examples of Digital Signal Processing Unit

C.1 2-channel AD board

Fig. C.1: 2CH AD board.

<table>
<thead>
<tr>
<th>Part</th>
<th>Resolution</th>
<th>Sampling Rates</th>
<th>Power Consumption</th>
<th>Input BW</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPT 7938</td>
<td>12 bits</td>
<td>40 MSPS</td>
<td>170 mW</td>
<td>250MHz</td>
</tr>
<tr>
<td>(SPT)</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Buffer Memory

256 K Words / channel (FIFO)

FPGAs

ALTERA FLEX6K, ACEX1K100, FLEX10K200

Table C.1: Specifications of 2CH AD board.
C.2 16-channel AD board

Fig. C.2: 16CH distributed architecture board.

<table>
<thead>
<tr>
<th>Part</th>
<th>Resolution</th>
<th>Sampling Rates</th>
<th>Power Consumption</th>
<th>Input BW</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADC SPT 7938</td>
<td>12 bits</td>
<td>40 MSPS</td>
<td>170 mW</td>
<td>250MHz</td>
</tr>
<tr>
<td>(SPT)</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Part</th>
<th>Resolution</th>
<th>Sampling Rates</th>
<th>Power Consumption</th>
</tr>
</thead>
<tbody>
<tr>
<td>DAC 904</td>
<td>14 bits</td>
<td>165 MSPS</td>
<td>170 mW</td>
</tr>
<tr>
<td>(Burrbrown)</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>FPGA</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>ALTERA APEX20KC</td>
<td>(672pin, 0.6 Mega Gates, 311,296 memory bits)</td>
</tr>
</tbody>
</table>

Table C.2: Specifications of 16CH distributed architecture board.
## C.3 4-channel AD board

![4CH AD board](image)

Fig. C.3: 4CH AD board.

<table>
<thead>
<tr>
<th>ADC</th>
<th>Resolution</th>
<th>Sampling Rates</th>
<th>Power Consumption</th>
<th>Input BW</th>
</tr>
</thead>
<tbody>
<tr>
<td>AD 9430</td>
<td>12 bits</td>
<td>200 MSPS</td>
<td>1.1 W</td>
<td>700MHz</td>
</tr>
<tr>
<td>(Analog Device)</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>ADS5422</td>
<td>14 bits</td>
<td>62 MSPS</td>
<td>1.2 W</td>
<td>300MHz</td>
</tr>
<tr>
<td>(Burrbrown)</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>DAC</th>
<th>Resolution</th>
<th>Sampling Rates</th>
<th>Power Consumption</th>
</tr>
</thead>
<tbody>
<tr>
<td>MAX 5192</td>
<td>14 bits</td>
<td>260 MSPS</td>
<td>1.2 W</td>
</tr>
<tr>
<td>(Maxim)</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

| FPGAs             |                          |                 |                   |
|-------------------|--------------------------|-----------------|
| ALTERA STRATIX EP1S25 (780pin, 0.6 Mega Gates, 1,944,576 memory bits) | | |

Table C.3: Specifications of 4CH AD board.
C.4 16-channel AD/DA board

![16-channel AD/DA board image](image)

Fig. C.4: 16CH AD/DA board.

<table>
<thead>
<tr>
<th>ADC</th>
<th>Part</th>
<th>Resolution</th>
<th>Sampling Rates</th>
<th>Power Consumption</th>
<th>Input BW</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>AD 9245</td>
<td>14 bits</td>
<td>80 MSPS</td>
<td>366 mW</td>
<td>500 MHz</td>
</tr>
<tr>
<td>(Analog Device)</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>DAC</th>
<th>Part</th>
<th>Resolution</th>
<th>Sampling Rates</th>
<th>Power Consumption</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>DAC 904</td>
<td>14 bits</td>
<td>165 MSPS</td>
<td>170 mW</td>
</tr>
<tr>
<td>(Burrrbrown)</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

| Buffer Memory | 16 M Words / I,Q / channel (SDRAM DIMM module) |

| FPGAs        | ALTERA STRATIX EP1S40(1020pin, 1 Mega Gates, 3,423,744 memory bits) |

Table C.4: Specifications of 16CH AD/DA board.
Bibliography


Bibliography


[64] Implementing FIR filters in FLEX Devices, ALTERA Applications Note 73, Feb. 1998


[76] Digital IF Subsampling Using the H15702, HSP45116 and HSP43220, Intersil Application Note, AN9309.2.


Publication list

Related Works

Journal Papers


Proceedings of International Conferences


IEICE Technical Reports (Domestic)


Proceedings of IEICE General / Society Conferences (Domestic)


Other Conferences


Joint Works

Journal Papers


Proceedings of International Conferences


IEICE Technical Reports (Domestic)


Proceedings of IEICE General / Society Conferences (Domestic)


