Channel Estimation Considerate Precoder Design For Multi-User Massive MIMO-OFDM Systems: The Concept and Fast Algorithms
Channel Estimation Considerate Precoder Design For Multi-User Massive MIMO-OFDM Systems: The Concept and Fast Algorithms
Channel Estimation Considerate Precoder Design For Multi-User Massive MIMO-OFDM Systems: The Concept and Fast Algorithms
Abstract—The sixth-generation (6G) communication networks as Type 2 demodulation reference signals (DMRS). To ac-
target peak data rates exceeding 1Tbps, necessitating base sta- commodate more user terminals (UTs), the communication
arXiv:2403.06072v1 [cs.IT] 10 Mar 2024
tions (BS) to support up to 100 simultaneous data streams. How- system opt for adding more pilot symbols or resorting to non-
ever, sparse pilot allocation to accommodate such streams poses
challenges for users’ channel estimation. This paper presents orthogonal pilots. However, increasing pilot usage diminishes
Channel Estimation Considerate Precoding (CECP), where BS system throughput, and using non-orthogonal pilots causes
precoders prioritize facilitating channel estimation alongside pilot contamination, also leading to performance degradation
maximizing transmission rate. To address the computational [12]. To circumvent this dilemma, we introduce in this paper
complexity of 6G large-scale multi-input multi-output (MIMO) the concept of Channel Estimation Considerate Precoding
systems, we propose a computationally-efficient space-time block
diagonal channel shortening (ST-BDCS) precoding scheme. By (CECP), emphasizing that base station (BS) precoders should
leveraging the sparse Toeplitz property of orthogonal frequency not only maximize transmission rates but also facilitate users’
division multiplexing (OFDM) channels, this time-domain pre- channel estimation with sparser pilots.
coding design effectively mitigates multi-user interference in the The idea of CECP, albeit not so-termed before, can be
downlink and shortens the effective channel’s temporal length. traced back to the smoothed singular value decomposition
Consequently, users can estimate the channels using sparse pilots.
To enable fast implementation, we develop a generalized complex- (SVD) precoding introduced in [13], where Shen et al. derived
valued Toeplitz matrix QR decomposition algorithm applicable to SVD precoders that smooth the effective channels across
various space-time signal processing problems. Simulation results the subcarriers. The precoders equivalently reduce the delay
demonstrate that the ST-BDCS precoding method approximates spread of the effective channel, thereby contributing to the
the rate performance of conventional subcarrier-by-subcarrier enhanced estimation of the effective channel at the receiver.
precoding schemes. However, it offers the advantages of easier
channel estimation for users and significantly reduced computa- In our previous work, we proposed a phase-rotated SVD
tional complexity for the BS. precoding scheme, exploiting the phase non-uniqueness of
SVD to enhance the smoothness of the effective channel
Index Terms—Massive MIMO-OFDM, channel estimation con-
siderate precoding, space-time precoding, block diagonalization, in the frequency domain, which led to the improved BER
channel shortening, Toeplitz QR decomposition performance [10]. In [14], the channel smoothing method
was investigated and tailored for adjacent subcarriers in WiFi
I. I NTRODUCTION systems. Furthermore, Jeon et al. [15] discussed the joint
transceiver design by taking into account the smoothed beam-
It is anticipated that the sixth generation (6G) communi-
steering matrices. Although precoder smoothing processing
cation networks can support peak data rate up to 1 Terabit
helps in mitigating frequency selectivity, they do not offer
per second (Tbps) [1]. To attain such remarkable data rates,
direct and flexible control over the effective channel length.
(extremely) large scale multiple input multiple output (MIMO)
To this end, we propose in this paper the CECP design
will be utilized to support dozens of or even up to one
intended to shorten the effective channel length directly in the
hundred simultaneously transmitted data streams, coupled with
time domain. Moreover, the CECP design of this paper also
orthogonal frequency division multiplexing (OFDM) [2], [3].
addresses the multi-user interference, which is ubiquitous in
Several studies have explored efficient precoding techniques
a wireless network, although the aforementioned work [10],
for (massive) MIMO-OFDM systems [4]–[9]. However, these
[13]–[15] focused on the single-user scenario.
expeditious precoding schemes often overlook a significant
Therefore, this paper aims to develop CECP schemes for
consequence: they exacerbate frequency selectivity of the
multi-user massive MIMO-OFDM systems. The CECP design
effective channel, i.e., the concatenation of the precoders and
integrates two features: block diagonalization and channel
the air-interface channel, and hence complicates the channel
shortening. Block diagonalization is applied to mitigate the
estimation for the receiver [10].
inter-user interference (IUI) for UTs with multiple receiving
In the current 5G new radio (NR) protocol [11], the pilots
antennas. The channel shortening constraint specifies the num-
can only support at most 24 orthogonal data streams, such
ber of effective channel taps in the time domain, enhancing
channel estimation and reducing receiver design complexity
Junkai Liu and Yi Jiang are with the Key Laboratory for Information at receivers. To reduce the computational complexity entailed
Science of Electromagnetic Waves (MoE), School of Information Science
and Technology, Fudan University, Shanghai, China. (Corresponding author: by the high dimensionality of massive MIMO system, our
Yi Jiang. Email: [email protected]) precoding design draws inspiration from the sparse Toeplitz
2
property of OFDM channels. The multiplication of the channel Notations: The transpose and Hermitian transpose of matrix
matrix by frequency-domain precoders corresponds to the A are AT and AH , respectively. A∗ denotes the conjugate
time-domain convolution. This allows for the computation of form of A. IK denotes the identity matrix with K dimensions.
a finite impulse response (FIR) precoding filter by solving a A(m, 1 : N ) is a MATLAB notation style, representing the m-
high-dimensional block Toeplitz linear equation in the time th row and the first N columns in A. tr[·] is the trace operator.
domain. Since the channel shortening constraint is inherently ⊗ is the Kronecker product. blkdiag(A1 , A2 , . . . , AN ) stands
addressed in the time domain, both constraints can be naturally for the block diagonal matrix with A1 , A2 , . . . , AN being its
handled while solving Toeplitz linear systems. Simulation diagonal. ⌈·⌉ represents the rounding up operation.
results substantiate the superior performance of our methods.
While there indeed exist methods known as channel short- II. S YSTEM M ODEL A ND P ROBLEM F ORMULATION
ening, it is crucial to note that conventional channel short- We consider a downlink massive MIMO-OFDM system
ening processing primarily targets mitigating computational with K subcarriers, where a Nt -antenna BS serves P UTs
complexity of Viterbi decoder or addressing insufficient cyclic simultaneously. Each UT is equipped with Nr antennas and
prefix (CP) effects, rather than focusing on enhancing channel receives Ns data streams. Given that all UTs share the same
estimation. In [16], Rusek et al. derived the achievable infor- time-frequency resources in the same OFDM symbol, the
mation rate for a channel shortening detector cascaded with received signal of the p-th UT in the k-th subcarrier is
nonlinear decoders. This technique effectively shortens the P
effective channel length and reduces computational complexity
X
yp [k] = Hp [k]Gp [k]xp [k] + Hp [k]Gj [k]xj [k] + n [k] ,
of Viterbi decoders. Optimal designs for channel shortening j̸=p
receivers and transmitters were respectively proposed in [17] (1)
and [18]. Introducing the channel shortening detector, Hu where Hp [k] ∈ CNr ×Nt is the downlink channel matrix of
et al. explored linear precoders using block diagonalization the p-th UT in the frequency domain, Gp [k] ∈ CNt ×Ns is the
methods [19]. In addition to channel shortening processing at precoding matrix, xp [k] ∈ CNs ×1 denotes the transmit signal
receivers, a time-frequency channel shortening precoding was vector, and n [k] ∼ CN (0, σ 2 INr ) is the complex additive
proposed for massive MIMO-OFDM systems with insufficient white Gaussian noise (AWGN).
CP [20]. However, it is noteworthy that [17]–[20] have not Note that the discrete Fourier transform (DFT) for the p-th
considered computationally-efficient linear precoding designs UT channel Hp [k] in the k-th subcarrier is expressed as
to specifically enhance channel estimation for UTs. L−1
2πlk
The contributions of this paper are outlined as follows.
X
Hp [k] = Ĥp [l]e−j K , k = 0, 1, . . . , K − 1, (2)
• We propose an innovative precoding scheme termed l=0
space-time block diagonal channel shortening (ST-BDCS) where Ĥp [l] ∈ CNr ×Nt represents the l-th channel impulse
design. This approach mitigates inter-user interference response in the time domain, and L is the channel length,
and adheres to the channel shortening constraint in the which is far smaller than K. Ĥp ∈ CKNr ×KNt is the block-
time domain. Exploiting the sparse Toeplitz characteristic column circulant channel matrix in the time domain after
inherent in OFDM channels, the ST-BDCS scheme can removing the cyclic prefix (CP), which is shown at the top
be efficiently computed by solving a high-dimensional in the next page. According to [22], (2) can be transformed
Toeplitz linear equation. into matrix form as
• Given the single-tap constraint for the effective channel,
the solution to the ST-BDCS turns out to be closed- (F1 ⊗INr )Ĥp (F1 ⊗INt )H = blkdiag{Hp [0], · · · , Hp [K − 1]},
form. Incorporating the multi-tap constraint renders the (3)
problem of maximizing sum rate non-convex. To tackle where F1 ∈ CK×K is the K-point DFT matrix. Therefore,
this challenge, we propose the manifold optimization that the received signal of the p-th UT in the time-domain can be
enables an iterative search for solutions conforming to the equivalently denoted as
constraints within a low-dimensional space. P
X
• To solve the high-dimensional Toeplitz linear equa- ŷp = Ĥp Ĝp x̂p + Ĥp Ĝj x̂j + n̂, (5)
tion, we introduce a generalized complex-valued (block) j̸=p
Toeplitz QR decomposition algorithm. Our proposed nu-
where ŷp ∈ CKNr ×1 and x̂p ∈ CKNs ×1 are the received
merical algorithms not only exhibit lower computational
signal vector and the transmit signal vector in the time-domain
complexity compared with the method in [21] but also
respectively, Ĝp ∈ CKNt ×KNs is the time-domain circulant
demonstrate versatility in addressing complex-valued sig-
FIR precoder for the p-th UT, and n̂ is the time-domain
nal processing problems in the time domain.
AWGN.
This paper is structured as follows: section II provides an Since both Ĥp and Ĝp are circulant matrices, the effective
introduction to the fundamental signal model for downlink channel matrix Ĥp Ĝp forms a circulant matrix as well.
multi-user massive MIMO-OFDM systems. In Section III, we Therefore, we only need to focus on the design of the first
present the ST-BDCS design, encompassing various channel block column Wp of Ĝp . Wp ∈ CKNt ×Ns is defined as
shortening constraints. Section IV showcases the simulation h iH
results based on the 5GNR protocols. Wp = ĜH p [0] · · · ĜH
p [Qt − 1] 0 · · · 0 , (6)
3
Nt ×Ns
where ĜH p [q] ∈ C is the q-th tap of FIR filter coeffi- set. The maximum sum rate can be obtained by optimizing
cients, and the length of FIR filter is Qt . The effective time- W̄p , p = 1, · · · , P in the following problem:
domain channel can be expressed as
K X
P
X 1
Ĥp Wp = ∆p , (7) max log2 INr + F2 (k)Tp W̄p W̄pH TH H
p F2 (k)
W̄ σ2
k=1 p=1
where ∆p = ∆H
H
H subject to Tr(W̄H W̄) ≤ 1
p [0] · · · ∆p [Qt + L − 2] 0 · · · 0
is the effective time-domain channel and ∆p [s] ∈ CNr ×Ns Tj W̄p = 0 for all j ̸= p
is the s-th response in ∆p . The tailing zeros components in ∆¯ p (sp ) = 0 if sp ∈ S C , and for all p
p
(6) truncates the first Qt block columns of Ĥp into a block (10)
Toeplitz matrix, of which only the first Qt + L − 1 block rows where W̄ = W̄1 · · · W̄P , F2 (k) = F1 (k, 1 : Qt +
consist of non-zero elements. We denote the Qt + L − 1 block L−1)⊗INr , ∆ ¯ p (s) is the s-th block row in ∆
¯ p , i.e., ∆
¯ p (s) =
rows as block Toeplitz matrix Tp ∈ C(Qt +L−1)Nr ×Qt Nt , ∆p [s]. W̄ are required to satisfy the unit power constraint and
which can be written as the IUI and the ISI mitigation constraints while maximizing
(10). Motivated by conventional block diagonalization scheme,
Ĥp [0] 0 ··· 0 we can divide W̄ into two components, i.e., W̄ = UV, where
.. .. .. U = U1 ·· · UP is for IUI and ISI cancellation, and
. Ĥp [0] . .
.
V = blkdiag V1 · · · VP is to maximize (10).
. ..
Tp =
Ĥp [L − 1] . . 0
. (8)
0 Ĥp [L − 1] Ĥp [0]
.. .. .. ..
III. FAST S PACE -T IME BDCS P ROCESSING
.
. . .
0 0 ··· Ĥp [L − 1]
In this section, we present a rapid implementation of
To this end, (7) is further transformed into the ST-BDCS design for massive MIMO-OFDM systems by
exploiting the sparse Toeplitz property of OFDM channels.
¯ p, Our approach ensures the fulfillment of both IUI and channel
Tp W̄p = ∆ (9) shortening constraints.
h iH
where W̄p = ĜH H ¯p =
p [0] · · · Ĝp [Qt − 1] , and ∆
H H
∆p [0] · · · ∆H
p [Qt + L − 2] . Based on (9), two in- A. Effective Channel of Single Tap
sights are drawn. Firstly, the precoder design dimension is
reduced from K (subcarriers) in the frequency domain to The adoption of a completely flat effective channel signifi-
Qt (taps) in the time domain, reducing the computational cantly simplifies the process of channel estimation and receiver
complexity greatly if Qt ≪ K. Secondly, the effective channel design for individual users, leading us to consider a scenario
∆¯ p can be prescribed flexibly. If ∆
¯ p possesses a single non- where |Sp | = 1, p = 1, · · · , P . Without loss of generality,
zero tap, the inter-symbol interference (ISI) will be fully we can assume all UTs share the same Sp . Given Qt , we
removed, resulting in a completely flat frequency response. concentrate on the optimal single tap response design for the
Conversely, multiple non-zero taps in ∆ ¯ p signify the presence time-domain effective channel in this part.
of residual ISI, potentially leading to improved throughput The ISI and IUI canceller Up should lie in the null space of
H H
H
T1 · · · TH H H
owing to extra degrees of freedom. Therefore, the space-time p−1 Tp (sp ) Tp+1 · · · TP for all
precoder design aims to trade off ISI mitigation and sum rate sp ∈ SpC , where Tp (sp ) represents the sp -th block row in Tp .
by adjusting the non-zero elements in ∆ ¯ p while reducing the However, employing singular value decomposition (SVD) for
computational complexity. Defining the response tap reserva- direct null space computation entails significant computational
tion set as Sp , we denote SpC = {0, · · · , Qt +L−2}−Sp as the complexity. To fast compute U and V, we propose the
complementary set of Sp , which represents the ISI mitigation following steps:
4
(S1) We aggregate all truncated block Toeplitz matrices (S4) After optimizing V, W̄ is calculated by
Tp , p = 1, · · · , P into ¯
W̄ = TH R−1 (R−1 )H ∆V. (15)
···
T[0] 0 0
.. .. .. Due to the block Toeplitzness of TH , (15) can be computed by
. T[0] . .
fast Fourier transform (FFT). In summary, the optimal single
.. . . tap response design is presented in Algorithm 1.
T=
T[L − 1] . . 0 ,
(11)
0 T[L − 1] T[0]
.. .. ..
Algorithm 1 Space-Time Optimal Single-Tap Response De-
..
. . . . sign
0 0 · · · T[L − 1] 1: Input: The aggregated Toeplitz matrix T in (11), and
response tap reservation set Sp , p = 1, · · · , P .
where T ∈ C(Qt +L−1)P Nr ×Qt Nt , and the l-th tap is
H H 2: Call Algorithm 4 to compute (12);
T[l] = Ĥ1 [l] · · · ĤH
P [l] . Then, we perform the
3: Acquire R−1 ;
block Toeplitz QR decomposition on TH , which is decom- 4: Maximize (14) by utilizing SVD and water-filling tech-
posed into niques;
TH = QR, (12) 5: Calculate (15).
where Q ∈ CQt Nt ×(Qt +L−1)P Nr is unitary, and R ∈ 6: Output: W̄.
C(Qt +L−1)P Nr ×(Qt +L−1)P Nr is a sparse upper triangular ma-
trix with at most L block matrices in each row or each column.
However, the past Toeplitz QR decomposition algorithms are
B. Effective Channel of Multiple Taps
tailored for real-valued scalar Toeplitz matrix, which cannot
be directly utilized here [23]–[25]. To tackle this challenge, In the preceding subsection, we have derived the optimal
we have proposed complex-valued Toeplitz QR decomposition space-time transmitter design for single tap response scenario
algorithm for Toeplitz matrices with special structures [4]. In to make the effective channels seen by the UTs frequency flat
this paper, we put forward generalized (block) Toeplitz QR completely, resulting in a substantial simplification of channel
decomposition algorithms in Appendix A and Appendix B. estimation and receiver processing for the users. However, the
Therefore, (12) can be achieved by Algorithm 4 in Appendix single-tap constraint is typically rate-lossy, and incorporating
B. multiple tap responses could potentially enhance sum rate,
(S2) In conventional MIMO systems, the first block diag- despite the introduction of some ISI interference.
onalization design was proposed in [26] by utilizing singular Given response tap reservation set Sp , the ISI and IUI
value decomposition (SVD) to mitigate the IUI. Indeed, the canceller Up for the p-th UT can be expressed as
SVD operation can be replaced by QR decomposition with ¯ p ∈ CQt Nt ×Nr |Sp | ,
Up = TH R−1 (R−1 )H ∆ (16)
lower computational complexity, since the right singular ma-
trix of the SVD and the Q matrix of the QR decomposition where we assume the reserved taps in Sp are consecutive
¯ p = 0 · · · Ap · · · 0 T . Here, Ap denotes the
share the same column space. Hence, the similar IUI mitiga- and ∆
tion can be more efficiently achieved using the QR decompo- position of reserved taps, which can be expressed as
sition rather than the SVD [27], [28]. Li et al. have proved
Bp
that the optimal block diagonalization precoder is strictly
Ap =
.. P N |S |×Nr |Sp |
∈R r p .
equivalent to the generalized zero-forcing (ZF) precoding by .
cascading a block diagonal matrix [29]. Therefore, U can be Bp
obtained by
In Bp ∈ RP Nr ×Nr , the p-th block row is INr while other
U = TH (TTH )−1 ∆ ¯ = TH R−1 (R−1 )H ∆, ¯ (13) block elements are zeros. With (16) and imitating (14), we
− 12
where ∆ ¯ = 0 · · · IP Nr · · · 0
T
is the unified im- can also replace Vp by Vp = (UH p Up ) V̄p , where the sum
pulse response. The position of single tap response IP Nr in rate in (10) can be transformed into
∆¯ depends on Sp . max f (V̄),
V̄ (17)
(S3) With (13), the ISI and IUI constraints in (10) are
satisfied, where the effective channel has totally flat response subject to Tr(V̄H V̄) ≤ 1
in the frequency domain. Then, the sum rate maximization can where
be rewritten as K X
P
P
X 1 H H
1 f (V̄) = log2 INs + V̄ H̄ [k]H̄p [k]V̄p ,
σ2 p p
X
−1
max K log2 INs + 2 V̄pH (UH p Up ) V̄p k=1 p=1
V̄
p=1
σ (14)
1
−2
subject to Tr(V̄H V̄) ≤ 1, H̄p [k] = F1 (k, Sp )(UH p Up ) is the effective channel
H matrix in the k-th subcarrier. Therefore, (17) is a multi-
− 21
V̄p and V̄ = V̄1H · · · V̄PH
where Vp = (UH p Up ) . user sum rate maximization problem over frequency selec-
(14) can be addressed by the standard technique of SVD and tive channels. As an optimal solution to (17), V̄ should
water-filling. satisfy the equality power constraint. The problem (17) can
5
T P
Tr grad f (V̄(k+1) )H (grad f (V̄(k+1) ) − grad f V̄(k) )
β (k+1) = T P . (21)
Tr (D(k)T P )H (grad f (V̄(k+1) ) − grad f V̄(k) )
8 6.5
7.5
5.5
Sum Rate bps/Hz/stream
3.5
6
3
15 20 25 30 35 40 45 50 20 25 30 35 40
FIR Filter Length FIR Filter Length
Fig. 1. Sum rate comparison. Nt = 256, Nr = 4, P = 20, Ns = 2. Fig. 2. Sum rate comparison. Nt = 256, Nr = 4, P = 30, Ns = 2.
The SNR is 20 dB. The position of impulse taps is in the center of effective The SNR is 20 dB. The position of impulse taps is in the center of effective
channel. channel.
7.4 10 10
Channel Capacity bps/Hz/stream
4
3.5
Single Tap Response ST-BDCS
7.3 3 Single Tap Response SF-BDCS
EZF Precoding
Multi-tap = 32 SF-BDCS
2.5 Multi-tap = 32 ST-BDCS
7.2
P = 20 Multi-tap = 4 Response ST-BDCS 2
Complex Multiplications
P = 20 Multi-tap = 8 Response ST-BDCS
75.11%
7.1 75.72%
1.5
7
5 10 15 20 25
Iteration Number 1 26.54% 26.06%
(a) P = 20, Qt = 30
Channel Capacity bps/Hz/stream
5.8
0.5
Single Tap ST Single Tap SF EZF Multi-Tap SF Multi-Tap ST
5.6
Fig. 5. Computational comparison for ST-BDCS, SF-BDCS and EZF Pre-
5.4 P = 30 Multi-tap = 8 Response ST-BDCS
coding. Nt = 256, Nr = 4, P = 30, Ns = 2.
P = 30 Multi-tap = 32 Response ST-BDCS
5.2
plexity reduction is 43.78% and 88.40% compared with SF-
5
5 10 15 20 25 BDCS and EZF precoding, respectively. However, increasing
Iteration Number the value of Qt becomes necessary as the ratio of Nt /P Ns
(b) P = 30, Qt = 44 decreases, resulting in an increase in computational complexity
for the ST-BDCS design. In Figure 5, the reduced complex
Fig. 3. Manifold optimization iteration comparison for multiple taps response
design. Nt = 256, Nr = 4, Ns = 2.
multiplications are 26.54% and 24.69% compared with SF-
BDCS.
Indicated by Fig. 1 to Fig. 5, we can conclude advantages
10 10
2.5
of space-time domain precoding design. Firstly, space-time
Single Tap Response ST-BDCS
domain precoding can approach frequency domain subcarrier-
2
Single Tap Response SF-BDCS
EZF Precoding
by-subcarrier precoding methods using a short-tapped FIR
Multi-tap = 8 SF-BDCS
Multi-tap = 8 ST-BDCS filter in the time domain. When L ≪ K, fewer FIR filter
taps can be used compared with frequency domain precoding
Complex Multiplications
schemes, while providing easier channel estimation for users develop the fast recursive algorithm for generalized complex-
and significantly reduced computational complexity for the valued Toeplitz QR decomposition, where N = 4. In this case,
BS. (32) appears to be
x1 x2 x3
A PPENDIX A: G ENERALIZED C OMPLEX - VALUED S CALAR r11 z1 z2
x4 x5
T OEPLITZ QR DECOMPOSITION x1 x2
Q x6 =
,
z1 x4
Let us start to consider a general column full-rank (N ≤ M ) z2 z3
t−1 t−2 t−3
scalar Toeplitz matrix, which can be written as tM −1 tM −2 tM −3
(33)
t0 t−1 · · · t−(N −1)
where zH = z1 z2 z3 . In the first step, we introduce
t1
t0 · · · t−(N −2)
a 2-by-2 complex-valued Givens rotation matrix G1 to both
TH = t2
t1 · · · t−(N −3)
∈ CM ×N . (22) sides of (33), where (33) can be transformed into
.. .. .. ..
. . . .
x1 x2 x3
tM −1 tM −2 ··· tM −N r̄11 z̄1 z̄2
x4 x5
x1 x2
We can partition the complex-valued scalar Toeplitz matrix Q x6 =
. (34)
z1 x4
TH as z2 z3
0 t̄−2 t̄−3
tM −1 tM −2 tM −3
t0 uH
T̃ u2
TH = 1
= . (23)
v1 T̃ v2H tM −N Here, G1 is absorbed into Q. Then, we control Q to achieve
Let TH = QR be the QR decomposition. Similarly, we can
x1 x2 x3 x1 x2 x3
also partition the decomposed upper triangular matrix R as x4 x5 x4 x5
r11 zH
R2 z̃
Q x 6
= Q 1
x6 ,
R= = . (24) z1 z2 z3 z̄1 z̄2 z̄3
0 R1 0 rnn
tM −1 tM −2 tM −3 0 t̄M −2 t̄M −3
Since RH R = TTH , we will obtain two equations shown at (35)
the top of this page, where Since r11 and zH are known, we can continue utilizing 2-by-2
Givens rotation matrix as
∗ t0
sin (θ) ejφ
r11 .. cos (θ) x1 r̄11
r11 = T . (27) = , (36)
z . − sin (θ) e−jφ cos (θ) z̄1 0
tM −1
from which θ, φ, and x1 can be figured out accordingly. After
The right-hand side of (27) is a convolution and thus can
determining Givens rotation matrix, we further acquire
be efficiently computed using FFTs. Therefore, r11 and z
can be obtained. Next, we need to figure out the remaining
cos (θ) sin (θ) ejφ
x2
z̄1
components of R. According to (25) and (26), we will have = , (37)
− sin (θ) e−jφ cos (θ) z̄2 ẑ2
zzH + RH H H
1 R1 = u1 u1 + T̃ T̃, (28) where we can compute x2 and ẑ2 . Using the same routine, we
and can obtain x3 and ẑ3 . Hence, we have
RH H H
2 R2 = T̃ T̃ + v2 v2 . (29)
× × ×
× × ×
x 4 x 5
Combining (28) and (29), it can be concluded that a1 a2
Q2 x6 =
, (38)
0 x4
RH H H H H ẑ2 ẑ3
1 R1 + zz + v2 v2 = R2 R2 + u1 u1 . (30) 0 t̄−2 t̄−3
0 t̄M −2 t̄M −3
Therefore, (30) can be further expressed as
where × denotes the element we no longer care about. In
RH1 the remaining steps, we can employ the same techniques to
H H R2
R1 z v 2 z = R2 u1 , (31) ascertain all values of x4 , x5 , and x6 . The complex-valued
uH
v2H 1
scalar (CVS) Toeplitz QR decomposition is recapitulated for
to which we can introduce the unitary matrix Q as (22) as below
R1
R2
Q zH = . (32) A PPENDIX B: G ENERALIZED C OMPLEX - VALUED B LOCK
uH
v2H 1 T OEPLITZ QR DECOMPOSITION
Note that the (i + 1)-th row of R2 corresponds to the i-th row The above generalized CVS Toeplitz QR decomposition can
of R1 in (32). To this end, we present an illustrative example to be extended into block scenario as explained as following. We
9
∗ H 2
2
t∗0 uH
r11 r11 z |t0 | + v1H v1 H
1 + v1 T̃ ,
= (25)
r11 z zz + RH
H
1 R1 t0 u1 + T̃H v1 u1 uH H
1 + T̃ T̃
Algorithm 3 Generalized CVS Toeplitz QR Decomposition To this end, we demonstrate the generalized complex-valued
1: Input: r11 , zH , v2H , and uH
1 ; block (CVB) Toeplitz QR decomposition with an example,
2: R (1, 1 : N ) = r11 zH ; where N = 4. In this case, (44) appears to be
3: for k = 1 : N − 1 do
Rotate and Update: uH X1 X2 X3
1 , R2 (k, k : end), and
4: R11 Z1 Z2
H X 4 X 5
z v2 by computing the Givens rotation matrix; X1 X2
Q X6 =
,
Z1 X 4
Z2 Z3
5: Solve: R1 (k, k), [θ, φ] = Givens R1 (k, k) , zH , and T−1 T−2 T−3
TM −1 TM −2 TM −3
R1 (k, k + 1 : end)
(45)
6: Update: zH ;
where zH = Z1 Z2 Z3 . In the first step, we introduce
7: if k < N − 1 then
a Householder matrix J1 to both sides of (45), where J1 is
8: R2 (k + 1, k + 1 : end) = R1 (k, k : end − 1); H
determined by the first column of RH H
11 T−1 and can
9: end if
also be absorbed into Q. (45) is then transformed into
10: end for
Output: R.
11: X1 X2 X3
R̄11 Z̄1 Z̄2
X4 X5
X1 X2
Q X6 =
.
consider a block column full-rank Toeplitz matrix as
Z1 X4
Z2 Z3
T̄−1 T̄−2 T̄−3
T0 T−1 · · · T−(N −1)
TM −1 TM −2 TM −3
T1 T0 · · · T−(N −2) (46)
H
T2 T1 · · · T−(N −3)
Next, we can generate another Householder matrix J 2 to rotate
T = ∈ CM Q×N P , the first column of ZH
TH
H
, which can be expressed
.. .. .. .. 1 M −1
. . . . as
TM −1 TM −2 · · · TM −N
(39) X1 X2 X3
R̄11 Z̄1 Z̄2
where each block is a complex-valued Q × P matrix and P ≤ X 4 X 5
X1 X2
Q. Let TH = QR be the QR decomposition. We can partition Q1 X6 =
.
Z̄1 X4
TH and R as Z̄2 Z̄3
T̄−1 T̄−2 T̄−3
T̄M −1 T̄M −2 T̄M −3
T0 uH
T̃ u2
TH = 1
= , (40) (47)
v1 T̃ v2H TM −N In the second step, the first row of
X1 X2 X3
and is
required to be determined. Since the first column of
zH
R11 R2 z̃ H
R= = , (41) ZH1 T H
M −1 is rotated by J2 , we can resolve the
0 R1 0 Rnn first element of the first row in X1 by utilizing a 2-by-2
where R11 and zH can be calculated by Givens rotation matrix. Similar to (37), the remaining elements
in the first row of X1 X2 X3 are figured out after
H T0 recognizing the coefficients of Givens rotation matrix, where
R11 ..
R11 = T . (42) (47) can be updated as
z .
TM −1
X̄1 X̄2 X̄3
Applying the same mathematical manipulation as in the scalar R̄11 Z̄1 Z̄2
X 4 X 5
case (25) and (26), we will have
X1 X2
Q2 X6 =
.
Ẑ1 X4
Ẑ2 Ẑ3
RH1 T̄−1 T̄−2 T̄−3
R2
T̄M −1 T̄M −2 T̄M −3
H
R1 z v2 z = RH
u 1 , (43)
2
uH (48)
v2H 1
Therefore, we can follow the same routine to determine
to which we can also introduce the unitary matrix Q as
the remaining unsolved elements in X̄1 X̄2 X̄3 . As
a result, X4 , X5 , and X6 are calculated successively. The
R1
H R2 generalized complex-valued block Toeplitz QR decomposition
Q z = . (44)
uH
v2H 1 is recapitulated for (39) as below
10
Algorithm 4 Generalized CVB Toeplitz QR Decomposition IEEE 92nd Vehicular Technology Conference (VTC2020-Fall). IEEE,
1: Input: r11 , zH , v2H , and uH 2020, pp. 1–6.
1 ; [16] F. Rusek and D. Fertonani, “Bounds on the information rate of inter-
2: R (1 : P, 1 : P N ) = R11 zH ;
symbol interference channels based on mismatched receivers,” IEEE
3: for n = 1 : N − 1 do Transactions on Information Theory, vol. 58, no. 3, pp. 1470–1482,
4: for p = 1 : P do 2012.
[17] F. Rusek and A. Prlja, “Optimal channel shortening for MIMO and
5: k = (n − 1)P + p; ISI channels,” IEEE Transactions on Wireless Communications, vol. 11,
6: Rotate and Update: uH 1 , R2 (k, k : end), and no. 2, pp. 810–818, 2011.
H [18] A. Modenini, F. Rusek, and G. Colavolpe, “Optimal transmit filters for
z v2 by computing the Householder matrix; ISI channels under channel shortening detection,” IEEE Transactions on
Communications, vol. 61, no. 12, pp. 4997–5005, 2013.
7: Solve: R1 (n, n), [θ, φ] = Givens R1 (n, n), zH , [19] S. Hu, X. Gao, and F. Rusek, “Linear precoder design for MIMO-ISI
broadcasting channels under channel shortening detection,” IEEE Signal
and R1 (n, n : end); Processing Letters, vol. 23, no. 9, pp. 1207–1211, 2016.
8: Update: zH ; [20] R.-A. Pitaval, “Channel Shortening by Large Multiantenna Precoding
9: if k < N − 1 then in OFDM,” IEEE Transactions on Communications, vol. 69, no. 5, pp.
2878–2893, 2021.
10: R2 (n + P, n + P : end) = R1 (n, n : end − P ); [21] J. Wang, Y. Jiang, and G. E. Sobelman, “Iterative computation of FIR
11: end if MIMO MMSE-DFE with flexible complexity-performance tradeoff,”
12: end for IEEE Transactions on Signal Processing, vol. 61, no. 9, pp. 2394–2404,
2013.
13: end for [22] R. M. Gray et al., “Toeplitz and circulant matrices: A review,” Founda-
14: Output: R. tions and Trends® in Communications and Information Theory, vol. 2,
no. 3, pp. 155–239, 2006.
[23] D. Sweet, “Fast Toeplitz orthogonalization,” Numerische Mathematik,
vol. 43, no. 1, pp. 1–21, 1984.
R EFERENCES [24] J. Chun, T. Kailath, and H. Lev-Ari, “Fast parallel algorithms for QR
and triangular factorization,” SIAM Journal on Scientific and Statistical
[1] N. Rajatheva, I. Atzeni, E. Bjornson, A. Bourdoux, S. Buzzi, J.-B. Computing, vol. 8, no. 6, pp. 899–913, 1987.
Dore, S. Erkucuk, M. Fuentes, K. Guan, Y. Hu et al., “White paper [25] A. Bojanczyk, R. Brent, and F. De Hoog, “QR factorization of Toeplitz
on broadband connectivity in 6G,” arXiv preprint arXiv:2004.14247, matrices,” Numerische Mathematik, vol. 49, no. 1, pp. 81–94, 1986.
2020. [26] Q. H. Spencer, A. L. Swindlehurst, and M. Haardt, “Zero-forcing meth-
[2] Z. Zhang, Y. Xiao, Z. Ma, M. Xiao, Z. Ding, X. Lei, G. K. Karagiannidis, ods for downlink spatial multiplexing in multiuser MIMO channels,”
and P. Fan, “6G wireless networks: Vision, requirements, architecture, IEEE Transactions on Signal Processing, vol. 52, no. 2, pp. 461–471,
and key technologies,” IEEE Vehicular Technology Magazine, vol. 14, 2004.
no. 3, pp. 28–41, 2019. [27] H. Sung, S.-R. Lee, and I. Lee, “Generalized channel inversion methods
[3] Z. Wang, J. Zhang, H. Du, E. Wei, B. Ai, D. Niyato, and M. Debbah, for multiuser MIMO systems,” IEEE Transactions on Communications,
“Extremely large-scale MIMO: Fundamentals, challenges, solutions, and vol. 57, no. 11, pp. 3489–3499, 2009.
future directions,” IEEE Wireless Communications, 2023. [28] L.-N. Tran, M. Juntti, and E.-K. Hong, “On the precoder design for
[4] J. Liu, W. Zhang, and Y. Jiang, “Fast Computation of Zero-Forcing block diagonalized MIMO broadcast channels,” IEEE Communications
Precoding for Massive MIMO-OFDM Systems,” IEEE Transactions on Letters, vol. 16, no. 8, pp. 1165–1168, 2012.
Signal Processing, 2024. [29] W. Li and M. Latva-aho, “An efficient channel block diagonalization
[5] Y.-W. Liang, R. Schober, and W. Gerstacker, “Time-domain transmit method for generalized zero forcing assisted MIMO broadcasting sys-
beamforming for MIMO-OFDM systems with finite rate feedback,” tems,” IEEE Transactions on Wireless Communications, vol. 10, no. 3,
IEEE Transactions on Communications, vol. 57, no. 9, pp. 2828–2838, pp. 739–744, 2010.
2009. [30] N. Boumal, An introduction to optimization on smooth manifolds.
[6] D. Cescato and H. Bölcskei, “Algorithms for interpolation-based QR Cambridge University Press, 2023.
decomposition in MIMO-OFDM systems,” IEEE Transactions on Signal [31] L. Sun and M. R. McKay, “Eigen-based transceivers for the MIMO
Processing, vol. 59, no. 4, pp. 1719–1733, 2011. broadcast channel with semi-orthogonal user selection,” IEEE Transac-
[7] Y. Liu, G. Y. Li, W. Han, and Z. Zhong, “Low-complexity recursive tions on Signal Processing, vol. 58, no. 10, pp. 5246–5261, 2010.
convolutional precoding for OFDM-based large-scale antenna systems,” [32] 3GPP, “TS 38.901 V16.1.0: Study on channel model for frequencies
IEEE Transactions on Wireless Communications, vol. 15, no. 7, pp. from 0.5 to 100 GHz,” www.3GPP.org, 2020.
4902–4913, 2016. [33] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization algorithms on
[8] S. Kashyap, C. Mollén, E. Björnson, and E. G. Larsson, “Frequency- matrix manifolds. Princeton University Press, 2008.
domain interpolation of the zero-forcing matrix in massive MIMO-
OFDM,” in 2016 IEEE 17th International Workshop on Signal Pro-
cessing Advances in Wireless Communications (SPAWC). IEEE, 2016,
pp. 1–5.
[9] C. Jeon, Z. Li, and C. Studer, “Approximate Gram-matrix interpolation
for wideband massive MU-MIMO systems,” IEEE Transactions on
Vehicular Technology, vol. 69, no. 5, pp. 4677–4688, 2020.
[10] W. Hu, F. Li, and Y. Jiang, “Phase rotations of svd-based precoders in
mimo-ofdm for improved channel estimation,” IEEE Wireless Commu-
nications Letters, vol. 10, no. 8, pp. 1805–1809, 2021.
[11] 3GPP, “TS 38.211 V16.10.0: NR; Physical channels and modulation,”
www.3GPP.org, 2022.
[12] ——, “TS 38.214 V16.10.0: NR; Physical layer procedures for data,”
www.3GPP.org, 2022.
[13] C. Shen and M. P. Fitz, “Mimo-ofdm beamforming for improved channel
estimation,” IEEE Journal on selected Areas in communications, vol. 26,
no. 6, pp. 948–959, 2008.
[14] F. Jiang, Q. Li, and X. Chen, “Channel smoothing for 802.11 ax beam-
formed mimo-ofdm,” IEEE Communications Letters, vol. 25, no. 10, pp.
3413–3417, 2021.
[15] E. Jeon, M. Ahn, S. Kim, W. B. Lee, and J. Kim, “Joint beamformer and
beamformee design for channel smoothing in wlan systems,” in 2020