Channel Estimation Considerate Precoder Design For Multi-User Massive MIMO-OFDM Systems: The Concept and Fast Algorithms

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

1

Channel Estimation Considerate Precoder Design


for Multi-user Massive MIMO-OFDM Systems:
The Concept and Fast Algorithms
Junkai Liu, Yi Jiang, Member, IEEE

Abstract—The sixth-generation (6G) communication networks as Type 2 demodulation reference signals (DMRS). To ac-
target peak data rates exceeding 1Tbps, necessitating base sta- commodate more user terminals (UTs), the communication
arXiv:2403.06072v1 [cs.IT] 10 Mar 2024

tions (BS) to support up to 100 simultaneous data streams. How- system opt for adding more pilot symbols or resorting to non-
ever, sparse pilot allocation to accommodate such streams poses
challenges for users’ channel estimation. This paper presents orthogonal pilots. However, increasing pilot usage diminishes
Channel Estimation Considerate Precoding (CECP), where BS system throughput, and using non-orthogonal pilots causes
precoders prioritize facilitating channel estimation alongside pilot contamination, also leading to performance degradation
maximizing transmission rate. To address the computational [12]. To circumvent this dilemma, we introduce in this paper
complexity of 6G large-scale multi-input multi-output (MIMO) the concept of Channel Estimation Considerate Precoding
systems, we propose a computationally-efficient space-time block
diagonal channel shortening (ST-BDCS) precoding scheme. By (CECP), emphasizing that base station (BS) precoders should
leveraging the sparse Toeplitz property of orthogonal frequency not only maximize transmission rates but also facilitate users’
division multiplexing (OFDM) channels, this time-domain pre- channel estimation with sparser pilots.
coding design effectively mitigates multi-user interference in the The idea of CECP, albeit not so-termed before, can be
downlink and shortens the effective channel’s temporal length. traced back to the smoothed singular value decomposition
Consequently, users can estimate the channels using sparse pilots.
To enable fast implementation, we develop a generalized complex- (SVD) precoding introduced in [13], where Shen et al. derived
valued Toeplitz matrix QR decomposition algorithm applicable to SVD precoders that smooth the effective channels across
various space-time signal processing problems. Simulation results the subcarriers. The precoders equivalently reduce the delay
demonstrate that the ST-BDCS precoding method approximates spread of the effective channel, thereby contributing to the
the rate performance of conventional subcarrier-by-subcarrier enhanced estimation of the effective channel at the receiver.
precoding schemes. However, it offers the advantages of easier
channel estimation for users and significantly reduced computa- In our previous work, we proposed a phase-rotated SVD
tional complexity for the BS. precoding scheme, exploiting the phase non-uniqueness of
SVD to enhance the smoothness of the effective channel
Index Terms—Massive MIMO-OFDM, channel estimation con-
siderate precoding, space-time precoding, block diagonalization, in the frequency domain, which led to the improved BER
channel shortening, Toeplitz QR decomposition performance [10]. In [14], the channel smoothing method
was investigated and tailored for adjacent subcarriers in WiFi
I. I NTRODUCTION systems. Furthermore, Jeon et al. [15] discussed the joint
transceiver design by taking into account the smoothed beam-
It is anticipated that the sixth generation (6G) communi-
steering matrices. Although precoder smoothing processing
cation networks can support peak data rate up to 1 Terabit
helps in mitigating frequency selectivity, they do not offer
per second (Tbps) [1]. To attain such remarkable data rates,
direct and flexible control over the effective channel length.
(extremely) large scale multiple input multiple output (MIMO)
To this end, we propose in this paper the CECP design
will be utilized to support dozens of or even up to one
intended to shorten the effective channel length directly in the
hundred simultaneously transmitted data streams, coupled with
time domain. Moreover, the CECP design of this paper also
orthogonal frequency division multiplexing (OFDM) [2], [3].
addresses the multi-user interference, which is ubiquitous in
Several studies have explored efficient precoding techniques
a wireless network, although the aforementioned work [10],
for (massive) MIMO-OFDM systems [4]–[9]. However, these
[13]–[15] focused on the single-user scenario.
expeditious precoding schemes often overlook a significant
Therefore, this paper aims to develop CECP schemes for
consequence: they exacerbate frequency selectivity of the
multi-user massive MIMO-OFDM systems. The CECP design
effective channel, i.e., the concatenation of the precoders and
integrates two features: block diagonalization and channel
the air-interface channel, and hence complicates the channel
shortening. Block diagonalization is applied to mitigate the
estimation for the receiver [10].
inter-user interference (IUI) for UTs with multiple receiving
In the current 5G new radio (NR) protocol [11], the pilots
antennas. The channel shortening constraint specifies the num-
can only support at most 24 orthogonal data streams, such
ber of effective channel taps in the time domain, enhancing
channel estimation and reducing receiver design complexity
Junkai Liu and Yi Jiang are with the Key Laboratory for Information at receivers. To reduce the computational complexity entailed
Science of Electromagnetic Waves (MoE), School of Information Science
and Technology, Fudan University, Shanghai, China. (Corresponding author: by the high dimensionality of massive MIMO system, our
Yi Jiang. Email: [email protected]) precoding design draws inspiration from the sparse Toeplitz
2

property of OFDM channels. The multiplication of the channel Notations: The transpose and Hermitian transpose of matrix
matrix by frequency-domain precoders corresponds to the A are AT and AH , respectively. A∗ denotes the conjugate
time-domain convolution. This allows for the computation of form of A. IK denotes the identity matrix with K dimensions.
a finite impulse response (FIR) precoding filter by solving a A(m, 1 : N ) is a MATLAB notation style, representing the m-
high-dimensional block Toeplitz linear equation in the time th row and the first N columns in A. tr[·] is the trace operator.
domain. Since the channel shortening constraint is inherently ⊗ is the Kronecker product. blkdiag(A1 , A2 , . . . , AN ) stands
addressed in the time domain, both constraints can be naturally for the block diagonal matrix with A1 , A2 , . . . , AN being its
handled while solving Toeplitz linear systems. Simulation diagonal. ⌈·⌉ represents the rounding up operation.
results substantiate the superior performance of our methods.
While there indeed exist methods known as channel short- II. S YSTEM M ODEL A ND P ROBLEM F ORMULATION
ening, it is crucial to note that conventional channel short- We consider a downlink massive MIMO-OFDM system
ening processing primarily targets mitigating computational with K subcarriers, where a Nt -antenna BS serves P UTs
complexity of Viterbi decoder or addressing insufficient cyclic simultaneously. Each UT is equipped with Nr antennas and
prefix (CP) effects, rather than focusing on enhancing channel receives Ns data streams. Given that all UTs share the same
estimation. In [16], Rusek et al. derived the achievable infor- time-frequency resources in the same OFDM symbol, the
mation rate for a channel shortening detector cascaded with received signal of the p-th UT in the k-th subcarrier is
nonlinear decoders. This technique effectively shortens the P
effective channel length and reduces computational complexity
X
yp [k] = Hp [k]Gp [k]xp [k] + Hp [k]Gj [k]xj [k] + n [k] ,
of Viterbi decoders. Optimal designs for channel shortening j̸=p
receivers and transmitters were respectively proposed in [17] (1)
and [18]. Introducing the channel shortening detector, Hu where Hp [k] ∈ CNr ×Nt is the downlink channel matrix of
et al. explored linear precoders using block diagonalization the p-th UT in the frequency domain, Gp [k] ∈ CNt ×Ns is the
methods [19]. In addition to channel shortening processing at precoding matrix, xp [k] ∈ CNs ×1 denotes the transmit signal
receivers, a time-frequency channel shortening precoding was vector, and n [k] ∼ CN (0, σ 2 INr ) is the complex additive
proposed for massive MIMO-OFDM systems with insufficient white Gaussian noise (AWGN).
CP [20]. However, it is noteworthy that [17]–[20] have not Note that the discrete Fourier transform (DFT) for the p-th
considered computationally-efficient linear precoding designs UT channel Hp [k] in the k-th subcarrier is expressed as
to specifically enhance channel estimation for UTs. L−1
2πlk
The contributions of this paper are outlined as follows.
X
Hp [k] = Ĥp [l]e−j K , k = 0, 1, . . . , K − 1, (2)
• We propose an innovative precoding scheme termed l=0
space-time block diagonal channel shortening (ST-BDCS) where Ĥp [l] ∈ CNr ×Nt represents the l-th channel impulse
design. This approach mitigates inter-user interference response in the time domain, and L is the channel length,
and adheres to the channel shortening constraint in the which is far smaller than K. Ĥp ∈ CKNr ×KNt is the block-
time domain. Exploiting the sparse Toeplitz characteristic column circulant channel matrix in the time domain after
inherent in OFDM channels, the ST-BDCS scheme can removing the cyclic prefix (CP), which is shown at the top
be efficiently computed by solving a high-dimensional in the next page. According to [22], (2) can be transformed
Toeplitz linear equation. into matrix form as
• Given the single-tap constraint for the effective channel,
the solution to the ST-BDCS turns out to be closed- (F1 ⊗INr )Ĥp (F1 ⊗INt )H = blkdiag{Hp [0], · · · , Hp [K − 1]},
form. Incorporating the multi-tap constraint renders the (3)
problem of maximizing sum rate non-convex. To tackle where F1 ∈ CK×K is the K-point DFT matrix. Therefore,
this challenge, we propose the manifold optimization that the received signal of the p-th UT in the time-domain can be
enables an iterative search for solutions conforming to the equivalently denoted as
constraints within a low-dimensional space. P
X
• To solve the high-dimensional Toeplitz linear equa- ŷp = Ĥp Ĝp x̂p + Ĥp Ĝj x̂j + n̂, (5)
tion, we introduce a generalized complex-valued (block) j̸=p
Toeplitz QR decomposition algorithm. Our proposed nu-
where ŷp ∈ CKNr ×1 and x̂p ∈ CKNs ×1 are the received
merical algorithms not only exhibit lower computational
signal vector and the transmit signal vector in the time-domain
complexity compared with the method in [21] but also
respectively, Ĝp ∈ CKNt ×KNs is the time-domain circulant
demonstrate versatility in addressing complex-valued sig-
FIR precoder for the p-th UT, and n̂ is the time-domain
nal processing problems in the time domain.
AWGN.
This paper is structured as follows: section II provides an Since both Ĥp and Ĝp are circulant matrices, the effective
introduction to the fundamental signal model for downlink channel matrix Ĥp Ĝp forms a circulant matrix as well.
multi-user massive MIMO-OFDM systems. In Section III, we Therefore, we only need to focus on the design of the first
present the ST-BDCS design, encompassing various channel block column Wp of Ĝp . Wp ∈ CKNt ×Ns is defined as
shortening constraints. Section IV showcases the simulation h iH
results based on the 5GNR protocols. Wp = ĜH p [0] · · · ĜH
p [Qt − 1] 0 · · · 0 , (6)
3

··· Ĥp [L − 1] ···


 
Ĥp [0] 0 0
 .. .. 
 . Ĥp [0] . Ĥp [L − 1] 
 
 .. .. 
 Ĥp [L − 1] . . 0 
Ĥp = 
 ..
.
 (4)

 0 Ĥp [L − 1] Ĥp [0] . 

 .. .. .. .. .. 
 . . . . . 0 
0 0 ··· Ĥp [L − 1] ··· Ĥp [0]

Nt ×Ns
where ĜH p [q] ∈ C is the q-th tap of FIR filter coeffi- set. The maximum sum rate can be obtained by optimizing
cients, and the length of FIR filter is Qt . The effective time- W̄p , p = 1, · · · , P in the following problem:
domain channel can be expressed as
K X
P
X 1
Ĥp Wp = ∆p , (7) max log2 INr + F2 (k)Tp W̄p W̄pH TH H
p F2 (k)
W̄ σ2
k=1 p=1

where ∆p = ∆H
 H
H subject to Tr(W̄H W̄) ≤ 1
p [0] · · · ∆p [Qt + L − 2] 0 · · · 0
is the effective time-domain channel and ∆p [s] ∈ CNr ×Ns Tj W̄p = 0 for all j ̸= p
is the s-th response in ∆p . The tailing zeros components in ∆¯ p (sp ) = 0 if sp ∈ S C , and for all p
p
(6) truncates the first Qt block columns of Ĥp into a block   (10)
Toeplitz matrix, of which only the first Qt + L − 1 block rows where W̄ = W̄1 · · · W̄P , F2 (k) = F1 (k, 1 : Qt +
consist of non-zero elements. We denote the Qt + L − 1 block L−1)⊗INr , ∆ ¯ p (s) is the s-th block row in ∆
¯ p , i.e., ∆
¯ p (s) =
rows as block Toeplitz matrix Tp ∈ C(Qt +L−1)Nr ×Qt Nt , ∆p [s]. W̄ are required to satisfy the unit power constraint and
which can be written as the IUI and the ISI mitigation constraints while maximizing
  (10). Motivated by conventional block diagonalization scheme,
Ĥp [0] 0 ··· 0 we can divide W̄ into two  components, i.e., W̄ = UV, where
 .. .. ..  U = U1 ·· · UP is for IUI and ISI cancellation, and
 . Ĥp [0] . . 

.
 V = blkdiag V1 · · · VP is to maximize (10).

. .. 
Tp = 

Ĥp [L − 1] . . 0

 . (8)
0 Ĥp [L − 1] Ĥp [0] 
 

 .. .. .. .. 
III. FAST S PACE -T IME BDCS P ROCESSING
.
 
 . . . 
0 0 ··· Ĥp [L − 1]
In this section, we present a rapid implementation of
To this end, (7) is further transformed into the ST-BDCS design for massive MIMO-OFDM systems by
exploiting the sparse Toeplitz property of OFDM channels.
¯ p, Our approach ensures the fulfillment of both IUI and channel
Tp W̄p = ∆ (9) shortening constraints.
h iH
where W̄p = ĜH H ¯p =
p [0] · · · Ĝp [Qt − 1] , and ∆
 H H
∆p [0] · · · ∆H

p [Qt + L − 2] . Based on (9), two in- A. Effective Channel of Single Tap
sights are drawn. Firstly, the precoder design dimension is
reduced from K (subcarriers) in the frequency domain to The adoption of a completely flat effective channel signifi-
Qt (taps) in the time domain, reducing the computational cantly simplifies the process of channel estimation and receiver
complexity greatly if Qt ≪ K. Secondly, the effective channel design for individual users, leading us to consider a scenario
∆¯ p can be prescribed flexibly. If ∆
¯ p possesses a single non- where |Sp | = 1, p = 1, · · · , P . Without loss of generality,
zero tap, the inter-symbol interference (ISI) will be fully we can assume all UTs share the same Sp . Given Qt , we
removed, resulting in a completely flat frequency response. concentrate on the optimal single tap response design for the
Conversely, multiple non-zero taps in ∆ ¯ p signify the presence time-domain effective channel in this part.
of residual ISI, potentially leading to improved throughput The ISI and IUI canceller Up should lie in the null space of
H H
 H
T1 · · · TH H H

owing to extra degrees of freedom. Therefore, the space-time p−1 Tp (sp ) Tp+1 · · · TP for all
precoder design aims to trade off ISI mitigation and sum rate sp ∈ SpC , where Tp (sp ) represents the sp -th block row in Tp .
by adjusting the non-zero elements in ∆ ¯ p while reducing the However, employing singular value decomposition (SVD) for
computational complexity. Defining the response tap reserva- direct null space computation entails significant computational
tion set as Sp , we denote SpC = {0, · · · , Qt +L−2}−Sp as the complexity. To fast compute U and V, we propose the
complementary set of Sp , which represents the ISI mitigation following steps:
4

(S1) We aggregate all truncated block Toeplitz matrices (S4) After optimizing V, W̄ is calculated by
Tp , p = 1, · · · , P into ¯
W̄ = TH R−1 (R−1 )H ∆V. (15)
···
 
T[0] 0 0
 .. .. ..  Due to the block Toeplitzness of TH , (15) can be computed by

 . T[0] . . 
 fast Fourier transform (FFT). In summary, the optimal single
 .. . .  tap response design is presented in Algorithm 1.
T=
 T[L − 1] . . 0 ,

(11)
 0 T[L − 1] T[0] 

.. .. ..
 Algorithm 1 Space-Time Optimal Single-Tap Response De-
 .. 
 . . . .  sign
0 0 · · · T[L − 1] 1: Input: The aggregated Toeplitz matrix T in (11), and
response tap reservation set Sp , p = 1, · · · , P .
where T ∈ C(Qt +L−1)P Nr ×Qt Nt , and the l-th tap is
 H H 2: Call Algorithm 4 to compute (12);
T[l] = Ĥ1 [l] · · · ĤH
P [l] . Then, we perform the
3: Acquire R−1 ;
block Toeplitz QR decomposition on TH , which is decom- 4: Maximize (14) by utilizing SVD and water-filling tech-
posed into niques;
TH = QR, (12) 5: Calculate (15).
where Q ∈ CQt Nt ×(Qt +L−1)P Nr is unitary, and R ∈ 6: Output: W̄.
C(Qt +L−1)P Nr ×(Qt +L−1)P Nr is a sparse upper triangular ma-
trix with at most L block matrices in each row or each column.
However, the past Toeplitz QR decomposition algorithms are
B. Effective Channel of Multiple Taps
tailored for real-valued scalar Toeplitz matrix, which cannot
be directly utilized here [23]–[25]. To tackle this challenge, In the preceding subsection, we have derived the optimal
we have proposed complex-valued Toeplitz QR decomposition space-time transmitter design for single tap response scenario
algorithm for Toeplitz matrices with special structures [4]. In to make the effective channels seen by the UTs frequency flat
this paper, we put forward generalized (block) Toeplitz QR completely, resulting in a substantial simplification of channel
decomposition algorithms in Appendix A and Appendix B. estimation and receiver processing for the users. However, the
Therefore, (12) can be achieved by Algorithm 4 in Appendix single-tap constraint is typically rate-lossy, and incorporating
B. multiple tap responses could potentially enhance sum rate,
(S2) In conventional MIMO systems, the first block diag- despite the introduction of some ISI interference.
onalization design was proposed in [26] by utilizing singular Given response tap reservation set Sp , the ISI and IUI
value decomposition (SVD) to mitigate the IUI. Indeed, the canceller Up for the p-th UT can be expressed as
SVD operation can be replaced by QR decomposition with ¯ p ∈ CQt Nt ×Nr |Sp | ,
Up = TH R−1 (R−1 )H ∆ (16)
lower computational complexity, since the right singular ma-
trix of the SVD and the Q matrix of the QR decomposition where we assume the reserved taps in Sp are consecutive
¯ p = 0 · · · Ap · · · 0 T . Here, Ap denotes the
 
share the same column space. Hence, the similar IUI mitiga- and ∆
tion can be more efficiently achieved using the QR decompo- position of reserved taps, which can be expressed as
sition rather than the SVD [27], [28]. Li et al. have proved  
Bp
that the optimal block diagonalization precoder is strictly
Ap = 
 ..  P N |S |×Nr |Sp |
∈R r p .
equivalent to the generalized zero-forcing (ZF) precoding by .
cascading a block diagonal matrix [29]. Therefore, U can be Bp
obtained by
In Bp ∈ RP Nr ×Nr , the p-th block row is INr while other
U = TH (TTH )−1 ∆ ¯ = TH R−1 (R−1 )H ∆, ¯ (13) block elements are zeros. With (16) and imitating (14), we
− 12
where ∆ ¯ = 0 · · · IP Nr · · · 0
 T
is the unified im- can also replace Vp by Vp = (UH p Up ) V̄p , where the sum
pulse response. The position of single tap response IP Nr in rate in (10) can be transformed into
∆¯ depends on Sp . max f (V̄),
V̄ (17)
(S3) With (13), the ISI and IUI constraints in (10) are
satisfied, where the effective channel has totally flat response subject to Tr(V̄H V̄) ≤ 1
in the frequency domain. Then, the sum rate maximization can where
be rewritten as K X
P
P
X 1 H H
1 f (V̄) = log2 INs + V̄ H̄ [k]H̄p [k]V̄p ,
σ2 p p
X
−1
max K log2 INs + 2 V̄pH (UH p Up ) V̄p k=1 p=1

p=1
σ (14)
1
−2
subject to Tr(V̄H V̄) ≤ 1, H̄p [k] = F1 (k, Sp )(UH p Up ) is the effective channel
H matrix in the k-th subcarrier. Therefore, (17) is a multi-
− 21
V̄p and V̄ = V̄1H · · · V̄PH

where Vp = (UH p Up ) . user sum rate maximization problem over frequency selec-
(14) can be addressed by the standard technique of SVD and tive channels. As an optimal solution to (17), V̄ should
water-filling. satisfy the equality power constraint. The problem (17) can
5

be viewed as the non-convex optimization problem on the C. Computational Complexity Analysis


complex unit sphere (CUS) manifold Mcus , i.e., V̄ ∈ Mcus ,
which can utilize the first order type manifold optimiza- Single Tap Response Design: In Algorithm 1, the cal-
tion algorithm to solve. The Euclidean gradient of f (V̄) is culation of (12) utilizes Algorithm 4, which demonstrates
h  iH reduced computational complexity in comparison to the CVB
∇f V̄ = ∇V̄1 f V̄ H · · · ∇V̄P f V̄ H
 
. The p- Toeplitz QR decomposition algorithm proposed by Wang et

th term ∇V̄p f V̄ can be written as al. [21]. The complex multiplications for Algorithm 4 are
K
about O(Nt (P Nr )2 (Qt L + L2 ) + (P Nr )3 (QL + L2 /2)),
 X −1 while it is O(2Qt LNt2 P Nr + Q2t Nt (P Nr )2 ) in [21]. Since
∇V̄p f V̄ = H̄H
p [k]H̄p [k]V̄p Mp [k] (18)
R is a sparse upper triangular matrix with at most L block
k=1
matrices in each row and each column, the inversion of
where Mp [k] = σ 2 INs + V̄pH H̄H p [k]H̄p [k]V̄p . Then, the R requires O((P Nr )3 (Qt L + L2 /2 − (Qt + L − 1)2 /4)).
orthogonal projection from Euclidean gradient to the Rieman- We note L̄ = Qt + L − sp , where we have assumed sp
nian gradient at V̄ is expressed as is same for all UTs. The SVD and water-filling requires
O( 12 L̄P Nr3 + P Nr3 + 12 P 2 Ns2 ). The FFT operation for TH
grad f (V̄) = ∇f V̄ − R tr(V̄H ∇f V̄ ) V̄. (19)
  
requires O( 12 Nt P Nr 2⌈log2 (Qt +L−1)⌉ ⌈log2 (Qt + L − 1)⌉).
With (18) and (19), we can develop a line-search based The calculation for (15) is O(Nt P Nr P Ns 2⌈log2 (Qt +L−1)⌉ +
conjugate gradient algorithm in [30] to compute V̄, which is (P Nr )2 P Ns L̄(Qt + L − 1) + L̄(P Nr )2 Ns ) in the fre-
summarized in Algorithm 2. In line 5, the RetrV̄(k) (·) is the quency domain. Then, IFFT operation is required to
calculate the time-domain precoding coefficients with
Algorithm 2 Conjugate Gradient Algorithm for Sum Rate O(1/2Nt P Ns 2⌈log2 (Qt +L−1)⌉ ⌈log2 (Qt + L − 1)⌉) complex
Maximization based on Manifold Optimization multiplications.
1: Input: V̄(1) ∈ Mcus and effective channel matrices Multiple Taps Response Design: The calculation pro-
H̄p [k], p = 1, · · · , P, k = 1, · · · , K. cedure for (12) and R−1 is the same as single tap re-
2: Initialize: k = 1, and search direction D(1) = sponse design. The computational complexity of Algorithm
−grad f (V̄(1) ). 2 is mainly determined by iteration loops. The Armijo
3: while stopping criterion not satisfied do backtracking line search has computational complexity of
4: Choose step factor α(k) using Armijo backtracking line O(KP Ns3 IArmijo ), where IArmijo is the iteration number of
search; Armijo backtracking. The Euclidean gradient computation re-
P
Update next feasible point by retraction operation ((|Sp |Nr )2 Ns + Ns3 + 32 Ns2 |Sp |Nr )). The Rie-
P
5: quires O(
V̄(k+1) = RetrV̄(k) (α(k) D(k) ); p=1
6:

Compute Euclidean gradient ∇f V̄(k+1) based on mann gradient update, tangent space transport operation, and
P
(18) and update Riemann gradient grad f (V̄(k+1) )
P
H-S paramter computation requires O( ((|Sp |Nr )5Ns ).
based on (19); p=1
T P Therefore, assuming I iterations in while loops, the to-
7: Compute tangent space transport grad f V̄(k) and
tal complex multiplications are approximately bounded by
D(k)T P from the tangent space at V̄(k) to the tangent P
O(I{KP Ns3 IArmijo + ((|Sp |Nr )2 Ns +Ns3 + 32 Ns2 |Sp |Nr +
P
space at V̄(k+1) ;
p=1
8: Compute Hestenes and Stiefel parameter β (k+1) ;
(|Sp |Nr )5Ns )})
9: Update next search direction
Benchmark I: The first benchmark is the eigen zero-forcing
D(k+1) = −grad f (V̄(k+1) ) + β (k+1) D(k)T P ; (EZF) precoding technique [31], which is widely utilized in the
current commercial multi-user MIMO-OFDM systems. The
EZF precoding coefficients are obtained by performing ZF
10: k = k + 1;
precoding based on the SVD. The EZF precoding requires
11: end while
O((1/2Nt2 P Nr + 3P Nt2 Ns + 3/2Nt (P Ns )2 + (P Ns )3 )K)
12: Output: V̄(k) .
complex multiplications approximately [29], [31].
Benchmark II: The second benchmark scheme is the
retraction operation to retract V̄(k+1) on the sphere manifold, space-frequency BDCS (SF-BDCS) scheme. The SF-BDCS
which is defined as scheme can be obtained through a two-step process by
V̄(k) + α(k) D(k) combining techniques from [29] and [19]. In the first step,
RetrV̄(k) (α(k) D(k) ) = . (20) the IUI and ISI can be completely mitigated by calculat-
V̄(k) + α(k) D(k) F
 ing the generalized ZF precoder for each subcarrier. In the
In line 7, grad f V̄(k) and D(k) belong to the tangent space second step, a time-domain block diagonal effective chan-
at V̄(k) , which cannot be directly combined with components nel is introduced to maximize the sum rate. The computa-
in the tangent space at V̄(k+1) . To achieve the different tangent tional complexity for the frequency-domain BDCS method is
space mapping, the orthogonal projection follows the same bounded by O( 21 KNt2 (P Nr )2 +K(P Nr )3 +K(P Nr )2 P Ns +
rule in (19). In line 8, the Hestenes and Stiefel (H-S) parameter P
KNt P Nr P Ns )+O(I{KP Ns3 IArmijo + ((|Sp |Nr )2 Ns +
P
β (k+1) is expressed in (21) p=1
6

 T P 
Tr grad f (V̄(k+1) )H (grad f (V̄(k+1) ) − grad f V̄(k) )
β (k+1) =  T P  . (21)
Tr (D(k)T P )H (grad f (V̄(k+1) ) − grad f V̄(k) )

8 6.5

7.5
5.5
Sum Rate bps/Hz/stream

Sum Rate bps/Hz/stream


EZF Precoding
5 Single Tap Response SF-BDCS
7 Single Tap Response ST-BDCS
Multi-Tap = 8 Response ST-BDCS
Multi-Tap = 16 Response ST-BDCS
EZF Precoding Multi-Tap = 24 Response ST-BDCS
4.5
Single Tap Response SF-BDCS Multi-Tap = 32 Response ST-BDCS
Single Tap Response ST-BDCS
Multi-Tap = 4 Response ST-BDCS
6.5 Multi-Tap = 8 Response ST-BDCS
4

3.5
6

3
15 20 25 30 35 40 45 50 20 25 30 35 40
FIR Filter Length FIR Filter Length

Fig. 1. Sum rate comparison. Nt = 256, Nr = 4, P = 20, Ns = 2. Fig. 2. Sum rate comparison. Nt = 256, Nr = 4, P = 30, Ns = 2.
The SNR is 20 dB. The position of impulse taps is in the center of effective The SNR is 20 dB. The position of impulse taps is in the center of effective
channel. channel.

Ns3 + 23 Ns2 |Sp |Nr + (|Sp |Nr )5Ns )}) approximately.


of freedom for optimization, ultimately enhancing system per-
IV. S IMULATION R ESULTS formance. Furthermore, the space-time BDCS design exhibits
a performance loss of approximately 3.3% compared to the
In this section, we begin by assessing the sum rate perfor- space-frequency BDCS method and 7.6% compared to EZF
mance of the proposed space-time BDCS methods in relation precoding when Qt = 30.The sum rate improvement is not
to EZF and space-frequency BDCS schemes. We also provide prominent under the multiple taps response design when Nt is
an illustration of the iteration process in Algorithm 2. Finally, relatively large. However, reducing the number of transmitting
we present a demonstration of the computational complexity antennas necessitates increasing Qt and the length of the
comparison. effective channel to avoid severe performance degradation. In
Fig. 2, we explore a scenario where P is increased to 30
A. Sum Rate Evaluation while keeping the other parameters constant. The single tap
response design shows an approximate 11% performance gap
In this section, we investigate a massive MIMO-OFDM
compared to EZF precoding when Qt = 44. However, in-
communication system operating under the 5GNR protocol.
creasing the length of the effective channel helps mitigate this
The system employs the 5GNR Cluster Delay Line B (CDL-
performance loss. By setting the effective channel length to 32,
B) channel model characterized by a maximum delay spread of
the performance gap is reduced to 9%. Therefore, increasing
100ns [32]. The carrier frequency is 6.7GHz. With subcarrier
the spatial degree of freedom results in a decrease in the
interval of 15kHz and K = 2048 subcarriers, we choose
temporal degree of freedom. Consequently, the performance
L = 20 to sufficiently capture the dominant power of the CDL-
loss of space-time BDCS can be reduced while significantly
B channel. Our simulations employ a Monte Carlo sample size
decreasing computational complexity.
of 100. Figure 1 illustrates a scenario where a BS equipped
with 256 transmitting antennas serves UTs equipped with 4 In addition to the sum rate comparison, we also depict the
receiving antennas, with each UT receiving 2 data streams. convergence speed of Algorithm 2 in Fig. 3. As stated in
We observe that EZF precoding outperforms both single tap [33], the algorithm using the manifold optimization method
and multiple taps response designs in terms of sum rate gain. is guaranteed to converge to the point where the gradient of
This superiority arises from the fact that the EZF method the objective function is zero. Hence, the sum rate optimization
does not consider the number of taps in the time-domain problem for multiple taps response design can converge within
effective channel. Despite the non-flat frequency domain of finite steps rapidly. Indicated by Fig. 3 (a) and (b), we choose
the effective channel, this characteristic leads to higher degrees the iteration number 20 as the stopping criterion.
7

7.4 10 10
Channel Capacity bps/Hz/stream
4

3.5
Single Tap Response ST-BDCS
7.3 3 Single Tap Response SF-BDCS
EZF Precoding
Multi-tap = 32 SF-BDCS
2.5 Multi-tap = 32 ST-BDCS
7.2
P = 20 Multi-tap = 4 Response ST-BDCS 2

Complex Multiplications
P = 20 Multi-tap = 8 Response ST-BDCS
75.11%
7.1 75.72%
1.5

7
5 10 15 20 25
Iteration Number 1 26.54% 26.06%

(a) P = 20, Qt = 30
Channel Capacity bps/Hz/stream

5.8
0.5
Single Tap ST Single Tap SF EZF Multi-Tap SF Multi-Tap ST

5.6
Fig. 5. Computational comparison for ST-BDCS, SF-BDCS and EZF Pre-
5.4 P = 30 Multi-tap = 8 Response ST-BDCS
coding. Nt = 256, Nr = 4, P = 30, Ns = 2.
P = 30 Multi-tap = 32 Response ST-BDCS
5.2
plexity reduction is 43.78% and 88.40% compared with SF-
5
5 10 15 20 25 BDCS and EZF precoding, respectively. However, increasing
Iteration Number the value of Qt becomes necessary as the ratio of Nt /P Ns
(b) P = 30, Qt = 44 decreases, resulting in an increase in computational complexity
for the ST-BDCS design. In Figure 5, the reduced complex
Fig. 3. Manifold optimization iteration comparison for multiple taps response
design. Nt = 256, Nr = 4, Ns = 2.
multiplications are 26.54% and 24.69% compared with SF-
BDCS.
Indicated by Fig. 1 to Fig. 5, we can conclude advantages
10 10
2.5
of space-time domain precoding design. Firstly, space-time
Single Tap Response ST-BDCS
domain precoding can approach frequency domain subcarrier-
2
Single Tap Response SF-BDCS
EZF Precoding
by-subcarrier precoding methods using a short-tapped FIR
Multi-tap = 8 SF-BDCS
Multi-tap = 8 ST-BDCS filter in the time domain. When L ≪ K, fewer FIR filter
taps can be used compared with frequency domain precoding
Complex Multiplications

1.5 88.97% 88.40%


coefficients, thereby reducing the burden on communication
capacity between the baseband unit and active antenna unit.
1 This enables the utilization of data with higher quantiza-
tion accuracy for characterizing the precoding coefficients.
Secondly, space-time precoding leverages channel sparsity
0.5 43.78%
45.05% and Toeplitz characteristics. The computational complexity of
the algorithm can be significantly reduced by solving high-
0
Single Tap ST Single Tap SF EZF Multi-Tap SF Multi-Tap ST
dimensional Toeplitz linear equations or utilizing the FFT to
compute Toeplitz matrix-vector multiplication.
Fig. 4. Computational comparison for ST-BDCS, SF-BDCS and EZF Pre-
coding. Nt = 256, Nr = 4, P = 20, Ns = 2. V. C ONCLUSIONS
This paper introduces the concept of Channel Estimation
Considerate Precoding, where the precoders employed by
B. Computational Complexity Comparison
the BS aim to facilitate users’ channel estimation while
In this section, we compare the computational complexity maximizing transmission rate. To address the computational
of proposed space-time domain methods with benchmarks. complexity associated with massive MIMO-OFDM systems,
Without loss of generality, IArmijo is set as 20 in each iteration we propose a computationally-efficient ST-BDCS precoding
for ST-BDCS multiple taps response design. The position of scheme. This scheme leverages the sparse Toeplitz property of
¯ In Figures
impulse taps is in the center of effective channel ∆. OFDM channels to effectively mitigate multi-user interference
4 and 5, we compare the computational complexity of ST- in the downlink. Additionally, it shortens the temporal length
BDCS, SF-BDCS, and EZF precoding for different values of the effective channels, enabling users to estimate with sparse
of Qt (30 and 44, respectively). In Figure 4, the single tap pilots. To fast implement the precoding schemes, we develop
response design in the space-time domain has approximately a generalized complex-valued Toeplitz matrix QR decomposi-
a 45% reduction in complex multiplications compared with the tion algorithm that can be applied to various space-time signal
single tap SF-BDCS processing, and a decrease of 88.97% in processing problems. Simulation results demonstrate that the
computational complexity compared with EZF precoding. In ST-BDCS precoding method achieves comparable rate per-
the multiple response design scenario, the computational com- formance to conventional subcarrier-by-subcarrier precoding
8

schemes, while providing easier channel estimation for users develop the fast recursive algorithm for generalized complex-
and significantly reduced computational complexity for the valued Toeplitz QR decomposition, where N = 4. In this case,
BS. (32) appears to be
 
x1 x2 x3  
A PPENDIX A: G ENERALIZED C OMPLEX - VALUED S CALAR r11 z1 z2
 x4 x5  
T OEPLITZ QR DECOMPOSITION  x1 x2 
Q x6  =
 ,

 z1 x4 
Let us start to consider a general column full-rank (N ≤ M ) z2 z3 
t−1 t−2 t−3
scalar Toeplitz matrix, which can be written as tM −1 tM −2 tM −3
  (33)
t0 t−1 · · · t−(N −1)
 
where zH = z1 z2 z3 . In the first step, we introduce
 t1
 t0 · · · t−(N −2) 
 a 2-by-2 complex-valued Givens rotation matrix G1 to both
TH =  t2
 t1 · · · t−(N −3) 
 ∈ CM ×N . (22) sides of (33), where (33) can be transformed into
 .. .. .. .. 
 . . . .  
x1 x2 x3

 
tM −1 tM −2 ··· tM −N r̄11 z̄1 z̄2
 x4 x5  
 x1 x2 
We can partition the complex-valued scalar Toeplitz matrix Q x6  =
  . (34)

 z1 x4 
TH as z2 z3 
0 t̄−2 t̄−3
tM −1 tM −2 tM −3
t0 uH
   
T̃ u2
TH = 1
= . (23)
v1 T̃ v2H tM −N Here, G1 is absorbed into Q. Then, we control Q to achieve
Let TH = QR be the QR decomposition. Similarly, we can
   
x1 x2 x3 x1 x2 x3
also partition the decomposed upper triangular matrix R as  x4 x5   x4 x5 
   

r11 zH
 
R2 z̃
 Q x 6

 = Q 1
 x6  ,
R= = . (24)  z1 z2 z3   z̄1 z̄2 z̄3 
0 R1 0 rnn
tM −1 tM −2 tM −3 0 t̄M −2 t̄M −3
Since RH R = TTH , we will obtain two equations shown at (35)
the top of this page, where Since r11 and zH are known, we can continue utilizing 2-by-2
  Givens rotation matrix as
 ∗  t0
sin (θ) ejφ
    
r11 .. cos (θ) x1 r̄11
r11 = T  . (27) = , (36)
 
z . − sin (θ) e−jφ cos (θ) z̄1 0
tM −1
from which θ, φ, and x1 can be figured out accordingly. After
The right-hand side of (27) is a convolution and thus can
determining Givens rotation matrix, we further acquire
be efficiently computed using FFTs. Therefore, r11 and z
can be obtained. Next, we need to figure out the remaining 
cos (θ) sin (θ) ejφ

x2
 
z̄1

components of R. According to (25) and (26), we will have = , (37)
− sin (θ) e−jφ cos (θ) z̄2 ẑ2
zzH + RH H H
1 R1 = u1 u1 + T̃ T̃, (28) where we can compute x2 and ẑ2 . Using the same routine, we
and can obtain x3 and ẑ3 . Hence, we have
RH H H
2 R2 = T̃ T̃ + v2 v2 . (29)
 
× × ×  
× × ×
 x 4 x 5

Combining (28) and (29), it can be concluded that    a1 a2 
Q2  x6 =
  , (38)

 0 x4 
RH H H H H ẑ2 ẑ3 
1 R1 + zz + v2 v2 = R2 R2 + u1 u1 . (30) 0 t̄−2 t̄−3
0 t̄M −2 t̄M −3
Therefore, (30) can be further expressed as
  where × denotes the element we no longer care about. In
 RH1 the remaining steps, we can employ the same techniques to
 
 H  H  R2
R1 z v 2 z = R2 u1 , (31) ascertain all values of x4 , x5 , and x6 . The complex-valued
uH
 
v2H 1
scalar (CVS) Toeplitz QR decomposition is recapitulated for
to which we can introduce the unitary matrix Q as (22) as below
 
R1  
R2
Q  zH  = . (32) A PPENDIX B: G ENERALIZED C OMPLEX - VALUED B LOCK
uH
v2H 1 T OEPLITZ QR DECOMPOSITION

Note that the (i + 1)-th row of R2 corresponds to the i-th row The above generalized CVS Toeplitz QR decomposition can
of R1 in (32). To this end, we present an illustrative example to be extended into block scenario as explained as following. We
9

∗ H 2
2
t∗0 uH
   
r11 r11 z |t0 | + v1H v1 H
1 + v1 T̃ ,
= (25)
r11 z zz + RH
H
1 R1 t0 u1 + T̃H v1 u1 uH H
1 + T̃ T̃

RH RH T̃H T̃ + v2 v2H T̃H u2 + v2 tM −N


   
2 R2 2 z̃
= 2 . (26)
z̃H R2 z̃H z̃ + rnn2
u2 T̃ + t∗M −N v2H
H
uH2 u2 + |tM −N |

Algorithm 3 Generalized CVS Toeplitz QR Decomposition To this end, we demonstrate the generalized complex-valued
1: Input: r11 , zH , v2H , and uH
1 ; block (CVB) Toeplitz QR decomposition with an example,
2: R (1, 1 : N ) = r11 zH ; where N = 4. In this case, (44) appears to be
3: for k = 1 : N − 1 do  
Rotate and Update: uH X1 X2 X3
1 , R2 (k, k : end), and
 
4: R11 Z1 Z2
 H  X 4 X 5

z v2 by computing the Givens rotation matrix;    X1 X2 
Q X6  =
 ,

 Z1 X 4 
 Z2 Z3 
5: Solve: R1 (k, k), [θ, φ] = Givens R1 (k, k) , zH , and T−1 T−2 T−3
TM −1 TM −2 TM −3
R1 (k, k + 1 : end)
(45)
6: Update: zH ;  
where zH = Z1 Z2 Z3 . In the first step, we introduce
7: if k < N − 1 then
a Householder matrix J1 to both sides of (45), where J1 is
8: R2 (k + 1, k + 1 : end) = R1 (k, k : end − 1); H
determined by the first column of RH H

11 T−1 and can
9: end if
also be absorbed into Q. (45) is then transformed into
10: end for
Output: R.
 
11: X1 X2 X3  
R̄11 Z̄1 Z̄2
 X4 X5   
 X1 X2 
Q X6  =
 .
consider a block column full-rank Toeplitz matrix as

 Z1 X4 
Z2 Z3 
T̄−1 T̄−2 T̄−3

T0 T−1 · · · T−(N −1)
 TM −1 TM −2 TM −3
 T1 T0 · · · T−(N −2)  (46)
H

 T2 T1 · · · T−(N −3) 
 Next, we can generate another Householder matrix J 2 to rotate
T =  ∈ CM Q×N P , the first column of ZH

TH
H
, which can be expressed
 .. .. .. ..  1 M −1
 . . . .  as
TM −1 TM −2 · · · TM −N  
(39) X1 X2 X3  
R̄11 Z̄1 Z̄2
where each block is a complex-valued Q × P matrix and P ≤  X 4 X 5

   X1 X2 
Q. Let TH = QR be the QR decomposition. We can partition Q1  X6  =
 .

 Z̄1 X4 
TH and R as Z̄2 Z̄3 
T̄−1 T̄−2 T̄−3
T̄M −1 T̄M −2 T̄M −3
T0 uH
   
T̃ u2
TH = 1
= , (40) (47)
v1 T̃ v2H TM −N In the second step, the first row of

X1 X2 X3
and is
 required to be determined. Since the first column of
zH
   
R11 R2 z̃ H
R= = , (41) ZH1 T H
M −1 is rotated by J2 , we can resolve the
0 R1 0 Rnn first element of the first row in X1 by utilizing a 2-by-2
where R11 and zH can be calculated by Givens rotation matrix.  Similar to (37), the remaining elements
  in the first row of X1 X2 X3 are figured out after
 H  T0 recognizing the coefficients of Givens rotation matrix, where
R11 ..
R11 = T  . (42) (47) can be updated as
 
z .
TM −1  
X̄1 X̄2 X̄3  
Applying the same mathematical manipulation as in the scalar R̄11 Z̄1 Z̄2
 X 4 X 5 
case (25) and (26), we will have
   X1 X2 
Q2  X6  =
 .

 Ẑ1 X4 
Ẑ2 Ẑ3 
 
 RH1 T̄−1 T̄−2 T̄−3
 
 R2
T̄M −1 T̄M −2 T̄M −3
 H
R1 z v2  z  = RH

u 1 , (43)
2
uH (48)
v2H 1
Therefore, we can follow the same routine to determine
to which we can also introduce the unitary matrix Q as

the remaining unsolved elements in X̄1 X̄2 X̄3 . As
a result, X4 , X5 , and X6 are calculated successively. The
 
R1  
H  R2 generalized complex-valued block Toeplitz QR decomposition
Q z = . (44)
uH

v2H 1 is recapitulated for (39) as below
10

Algorithm 4 Generalized CVB Toeplitz QR Decomposition IEEE 92nd Vehicular Technology Conference (VTC2020-Fall). IEEE,
1: Input: r11 , zH , v2H , and uH 2020, pp. 1–6.
1 ; [16] F. Rusek and D. Fertonani, “Bounds on the information rate of inter-
2: R (1 : P, 1 : P N ) = R11 zH ;

symbol interference channels based on mismatched receivers,” IEEE
3: for n = 1 : N − 1 do Transactions on Information Theory, vol. 58, no. 3, pp. 1470–1482,
4: for p = 1 : P do 2012.
[17] F. Rusek and A. Prlja, “Optimal channel shortening for MIMO and
5: k = (n − 1)P + p; ISI channels,” IEEE Transactions on Wireless Communications, vol. 11,
6: Rotate and Update: uH 1 , R2 (k, k : end), and no. 2, pp. 810–818, 2011.
 H [18] A. Modenini, F. Rusek, and G. Colavolpe, “Optimal transmit filters for
z v2 by computing the Householder matrix; ISI channels under channel shortening detection,” IEEE Transactions on
 Communications, vol. 61, no. 12, pp. 4997–5005, 2013.
7: Solve: R1 (n, n), [θ, φ] = Givens R1 (n, n), zH , [19] S. Hu, X. Gao, and F. Rusek, “Linear precoder design for MIMO-ISI
broadcasting channels under channel shortening detection,” IEEE Signal
and R1 (n, n : end); Processing Letters, vol. 23, no. 9, pp. 1207–1211, 2016.
8: Update: zH ; [20] R.-A. Pitaval, “Channel Shortening by Large Multiantenna Precoding
9: if k < N − 1 then in OFDM,” IEEE Transactions on Communications, vol. 69, no. 5, pp.
2878–2893, 2021.
10: R2 (n + P, n + P : end) = R1 (n, n : end − P ); [21] J. Wang, Y. Jiang, and G. E. Sobelman, “Iterative computation of FIR
11: end if MIMO MMSE-DFE with flexible complexity-performance tradeoff,”
12: end for IEEE Transactions on Signal Processing, vol. 61, no. 9, pp. 2394–2404,
2013.
13: end for [22] R. M. Gray et al., “Toeplitz and circulant matrices: A review,” Founda-
14: Output: R. tions and Trends® in Communications and Information Theory, vol. 2,
no. 3, pp. 155–239, 2006.
[23] D. Sweet, “Fast Toeplitz orthogonalization,” Numerische Mathematik,
vol. 43, no. 1, pp. 1–21, 1984.
R EFERENCES [24] J. Chun, T. Kailath, and H. Lev-Ari, “Fast parallel algorithms for QR
and triangular factorization,” SIAM Journal on Scientific and Statistical
[1] N. Rajatheva, I. Atzeni, E. Bjornson, A. Bourdoux, S. Buzzi, J.-B. Computing, vol. 8, no. 6, pp. 899–913, 1987.
Dore, S. Erkucuk, M. Fuentes, K. Guan, Y. Hu et al., “White paper [25] A. Bojanczyk, R. Brent, and F. De Hoog, “QR factorization of Toeplitz
on broadband connectivity in 6G,” arXiv preprint arXiv:2004.14247, matrices,” Numerische Mathematik, vol. 49, no. 1, pp. 81–94, 1986.
2020. [26] Q. H. Spencer, A. L. Swindlehurst, and M. Haardt, “Zero-forcing meth-
[2] Z. Zhang, Y. Xiao, Z. Ma, M. Xiao, Z. Ding, X. Lei, G. K. Karagiannidis, ods for downlink spatial multiplexing in multiuser MIMO channels,”
and P. Fan, “6G wireless networks: Vision, requirements, architecture, IEEE Transactions on Signal Processing, vol. 52, no. 2, pp. 461–471,
and key technologies,” IEEE Vehicular Technology Magazine, vol. 14, 2004.
no. 3, pp. 28–41, 2019. [27] H. Sung, S.-R. Lee, and I. Lee, “Generalized channel inversion methods
[3] Z. Wang, J. Zhang, H. Du, E. Wei, B. Ai, D. Niyato, and M. Debbah, for multiuser MIMO systems,” IEEE Transactions on Communications,
“Extremely large-scale MIMO: Fundamentals, challenges, solutions, and vol. 57, no. 11, pp. 3489–3499, 2009.
future directions,” IEEE Wireless Communications, 2023. [28] L.-N. Tran, M. Juntti, and E.-K. Hong, “On the precoder design for
[4] J. Liu, W. Zhang, and Y. Jiang, “Fast Computation of Zero-Forcing block diagonalized MIMO broadcast channels,” IEEE Communications
Precoding for Massive MIMO-OFDM Systems,” IEEE Transactions on Letters, vol. 16, no. 8, pp. 1165–1168, 2012.
Signal Processing, 2024. [29] W. Li and M. Latva-aho, “An efficient channel block diagonalization
[5] Y.-W. Liang, R. Schober, and W. Gerstacker, “Time-domain transmit method for generalized zero forcing assisted MIMO broadcasting sys-
beamforming for MIMO-OFDM systems with finite rate feedback,” tems,” IEEE Transactions on Wireless Communications, vol. 10, no. 3,
IEEE Transactions on Communications, vol. 57, no. 9, pp. 2828–2838, pp. 739–744, 2010.
2009. [30] N. Boumal, An introduction to optimization on smooth manifolds.
[6] D. Cescato and H. Bölcskei, “Algorithms for interpolation-based QR Cambridge University Press, 2023.
decomposition in MIMO-OFDM systems,” IEEE Transactions on Signal [31] L. Sun and M. R. McKay, “Eigen-based transceivers for the MIMO
Processing, vol. 59, no. 4, pp. 1719–1733, 2011. broadcast channel with semi-orthogonal user selection,” IEEE Transac-
[7] Y. Liu, G. Y. Li, W. Han, and Z. Zhong, “Low-complexity recursive tions on Signal Processing, vol. 58, no. 10, pp. 5246–5261, 2010.
convolutional precoding for OFDM-based large-scale antenna systems,” [32] 3GPP, “TS 38.901 V16.1.0: Study on channel model for frequencies
IEEE Transactions on Wireless Communications, vol. 15, no. 7, pp. from 0.5 to 100 GHz,” www.3GPP.org, 2020.
4902–4913, 2016. [33] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization algorithms on
[8] S. Kashyap, C. Mollén, E. Björnson, and E. G. Larsson, “Frequency- matrix manifolds. Princeton University Press, 2008.
domain interpolation of the zero-forcing matrix in massive MIMO-
OFDM,” in 2016 IEEE 17th International Workshop on Signal Pro-
cessing Advances in Wireless Communications (SPAWC). IEEE, 2016,
pp. 1–5.
[9] C. Jeon, Z. Li, and C. Studer, “Approximate Gram-matrix interpolation
for wideband massive MU-MIMO systems,” IEEE Transactions on
Vehicular Technology, vol. 69, no. 5, pp. 4677–4688, 2020.
[10] W. Hu, F. Li, and Y. Jiang, “Phase rotations of svd-based precoders in
mimo-ofdm for improved channel estimation,” IEEE Wireless Commu-
nications Letters, vol. 10, no. 8, pp. 1805–1809, 2021.
[11] 3GPP, “TS 38.211 V16.10.0: NR; Physical channels and modulation,”
www.3GPP.org, 2022.
[12] ——, “TS 38.214 V16.10.0: NR; Physical layer procedures for data,”
www.3GPP.org, 2022.
[13] C. Shen and M. P. Fitz, “Mimo-ofdm beamforming for improved channel
estimation,” IEEE Journal on selected Areas in communications, vol. 26,
no. 6, pp. 948–959, 2008.
[14] F. Jiang, Q. Li, and X. Chen, “Channel smoothing for 802.11 ax beam-
formed mimo-ofdm,” IEEE Communications Letters, vol. 25, no. 10, pp.
3413–3417, 2021.
[15] E. Jeon, M. Ahn, S. Kim, W. B. Lee, and J. Kim, “Joint beamformer and
beamformee design for channel smoothing in wlan systems,” in 2020

You might also like