Robust subgroup-classifier learning and testing in change-plane regressions

Xu Liu1, Jian Huang2, Yong Zhou3 and Xiao Zhang4,   
1Shanghai University of Finance and Economics, Shanghai, China
2The Hong Kong Polytechnic University, Hong Kong SAR, China
3East China Normal University, Shanghai, China
4The Chinese University of Hong Kong-Shenzhen, Shenzhen, China
CONTACT [email protected]; The Chinese University of Hong Kong-Shenzhen, China
Abstract

Considered here are robust subgroup-classifier learning and testing in change-plane regressions with heavy-tailed errors, which can identify subgroups as a basis for making optimal recommendations for individualized treatment. A new subgroup classifier is proposed by smoothing the indicator function, which is learned by minimizing the smoothed Huber loss. Nonasymptotic properties and the Bahadur representation of estimators are established, in which the proposed estimators of the grouping difference parameter and baseline parameter achieve sub-Gaussian tails. The hypothesis test considered here belongs to the class of test problems for which some parameters are not identifiable under the null hypothesis. The classic supremum of the squared score test statistic may lose power in practice when the dimension of the grouping parameter is large, so to overcome this drawback and make full use of the data’s heavy-tailed error distribution, a robust weighted average of the squared score test statistic is proposed, which achieves a closed form when an appropriate weight is chosen. Asymptotic distributions of the proposed robust test statistic are derived under the null and alternative hypotheses. The proposed robust subgroup classifier and test statistic perform well on finite samples, and their performances are shown further by applying them to a medical dataset. The proposed procedure leads to the immediate application of recommending optimal individualized treatments.


Keywords: Gaussian-type deviation; Heavy-tailed data; Nonstandard test; Robust classifier; Subgroup classifier; Subgroup detection.

1 Introduction

When studying the risk of a disease outcome, there could be heterogeneity across subgroups characterized by covariates, meaning that the same treatment in different subpopulations may cause different treatment effects of predictors. In the presence of population heterogeneity in classical models, learning the subgroup classifier and testing the existence of subgroups associated with the risk-model heterogeneity are important for understanding better the different effects of predictors and modeling better the association of diseases with predictors. In precision medicine, this plays a core role in guiding personalized treatment to individuals in a population by identifying subgroups with different treatment effects on disease. There has been much previous research on learning the subgroup classifiers of individuals based on various models Foster et al., (2011); Wei and Kosorok, (2018); Huang et al., (2020); Li et al., (2021); Zhang et al., (2022).

Before learning the subgroup classifier, it is necessary to test for the existence of subgroups of individuals to address the potential risk of finding false-positive subgroups. This necessity not only arises from the data themselves but is also intrinsic to statistics, because the nonexistence of subgroups causes the identifiability problem when learning the subgroup classifier. However, this test problem belongs to the class of nonstandard tests with loss of identifiability under the null hypothesis; see Wald, (1943); Andrews and Ploberger, (1994, 1995); Davies, (1977); Song et al., (2009); Liu et al., (2024); Kang et al., (2024), among others. Therefore, the focus herein is on learning the subgroup classifier and testing for the existence of subgroups simultaneously, which offers more information and the potential for making the best recommendations for optimal individualized treatments and guiding future treatment modification and development.

Let {𝑽i=(yi,𝑿i,𝒁i,𝑼i),i=1,,n}formulae-sequencesubscript𝑽𝑖subscript𝑦𝑖subscript𝑿𝑖subscript𝒁𝑖subscript𝑼𝑖𝑖1𝑛\{{\boldsymbol{V}}_{i}=(y_{i},{\boldsymbol{X}}_{i},{\boldsymbol{Z}}_{i},{% \boldsymbol{U}}_{i}),i=1,\cdots,n\}{ bold_italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_i = 1 , ⋯ , italic_n } be the observed data, which are n𝑛nitalic_n independent and identically distributed copies of 𝑽=(y,𝑿,𝒁,𝑼)𝑽𝑦𝑿𝒁𝑼{\boldsymbol{V}}=(y,{\boldsymbol{X}},{\boldsymbol{Z}},{\boldsymbol{U}})bold_italic_V = ( italic_y , bold_italic_X , bold_italic_Z , bold_italic_U ). Consider the regression model with change plane Lee et al., (2011); Zhang et al., (2022); Mukherjee et al., (2022); Liu et al., (2024)

yi=𝑿iT𝜶+𝒁iT𝜷𝟏(𝑼iT𝜸0)+ϵi,subscript𝑦𝑖superscriptsubscript𝑿𝑖T𝜶superscriptsubscript𝒁𝑖T𝜷1superscriptsubscript𝑼𝑖T𝜸0subscriptitalic-ϵ𝑖\displaystyle y_{i}={\boldsymbol{X}}_{i}^{\rm T}\boldsymbol{\alpha}+{% \boldsymbol{Z}}_{i}^{\rm T}\boldsymbol{\beta}{\boldsymbol{1}}({\boldsymbol{U}}% _{i}^{\rm T}\boldsymbol{\gamma}\geq 0)+\epsilon_{i},italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α + bold_italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_β bold_1 ( bold_italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ ≥ 0 ) + italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (1)

where 𝜶=(α1,,αp)TΘαp𝜶superscriptsubscript𝛼1subscript𝛼𝑝TsubscriptΘ𝛼superscript𝑝\boldsymbol{\alpha}=(\alpha_{1},\cdots,\alpha_{p})^{\rm T}\in\Theta_{\alpha}% \subseteq\mathbb{R}^{p}bold_italic_α = ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ∈ roman_Θ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ⊆ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT, 𝜷=(β1,,βq)TΘβq𝜷superscriptsubscript𝛽1subscript𝛽𝑞TsubscriptΘ𝛽superscript𝑞\boldsymbol{\beta}=(\beta_{1},\cdots,\beta_{q})^{\rm T}\in\Theta_{\beta}% \subseteq\mathbb{R}^{q}bold_italic_β = ( italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_β start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ∈ roman_Θ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ⊆ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT and 𝜸=(γ1,,γr)TΘγr𝜸superscriptsubscript𝛾1subscript𝛾𝑟TsubscriptΘ𝛾superscript𝑟\boldsymbol{\gamma}=(\gamma_{1},\cdots,\gamma_{r})^{\rm T}\in\Theta_{\gamma}% \subseteq\mathbb{R}^{r}bold_italic_γ = ( italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_γ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ∈ roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ⊆ blackboard_R start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT are unknown parameters, and E(ϵi|𝑿i,𝒁i,𝑼i)=0Econditionalsubscriptitalic-ϵ𝑖subscript𝑿𝑖subscript𝒁𝑖subscript𝑼𝑖0{\mathrm{E}}(\epsilon_{i}|{\boldsymbol{X}}_{i},{\boldsymbol{Z}}_{i},{% \boldsymbol{U}}_{i})=0roman_E ( italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = 0 and E(|ϵi|2+δ|𝑿i,𝒁i,𝑼i)=Mδ<Econditionalsuperscriptsubscriptitalic-ϵ𝑖2𝛿subscript𝑿𝑖subscript𝒁𝑖subscript𝑼𝑖subscript𝑀𝛿{\mathrm{E}}(|\epsilon_{i}|^{2+\delta}|{\boldsymbol{X}}_{i},{\boldsymbol{Z}}_{% i},{\boldsymbol{U}}_{i})=M_{\delta}<\inftyroman_E ( | italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT | bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_M start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT < ∞ for some δ0𝛿0\delta\geq 0italic_δ ≥ 0. When δ=0𝛿0\delta=0italic_δ = 0, the error has a finite second moment, denoted by M0=E(ϵi2|𝑿i,𝒁i,𝑼i)subscript𝑀0Econditionalsuperscriptsubscriptitalic-ϵ𝑖2subscript𝑿𝑖subscript𝒁𝑖subscript𝑼𝑖M_{0}={\mathrm{E}}(\epsilon_{i}^{2}|{\boldsymbol{X}}_{i},{\boldsymbol{Z}}_{i},% {\boldsymbol{U}}_{i})italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_E ( italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). For easy expression, let 𝜽=(𝜶T,𝜷T)T𝜽superscriptsuperscript𝜶Tsuperscript𝜷TT\boldsymbol{\theta}=(\boldsymbol{\alpha}^{\rm T},\boldsymbol{\beta}^{\rm T})^{% \rm T}bold_italic_θ = ( bold_italic_α start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , bold_italic_β start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT. Following the expressions in Liu et al., (2024), 𝑼𝑼{\boldsymbol{U}}bold_italic_U is called the grouping variable, 𝜸𝜸\boldsymbol{\gamma}bold_italic_γ is called the grouping parameter, 𝒁𝒁{\boldsymbol{Z}}bold_italic_Z is called the grouping difference variable, 𝜷𝜷\boldsymbol{\beta}bold_italic_β is called the grouping difference parameter, 𝑿𝑿{\boldsymbol{X}}bold_italic_X is called the baseline variable, and 𝜶𝜶\boldsymbol{\alpha}bold_italic_α is called the baseline parameter. Herein, the indicator function 𝟏(𝑼T𝜸0)1superscript𝑼T𝜸0{\boldsymbol{1}}({\boldsymbol{U}}^{\rm T}\boldsymbol{\gamma}\geq 0)bold_1 ( bold_italic_U start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ ≥ 0 ) is called the subgroup classifier.

The technology for collecting and processing data sets has improved considerably in recent years, and one is now more likely to encounter heavy-tailed or low-quality data, thereby causing the typical assumption of a Gaussian or sub-Gaussian distribution to fail. Therefore, new challenges arise compared with the classic methodology for modeling non-Gaussian or heavy-tailed data. Even for linear regression models with heavy-tailed errors, the ordinary least squares (OLS) estimators are suboptimal both theoretically and empirically. Instead, proposed herein is a robust estimator of subgroup classification by considering the change-plane model (1) with heavy-tailed errors. This paper addresses two important problems for model (1) with heavy-tailed errors, i.e., subgroup-classifier learning (Section 2) and subgroup testing for whether subgroups exist (Section 3).

1.1 Robust subgroup-classifier learning

Li et al., (2021) considered the change-plane model (1) with Gaussian errors. Also, Zhang et al., (2022) investigated a quantile regression with a change plane and derived the asymptotic normalities for the grouping difference parameter and the grouping parameter. However, although quantile or median regression models require no Gaussian or sub-Gaussian assumption, they essentially estimate the conditional quantile or median regression instead of the conditional mean regression. If the mean regression is of interest in practice, then these procedures are not feasible unless the error distribution is symmetric around zero, which may be too strong to cause the misspecification problem. See Fan et al., 2017b for some examples that demonstrate the distinction between conditional mean regression and conditional quantile or median regression.

Linear regression models with heavy-tailed errors are prevalent in the literature. Fan et al., 2017b proposed a robust estimator of high-dimensional mean regression in the absence of asymmetry and with light tail assumptions. Zhou et al., (2018) provided a robust M-estimation procedure with applications to dependence-adjusted multiple testing. Sun et al., (2020) and Wang et al., (2021) studied adaptive Huber regression for linear regression models with heavy-tailed errors. Chen and Zhou, (2020) investigated robust inference via multiplier bootstrap in multiple response regression models, constructing robust bootstrap confidence sets and addressing large-scale simultaneous hypothesis testing problems.

Refer to caption
Refer to caption
Figure 1: Estimation errors of parameter 𝛃𝛃\boldsymbol{\beta}bold_italic_β with L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm (left) and accuracies of estimated subgroup classifier (right) for heavy-tailed errors generated from the Pareto distribution.

Mukherjee et al., (2022) studied the change-plane problem under heavy-tailed errors when 𝜶=0𝜶0\boldsymbol{\alpha}=0bold_italic_α = 0 and 𝑿=1𝑿1{\boldsymbol{X}}=1bold_italic_X = 1, which is a special case of the change-plane model (1), and they left the general change-plane model with heavy-tailed errors for future work. Herein, a new robust procedure is introduced to estimate parameters and consequently to learn the subgroup classifier. Figure 1 shows boxplots of the estimation errors of the parameter 𝜽=(𝜶T,𝜷T)T𝜽superscriptsuperscript𝜶Tsuperscript𝜷TT\boldsymbol{\theta}=(\boldsymbol{\alpha}^{\rm T},\boldsymbol{\beta}^{\rm T})^{% \rm T}bold_italic_θ = ( bold_italic_α start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , bold_italic_β start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT with L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm and the accuracies of the estimated subgroup classifier, where the L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm of the estimation errors is defined as 𝜽^τ,h𝜽normsubscript^𝜽𝜏superscript𝜽\|\hat{\boldsymbol{\theta}}_{\tau,h}-\boldsymbol{\theta}^{*}\|∥ over^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT - bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ with the robust estimator 𝜽^τ,hsubscript^𝜽𝜏\hat{\boldsymbol{\theta}}_{\tau,h}over^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT of the true grouping parameter 𝜽superscript𝜽\boldsymbol{\theta}^{*}bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, and the accuracy is defined as ACC=1n1i=1n|𝟏(𝑼iT𝜸^τ,h0)𝟏(𝑼iT𝜸0)|ACC1superscript𝑛1superscriptsubscript𝑖1𝑛1superscriptsubscript𝑼𝑖Tsubscript^𝜸𝜏01superscriptsubscript𝑼𝑖Tsuperscript𝜸0\mbox{ACC}=1-n^{-1}\sum_{i=1}^{n}\left|{\boldsymbol{1}}({\boldsymbol{U}}_{i}^{% \rm T}\hat{\boldsymbol{\gamma}}_{\tau,h}\geq 0)-{\boldsymbol{1}}({\boldsymbol{% U}}_{i}^{\rm T}\boldsymbol{\gamma}^{*}\geq 0)\right|ACC = 1 - italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | bold_1 ( bold_italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_γ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT ≥ 0 ) - bold_1 ( bold_italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≥ 0 ) | with the robust estimator 𝜸^τ,hsubscript^𝜸𝜏\hat{\boldsymbol{\gamma}}_{\tau,h}over^ start_ARG bold_italic_γ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT of the true parameter 𝜸superscript𝜸\boldsymbol{\gamma}^{*}bold_italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Here, the settings are (p,q,r)=(3,3,3)𝑝𝑞𝑟333(p,q,r)=(3,3,3)( italic_p , italic_q , italic_r ) = ( 3 , 3 , 3 ) and n=(200,400,600)𝑛200400600n=(200,400,600)italic_n = ( 200 , 400 , 600 ) with 1000 repetitions, the heavy-tailed errors are generated from the Pareto distribution Par(2,1)𝑃𝑎𝑟21Par(2,1)italic_P italic_a italic_r ( 2 , 1 ) with shape parameter 2 and scale parameter 1, and X1=Z1=1subscript𝑋1subscript𝑍11X_{1}=Z_{1}=1italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 and (X2,,Xp)T=(Z2,,Zp)Tsuperscriptsubscript𝑋2subscript𝑋𝑝Tsuperscriptsubscript𝑍2subscript𝑍𝑝T(X_{2},\cdots,X_{p})^{\rm T}=(Z_{2},\cdots,Z_{p})^{\rm T}( italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT = ( italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_Z start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT and (U2,,Ur)Tsuperscriptsubscript𝑈2subscript𝑈𝑟T(U_{2},\cdots,U_{r})^{\rm T}( italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT are generated independently from multivariate normal distributions N(𝟎p1,3𝑰p1)𝑁subscript0𝑝13subscript𝑰𝑝1N({\boldsymbol{0}}_{p-1},\sqrt{3}{\boldsymbol{I}}_{p-1})italic_N ( bold_0 start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT , square-root start_ARG 3 end_ARG bold_italic_I start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT ) and N(𝟎r1,3𝑰r1)𝑁subscript0𝑟13subscript𝑰𝑟1N({\boldsymbol{0}}_{r-1},\sqrt{3}{\boldsymbol{I}}_{r-1})italic_N ( bold_0 start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT , square-root start_ARG 3 end_ARG bold_italic_I start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ), respectively; see Section 4 for details. As used by Zhang et al., (2022), the smooth function K(u)={1+exp(u)}1𝐾𝑢superscript1𝑢1K(u)=\{1+\exp(-u)\}^{-1}italic_K ( italic_u ) = { 1 + roman_exp ( - italic_u ) } start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT with smoothness parameter h=log(n)/n𝑛𝑛h=\sqrt{\log(n)/n}italic_h = square-root start_ARG roman_log ( italic_n ) / italic_n end_ARG is chosen, and the proposed robust estimation procedure (AHu) is compared with the method based on OLS (Li et al., 2021). Figure 1 sends the important message that in the presence of heavy tails, compared with the existing method (Li et al., 2021), the proposed robust estimators not only reduce the estimation error dramatically but also improve significantly the accuracy of the subgroup classifier.

1.2 Robust subgroup testing

Another goal of this paper is to test for the existence of subgroups, i.e.,

H0:𝜷=𝟎versusH1:𝜷𝟎.:subscript𝐻0𝜷0𝑣𝑒𝑟𝑠𝑢𝑠subscript𝐻1:𝜷0\displaystyle H_{0}:\boldsymbol{\beta}={\boldsymbol{0}}\quad versus\quad H_{1}% :\boldsymbol{\beta}\neq{\boldsymbol{0}}.italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : bold_italic_β = bold_0 italic_v italic_e italic_r italic_s italic_u italic_s italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : bold_italic_β ≠ bold_0 . (2)

Note that the grouping parameter 𝜸𝜸\boldsymbol{\gamma}bold_italic_γ is not identifiable under the null hypothesis.

The classic Wald-type test or score-based test is powerful in standard test problems when there is no identifiability problem in both the null and alternative hypotheses, but these common procedures are not feasible when nuisance parameters are present. Andrews and Ploberger, (1994) and Andrews and Ploberger, (1995) studied the weighted average exponential form, which was originally introduced by Wald, (1943). Davies, (1977) investigated well the supremum of the squared score test (SST) statistic for mixture models, which was applied by Song et al., (2009) and Kang et al., (2017) to semiparametric models in censoring data.

All the aforementioned testing methods are optimal tests based on the weighted average power criterion. However, because these optimal tests take the weighted exponential average of the classical tests over the grouping parametric space ΘγsubscriptΘ𝛾\Theta_{\gamma}roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT, they may not only not perform well in practice when the dimension of ΘγsubscriptΘ𝛾\Theta_{\gamma}roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT is large but also give rise to a heavy-burden calculation of the p-value or the critical value. Instead, Liu et al., (2024) introduced a new test statistic by taking the weighted average of the SST (WAST) over ΘγsubscriptΘ𝛾\Theta_{\gamma}roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT and removing both the inverse of the covariance and the cross-interaction terms to overcome the drawbacks of SST. Thanks to its closed form, WAST achieves more-accurate type-I errors and significantly improved power and hence dramatically reduced computational time as a byproduct.

Refer to caption
Refer to caption
Refer to caption
Figure 2: Powers of test statistics by proposed RWAST (rwast, red solid line), WAST (olsw, green dashed line), and SST (olss, blue dotted line) for (p,q,r)=(3,3,3)𝑝𝑞𝑟333(p,q,r)=(3,3,3)( italic_p , italic_q , italic_r ) = ( 3 , 3 , 3 ) and generating heavy-tailed errors from the Pareto distribution.

All the aforementioned test procedures require the important assumption of Gaussian or sub-Gaussian errors in the change-plane models, none of which apply to heavy-tailed data sets. Therefore, proposed herein is a robust test procedure based on WAST (Liu et al., 2024), called robust WAST (RWAST). Figure 2 shows the power curves of the proposed RWAST, WAST as introduced by Liu et al., (2024), and SST as considered by Davies, (1977); Kang et al., (2017). A total of 1000 bootstrap samples is set, and the other settings are the same as those in Section 1.1. With the nominal significance level α=0.05𝛼0.05\alpha=0.05italic_α = 0.05, the type-I errors for these three methods are (rwast,olsw,olss)=(0.042,0.065,0.031)rwastolswolss0.0420.0650.031(\mbox{rwast},\mbox{olsw},\mbox{olss})=(0.042,0.065,0.031)( rwast , olsw , olss ) = ( 0.042 , 0.065 , 0.031 ) for n=200𝑛200n=200italic_n = 200, (rwast,olsw,olss)=(0.043,0.047,0.118)rwastolswolss0.0430.0470.118(\mbox{rwast},\mbox{olsw},\mbox{olss})=(0.043,0.047,0.118)( rwast , olsw , olss ) = ( 0.043 , 0.047 , 0.118 ) for n=400𝑛400n=400italic_n = 400, and (rwast,olsw,olss)=(0.033,0.047,0.135)rwastolswolss0.0330.0470.135(\mbox{rwast},\mbox{olsw},\mbox{olss})=(0.033,0.047,0.135)( rwast , olsw , olss ) = ( 0.033 , 0.047 , 0.135 ) for n=600𝑛600n=600italic_n = 600. It follows that RWAST controls the type-I errors well, while those of SST based on quadratic loss are larger and deviate far from 0.05. Figure 2 shows that in the presence of heavy tails, the proposed RWAST achieves larger power in comparison with WAST based on the ordinary quadratic loss.

In summary, from the demonstration examples in Section 1.1 and Section 1.2, compared with the existing nonrobust methods for heavy-tailed data, the proposed robust estimation procedure is characterized by lower estimation errors and higher accuracy, and the robust test procedure has more-accurate type-I errors and larger power.

1.3 Main contributions

The main contributions of this paper are summarized as follows. First, the robust estimation procedure for the change-plane model (1) with heavy-tailed errors is investigated carefully. The proposed robust estimator adapts to the sample size, the robustification parameter in the Huber loss, the smoothness parameter when approximating the indicator function in the subgroup classifier, and the moments of errors. The sacrifices made in pursuit of robustness and smoothness are analyzed theoretically, with the bias involving the robustification parameter arising from the pursuit of robustness, and the one involving the smoothness parameter arising from the approximation to the indicator function. The nonasymptotic properties for parameters 𝜶𝜶\boldsymbol{\alpha}bold_italic_α and 𝜷𝜷\boldsymbol{\beta}bold_italic_β are established, as well as those for the grouping parameter. The theoretical results reveal that the proposed estimators of the grouping difference parameter and baseline parameter have Gaussian-type deviations (Devroye et al., 2016). Also provided is the nonasymptotic Bahadur representation of the proposed robust estimators, which is convenient for deriving the classical asymptotic results needed for statistical inference such as hypothesis tests and constructing confidence regions. Extensive simulation studies show that the proposed robust estimation procedure is superior to its several competitors.

Second, for the change-plane model with heavy-tailed errors, RWAST is proposed, which makes full use of the data’s heavy-tailed information and overcomes the drawbacks of loss of power in practice and the heavy computational burden of SST when the dimension of the grouping parameter is large. The asymptotic distributions of the proposed RWAST under the null and alternative hypotheses are established based on the theory of degenerate U-statistics. As with exponential average tests, the proposed asymptotic distributions are not standard (e.g., the normal or χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT distribution), so a novel bootstrap method that is easily implemented and theoretically guaranteed is introduced to mimic the critical value or p-value. Comprehensive simulation studies conducted with finite sample sizes and for various heavy-tailed error distributions show the excellent performance of the proposed RWAST, which improves the power significantly and reduces the computational burden dramatically.

In summary, a novel robust estimator is proposed that adapts to the sample size, dimension, robustification parameter, moments, and smoothness parameter in pursuit of the optimal tradeoff among bias, robustness, and smoothness. To the best of the author’s knowledge about change-plane analysis, the literature contains no nonasymptotic results with sub-Gaussian tails for parameters 𝜽𝜽\boldsymbol{\theta}bold_italic_θ and 𝜼𝜼\boldsymbol{\eta}bold_italic_η, and no nonasymptotic results with sub-exponential tails for the Bahadur representation of these parameters. Furthermore, a robust test procedure is proposed that improves on WAST in the change-plane model with heavy-tailed errors.

1.4 Notation and Organization of paper

Here, some useful notation is introduced for convenience of expression. For a vector 𝒗d𝒗superscript𝑑{\boldsymbol{v}}\in\mathbb{R}^{d}bold_italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and a square matrix A=(aij)d×d𝐴subscript𝑎𝑖𝑗superscript𝑑𝑑A=(a_{ij})\in\mathbb{R}^{d\times d}italic_A = ( italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT, denote by 𝒗norm𝒗\|{\boldsymbol{v}}\|∥ bold_italic_v ∥ the Euclidean norm of 𝒗𝒗{\boldsymbol{v}}bold_italic_v, by trace(A)=i=1daiitrace𝐴superscriptsubscript𝑖1𝑑subscript𝑎𝑖𝑖\mbox{trace}(A)=\sum_{i=1}^{d}a_{ii}trace ( italic_A ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT the trace of A𝐴Aitalic_A; 𝒗A2=i,jaijvivjsuperscriptsubscriptnorm𝒗𝐴2subscript𝑖𝑗subscript𝑎𝑖𝑗subscript𝑣𝑖subscript𝑣𝑗\|{\boldsymbol{v}}\|_{A}^{2}=\sum_{i,j}a_{ij}v_{i}v_{j}∥ bold_italic_v ∥ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and 𝒗2=𝒗𝒗Tsuperscript𝒗tensor-productabsent2𝒗superscript𝒗T{\boldsymbol{v}}^{\otimes 2}={\boldsymbol{v}}{\boldsymbol{v}}^{\rm T}bold_italic_v start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT = bold_italic_v bold_italic_v start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT. Denote by Ap=sup{A𝒙p:𝒙d,𝒙p=1}\|A\|_{p}=\sup\{\|A{\boldsymbol{x}}\|_{p}:{\boldsymbol{x}}\in\mathbb{R}^{d},\|% {\boldsymbol{x}}\|_{p}=1\}∥ italic_A ∥ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = roman_sup { ∥ italic_A bold_italic_x ∥ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT : bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∥ bold_italic_x ∥ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = 1 } the induced operator norm for a matrix A=(aij)m×d𝐴subscript𝑎𝑖𝑗superscript𝑚𝑑A=(a_{ij})\in\mathbb{R}^{m\times d}italic_A = ( italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_d end_POSTSUPERSCRIPT.

Denote by 𝐏𝐏\boldsymbol{\mathrm{P}}bold_P the ordinary probability measure such that 𝐏f=f𝑑𝐏𝐏𝑓𝑓differential-d𝐏\boldsymbol{\mathrm{P}}f=\int fd\boldsymbol{\mathrm{P}}bold_P italic_f = ∫ italic_f italic_d bold_P for any measurable function f𝑓fitalic_f, by nsubscript𝑛\mathbb{P}_{n}blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT the empirical measure of a sample of random elements from 𝐏𝐏\boldsymbol{\mathrm{P}}bold_P such that nf=n1i=1nf(𝑽i)subscript𝑛𝑓superscript𝑛1superscriptsubscript𝑖1𝑛𝑓subscript𝑽𝑖\mathbb{P}_{n}f=n^{-1}\sum_{i=1}^{n}f({\boldsymbol{V}}_{i})blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_f = italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( bold_italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), and by 𝔾nsubscript𝔾𝑛{\mathbb{G}}_{n}blackboard_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT the empirical process indexed by a class {\cal F}caligraphic_F of measurable functions such that 𝔾nf=n(n𝐏)fsubscript𝔾𝑛𝑓𝑛subscript𝑛𝐏𝑓{\mathbb{G}}_{n}f=\sqrt{n}(\mathbb{P}_{n}-\boldsymbol{\mathrm{P}})fblackboard_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_f = square-root start_ARG italic_n end_ARG ( blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - bold_P ) italic_f for any f𝑓f\in{\cal F}italic_f ∈ caligraphic_F. Let Lp(Q)superscript𝐿𝑝𝑄L^{p}(Q)italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_Q ) be the space of all measurable functions f𝑓fitalic_f such that fQ,p:=(Q|f|p)1/p<assignsubscriptnorm𝑓𝑄𝑝superscript𝑄superscript𝑓𝑝1𝑝\|f\|_{Q,p}:=(Q|f|^{p})^{1/p}<\infty∥ italic_f ∥ start_POSTSUBSCRIPT italic_Q , italic_p end_POSTSUBSCRIPT := ( italic_Q | italic_f | start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / italic_p end_POSTSUPERSCRIPT < ∞, where p[1,)𝑝1p\in[1,\infty)italic_p ∈ [ 1 , ∞ ) and (Q|f|p)1/psuperscript𝑄superscript𝑓𝑝1𝑝(Q|f|^{p})^{1/p}( italic_Q | italic_f | start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / italic_p end_POSTSUPERSCRIPT denotes the essential supremum when p=𝑝p=\inftyitalic_p = ∞. Let N(ϵ,,Q,2)N(\epsilon,{\cal F},\|\cdot\|_{Q,2})italic_N ( italic_ϵ , caligraphic_F , ∥ ⋅ ∥ start_POSTSUBSCRIPT italic_Q , 2 end_POSTSUBSCRIPT ) be an ϵitalic-ϵ\epsilonitalic_ϵ-covering number of {\cal F}caligraphic_F with respect to the L2(Q)superscript𝐿2𝑄L^{2}(Q)italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_Q ) seminorm Q,2\|\cdot\|_{Q,2}∥ ⋅ ∥ start_POSTSUBSCRIPT italic_Q , 2 end_POSTSUBSCRIPT, where {\cal F}caligraphic_F is a class of measure functions and Q𝑄Qitalic_Q is finite discrete.

The remainder of this paper is organized as follows. Section 2 provides the robust estimators for the grouping difference parameter and grouping parameter as well as the subgroup classifier, and theorems reveal that these estimators achieve Gaussian-type deviations. Also derived is the Bahadur representation of the robust estimators, and it is shown that the remainder of the Bahadur representation achieves sub-Gaussian tails. Section 3 presents the RWAST statistic and establishes its limiting distributions under the null and alternative hypotheses. Section 4 reports the results of simulation studies conducted to evaluate the finite-sample performance of the proposed methods with competitors in the change-plane models with heavy-tailed errors. The performance of the proposed methods is illustrated further by applying them to a medical dataset in Section 5. Finally, Section 6 concludes with remarks and further extensions. The proofs are provided in the Supplementary Material, and an R package named “wasthub” is available at https://github.com/xliusufe/wasthub.

2 Robust subgroup-classifier learning

In this section, the subgroup classifier is learned to partition subjects into two subgroups, and nonasymptotic properties are provided for the robust estimators, whose deviations achieve sub-Gaussian tails. To adapt for different magnitudes of errors and to robustify the estimation, the Huber loss (Huber, 1964; Fan et al., 2017b ; Wang et al., 2021; Han et al., 2022) is considered, the definition of which begins this section.

Definition 2.1.

The Huber loss Lτ(u)subscript𝐿𝜏𝑢L_{\tau}(u)italic_L start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_u ) (Huber, 1964) is defined as

Lτ(u)={u2/2 if |u|τ,τ|u|τ2/2 otherwise,subscript𝐿𝜏𝑢casessuperscript𝑢22 if 𝑢𝜏𝜏𝑢superscript𝜏22 otherwise,\displaystyle L_{\tau}(u)=\left\{\begin{array}[]{ll}u^{2}/2&\text{ if }|u|\leq% \tau,\\ \tau|u|-\tau^{2}/2&\text{ otherwise,}\end{array}\right.italic_L start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_u ) = { start_ARRAY start_ROW start_CELL italic_u start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 end_CELL start_CELL if | italic_u | ≤ italic_τ , end_CELL end_ROW start_ROW start_CELL italic_τ | italic_u | - italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 end_CELL start_CELL otherwise, end_CELL end_ROW end_ARRAY (5)

where τ>0𝜏0\tau>0italic_τ > 0 is a tuning parameter called the robustification parameter (Sun et al., 2020; Chen and Zhou, 2020), which regulates the bias and robustness.

The Huber loss is a hybrid of the squared loss with small errors and absolute loss for large errors. Denoting by L(u)=u2/2𝐿𝑢superscript𝑢22L(u)=u^{2}/2italic_L ( italic_u ) = italic_u start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 the ordinary quadratic loss, it is straightforward to see that L(u)=limτLτ(u)𝐿𝑢subscript𝜏subscript𝐿𝜏𝑢L(u)=\lim_{\tau\rightarrow\infty}L_{\tau}(u)italic_L ( italic_u ) = roman_lim start_POSTSUBSCRIPT italic_τ → ∞ end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_u ).

2.1 Robust estimation

Rewrite model (1) as

yi=𝑿iT𝜶+𝒁iT𝜷𝟏(U1i+𝑼2iT𝜼0)+ϵi,subscript𝑦𝑖superscriptsubscript𝑿𝑖T𝜶superscriptsubscript𝒁𝑖T𝜷1subscript𝑈1𝑖superscriptsubscript𝑼2𝑖T𝜼0subscriptitalic-ϵ𝑖\displaystyle y_{i}={\boldsymbol{X}}_{i}^{\rm T}\boldsymbol{\alpha}+{% \boldsymbol{Z}}_{i}^{\rm T}\boldsymbol{\beta}{\boldsymbol{1}}(U_{1i}+{% \boldsymbol{U}}_{2i}^{\rm T}\boldsymbol{\eta}\geq 0)+\epsilon_{i},italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α + bold_italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_β bold_1 ( italic_U start_POSTSUBSCRIPT 1 italic_i end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_η ≥ 0 ) + italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (6)

where 𝑼i=(U1i,𝑼2iT)Tsubscript𝑼𝑖superscriptsubscript𝑈1𝑖superscriptsubscript𝑼2𝑖TT{\boldsymbol{U}}_{i}=(U_{1i},{\boldsymbol{U}}_{2i}^{\rm T})^{\rm T}bold_italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_U start_POSTSUBSCRIPT 1 italic_i end_POSTSUBSCRIPT , bold_italic_U start_POSTSUBSCRIPT 2 italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT, 𝜼=γ11𝜸1𝜼superscriptsubscript𝛾11subscript𝜸1\boldsymbol{\eta}=\gamma_{1}^{-1}\boldsymbol{\gamma}_{-1}bold_italic_η = italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_γ start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT with 𝜸1=(γ2,,γr)Tsubscript𝜸1superscriptsubscript𝛾2subscript𝛾𝑟T\boldsymbol{\gamma}_{-1}=(\gamma_{2},\cdots,\gamma_{r})^{\rm T}bold_italic_γ start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT = ( italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_γ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT. To avoid the identifiability problem for 𝜼𝜼\boldsymbol{\eta}bold_italic_η, 𝜷𝟎𝜷0\boldsymbol{\beta}\neq{\boldsymbol{0}}bold_italic_β ≠ bold_0 is assumed in this section. Denote 𝜻=(𝜶T,𝜷T,𝜼T)TΘζ𝜻superscriptsuperscript𝜶Tsuperscript𝜷Tsuperscript𝜼TTsubscriptΘ𝜁\boldsymbol{\zeta}=(\boldsymbol{\alpha}^{\rm T},\boldsymbol{\beta}^{\rm T},% \boldsymbol{\eta}^{\rm T})^{\rm T}\in\Theta_{\zeta}bold_italic_ζ = ( bold_italic_α start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , bold_italic_β start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , bold_italic_η start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ∈ roman_Θ start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT, where Θζ=Θα×Θβ×ΘηsubscriptΘ𝜁subscriptΘ𝛼subscriptΘ𝛽subscriptΘ𝜂\Theta_{\zeta}=\Theta_{\alpha}\times\Theta_{\beta}\times\Theta_{\eta}roman_Θ start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT = roman_Θ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT × roman_Θ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT × roman_Θ start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT is the product space.

Because the indicator function 𝟏(U1+𝑼2T𝜼0)1subscript𝑈1superscriptsubscript𝑼2T𝜼0{\boldsymbol{1}}(U_{1}+{\boldsymbol{U}}_{2}^{\rm T}\boldsymbol{\eta}\geq 0)bold_1 ( italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_η ≥ 0 ) is not differentiable, it is natural to approximate it by a smooth function K(u)𝐾𝑢K(u)italic_K ( italic_u ) satisfying

limu+K(u)=1andlimuK(u)=0.subscript𝑢𝐾𝑢1andsubscript𝑢𝐾𝑢0\displaystyle\lim_{u\rightarrow+\infty}K(u)=1~{}\mbox{and}~{}\lim_{u% \rightarrow-\infty}K(u)=0.roman_lim start_POSTSUBSCRIPT italic_u → + ∞ end_POSTSUBSCRIPT italic_K ( italic_u ) = 1 and roman_lim start_POSTSUBSCRIPT italic_u → - ∞ end_POSTSUBSCRIPT italic_K ( italic_u ) = 0 .

Note that this smooth function characterizes the cumulative distribution function instead of the density function; see Seo and Linton, (2007); Li et al., (2021); Zhang et al., (2022); Mukherjee et al., (2020) for more details. The literature contains many commonly used smooth functions, such as the cumulative distribution function of standard normal distribution K(u)=Φ(u)𝐾𝑢Φ𝑢K(u)=\Phi(u)italic_K ( italic_u ) = roman_Φ ( italic_u ), the sigmoid function K(u)={1+exp(u)}1𝐾𝑢superscript1𝑢1K(u)=\{1+\exp(-u)\}^{-1}italic_K ( italic_u ) = { 1 + roman_exp ( - italic_u ) } start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, and the mixture of the cumulative distribution function and density of standard normal distribution K(u)=Φ(u)+uϕ(u)𝐾𝑢Φ𝑢𝑢italic-ϕ𝑢K(u)=\Phi(u)+u\phi(u)italic_K ( italic_u ) = roman_Φ ( italic_u ) + italic_u italic_ϕ ( italic_u ). Thus, model (6) can be approximated by

yi=𝑿iT𝜶+𝒁iT𝜷Kh(U1i+𝑼2iT𝜼)+ϵi,subscript𝑦𝑖superscriptsubscript𝑿𝑖T𝜶superscriptsubscript𝒁𝑖T𝜷subscript𝐾subscript𝑈1𝑖superscriptsubscript𝑼2𝑖T𝜼subscriptitalic-ϵ𝑖\displaystyle y_{i}={\boldsymbol{X}}_{i}^{\rm T}\boldsymbol{\alpha}+{% \boldsymbol{Z}}_{i}^{\rm T}\boldsymbol{\beta}K_{h}(U_{1i}+{\boldsymbol{U}}_{2i% }^{\rm T}\boldsymbol{\eta})+\epsilon_{i},italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α + bold_italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_β italic_K start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT 1 italic_i end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_η ) + italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (7)

where Kh(u)=K(u/h)subscript𝐾𝑢𝐾𝑢K_{h}(u)=K(u/h)italic_K start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_u ) = italic_K ( italic_u / italic_h ), and hhitalic_h is a predetermined tuning parameter associated with n𝑛nitalic_n satisfying limnh=0subscript𝑛0\lim_{n\rightarrow\infty}h=0roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_h = 0 , called the smoothness parameter.

For any τ>0𝜏0\tau>0italic_τ > 0, let 𝜻τ,hsuperscriptsubscript𝜻𝜏\boldsymbol{\zeta}_{\tau,h}^{*}bold_italic_ζ start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be the minimizer defined as

𝜻τ,h=argmin𝜻Θζ𝐏Lτ(y𝑿T𝜶𝒁T𝜷Kh(U1+𝑼2T𝜼)),superscriptsubscript𝜻𝜏subscriptargmin𝜻subscriptΘ𝜁𝐏subscript𝐿𝜏𝑦superscript𝑿T𝜶superscript𝒁T𝜷subscript𝐾subscript𝑈1superscriptsubscript𝑼2T𝜼\displaystyle\boldsymbol{\zeta}_{\tau,h}^{*}=\mathop{\rm argmin}_{\boldsymbol{% \zeta}\in\Theta_{\zeta}}\boldsymbol{\mathrm{P}}L_{\tau}(y-{\boldsymbol{X}}^{% \rm T}\boldsymbol{\alpha}-{\boldsymbol{Z}}^{\rm T}\boldsymbol{\beta}K_{h}(U_{1% }+{\boldsymbol{U}}_{2}^{\rm T}\boldsymbol{\eta})),bold_italic_ζ start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_argmin start_POSTSUBSCRIPT bold_italic_ζ ∈ roman_Θ start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_P italic_L start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y - bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α - bold_italic_Z start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_β italic_K start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_η ) ) , (8)

which approximates the minimizer

𝜻τ=argmin𝜻Θζ𝐏Lτ(y𝑿T𝜶𝒁T𝜷𝟏(U1+𝑼2T𝜼0)).superscriptsubscript𝜻𝜏subscriptargmin𝜻subscriptΘ𝜁𝐏subscript𝐿𝜏𝑦superscript𝑿T𝜶superscript𝒁T𝜷1subscript𝑈1superscriptsubscript𝑼2T𝜼0\displaystyle\boldsymbol{\zeta}_{\tau}^{*}=\mathop{\rm argmin}_{\boldsymbol{% \zeta}\in\Theta_{\zeta}}\boldsymbol{\mathrm{P}}L_{\tau}(y-{\boldsymbol{X}}^{% \rm T}\boldsymbol{\alpha}-{\boldsymbol{Z}}^{\rm T}\boldsymbol{\beta}{% \boldsymbol{1}}(U_{1}+{\boldsymbol{U}}_{2}^{\rm T}\boldsymbol{\eta}\geq 0)).bold_italic_ζ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_argmin start_POSTSUBSCRIPT bold_italic_ζ ∈ roman_Θ start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_P italic_L start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y - bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α - bold_italic_Z start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_β bold_1 ( italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_η ≥ 0 ) ) . (9)

As did Sun et al., (2020), 𝜻τsuperscriptsubscript𝜻𝜏\boldsymbol{\zeta}_{\tau}^{*}bold_italic_ζ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is called the Huber coefficient, which usually distinguishes from the true parameter 𝜻superscript𝜻\boldsymbol{\zeta}^{*}bold_italic_ζ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Measured by 𝜻τ𝜻normsuperscriptsubscript𝜻𝜏superscript𝜻\|\boldsymbol{\zeta}_{\tau}^{*}-\boldsymbol{\zeta}^{*}\|∥ bold_italic_ζ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_ζ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥, the Huber error is caused by the robustification for the heavy-tailed errors, while the distance 𝜻τ,h𝜻normsuperscriptsubscript𝜻𝜏superscript𝜻\|\boldsymbol{\zeta}_{\tau,h}^{*}-\boldsymbol{\zeta}^{*}\|∥ bold_italic_ζ start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_ζ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ is a consequence of both robustification and smoothness. Theorem 2 reveals that 𝜻τ,h𝜻normsuperscriptsubscript𝜻𝜏superscript𝜻\|\boldsymbol{\zeta}_{\tau,h}^{*}-\boldsymbol{\zeta}^{*}\|∥ bold_italic_ζ start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_ζ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ is controlled by both τ𝜏\tauitalic_τ and hhitalic_h, with hhitalic_h playing the role of the bandwidth in the nonparametric area.

Minimizing the empirical loss in (8) produces the robust estimator of interest, i.e.,

𝜻^τ,h=argmin𝜻ΘζnLτ(y𝑿T𝜶𝒁T𝜷Kh(U1+𝑼2T𝜼)).subscript^𝜻𝜏subscriptargmin𝜻subscriptΘ𝜁subscript𝑛subscript𝐿𝜏𝑦superscript𝑿T𝜶superscript𝒁T𝜷subscript𝐾subscript𝑈1superscriptsubscript𝑼2T𝜼\displaystyle\hat{\boldsymbol{\zeta}}_{\tau,h}=\mathop{\rm argmin}_{% \boldsymbol{\zeta}\in\Theta_{\zeta}}\mathbb{P}_{n}L_{\tau}(y-{\boldsymbol{X}}^% {\rm T}\boldsymbol{\alpha}-{\boldsymbol{Z}}^{\rm T}\boldsymbol{\beta}K_{h}(U_{% 1}+{\boldsymbol{U}}_{2}^{\rm T}\boldsymbol{\eta})).over^ start_ARG bold_italic_ζ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT = roman_argmin start_POSTSUBSCRIPT bold_italic_ζ ∈ roman_Θ start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y - bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α - bold_italic_Z start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_β italic_K start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_η ) ) . (10)

From (8), (9), and (10), the total estimation error 𝜻^τ,h𝜻normsubscript^𝜻𝜏superscript𝜻\|\hat{\boldsymbol{\zeta}}_{\tau,h}-\boldsymbol{\zeta}^{*}\|∥ over^ start_ARG bold_italic_ζ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT - bold_italic_ζ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ can be decomposed into three parts, i.e.,

𝜻^τ,h𝜻Total error𝜻^τ,h𝜻τ,hestimation error+𝜻τ,h𝜻τsmooth error+𝜻τ𝜻Huber error.subscriptnormsubscript^𝜻𝜏superscript𝜻Total errorsubscriptnormsubscript^𝜻𝜏superscriptsubscript𝜻𝜏estimation errorsubscriptnormsuperscriptsubscript𝜻𝜏superscriptsubscript𝜻𝜏smooth errorsubscriptnormsuperscriptsubscript𝜻𝜏superscript𝜻Huber error\displaystyle\underbrace{\|\hat{\boldsymbol{\zeta}}_{\tau,h}-\boldsymbol{\zeta% }^{*}\|}_{\mbox{Total error}}\leq\underbrace{\|\hat{\boldsymbol{\zeta}}_{\tau,% h}-\boldsymbol{\zeta}_{\tau,h}^{*}\|}_{\mbox{estimation error}}+\underbrace{\|% \boldsymbol{\zeta}_{\tau,h}^{*}-\boldsymbol{\zeta}_{\tau}^{*}\|}_{\mbox{smooth% error}}+\underbrace{\|\boldsymbol{\zeta}_{\tau}^{*}-\boldsymbol{\zeta}^{*}\|}% _{\mbox{Huber error}}.under⏟ start_ARG ∥ over^ start_ARG bold_italic_ζ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT - bold_italic_ζ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ end_ARG start_POSTSUBSCRIPT Total error end_POSTSUBSCRIPT ≤ under⏟ start_ARG ∥ over^ start_ARG bold_italic_ζ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT - bold_italic_ζ start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ end_ARG start_POSTSUBSCRIPT estimation error end_POSTSUBSCRIPT + under⏟ start_ARG ∥ bold_italic_ζ start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_ζ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ end_ARG start_POSTSUBSCRIPT smooth error end_POSTSUBSCRIPT + under⏟ start_ARG ∥ bold_italic_ζ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_ζ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ end_ARG start_POSTSUBSCRIPT Huber error end_POSTSUBSCRIPT . (11)

It is natural to use the alternating strategy to obtain the estimate, denoted by 𝜻^=(𝜶^T,𝜷^T,𝜼^T)T^𝜻superscriptsuperscript^𝜶Tsuperscript^𝜷Tsuperscript^𝜼TT\hat{\boldsymbol{\zeta}}=(\hat{\boldsymbol{\alpha}}^{\rm T},\hat{\boldsymbol{% \beta}}^{\rm T},\hat{\boldsymbol{\eta}}^{\rm T})^{\rm T}over^ start_ARG bold_italic_ζ end_ARG = ( over^ start_ARG bold_italic_α end_ARG start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_η end_ARG start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT. Specifically, the parameters (𝜶T,𝜷T)Tsuperscriptsuperscript𝜶Tsuperscript𝜷TT(\boldsymbol{\alpha}^{\rm T},\boldsymbol{\beta}^{\rm T})^{\rm T}( bold_italic_α start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , bold_italic_β start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT and 𝜼𝜼\boldsymbol{\eta}bold_italic_η can be estimated iteratively as follows. For given 𝜼(k)superscript𝜼𝑘\boldsymbol{\eta}^{(k)}bold_italic_η start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT, 𝜶(k+1)superscript𝜶𝑘1\boldsymbol{\alpha}^{(k+1)}bold_italic_α start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT and 𝜷(k+1)superscript𝜷𝑘1\boldsymbol{\beta}^{(k+1)}bold_italic_β start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT are obtained by minimizing

nLτ(y𝑿T𝜶𝒁T𝜷Kh(U1+𝑼2T𝜼(k))),subscript𝑛subscript𝐿𝜏𝑦superscript𝑿T𝜶superscript𝒁T𝜷subscript𝐾subscript𝑈1superscriptsubscript𝑼2Tsuperscript𝜼𝑘\displaystyle\mathbb{P}_{n}L_{\tau}(y-{\boldsymbol{X}}^{\rm T}\boldsymbol{% \alpha}-{\boldsymbol{Z}}^{\rm T}\boldsymbol{\beta}K_{h}(U_{1}+{\boldsymbol{U}}% _{2}^{\rm T}\boldsymbol{\eta}^{(k)})),blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y - bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α - bold_italic_Z start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_β italic_K start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_η start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) ) ,

and for given 𝜶(k+1)superscript𝜶𝑘1\boldsymbol{\alpha}^{(k+1)}bold_italic_α start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT and 𝜷(k+1)superscript𝜷𝑘1\boldsymbol{\beta}^{(k+1)}bold_italic_β start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT, 𝜸(k+1)superscript𝜸𝑘1\boldsymbol{\gamma}^{(k+1)}bold_italic_γ start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT is estimated by minimizing

nLτ(y𝑿T𝜶(k+1)𝒁T𝜷(k+1)Kh(U1+𝑼2T𝜼)).subscript𝑛subscript𝐿𝜏𝑦superscript𝑿Tsuperscript𝜶𝑘1superscript𝒁Tsuperscript𝜷𝑘1subscript𝐾subscript𝑈1superscriptsubscript𝑼2T𝜼\displaystyle\mathbb{P}_{n}L_{\tau}(y-{\boldsymbol{X}}^{\rm T}\boldsymbol{% \alpha}^{(k+1)}-{\boldsymbol{Z}}^{\rm T}\boldsymbol{\beta}^{(k+1)}K_{h}(U_{1}+% {\boldsymbol{U}}_{2}^{\rm T}\boldsymbol{\eta})).blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y - bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT - bold_italic_Z start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_η ) ) .

Iterating these two maximizers leads to the desired robust estimator. The above alternating strategy is summarized in Algorithm A in Appendix A of the Supplementary Material.

2.2 Nonasymptotic properties

This section begins with assumptions needed to establish the nonasymptotic properties. Let 𝑽¯¯𝑽{\bar{\boldsymbol{V}}}over¯ start_ARG bold_italic_V end_ARG be 𝑽𝑽{\boldsymbol{V}}bold_italic_V by removing U1subscript𝑈1U_{1}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, i.e., 𝑽¯=(y,𝑿,𝒁,𝑼2)){\bar{\boldsymbol{V}}}=(y,{\boldsymbol{X}},{\boldsymbol{Z}},{\boldsymbol{U}}_{% 2}))over¯ start_ARG bold_italic_V end_ARG = ( italic_y , bold_italic_X , bold_italic_Z , bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ), and ω¯(𝜼)=U1+𝑼2T𝜼¯𝜔𝜼subscript𝑈1superscriptsubscript𝑼2T𝜼\bar{\omega}(\boldsymbol{\eta})=U_{1}+{\boldsymbol{U}}_{2}^{\rm T}\boldsymbol{\eta}over¯ start_ARG italic_ω end_ARG ( bold_italic_η ) = italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_η and ω¯=ω¯(𝜼)¯𝜔¯𝜔superscript𝜼\bar{\omega}=\bar{\omega}(\boldsymbol{\eta}^{*})over¯ start_ARG italic_ω end_ARG = over¯ start_ARG italic_ω end_ARG ( bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ).

  1. (A1)

    The conditional random vectors E(𝑿|𝑼2)pEconditional𝑿subscript𝑼2superscript𝑝{\mathrm{E}}({\boldsymbol{X}}|{\boldsymbol{U}}_{2})\in\mathbb{R}^{p}roman_E ( bold_italic_X | bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT and E(𝒁|𝑼2)qEconditional𝒁subscript𝑼2superscript𝑞{\mathrm{E}}({\boldsymbol{Z}}|{\boldsymbol{U}}_{2})\in\mathbb{R}^{q}roman_E ( bold_italic_Z | bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT given 𝑼2subscript𝑼2{\boldsymbol{U}}_{2}bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are sub-Gaussian, and 𝑼2subscript𝑼2{\boldsymbol{U}}_{2}bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is sub-Gaussian. There is a universal constant K1>0subscript𝐾10K_{1}>0italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0 satisfying E(𝑿|𝑼2)ψ2K1\|{\mathrm{E}}({\boldsymbol{X}}|{\boldsymbol{U}}_{2})\|_{\psi_{2}}\leq K_{1}∥ roman_E ( bold_italic_X | bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, E(𝒁|𝑼2)ψ2K1\|{\mathrm{E}}({\boldsymbol{Z}}|{\boldsymbol{U}}_{2})\|_{\psi_{2}}\leq K_{1}∥ roman_E ( bold_italic_Z | bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and 𝑼2ψ2K1subscriptnormsubscript𝑼2subscript𝜓2subscript𝐾1\|{\boldsymbol{U}}_{2}\|_{\psi_{2}}\leq K_{1}∥ bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. For any 𝒖2r1subscript𝒖2superscript𝑟1{\boldsymbol{u}}_{2}\in\mathbb{R}^{r-1}bold_italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_r - 1 end_POSTSUPERSCRIPT, the matrices E(𝑿𝑿T|𝑼2=𝒖2)Econditional𝑿superscript𝑿Tsubscript𝑼2subscript𝒖2{\mathrm{E}}({\boldsymbol{X}}{\boldsymbol{X}}^{\rm T}|{\boldsymbol{U}}_{2}={% \boldsymbol{u}}_{2})roman_E ( bold_italic_X bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT | bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = bold_italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), E(𝑿𝑿T|𝑼2=𝒖2)Econditional𝑿superscript𝑿Tsubscript𝑼2subscript𝒖2{\mathrm{E}}({\boldsymbol{X}}{\boldsymbol{X}}^{\rm T}|{\boldsymbol{U}}_{2}={% \boldsymbol{u}}_{2})roman_E ( bold_italic_X bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT | bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = bold_italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), and E(𝑼2𝑼2T)Esubscript𝑼2superscriptsubscript𝑼2T{\mathrm{E}}({\boldsymbol{U}}_{2}{\boldsymbol{U}}_{2}^{\rm T})roman_E ( bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) are uniformly positive definite, and there is a universal constant K0>0subscript𝐾00K_{0}>0italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0 satisfying λmin(E(𝑿𝑿T|𝑼2=𝒖2))K0subscript𝜆Econditional𝑿superscript𝑿Tsubscript𝑼2subscript𝒖2subscript𝐾0\lambda_{\min}({\mathrm{E}}({\boldsymbol{X}}{\boldsymbol{X}}^{\rm T}|{% \boldsymbol{U}}_{2}={\boldsymbol{u}}_{2}))\geq K_{0}italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( roman_E ( bold_italic_X bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT | bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = bold_italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) ≥ italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, λmin(E(𝒁𝒁T|𝑼2=𝒖2))K0subscript𝜆Econditional𝒁superscript𝒁Tsubscript𝑼2subscript𝒖2subscript𝐾0\lambda_{\min}({\mathrm{E}}({\boldsymbol{Z}}{\boldsymbol{Z}}^{\rm T}|{% \boldsymbol{U}}_{2}={\boldsymbol{u}}_{2}))\geq K_{0}italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( roman_E ( bold_italic_Z bold_italic_Z start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT | bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = bold_italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) ≥ italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and λmin(E(𝑼2𝑼2T))K0subscript𝜆Esubscript𝑼2superscriptsubscript𝑼2Tsubscript𝐾0\lambda_{\min}({\mathrm{E}}({\boldsymbol{U}}_{2}{\boldsymbol{U}}_{2}^{\rm T}))% \geq K_{0}italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( roman_E ( bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) ) ≥ italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

  2. (A2)

    The error variable ϵitalic-ϵ\epsilonitalic_ϵ is independent of (𝑿T,𝒁T,𝑼T)Tsuperscriptsuperscript𝑿Tsuperscript𝒁Tsuperscript𝑼TT({\boldsymbol{X}}^{\rm T},{\boldsymbol{Z}}^{\rm T},{\boldsymbol{U}}^{\rm T})^{% \rm T}( bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , bold_italic_Z start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , bold_italic_U start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT and satisfies E(ϵ)=0Eitalic-ϵ0{\mathrm{E}}(\epsilon)=0roman_E ( italic_ϵ ) = 0 and E(|ϵ|2+δ)=Mδ<Esuperscriptitalic-ϵ2𝛿subscript𝑀𝛿{\mathrm{E}}(|\epsilon|^{2+\delta})=M_{\delta}<\inftyroman_E ( | italic_ϵ | start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT ) = italic_M start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT < ∞ with δ0𝛿0\delta\geq 0italic_δ ≥ 0. Denote M0=E(|ϵ|2)subscript𝑀0Esuperscriptitalic-ϵ2M_{0}={\mathrm{E}}(|\epsilon|^{2})italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_E ( | italic_ϵ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).

  3. (A3)

    0<E[𝟏(𝑼T𝜸0)]<10Edelimited-[]1superscript𝑼T𝜸010<{\mathrm{E}}[{\boldsymbol{1}}({\boldsymbol{U}}^{\rm T}\boldsymbol{\gamma}% \geq 0)]<10 < roman_E [ bold_1 ( bold_italic_U start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ ≥ 0 ) ] < 1 for any 𝜸Θγ𝜸subscriptΘ𝛾\boldsymbol{\gamma}\in\Theta_{\gamma}bold_italic_γ ∈ roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT, and there is a constant δu2>0subscript𝛿subscript𝑢20\delta_{u_{2}}>0italic_δ start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT > 0 satisfying sup𝜼𝕊r1E[𝑼2T𝜼𝟏(𝑼2T𝜼0)]δu2subscriptsupremum𝜼superscript𝕊𝑟1Edelimited-[]superscriptsubscript𝑼2T𝜼1superscriptsubscript𝑼2T𝜼0subscript𝛿subscript𝑢2\sup_{\boldsymbol{\eta}\in\mathbb{S}^{r-1}}{\mathrm{E}}[{\boldsymbol{U}}_{2}^{% \rm T}\boldsymbol{\eta}{\boldsymbol{1}}({\boldsymbol{U}}_{2}^{\rm T}% \boldsymbol{\eta}\geq 0)]\geq\delta_{u_{2}}roman_sup start_POSTSUBSCRIPT bold_italic_η ∈ blackboard_S start_POSTSUPERSCRIPT italic_r - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_E [ bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_η bold_1 ( bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_η ≥ 0 ) ] ≥ italic_δ start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

  4. (A4)

    For almost every 𝒖2subscript𝒖2{\boldsymbol{u}}_{2}bold_italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, the density of U1subscript𝑈1U_{1}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT conditional on 𝑼2=𝒖2subscript𝑼2subscript𝒖2{\boldsymbol{U}}_{2}={\boldsymbol{u}}_{2}bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = bold_italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is everywhere positive. The conditional density fϖ|𝒗¯(ϖ)subscript𝑓conditionalitalic-ϖ¯𝒗italic-ϖf_{\varpi|\bar{{\boldsymbol{v}}}}(\varpi)italic_f start_POSTSUBSCRIPT italic_ϖ | over¯ start_ARG bold_italic_v end_ARG end_POSTSUBSCRIPT ( italic_ϖ ) of ω¯¯𝜔\bar{\omega}over¯ start_ARG italic_ω end_ARG given 𝑽¯¯𝑽{\bar{\boldsymbol{V}}}over¯ start_ARG bold_italic_V end_ARG has continuous derivative, and there is a constant κf>0subscript𝜅𝑓0\kappa_{f}>0italic_κ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT > 0 such that fϖ|𝒗¯(ϖ)subscript𝑓conditionalitalic-ϖ¯𝒗italic-ϖf_{\varpi|\bar{{\boldsymbol{v}}}}(\varpi)italic_f start_POSTSUBSCRIPT italic_ϖ | over¯ start_ARG bold_italic_v end_ARG end_POSTSUBSCRIPT ( italic_ϖ ) and |fϖ|𝒗¯(ϖ)|superscriptsubscript𝑓conditionalitalic-ϖ¯𝒗italic-ϖ|f_{\varpi|\bar{{\boldsymbol{v}}}}^{\prime}(\varpi)|| italic_f start_POSTSUBSCRIPT italic_ϖ | over¯ start_ARG bold_italic_v end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ϖ ) | are uniformly bounded from above by κfsubscript𝜅𝑓\kappa_{f}italic_κ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT over (ϖ,𝒗¯)italic-ϖ¯𝒗(\varpi,\bar{{\boldsymbol{v}}})( italic_ϖ , over¯ start_ARG bold_italic_v end_ARG ). fϖ|𝒗¯(0)subscript𝑓conditionalitalic-ϖ¯𝒗0f_{\varpi|\bar{{\boldsymbol{v}}}}(0)italic_f start_POSTSUBSCRIPT italic_ϖ | over¯ start_ARG bold_italic_v end_ARG end_POSTSUBSCRIPT ( 0 ) is uniformly bounded from below by δf0subscript𝛿subscript𝑓0\delta_{f_{0}}italic_δ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT over 𝒗¯¯𝒗\bar{{\boldsymbol{v}}}over¯ start_ARG bold_italic_v end_ARG, and Fϖ|𝒗¯(0)subscript𝐹conditionalitalic-ϖ¯𝒗0F_{\varpi|\bar{{\boldsymbol{v}}}}(0)italic_F start_POSTSUBSCRIPT italic_ϖ | over¯ start_ARG bold_italic_v end_ARG end_POSTSUBSCRIPT ( 0 ) is uniformly bounded from above by κFsubscript𝜅𝐹\kappa_{F}italic_κ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT over 𝒗¯¯𝒗\bar{{\boldsymbol{v}}}over¯ start_ARG bold_italic_v end_ARG, where δf0>0subscript𝛿subscript𝑓00\delta_{f_{0}}>0italic_δ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT > 0 and 0<κF<10subscript𝜅𝐹10<\kappa_{F}<10 < italic_κ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT < 1 are constants.

  5. (A5)

    The smooth function K()𝐾K(\cdot)italic_K ( ⋅ ) is twice differentiable and K(t)=1K(t)𝐾𝑡1𝐾𝑡K(-t)=1-K(t)italic_K ( - italic_t ) = 1 - italic_K ( italic_t ). K()superscript𝐾K^{\prime}(\cdot)italic_K start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( ⋅ ) is symmetric around zero. Moveover, there is a universal constant κk>0subscript𝜅𝑘0\kappa_{k}>0italic_κ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0 satisfying max{supt|K(t)|,supt|K′′(t)|,|K(t)|j𝑑t,|K′′(t)|j𝑑t,|t||K(t)|j𝑑t}κksubscriptsupremum𝑡superscript𝐾𝑡subscriptsupremum𝑡superscript𝐾′′𝑡superscriptsuperscript𝐾𝑡𝑗differential-d𝑡superscriptsuperscript𝐾′′𝑡𝑗differential-d𝑡𝑡superscriptsuperscript𝐾𝑡𝑗differential-d𝑡subscript𝜅𝑘\max\left\{\sup_{t\in\mathbb{R}}|K^{\prime}(t)|,\sup_{t\in\mathbb{R}}|K^{% \prime\prime}(t)|,\int|K^{\prime}(t)|^{j}dt,\int|K^{\prime\prime}(t)|^{j}dt,% \int|t||K^{\prime}(t)|^{j}dt\right\}\leq\kappa_{k}roman_max { roman_sup start_POSTSUBSCRIPT italic_t ∈ blackboard_R end_POSTSUBSCRIPT | italic_K start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t ) | , roman_sup start_POSTSUBSCRIPT italic_t ∈ blackboard_R end_POSTSUBSCRIPT | italic_K start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_t ) | , ∫ | italic_K start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t ) | start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_d italic_t , ∫ | italic_K start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_t ) | start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_d italic_t , ∫ | italic_t | | italic_K start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t ) | start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_d italic_t } ≤ italic_κ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT with j=1,2𝑗12j=1,2italic_j = 1 , 2.

Remark 1.

Assumptions (A1)–(A5) are mild conditions for deriving the nonasymptotic bounds in Theorems 14 below. Assumption (A1) is the moment condition for covariates. Assumption (A2) is imposed to control the moment of error and to yield the adaptive nonasymptotic upper bounds; see Fan et al., 2017b ; Wang et al., (2021); Han et al., (2022). Assumption (A3) is mild and easily verified in practice, and it is usually imposed in change-plane analysis; see Kang et al., (2017); Liu et al., (2024). Assumption (A4) is required to establish nonasymptotic properties in dealing with the indicator function; see Horowitz, (1993); Zhang et al., (2022). By Lemmas C1–C3 in Appendix C of the Supplementary Material, Assumption (A5) holds for commonly used smoothing functions such as (i) the cumulative distribution function of standard normal distribution K(u)=Φ(u)𝐾𝑢Φ𝑢K(u)=\Phi(u)italic_K ( italic_u ) = roman_Φ ( italic_u ), (ii) the sigmoid function K(u)={1+exp(u)}1𝐾𝑢superscript1𝑢1K(u)=\{1+\exp(-u)\}^{-1}italic_K ( italic_u ) = { 1 + roman_exp ( - italic_u ) } start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, and (iii) the function K(u)=Φ(u)+uϕ(u)𝐾𝑢Φ𝑢𝑢italic-ϕ𝑢K(u)=\Phi(u)+u\phi(u)italic_K ( italic_u ) = roman_Φ ( italic_u ) + italic_u italic_ϕ ( italic_u ); see Horowitz, (1993); Zhang et al., (2022).

Theorem 1.

Let 𝛉=(𝛂T,𝛃T)T𝛉superscriptsuperscript𝛂Tsuperscript𝛃TT\boldsymbol{\theta}=(\boldsymbol{\alpha}^{\rm T},\boldsymbol{\beta}^{\rm T})^{% \rm T}bold_italic_θ = ( bold_italic_α start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , bold_italic_β start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT. If Assumptions (A1)–(A4) hold, then for some δ0𝛿0\delta\geq 0italic_δ ≥ 0, the minimizer 𝛇τ=((𝛉τ)T,(𝛈τ)T)Tsuperscriptsubscript𝛇𝜏superscriptsuperscriptsuperscriptsubscript𝛉𝜏Tsuperscriptsuperscriptsubscript𝛈𝜏TT\boldsymbol{\zeta}_{\tau}^{*}=((\boldsymbol{\theta}_{\tau}^{*})^{\rm T},(% \boldsymbol{\eta}_{\tau}^{*})^{\rm T})^{\rm T}bold_italic_ζ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( ( bold_italic_θ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , ( bold_italic_η start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT given in (9) satisfies

𝜽τ𝜽5K12Mδ+C𝜽2+δK12+3δ/2K0(1δF)(1+δ)τ1+δnormsuperscriptsubscript𝜽𝜏superscript𝜽5subscript𝐾12subscript𝑀𝛿𝐶superscriptnormsuperscript𝜽2𝛿superscriptsubscript𝐾123𝛿2subscript𝐾01subscript𝛿𝐹1𝛿superscript𝜏1𝛿\displaystyle\|\boldsymbol{\theta}_{\tau}^{*}-\boldsymbol{\theta}^{*}\|\leq 5K% _{1}\frac{2M_{\delta}+C\|\boldsymbol{\theta}^{*}\|^{2+\delta}K_{1}^{2+3\delta/% 2}}{K_{0}(1-\delta_{F})(1+\delta)\tau^{1+\delta}}∥ bold_italic_θ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ 5 italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG 2 italic_M start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT + italic_C ∥ bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 + 3 italic_δ / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( 1 - italic_δ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) ( 1 + italic_δ ) italic_τ start_POSTSUPERSCRIPT 1 + italic_δ end_POSTSUPERSCRIPT end_ARG

and

𝜼τ𝜼normsuperscriptsubscript𝜼𝜏superscript𝜼absent\displaystyle\|\boldsymbol{\eta}_{\tau}^{*}-\boldsymbol{\eta}^{*}\|\leq∥ bold_italic_η start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ 160K12K0δf0δu2𝜷2{2Mδ+C𝜽2+δK12+3δ/2K0(1δF)(1+δ)τ1+δ}2,160superscriptsubscript𝐾12subscript𝐾0subscript𝛿subscript𝑓0subscript𝛿subscript𝑢2superscriptnormsuperscript𝜷2superscript2subscript𝑀𝛿𝐶superscriptnormsuperscript𝜽2𝛿superscriptsubscript𝐾123𝛿2subscript𝐾01subscript𝛿𝐹1𝛿superscript𝜏1𝛿2\displaystyle\frac{160K_{1}^{2}}{K_{0}\delta_{f_{0}}\delta_{u_{2}}\|% \boldsymbol{\beta}^{*}\|^{2}}\left\{\frac{2M_{\delta}+C\|\boldsymbol{\theta}^{% *}\|^{2+\delta}K_{1}^{2+3\delta/2}}{K_{0}(1-\delta_{F})(1+\delta)\tau^{1+% \delta}}\right\}^{2},divide start_ARG 160 italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ bold_italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG { divide start_ARG 2 italic_M start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT + italic_C ∥ bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 + 3 italic_δ / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( 1 - italic_δ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) ( 1 + italic_δ ) italic_τ start_POSTSUPERSCRIPT 1 + italic_δ end_POSTSUPERSCRIPT end_ARG } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where C>0𝐶0C>0italic_C > 0 is a constant.

Theorem 1 states that the Huber error is of order τ(1+δ)superscript𝜏1𝛿\tau^{-(1+\delta)}italic_τ start_POSTSUPERSCRIPT - ( 1 + italic_δ ) end_POSTSUPERSCRIPT for 𝜽τ𝜽normsuperscriptsubscript𝜽𝜏superscript𝜽\|\boldsymbol{\theta}_{\tau}^{*}-\boldsymbol{\theta}^{*}\|∥ bold_italic_θ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ and of order τ(2+2δ)superscript𝜏22𝛿\tau^{-(2+2\delta)}italic_τ start_POSTSUPERSCRIPT - ( 2 + 2 italic_δ ) end_POSTSUPERSCRIPT for 𝜼𝜼τnormsuperscript𝜼superscriptsubscript𝜼𝜏\|\boldsymbol{\eta}^{*}-\boldsymbol{\eta}_{\tau}^{*}\|∥ bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_η start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥. As τ𝜏\tauitalic_τ tends to infinity, because the Huber loss becomes the ordinary quadratic loss, the Huber error vanishes as expected. The next theorem studies the smooth error and Huber loss together.

Theorem 2.

If Assumptions (A1)–(A5) hold, then for some δ0𝛿0\delta\geq 0italic_δ ≥ 0 and h=o(1)𝑜1h=o(1)italic_h = italic_o ( 1 ), the minimizer 𝛇τ,h=((𝛉τ,h)T,(𝛈τ,h)T)Tsuperscriptsubscript𝛇𝜏superscriptsuperscriptsuperscriptsubscript𝛉𝜏Tsuperscriptsuperscriptsubscript𝛈𝜏TT\boldsymbol{\zeta}_{\tau,h}^{*}=((\boldsymbol{\theta}_{\tau,h}^{*})^{\rm T},(% \boldsymbol{\eta}_{\tau,h}^{*})^{\rm T})^{\rm T}bold_italic_ζ start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( ( bold_italic_θ start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , ( bold_italic_η start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT given in (8) satisfies

𝜽τ,h𝜽16K1{2Mδ+C𝜽2+δK12+3δ/2K0(1δF)(1+δ)τ1+δ+κfκk𝜷K0(1δF)h}normsuperscriptsubscript𝜽𝜏superscript𝜽16subscript𝐾12subscript𝑀𝛿𝐶superscriptnormsuperscript𝜽2𝛿superscriptsubscript𝐾123𝛿2subscript𝐾01subscript𝛿𝐹1𝛿superscript𝜏1𝛿subscript𝜅𝑓subscript𝜅𝑘normsuperscript𝜷subscript𝐾01subscript𝛿𝐹\displaystyle\|\boldsymbol{\theta}_{\tau,h}^{*}-\boldsymbol{\theta}^{*}\|\leq 1% 6K_{1}\left\{\frac{2M_{\delta}+C\|\boldsymbol{\theta}^{*}\|^{2+\delta}K_{1}^{2% +3\delta/2}}{K_{0}(1-\delta_{F})(1+\delta)\tau^{1+\delta}}+\frac{\kappa_{f}% \kappa_{k}\|\boldsymbol{\beta}^{*}\|}{K_{0}(1-\delta_{F})}h\right\}∥ bold_italic_θ start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ 16 italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT { divide start_ARG 2 italic_M start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT + italic_C ∥ bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 + 3 italic_δ / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( 1 - italic_δ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) ( 1 + italic_δ ) italic_τ start_POSTSUPERSCRIPT 1 + italic_δ end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_κ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_κ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ end_ARG start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( 1 - italic_δ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) end_ARG italic_h }

and

𝜼τ,h𝜼642K12κfκkδf0δu2min{𝜷,𝜷2}{2Mδ+C𝜽2+δK12+3δ/2K0(1δF)(1+δ)τ1+δ+κfκk𝜷K0(1δF)h}2,normsuperscriptsubscript𝜼𝜏superscript𝜼superscript642superscriptsubscript𝐾12subscript𝜅𝑓subscript𝜅𝑘subscript𝛿subscript𝑓0subscript𝛿subscript𝑢2normsuperscript𝜷superscriptnormsuperscript𝜷2superscript2subscript𝑀𝛿𝐶superscriptnormsuperscript𝜽2𝛿superscriptsubscript𝐾123𝛿2subscript𝐾01subscript𝛿𝐹1𝛿superscript𝜏1𝛿subscript𝜅𝑓subscript𝜅𝑘normsuperscript𝜷subscript𝐾01subscript𝛿𝐹2\displaystyle\|\boldsymbol{\eta}_{\tau,h}^{*}-\boldsymbol{\eta}^{*}\|\leq\frac% {64^{2}K_{1}^{2}\kappa_{f}\kappa_{k}}{\delta_{f_{0}}\delta_{u_{2}}\min\{\|% \boldsymbol{\beta}^{*}\|,\|\boldsymbol{\beta}^{*}\|^{2}\}}\left\{\frac{2M_{% \delta}+C\|\boldsymbol{\theta}^{*}\|^{2+\delta}K_{1}^{2+3\delta/2}}{K_{0}(1-% \delta_{F})(1+\delta)\tau^{1+\delta}}+\frac{\kappa_{f}\kappa_{k}\|\boldsymbol{% \beta}^{*}\|}{K_{0}(1-\delta_{F})}h\right\}^{2},∥ bold_italic_η start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ divide start_ARG 64 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_κ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_κ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_min { ∥ bold_italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ , ∥ bold_italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } end_ARG { divide start_ARG 2 italic_M start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT + italic_C ∥ bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 + 3 italic_δ / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( 1 - italic_δ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) ( 1 + italic_δ ) italic_τ start_POSTSUPERSCRIPT 1 + italic_δ end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_κ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_κ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ end_ARG start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( 1 - italic_δ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) end_ARG italic_h } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where C>0𝐶0C>0italic_C > 0 is a constant.

Theorem 2 reveals that 𝜽τ,h𝜽normsuperscriptsubscript𝜽𝜏superscript𝜽\|\boldsymbol{\theta}_{\tau,h}^{*}-\boldsymbol{\theta}^{*}\|∥ bold_italic_θ start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ and 𝜼τ,h𝜼normsuperscriptsubscript𝜼𝜏superscript𝜼\|\boldsymbol{\eta}_{\tau,h}^{*}-\boldsymbol{\eta}^{*}\|∥ bold_italic_η start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ are associated with both the Huber error and the smooth error. The smoothness parameter hhitalic_h can be of order τ(1+δ)superscript𝜏1𝛿\tau^{-(1+\delta)}italic_τ start_POSTSUPERSCRIPT - ( 1 + italic_δ ) end_POSTSUPERSCRIPT, and because Kh(t)subscript𝐾𝑡K_{h}(t)italic_K start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_t ) approximates 𝟏(t0)1𝑡0{\boldsymbol{1}}(t\geq 0)bold_1 ( italic_t ≥ 0 ) as hhitalic_h tends to zero, the upper bounds in Theorem 2 are of the same order as those in Theorem 1 when h00h\rightarrow 0italic_h → 0. Thus, the deviations 𝜽τ,h𝜽normsuperscriptsubscript𝜽𝜏superscript𝜽\|\boldsymbol{\theta}_{\tau,h}^{*}-\boldsymbol{\theta}^{*}\|∥ bold_italic_θ start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ and 𝜼τ,h𝜼normsuperscriptsubscript𝜼𝜏superscript𝜼\|\boldsymbol{\eta}_{\tau,h}^{*}-\boldsymbol{\eta}^{*}\|∥ bold_italic_η start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ are sacrifices in pursuit of robustification and smoothness. The next theorem provides the exponential-type deviations for the baseline and grouping difference parameters and grouping parameter.

Theorem 3.

If Assumptions (A1)–(A5) hold, then for some δ𝛿\deltaitalic_δ and any t>0𝑡0t>0italic_t > 0, when h2n/(p+q+r)superscript2𝑛𝑝𝑞𝑟h^{2}n/(p+q+r)\rightarrow\inftyitalic_h start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n / ( italic_p + italic_q + italic_r ) → ∞ and h=o(1)𝑜1h=o(1)italic_h = italic_o ( 1 ), the estimator 𝛇^τ,h=(𝛉^τ,hT,𝛈^τ,hT)Tsubscript^𝛇𝜏superscriptsuperscriptsubscript^𝛉𝜏Tsuperscriptsubscript^𝛈𝜏TT\hat{\boldsymbol{\zeta}}_{\tau,h}=(\hat{\boldsymbol{\theta}}_{\tau,h}^{\rm T},% \hat{\boldsymbol{\eta}}_{\tau,h}^{\rm T})^{\rm T}over^ start_ARG bold_italic_ζ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT = ( over^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT given in (10) satisfies, with probability at least 121exp(t)121𝑡1-21\exp(-t)1 - 21 roman_exp ( - italic_t ),

𝜽^τ,h𝜽32K0(1κF)a(n,τ)delimited-∥∥subscript^𝜽𝜏superscript𝜽32subscript𝐾01subscript𝜅𝐹𝑎𝑛𝜏\displaystyle\begin{split}\|\hat{\boldsymbol{\theta}}_{\tau,h}-\boldsymbol{% \theta}^{*}\|\leq&\frac{32}{K_{0}(1-\kappa_{F})}a(n,\tau)\end{split}start_ROW start_CELL ∥ over^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT - bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ end_CELL start_CELL divide start_ARG 32 end_ARG start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( 1 - italic_κ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) end_ARG italic_a ( italic_n , italic_τ ) end_CELL end_ROW (12)

and

𝜼^τ,h𝜼32K0𝜷δf0δkK0(1κF)h1/2a(n,τ),delimited-∥∥subscript^𝜼𝜏superscript𝜼32subscript𝐾0normsuperscript𝜷subscript𝛿subscript𝑓0subscript𝛿𝑘subscript𝐾01subscript𝜅𝐹superscript12𝑎𝑛𝜏\displaystyle\begin{split}\|\hat{\boldsymbol{\eta}}_{\tau,h}-\boldsymbol{\eta}% ^{*}\|\leq&\frac{32}{K_{0}\|\boldsymbol{\beta}^{*}\|\sqrt{\delta_{f_{0}}\delta% _{k}K_{0}(1-\kappa_{F})}}h^{1/2}a(n,\tau),\end{split}start_ROW start_CELL ∥ over^ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT - bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ end_CELL start_CELL divide start_ARG 32 end_ARG start_ARG italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ bold_italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ square-root start_ARG italic_δ start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( 1 - italic_κ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) end_ARG end_ARG italic_h start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_a ( italic_n , italic_τ ) , end_CELL end_ROW (13)

where ν0subscript𝜈0\nu_{0}italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is a constant depending on only the constants {κF,κf,κk,K1,𝛏}subscript𝜅𝐹subscript𝜅𝑓subscript𝜅𝑘subscript𝐾1normsuperscript𝛏\{\kappa_{F},\kappa_{f},\kappa_{k},K_{1},\|\boldsymbol{\xi}^{*}\|\}{ italic_κ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT , italic_κ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_κ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ∥ bold_italic_ξ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ }, and

a(n,τ)=3ν0(p+q+r+2t)/n+22+δMδ{(3+δ)K1𝜷}2+δτ1+δ.𝑎𝑛𝜏3subscript𝜈0𝑝𝑞𝑟2𝑡𝑛superscript22𝛿subscript𝑀𝛿superscript3𝛿subscript𝐾1normsuperscript𝜷2𝛿superscript𝜏1𝛿\displaystyle a(n,\tau)=\sqrt{3\nu_{0}(p+q+r+2t)/n}+\frac{2^{2+\delta}M_{% \delta}\{(3+\delta)K_{1}\|\boldsymbol{\beta}^{*}\|\}^{2+\delta}}{\tau^{1+% \delta}}.italic_a ( italic_n , italic_τ ) = square-root start_ARG 3 italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_p + italic_q + italic_r + 2 italic_t ) / italic_n end_ARG + divide start_ARG 2 start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT { ( 3 + italic_δ ) italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ bold_italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ } start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT end_ARG start_ARG italic_τ start_POSTSUPERSCRIPT 1 + italic_δ end_POSTSUPERSCRIPT end_ARG .

With appropriate choice of τ𝜏\tauitalic_τ with δ=0𝛿0\delta=0italic_δ = 0, such as τ=O((n/t)1/2)𝜏𝑂superscript𝑛𝑡12\tau=O((n/t)^{1/2})italic_τ = italic_O ( ( italic_n / italic_t ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ) with t=log(n)𝑡𝑛t=\log(n)italic_t = roman_log ( italic_n ), the nonasymptotic property of the sub-Gaussian estimator 𝜽^τ,hsubscript^𝜽𝜏\hat{\boldsymbol{\theta}}_{\tau,h}over^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT in Theorem 3 demonstrates that the deviation 𝜽^τ,h𝜽normsubscript^𝜽𝜏superscript𝜽\|\hat{\boldsymbol{\theta}}_{\tau,h}-\boldsymbol{\theta}^{*}\|∥ over^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT - bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ adapts to the sample size, dimension, robustification parameter τ𝜏\tauitalic_τ, and moments in pursuit of the optimal tradeoff between bias and robustness, and the deviation 𝜼^τ,h𝜼normsubscript^𝜼𝜏superscript𝜼\|\hat{\boldsymbol{\eta}}_{\tau,h}-\boldsymbol{\eta}^{*}\|∥ over^ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT - bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ adapts to the extra smooth parameter hhitalic_h to achieve smoothness. Adaptation to the robustification parameter τ𝜏\tauitalic_τ is caused by pursuing the robustness for linear regression with heavy-tailed errors. The deviations 𝜽^τ,h𝜽normsubscript^𝜽𝜏superscript𝜽\|\hat{\boldsymbol{\theta}}_{\tau,h}-\boldsymbol{\theta}^{*}\|∥ over^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT - bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ and 𝜼^τ,h𝜼normsubscript^𝜼𝜏superscript𝜼\|\hat{\boldsymbol{\eta}}_{\tau,h}-\boldsymbol{\eta}^{*}\|∥ over^ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT - bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ coincide with the smoothed OLS estimator when τ𝜏\tau\rightarrow\inftyitalic_τ → ∞, because a(n,τ)=3ν0(p+q+r+2t)/n𝑎𝑛𝜏3subscript𝜈0𝑝𝑞𝑟2𝑡𝑛a(n,\tau)=\sqrt{3\nu_{0}(p+q+r+2t)/n}italic_a ( italic_n , italic_τ ) = square-root start_ARG 3 italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_p + italic_q + italic_r + 2 italic_t ) / italic_n end_ARG as τ𝜏\tau\rightarrow\inftyitalic_τ → ∞. The next theorem derives the nonasymptotic Bahadur representation of the robust estimators given in (10).

Theorem 4.

Let 𝐖(𝛈)=(𝐗T,𝐙T𝟏(ω¯(𝛈)0))T𝐖𝛈superscriptsuperscript𝐗Tsuperscript𝐙T1¯𝜔𝛈0T{\boldsymbol{W}}(\boldsymbol{\eta})=({\boldsymbol{X}}^{\rm T},{\boldsymbol{Z}}% ^{\rm T}{\boldsymbol{1}}(\bar{\omega}(\boldsymbol{\eta})\geq 0))^{\rm T}bold_italic_W ( bold_italic_η ) = ( bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , bold_italic_Z start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_1 ( over¯ start_ARG italic_ω end_ARG ( bold_italic_η ) ≥ 0 ) ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT and 𝐖~h(𝛈)=(𝐗T,𝐙TKh(ω¯(𝛈))T{\widetilde{\boldsymbol{W}}}_{h}(\boldsymbol{\eta})=({\boldsymbol{X}}^{\rm T},% {\boldsymbol{Z}}^{\rm T}K_{h}(\bar{\omega}(\boldsymbol{\eta}))^{\rm T}over~ start_ARG bold_italic_W end_ARG start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( bold_italic_η ) = ( bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , bold_italic_Z start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( over¯ start_ARG italic_ω end_ARG ( bold_italic_η ) ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT, where ω¯(𝛈)=U1+𝐔2Tβ¯𝜔𝛈subscript𝑈1superscriptsubscript𝐔2T𝛽\bar{\omega}(\boldsymbol{\eta})=U_{1}+{\boldsymbol{U}}_{2}^{\rm T}\betaover¯ start_ARG italic_ω end_ARG ( bold_italic_η ) = italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_β. If the assumptions in Theorem 3 are satisfied, then with probability at least 132exp(t)132𝑡1-32\exp(-t)1 - 32 roman_exp ( - italic_t ) we have

ΣW1/2(𝜽^τ,h𝜽)ΣW1/2n{𝑾~h(𝜼)ψτ(y𝑾~h(𝜼)T𝜽)}{ν1(p+q+1+2t)/n+ν2h1/2}a(n,τ)delimited-∥∥superscriptsubscriptΣ𝑊12subscript^𝜽𝜏superscript𝜽superscriptsubscriptΣ𝑊12subscript𝑛subscript~𝑾superscript𝜼subscript𝜓𝜏𝑦subscript~𝑾superscriptsuperscript𝜼Tsuperscript𝜽subscript𝜈1𝑝𝑞12𝑡𝑛subscript𝜈2superscript12𝑎𝑛𝜏\displaystyle\begin{split}\bigg{\|}\Sigma_{W}^{1/2}(\hat{\boldsymbol{\theta}}_% {\tau,h}-\boldsymbol{\theta}^{*})-&\Sigma_{W}^{-1/2}\mathbb{P}_{n}\left\{{% \widetilde{\boldsymbol{W}}}_{h}(\boldsymbol{\eta}^{*})\psi_{\tau}(y-{% \widetilde{\boldsymbol{W}}}_{h}(\boldsymbol{\eta}^{*})^{\rm T}\boldsymbol{% \theta}^{*})\right\}\bigg{\|}\\ \leq&\left\{\nu_{1}\sqrt{(p+q+1+2t)/n}+\nu_{2}h^{1/2}\right\}a(n,\tau)\end{split}start_ROW start_CELL ∥ roman_Σ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT - bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - end_CELL start_CELL roman_Σ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT { over~ start_ARG bold_italic_W end_ARG start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y - over~ start_ARG bold_italic_W end_ARG start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) } ∥ end_CELL end_ROW start_ROW start_CELL ≤ end_CELL start_CELL { italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG ( italic_p + italic_q + 1 + 2 italic_t ) / italic_n end_ARG + italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_h start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT } italic_a ( italic_n , italic_τ ) end_CELL end_ROW (14)

and

ΣU21/2(𝜼^τ,h𝜼)hΣU21/2n{𝑼2𝒁T𝜷Kh(U1+𝑼2T𝜼)ψτ(y𝑾~h(𝜼)T𝜽)}{ν3(r+2t)/(nh)+ν4(h+τ(1+δ))}h1/2a(n,τ),delimited-∥∥superscriptsubscriptΣsubscript𝑈212subscript^𝜼𝜏superscript𝜼superscriptsubscriptΣsubscript𝑈212subscript𝑛subscript𝑼2superscript𝒁Tsuperscript𝜷subscript𝐾subscript𝑈1superscriptsubscript𝑼2Tsuperscript𝜼subscript𝜓𝜏𝑦subscript~𝑾superscriptsuperscript𝜼Tsuperscript𝜽subscript𝜈3𝑟2𝑡𝑛subscript𝜈4superscript𝜏1𝛿superscript12𝑎𝑛𝜏\displaystyle\begin{split}\bigg{\|}\Sigma_{U_{2}}^{1/2}(\hat{\boldsymbol{\eta}% }_{\tau,h}-\boldsymbol{\eta}^{*})-&h\Sigma_{U_{2}}^{-1/2}\mathbb{P}_{n}\left\{% {\boldsymbol{U}}_{2}{\boldsymbol{Z}}^{\rm T}\boldsymbol{\beta}^{*}K_{h}(U_{1}+% {\boldsymbol{U}}_{2}^{\rm T}\boldsymbol{\eta}^{*})\psi_{\tau}(y-{\widetilde{% \boldsymbol{W}}}_{h}(\boldsymbol{\eta}^{*})^{\rm T}\boldsymbol{\theta}^{*})% \right\}\bigg{\|}\\ \leq&\left\{\nu_{3}\sqrt{(r+2t)/(nh)}+\nu_{4}\left(h+\tau^{-(1+\delta)}\right)% \right\}h^{1/2}a(n,\tau),\end{split}start_ROW start_CELL ∥ roman_Σ start_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT - bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - end_CELL start_CELL italic_h roman_Σ start_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT { bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_italic_Z start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y - over~ start_ARG bold_italic_W end_ARG start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) } ∥ end_CELL end_ROW start_ROW start_CELL ≤ end_CELL start_CELL { italic_ν start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT square-root start_ARG ( italic_r + 2 italic_t ) / ( italic_n italic_h ) end_ARG + italic_ν start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_h + italic_τ start_POSTSUPERSCRIPT - ( 1 + italic_δ ) end_POSTSUPERSCRIPT ) } italic_h start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_a ( italic_n , italic_τ ) , end_CELL end_ROW (15)

where a(n,τ)𝑎𝑛𝜏a(n,\tau)italic_a ( italic_n , italic_τ ) is as defined in Theorem 3, ν1>0subscript𝜈10\nu_{1}>0italic_ν start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0, ν2>0subscript𝜈20\nu_{2}>0italic_ν start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0, ν3>0subscript𝜈30\nu_{3}>0italic_ν start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT > 0, and ν4>0subscript𝜈40\nu_{4}>0italic_ν start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT > 0 are constants depending on only the constants {κF,κf,κk,K0,K1,𝛏}subscript𝜅𝐹subscript𝜅𝑓subscript𝜅𝑘subscript𝐾0subscript𝐾1normsuperscript𝛏\{\kappa_{F},\kappa_{f},\kappa_{k},K_{0},K_{1},\|\boldsymbol{\xi}^{*}\|\}{ italic_κ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT , italic_κ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_κ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ∥ bold_italic_ξ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ }, and

ΣW=𝐏{𝑾(𝜼)𝑾(𝜼)T},ΣU2=K(0)𝐏{fω¯|𝒗¯(0)(𝒁T𝜷)2𝑼2𝑼2T}.formulae-sequencesubscriptΣ𝑊𝐏𝑾superscript𝜼𝑾superscriptsuperscript𝜼TsubscriptΣsubscript𝑈2superscript𝐾0𝐏subscript𝑓conditional¯𝜔¯𝒗0superscriptsuperscript𝒁Tsuperscript𝜷2subscript𝑼2superscriptsubscript𝑼2T\displaystyle\Sigma_{W}=\boldsymbol{\mathrm{P}}\left\{{\boldsymbol{W}}(% \boldsymbol{\eta}^{*}){\boldsymbol{W}}(\boldsymbol{\eta}^{*})^{\rm T}\right\},% \quad\Sigma_{U_{2}}=K^{\prime}(0)\boldsymbol{\mathrm{P}}\left\{f_{\bar{\omega}% |{\bar{\boldsymbol{v}}}}(0)({\boldsymbol{Z}}^{\rm T}\boldsymbol{\beta}^{*})^{2% }{\boldsymbol{U}}_{2}{\boldsymbol{U}}_{2}^{\rm T}\right\}.roman_Σ start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = bold_P { bold_italic_W ( bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) bold_italic_W ( bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT } , roman_Σ start_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_K start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 0 ) bold_P { italic_f start_POSTSUBSCRIPT over¯ start_ARG italic_ω end_ARG | over¯ start_ARG bold_italic_v end_ARG end_POSTSUBSCRIPT ( 0 ) ( bold_italic_Z start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT } .

Theorem 4 shows that the remainder of the Bahadur representation of 𝜽^τ,h𝜽normsubscript^𝜽𝜏superscript𝜽\|\hat{\boldsymbol{\theta}}_{\tau,h}-\boldsymbol{\theta}^{*}\|∥ over^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT - bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ achieves the rate h1/2a(n,τ)superscript12𝑎𝑛𝜏h^{1/2}a(n,\tau)italic_h start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_a ( italic_n , italic_τ ), which is the same as that for 𝜼^τ,h𝜼normsubscript^𝜼𝜏superscript𝜼\|\hat{\boldsymbol{\eta}}_{\tau,h}-\boldsymbol{\eta}^{*}\|∥ over^ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT - bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥. Because of the rate restriction h2n/(p+q+r)superscript2𝑛𝑝𝑞𝑟h^{2}n/(p+q+r)\rightarrow\inftyitalic_h start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n / ( italic_p + italic_q + italic_r ) → ∞ and h=o(1)𝑜1h=o(1)italic_h = italic_o ( 1 ), the remainder of the Bahadur representation in inequality (15) does not exhibit subexponential behavior as considered by Sun et al., (2020); Chen and Zhou, (2020). This reason is that there is a change plane involved a smooth function in model (1). To the best of the authors’ knowledge, this is the first time that this type of nonasymptotic Bahadur representation has been reported in the literature, especially for the robust estimator 𝜼^τ,hsubscript^𝜼𝜏\hat{\boldsymbol{\eta}}_{\tau,h}over^ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT of the grouping parameter, with previous studies reporting only polynomial-type deviation bounds; see Liu et al., (2024); Zhang et al., (2022). It is convenient to derive the classical asymptotic results from the Bahadur representation, and Theorems 3 and 4 show that the robustification parameter τ𝜏\tauitalic_τ and the smoothness parameter hhitalic_h play the same role as bandwidth in constructing classical nonparametric estimators.

2.3 Implementation

The theoretical properties in Section 2.2 guarantee that the robust estimation performs well with appropriate choices of the robustification parameter τ𝜏\tauitalic_τ and the smoothness parameter hhitalic_h. Because the robustification parameter τ𝜏\tauitalic_τ is treated as a tuning parameter to balance bias and robustness, it is natural to consider using the cross-validation (CV) method to select an appropriate τ𝜏\tauitalic_τ in practice. However, as noted by Chen and Zhou, (2020) and Wang et al., (2021), because Mδ=E(|ϵ|2+δ)subscript𝑀𝛿Esuperscriptitalic-ϵ2𝛿M_{\delta}=\textrm{E}(|\epsilon|^{2+\delta})italic_M start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT = E ( | italic_ϵ | start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT ) is typically unknown in practice, its empirical OLS estimator M^δ=(npq)1i=1n(yi𝑿iT𝜶^τ,h𝒁iT𝜷^τ,h𝟏(𝑼iTγ^τ,h0))2+δsubscript^𝑀𝛿superscript𝑛𝑝𝑞1superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖superscriptsubscript𝑿𝑖Tsubscript^𝜶𝜏superscriptsubscript𝒁𝑖Tsubscript^𝜷𝜏1superscriptsubscript𝑼𝑖Tsubscript^𝛾𝜏02𝛿\hat{M}_{\delta}=(n-p-q)^{-1}\sum_{i=1}^{n}(y_{i}-{\boldsymbol{X}}_{i}^{\rm T}% \hat{\boldsymbol{\alpha}}_{\tau,h}-{\boldsymbol{Z}}_{i}^{\rm T}\hat{% \boldsymbol{\beta}}_{\tau,h}{\boldsymbol{1}}({\boldsymbol{U}}_{i}^{\rm T}\hat{% \gamma}_{\tau,h}\geq 0))^{2+\delta}over^ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT = ( italic_n - italic_p - italic_q ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_α end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT - bold_italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT bold_1 ( bold_italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT over^ start_ARG italic_γ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT ≥ 0 ) ) start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT is poor when the errors are heavy-tailed. Instead, there are two good alternatives (Chen and Zhou, 2020): (i) an adaptive technique based on Lepski’s method Lepskii, (1992) and (ii) a Huber-type method by solving a so-called censored equation Hahn et al., (1990); see Chen and Zhou, (2020); Wang et al., (2021) for details. For the smoothness parameter hhitalic_h, the CV method is always a natural choice for selecting an appropriate one. Alternatively, from the theoretical conditions on the smoothness parameter hhitalic_h, a rule of thumb hn=chσ^ulog(n)/nsubscript𝑛subscript𝑐subscript^𝜎𝑢𝑛𝑛h_{n}=c_{h}\hat{\sigma}_{u}\log(n)/\sqrt{n}italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT roman_log ( italic_n ) / square-root start_ARG italic_n end_ARG suggested by Seo and Linton, (2007), Li et al., (2021), and Zhang et al., (2022) is recommended by considering the computation reduction, where chsubscript𝑐c_{h}italic_c start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT is a constant and σ^u=(nr)1i=1n(U1i+𝑼2iT𝜼^ols)2subscript^𝜎𝑢superscript𝑛𝑟1superscriptsubscript𝑖1𝑛superscriptsubscript𝑈1𝑖superscriptsubscript𝑼2𝑖Tsuperscript^𝜼𝑜𝑙𝑠2\hat{\sigma}_{u}=\sqrt{(n-r)^{-1}\sum_{i=1}^{n}(U_{1i}+{\boldsymbol{U}}_{2i}^{% \rm T}\hat{\boldsymbol{\eta}}^{ols})^{2}}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = square-root start_ARG ( italic_n - italic_r ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_U start_POSTSUBSCRIPT 1 italic_i end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_η end_ARG start_POSTSUPERSCRIPT italic_o italic_l italic_s end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG is the estimated standard deviation of 𝑼T𝜸superscript𝑼T𝜸{\boldsymbol{U}}^{\rm T}\boldsymbol{\gamma}bold_italic_U start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ, with (𝜶^ols,𝜷^ols,𝜼^ols)superscript^𝜶𝑜𝑙𝑠superscript^𝜷𝑜𝑙𝑠superscript^𝜼𝑜𝑙𝑠(\hat{\boldsymbol{\alpha}}^{ols},\hat{\boldsymbol{\beta}}^{ols},\hat{% \boldsymbol{\eta}}^{ols})( over^ start_ARG bold_italic_α end_ARG start_POSTSUPERSCRIPT italic_o italic_l italic_s end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT italic_o italic_l italic_s end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_η end_ARG start_POSTSUPERSCRIPT italic_o italic_l italic_s end_POSTSUPERSCRIPT ) being the estimator of (𝜶,𝜷,𝜼)𝜶𝜷𝜼(\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\eta})( bold_italic_α , bold_italic_β , bold_italic_η ) by using ordinary quadratic loss.

Attention now turns to the implementation for subgroup-classifier learning, with the parameters 𝜽=(𝜶T,𝜷T)T𝜽superscriptsuperscript𝜶Tsuperscript𝜷TT\boldsymbol{\theta}=(\boldsymbol{\alpha}^{\rm T},\boldsymbol{\beta}^{\rm T})^{% \rm T}bold_italic_θ = ( bold_italic_α start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , bold_italic_β start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT and 𝜼𝜼\boldsymbol{\eta}bold_italic_η estimated iteratively as follows. Let τ(𝜶,𝜷,𝜼)subscript𝜏𝜶𝜷𝜼\ell_{\tau}(\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\eta})roman_ℓ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_α , bold_italic_β , bold_italic_η ) be the loss function for model (1), i.e.,

τ(𝜶,𝜷,𝜼)=i=1nLτ(yi𝑿iT𝜶𝒁iT𝜷𝟏(U1i+𝑼2iT𝜼0)),subscript𝜏𝜶𝜷𝜼superscriptsubscript𝑖1𝑛subscript𝐿𝜏subscript𝑦𝑖superscriptsubscript𝑿𝑖T𝜶superscriptsubscript𝒁𝑖T𝜷1subscript𝑈1𝑖superscriptsubscript𝑼2𝑖T𝜼0\displaystyle\ell_{\tau}(\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{% \eta})=\sum_{i=1}^{n}L_{\tau}\left(y_{i}-{\boldsymbol{X}}_{i}^{\rm T}% \boldsymbol{\alpha}-{\boldsymbol{Z}}_{i}^{\rm T}\boldsymbol{\beta}{\boldsymbol% {1}}(U_{1i}+{\boldsymbol{U}}_{2i}^{\rm T}\boldsymbol{\eta}\geq 0)\right),roman_ℓ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_α , bold_italic_β , bold_italic_η ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α - bold_italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_β bold_1 ( italic_U start_POSTSUBSCRIPT 1 italic_i end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_η ≥ 0 ) ) ,

and let the smoothed loss function be

~τ,h(𝜶,𝜷,𝜼)=i=1nLτ(yi𝑿iT𝜶𝒁iT𝜷Kh(U1i+𝑼2iT𝜼0)).subscript~𝜏𝜶𝜷𝜼superscriptsubscript𝑖1𝑛subscript𝐿𝜏subscript𝑦𝑖superscriptsubscript𝑿𝑖T𝜶superscriptsubscript𝒁𝑖T𝜷subscript𝐾subscript𝑈1𝑖superscriptsubscript𝑼2𝑖T𝜼0\displaystyle\tilde{\ell}_{\tau,h}(\boldsymbol{\alpha},\boldsymbol{\beta},% \boldsymbol{\eta})=\sum_{i=1}^{n}L_{\tau}\left(y_{i}-{\boldsymbol{X}}_{i}^{\rm T% }\boldsymbol{\alpha}-{\boldsymbol{Z}}_{i}^{\rm T}\boldsymbol{\beta}K_{h}(U_{1i% }+{\boldsymbol{U}}_{2i}^{\rm T}\boldsymbol{\eta}\geq 0)\right).over~ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT ( bold_italic_α , bold_italic_β , bold_italic_η ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α - bold_italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_β italic_K start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT 1 italic_i end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_η ≥ 0 ) ) . (16)

For given 𝜼(k)superscript𝜼𝑘\boldsymbol{\eta}^{(k)}bold_italic_η start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT, one obtains 𝜶(k+1)superscript𝜶𝑘1\boldsymbol{\alpha}^{(k+1)}bold_italic_α start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT and 𝜷(k+1)superscript𝜷𝑘1\boldsymbol{\beta}^{(k+1)}bold_italic_β start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT by minimizing the smoothed loss function

(𝜶(k+1),𝜷(k+1))=argmin𝜶Θα,𝜷Θβ~τ,h(𝜶,𝜷,𝜼(k)),superscript𝜶𝑘1superscript𝜷𝑘1subscriptargminformulae-sequence𝜶subscriptΘ𝛼𝜷subscriptΘ𝛽subscript~𝜏𝜶𝜷superscript𝜼𝑘\displaystyle(\boldsymbol{\alpha}^{(k+1)},\boldsymbol{\beta}^{(k+1)})=\mathop{% \rm argmin}_{\boldsymbol{\alpha}\in\Theta_{\alpha},\boldsymbol{\beta}\in\Theta% _{\beta}}\tilde{\ell}_{\tau,h}(\boldsymbol{\alpha},\boldsymbol{\beta},% \boldsymbol{\eta}^{(k)}),( bold_italic_α start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT , bold_italic_β start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT ) = roman_argmin start_POSTSUBSCRIPT bold_italic_α ∈ roman_Θ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT , bold_italic_β ∈ roman_Θ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT ( bold_italic_α , bold_italic_β , bold_italic_η start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) ,

and for given 𝜶(k+1)superscript𝜶𝑘1\boldsymbol{\alpha}^{(k+1)}bold_italic_α start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT and 𝜷(k+1)superscript𝜷𝑘1\boldsymbol{\beta}^{(k+1)}bold_italic_β start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT, one estimates 𝜼(k+1)superscript𝜼𝑘1\boldsymbol{\eta}^{(k+1)}bold_italic_η start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT by

𝜼(k+1)=argmin𝜼Θη~τ,h(𝜶(k+1),𝜷(k+1),𝜼).superscript𝜼𝑘1subscriptargmin𝜼subscriptΘ𝜂subscript~𝜏superscript𝜶𝑘1superscript𝜷𝑘1𝜼\displaystyle\boldsymbol{\eta}^{(k+1)}=\mathop{\rm argmin}_{\boldsymbol{\eta}% \in\Theta_{\eta}}\tilde{\ell}_{\tau,h}(\boldsymbol{\alpha}^{(k+1)},\boldsymbol% {\beta}^{(k+1)},\boldsymbol{\eta}).bold_italic_η start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT = roman_argmin start_POSTSUBSCRIPT bold_italic_η ∈ roman_Θ start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT ( bold_italic_α start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT , bold_italic_β start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT , bold_italic_η ) .

Iterating these two minimizers leads to the desired robust estimators. These are summarized with the multiplier bootstrap calibration in Algorithm 1 in Appendix A of the Supplementary Material, which provides the strategy for estimating the confidence intervals for the estimators 𝜶^^𝜶\hat{\boldsymbol{\alpha}}over^ start_ARG bold_italic_α end_ARG, 𝜷^^𝜷\hat{\boldsymbol{\beta}}over^ start_ARG bold_italic_β end_ARG, and 𝜼^^𝜼\hat{\boldsymbol{\eta}}over^ start_ARG bold_italic_η end_ARG.

Note that herein, 𝜶(k+1)superscript𝜶𝑘1\boldsymbol{\alpha}^{(k+1)}bold_italic_α start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT and 𝜷(k+1)superscript𝜷𝑘1\boldsymbol{\beta}^{(k+1)}bold_italic_β start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT for given 𝜼(k)superscript𝜼𝑘\boldsymbol{\eta}^{(k)}bold_italic_η start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT are obtained by a robust data-adaptive method proposed by Wang et al., (2021). Specifically, 𝜽=(𝜶T,𝜷T)T𝜽superscriptsuperscript𝜶Tsuperscript𝜷TT\boldsymbol{\theta}=(\boldsymbol{\alpha}^{\rm T},\boldsymbol{\beta}^{\rm T})^{% \rm T}bold_italic_θ = ( bold_italic_α start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , bold_italic_β start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT is estimated and τ𝜏\tauitalic_τ is calibrated simultaneously by solving the following system of equations:

{i=1nψτ(yi𝑾iT𝜽)𝑾i=0,(τ2n)1i=1nmin{(yi𝑾iT𝜽)2,τ2}n1(d+z)=0,casessuperscriptsubscript𝑖1𝑛subscript𝜓𝜏subscript𝑦𝑖superscriptsubscript𝑾𝑖T𝜽subscript𝑾𝑖0superscriptsuperscript𝜏2𝑛1superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖superscriptsubscript𝑾𝑖T𝜽2superscript𝜏2superscript𝑛1𝑑𝑧0\displaystyle\left\{\begin{array}[]{l}\sum_{i=1}^{n}\psi_{\tau}(y_{i}-{% \boldsymbol{W}}_{i}^{\rm T}\boldsymbol{\theta}){\boldsymbol{W}}_{i}=0,\\ (\tau^{2}n)^{-1}\sum_{i=1}^{n}\min\{(y_{i}-{\boldsymbol{W}}_{i}^{\rm T}% \boldsymbol{\theta})^{2},\tau^{2}\}-n^{-1}(d+z)=0,\end{array}\right.{ start_ARRAY start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_θ ) bold_italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , end_CELL end_ROW start_ROW start_CELL ( italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_min { ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_θ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } - italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_d + italic_z ) = 0 , end_CELL end_ROW end_ARRAY

where d=p+q1𝑑𝑝𝑞1d=p+q-1italic_d = italic_p + italic_q - 1, 𝑾i=(𝑿iT,𝒁iT𝟏(U1i+𝑼2iT𝜼(k)0))Tsubscript𝑾𝑖superscriptsuperscriptsubscript𝑿𝑖Tsuperscriptsubscript𝒁𝑖T1subscript𝑈1𝑖superscriptsubscript𝑼2𝑖Tsuperscript𝜼𝑘0T{\boldsymbol{W}}_{i}=({\boldsymbol{X}}_{i}^{\rm T},{\boldsymbol{Z}}_{i}^{\rm T% }{\boldsymbol{1}}(U_{1i}+{\boldsymbol{U}}_{2i}^{\rm T}\boldsymbol{\eta}^{(k)}% \geq 0))^{\rm T}bold_italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , bold_italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_1 ( italic_U start_POSTSUBSCRIPT 1 italic_i end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_η start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ≥ 0 ) ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT, and z=log(n)𝑧𝑛z=\log(n)italic_z = roman_log ( italic_n ) as suggested by Wang et al., (2021). The initial values 𝜽(0)=𝜽(ols)superscript𝜽0superscript𝜽𝑜𝑙𝑠\boldsymbol{\theta}^{(0)}=\boldsymbol{\theta}^{(ols)}bold_italic_θ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = bold_italic_θ start_POSTSUPERSCRIPT ( italic_o italic_l italic_s ) end_POSTSUPERSCRIPT and τ(0)=σ^ϵn/(d+z)superscript𝜏0subscript^𝜎italic-ϵ𝑛𝑑𝑧\tau^{(0)}=\hat{\sigma}_{\epsilon}\sqrt{n/(d+z)}italic_τ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT square-root start_ARG italic_n / ( italic_d + italic_z ) end_ARG are set using the ordinary quadratic loss, where σ^ϵ2=(npqr)1i=1n(y𝑿iT𝜶^ols𝒁iT𝜷^ols𝟏(U1i+𝑼2iT𝜼^ols0))2superscriptsubscript^𝜎italic-ϵ2superscript𝑛𝑝𝑞𝑟1superscriptsubscript𝑖1𝑛superscript𝑦superscriptsubscript𝑿𝑖Tsuperscript^𝜶𝑜𝑙𝑠superscriptsubscript𝒁𝑖Tsuperscript^𝜷𝑜𝑙𝑠1subscript𝑈1𝑖superscriptsubscript𝑼2𝑖Tsuperscript^𝜼𝑜𝑙𝑠02\hat{\sigma}_{\epsilon}^{2}=(n-p-q-r)^{-1}\sum_{i=1}^{n}(y-{\boldsymbol{X}}_{i% }^{\rm T}\hat{\boldsymbol{\alpha}}^{ols}-{\boldsymbol{Z}}_{i}^{\rm T}\hat{% \boldsymbol{\beta}}^{ols}{\boldsymbol{1}}(U_{1i}+{\boldsymbol{U}}_{2i}^{\rm T}% \hat{\boldsymbol{\eta}}^{ols}\geq 0))^{2}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( italic_n - italic_p - italic_q - italic_r ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_y - bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_α end_ARG start_POSTSUPERSCRIPT italic_o italic_l italic_s end_POSTSUPERSCRIPT - bold_italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT italic_o italic_l italic_s end_POSTSUPERSCRIPT bold_1 ( italic_U start_POSTSUBSCRIPT 1 italic_i end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_η end_ARG start_POSTSUPERSCRIPT italic_o italic_l italic_s end_POSTSUPERSCRIPT ≥ 0 ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, with (𝜶^ols,𝜷^ols,𝜼ols)superscript^𝜶𝑜𝑙𝑠superscript^𝜷𝑜𝑙𝑠superscript𝜼𝑜𝑙𝑠(\hat{\boldsymbol{\alpha}}^{ols},\hat{\boldsymbol{\beta}}^{ols},\boldsymbol{% \eta}^{ols})( over^ start_ARG bold_italic_α end_ARG start_POSTSUPERSCRIPT italic_o italic_l italic_s end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT italic_o italic_l italic_s end_POSTSUPERSCRIPT , bold_italic_η start_POSTSUPERSCRIPT italic_o italic_l italic_s end_POSTSUPERSCRIPT ) being the estimator of (𝜶,𝜷,𝜼)𝜶𝜷𝜼(\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\eta})( bold_italic_α , bold_italic_β , bold_italic_η ).

3 Robust subgroup testing

Before learning the subgroup classifier, it is of interest to test for the existence of subgroups, which guarantees avoidance of the identifiability problem of 𝜼𝜼\boldsymbol{\eta}bold_italic_η. This section considers the test problem (2). Recall the loss function in (9), i.e.,

Lτ(y𝑿T𝜶𝒁T𝜷𝟏(𝑼T𝜸0)),subscript𝐿𝜏𝑦superscript𝑿T𝜶superscript𝒁T𝜷1superscript𝑼T𝜸0\displaystyle L_{\tau}(y-{\boldsymbol{X}}^{\rm T}\boldsymbol{\alpha}-{% \boldsymbol{Z}}^{\rm T}\boldsymbol{\beta}{\boldsymbol{1}}({\boldsymbol{U}}^{% \rm T}\boldsymbol{\gamma}\geq 0)),italic_L start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y - bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α - bold_italic_Z start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_β bold_1 ( bold_italic_U start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ ≥ 0 ) ) , (17)

the derivative of which with respect to 𝜷𝜷\boldsymbol{\beta}bold_italic_β under the alternative hypothesis is

φ(𝑽,𝜶,𝜷,𝜸)=𝒁𝟏(𝑼T𝜸0)ψτ(y𝑿T𝜶𝒁T𝜷𝟏(𝑼T𝜸0)),𝜑𝑽𝜶𝜷𝜸𝒁1superscript𝑼T𝜸0subscript𝜓𝜏𝑦superscript𝑿T𝜶superscript𝒁T𝜷1superscript𝑼T𝜸0\displaystyle\varphi({\boldsymbol{V}},\boldsymbol{\alpha},\boldsymbol{\beta},% \boldsymbol{\gamma})={\boldsymbol{Z}}{\boldsymbol{1}}({\boldsymbol{U}}^{\rm T}% \boldsymbol{\gamma}\geq 0)\psi_{\tau}(y-{\boldsymbol{X}}^{\rm T}\boldsymbol{% \alpha}-{\boldsymbol{Z}}^{\rm T}\boldsymbol{\beta}{\boldsymbol{1}}({% \boldsymbol{U}}^{\rm T}\boldsymbol{\gamma}\geq 0)),italic_φ ( bold_italic_V , bold_italic_α , bold_italic_β , bold_italic_γ ) = bold_italic_Z bold_1 ( bold_italic_U start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ ≥ 0 ) italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y - bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α - bold_italic_Z start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_β bold_1 ( bold_italic_U start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ ≥ 0 ) ) ,

and with respect to 𝜶𝜶\boldsymbol{\alpha}bold_italic_α under the null hypothesis is

φ0(𝑽,𝜶)=𝑿ψτ(y𝑿T𝜶),subscript𝜑0𝑽𝜶𝑿subscript𝜓𝜏𝑦superscript𝑿T𝜶\displaystyle\varphi_{0}({\boldsymbol{V}},\boldsymbol{\alpha})={\boldsymbol{X}% }\psi_{\tau}(y-{\boldsymbol{X}}^{\rm T}\boldsymbol{\alpha}),italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_V , bold_italic_α ) = bold_italic_X italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y - bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α ) ,

where ψτ(u)subscript𝜓𝜏𝑢\psi_{\tau}(u)italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_u ) is the first derivative of the Huber loss (5), defined as ψτ(u)=sgn(u)min{|u|,τ}subscript𝜓𝜏𝑢sgn𝑢𝑢𝜏\psi_{\tau}(u)=\hbox{sgn}(u)\min\{|u|,\tau\}italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_u ) = sgn ( italic_u ) roman_min { | italic_u | , italic_τ }.

3.1 Robust estimation under null hypothesis

Under the null hypothesis, model (1) reduces to the ordinary linear regression model with heavy-tailed errors, i.e.,

yi=𝑿iT𝜶+ϵi.subscript𝑦𝑖superscriptsubscript𝑿𝑖T𝜶subscriptitalic-ϵ𝑖\displaystyle y_{i}={\boldsymbol{X}}_{i}^{\rm T}\boldsymbol{\alpha}+\epsilon_{% i}.italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α + italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . (18)

Parametric estimation in model (18) is well-studied in the literature; see Huber, (1964, 1973); Fan et al., 2017b ; Sun et al., (2020); Wang et al., (2021); Han et al., (2022); Chen and Zhou, (2020); Zhou et al., (2018), among others. Let 𝜶^τsubscript^𝜶𝜏\hat{\boldsymbol{\alpha}}_{\tau}over^ start_ARG bold_italic_α end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT be the estimate of 𝜶τsubscript𝜶𝜏\boldsymbol{\alpha}_{\tau}bold_italic_α start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT under the null hypothesis, i.e.,

𝜶^τ=argmin𝜶ΘαnLτ(y𝑿T𝜶).subscript^𝜶𝜏subscriptargmin𝜶subscriptΘ𝛼subscript𝑛subscript𝐿𝜏𝑦superscript𝑿T𝜶\displaystyle\hat{\boldsymbol{\alpha}}_{\tau}=\mathop{\rm argmin}_{\boldsymbol% {\alpha}\in\Theta_{\alpha}}\mathbb{P}_{n}L_{\tau}(y-{\boldsymbol{X}}^{\rm T}% \boldsymbol{\alpha}).over^ start_ARG bold_italic_α end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT = roman_argmin start_POSTSUBSCRIPT bold_italic_α ∈ roman_Θ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y - bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α ) . (19)

Before establishing asymptotic properties for the robust estimator under the null hypothesis and the test statistic, the following required assumptions are made.

  1. (B1)

    The random vector 𝑿𝑿{\boldsymbol{X}}bold_italic_X is sub-Gaussian, J={𝐏[𝑿𝑿T]}1p×p𝐽superscript𝐏delimited-[]𝑿superscript𝑿T1superscript𝑝𝑝J=\{\boldsymbol{\mathrm{P}}[{\boldsymbol{X}}{\boldsymbol{X}}^{\rm T}]\}^{-1}% \in\mathbb{R}^{p\times p}italic_J = { bold_P [ bold_italic_X bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ] } start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_p end_POSTSUPERSCRIPT is a finite and positive definite deterministic matrix, and there is a universal constant K1>0subscript𝐾10K_{1}>0italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0 satisfying 𝑿ψ2K1subscriptnorm𝑿subscript𝜓2subscript𝐾1\|{\boldsymbol{X}}\|_{\psi_{2}}\leq K_{1}∥ bold_italic_X ∥ start_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 𝒁ψ2K1subscriptnorm𝒁subscript𝜓2subscript𝐾1\|{\boldsymbol{Z}}\|_{\psi_{2}}\leq K_{1}∥ bold_italic_Z ∥ start_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

  2. (B2)

    The error variable ϵitalic-ϵ\epsilonitalic_ϵ is independent of (𝑿T,𝒁T,𝑼T)Tsuperscriptsuperscript𝑿Tsuperscript𝒁Tsuperscript𝑼TT({\boldsymbol{X}}^{\rm T},{\boldsymbol{Z}}^{\rm T},{\boldsymbol{U}}^{\rm T})^{% \rm T}( bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , bold_italic_Z start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , bold_italic_U start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT and satisfies 𝐏(ϵ)=0𝐏italic-ϵ0\boldsymbol{\mathrm{P}}(\epsilon)=0bold_P ( italic_ϵ ) = 0 and 𝐏(|ϵ|2+δ)=Mδ<𝐏superscriptitalic-ϵ2𝛿subscript𝑀𝛿\boldsymbol{\mathrm{P}}(|\epsilon|^{2+\delta})=M_{\delta}<\inftybold_P ( | italic_ϵ | start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT ) = italic_M start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT < ∞ with δ0𝛿0\delta\geq 0italic_δ ≥ 0.

  3. (B3)

    0<𝐏[𝟏(𝑼T𝜸0)]<10𝐏delimited-[]1superscript𝑼T𝜸010<\boldsymbol{\mathrm{P}}[{\boldsymbol{1}}({\boldsymbol{U}}^{\rm T}\boldsymbol% {\gamma}\geq 0)]<10 < bold_P [ bold_1 ( bold_italic_U start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ ≥ 0 ) ] < 1 for any 𝜸Θγr𝜸subscriptΘ𝛾superscript𝑟\boldsymbol{\gamma}\in\Theta_{\gamma}\subseteq\mathbb{R}^{r}bold_italic_γ ∈ roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ⊆ blackboard_R start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT.

Remark 2.

Assumption (B1) is the moment condition of covariates for establishing the nonasymptotic properties under the null hypothesis and deriving the asymptotic distributions. Assumption (B2) is the same as Assumption (A2), and Assumption (B3) is the same as Assumption (A3).

Lemma 1.

(Chen and Zhou, (2020)) If Assumptions (B1)–(B3) hold, then for any t>0𝑡0t>0italic_t > 0 and vv2+δ1/(2+δ)𝑣superscriptsubscript𝑣2𝛿12𝛿v\geq v_{2+\delta}^{1/(2+\delta)}italic_v ≥ italic_v start_POSTSUBSCRIPT 2 + italic_δ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / ( 2 + italic_δ ) end_POSTSUPERSCRIPT, the estimator 𝛂^τsubscript^𝛂𝜏\hat{\boldsymbol{\alpha}}_{\tau}over^ start_ARG bold_italic_α end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT given in (19) with τ=v(np+t)1/(2+δ)𝜏𝑣superscript𝑛𝑝𝑡12𝛿\tau=v(\frac{n}{p+t})^{1/(2+\delta)}italic_τ = italic_v ( divide start_ARG italic_n end_ARG start_ARG italic_p + italic_t end_ARG ) start_POSTSUPERSCRIPT 1 / ( 2 + italic_δ ) end_POSTSUPERSCRIPT satisfies

𝐏{J1/2(𝜶^τ𝜶)c1vp+tn}2etand𝐏{J1/2(𝜶^τ𝜶)J1/2n𝑿ψτ(ϵ)c2vp+tn}2et𝐏delimited-∥∥superscript𝐽12subscript^𝜶𝜏superscript𝜶subscript𝑐1𝑣𝑝𝑡𝑛2superscript𝑒𝑡and𝐏delimited-∥∥superscript𝐽12subscript^𝜶𝜏superscript𝜶superscript𝐽12subscript𝑛𝑿subscript𝜓𝜏italic-ϵsubscript𝑐2𝑣𝑝𝑡𝑛2superscript𝑒𝑡\displaystyle\begin{split}\boldsymbol{\mathrm{P}}\left\{\left\|J^{-1/2}(\hat{% \boldsymbol{\alpha}}_{\tau}-\boldsymbol{\alpha}^{*})\right\|\geq c_{1}v\sqrt{% \frac{p+t}{n}}\right\}\leq&2e^{t}\\ \mbox{and}~{}~{}\boldsymbol{\mathrm{P}}\left\{\left\|J^{-1/2}(\hat{\boldsymbol% {\alpha}}_{\tau}-\boldsymbol{\alpha}^{*})-J^{-1/2}\mathbb{P}_{n}{\boldsymbol{X% }}\psi_{\tau}(\epsilon)\right\|\geq c_{2}v\sqrt{\frac{p+t}{n}}\right\}\leq&2e^% {t}\end{split}start_ROW start_CELL bold_P { ∥ italic_J start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_α end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ ≥ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_v square-root start_ARG divide start_ARG italic_p + italic_t end_ARG start_ARG italic_n end_ARG end_ARG } ≤ end_CELL start_CELL 2 italic_e start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL and bold_P { ∥ italic_J start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_α end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - italic_J start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_italic_X italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_ϵ ) ∥ ≥ italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_v square-root start_ARG divide start_ARG italic_p + italic_t end_ARG start_ARG italic_n end_ARG end_ARG } ≤ end_CELL start_CELL 2 italic_e start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL end_ROW (20)

as along as nc3(p+t)𝑛subscript𝑐3𝑝𝑡n\geq c_{3}(p+t)italic_n ≥ italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_p + italic_t ), where c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTc3subscript𝑐3c_{3}italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are constants depending on only K1subscript𝐾1K_{1}italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

3.2 Robust test statistic

As discussed by Liu et al., (2024), for any known 𝜸Θγ𝜸subscriptΘ𝛾\boldsymbol{\gamma}\in\Theta_{\gamma}bold_italic_γ ∈ roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT, it is natural to consider an SST statistic for testing 𝜷=𝟎𝜷0\boldsymbol{\beta}={\boldsymbol{0}}bold_italic_β = bold_0, i.e.,

T~n(𝜸)=n1nφ(𝑽,𝜶^τ,𝟎,𝜸)V~(𝜸)12,subscript~𝑇𝑛𝜸superscript𝑛1subscriptsuperscriptnormsubscript𝑛𝜑𝑽subscript^𝜶𝜏0𝜸2~𝑉superscript𝜸1\displaystyle\tilde{T}_{n}(\boldsymbol{\gamma})=n^{-1}\|\mathbb{P}_{n}\varphi(% {\boldsymbol{V}},\hat{\boldsymbol{\alpha}}_{\tau},{\boldsymbol{0}},\boldsymbol% {\gamma})\|^{2}_{{\tilde{V}}(\boldsymbol{\gamma})^{-1}},over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_γ ) = italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_φ ( bold_italic_V , over^ start_ARG bold_italic_α end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , bold_0 , bold_italic_γ ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG ( bold_italic_γ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , (21)

where 𝜶^τsubscript^𝜶𝜏\hat{\boldsymbol{\alpha}}_{\tau}over^ start_ARG bold_italic_α end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT is given in (19) and V~(𝜸)=n{φ(𝑽,𝜶^τ,𝟎,𝜸)G^(𝜸)J^φ0(𝑽,𝜶^τ)}2~𝑉𝜸subscript𝑛superscript𝜑𝑽subscript^𝜶𝜏0𝜸^𝐺𝜸^𝐽subscript𝜑0𝑽subscript^𝜶𝜏tensor-productabsent2{\tilde{V}}(\boldsymbol{\gamma})=\mathbb{P}_{n}\{\varphi({\boldsymbol{V}},\hat% {\boldsymbol{\alpha}}_{\tau},{\boldsymbol{0}},\boldsymbol{\gamma})-\hat{G}(% \boldsymbol{\gamma})\hat{J}\varphi_{0}({\boldsymbol{V}},\hat{\boldsymbol{% \alpha}}_{\tau})\}^{\otimes 2}over~ start_ARG italic_V end_ARG ( bold_italic_γ ) = blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT { italic_φ ( bold_italic_V , over^ start_ARG bold_italic_α end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , bold_0 , bold_italic_γ ) - over^ start_ARG italic_G end_ARG ( bold_italic_γ ) over^ start_ARG italic_J end_ARG italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_V , over^ start_ARG bold_italic_α end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) } start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT. Here, G^(𝜸)^𝐺𝜸{\hat{G}}(\boldsymbol{\gamma})over^ start_ARG italic_G end_ARG ( bold_italic_γ ) and J^^𝐽{\hat{J}}over^ start_ARG italic_J end_ARG are consistent estimators of G(𝜸)𝐺𝜸G(\boldsymbol{\gamma})italic_G ( bold_italic_γ ) and J𝐽Jitalic_J, respectively, where J𝐽Jitalic_J is as defined in Assumption (B1) and

G(𝜸)=𝐏{𝒁𝑿T𝟏(𝑼T𝜸0)𝟏(|y𝑿T𝜶|τ)}q×p.𝐺𝜸𝐏𝒁superscript𝑿T1superscript𝑼T𝜸01𝑦superscript𝑿Tsuperscript𝜶𝜏superscript𝑞𝑝\displaystyle G(\boldsymbol{\gamma})=\boldsymbol{\mathrm{P}}\{{\boldsymbol{Z}}% {\boldsymbol{X}}^{\rm T}{\boldsymbol{1}}({\boldsymbol{U}}^{\rm T}\boldsymbol{% \gamma}\geq 0){\boldsymbol{1}}(|y-{\boldsymbol{X}}^{\rm T}\boldsymbol{\alpha}^% {*}|\leq\tau)\}\in\mathbb{R}^{q\times p}.italic_G ( bold_italic_γ ) = bold_P { bold_italic_Z bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_1 ( bold_italic_U start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ ≥ 0 ) bold_1 ( | italic_y - bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | ≤ italic_τ ) } ∈ blackboard_R start_POSTSUPERSCRIPT italic_q × italic_p end_POSTSUPERSCRIPT .
Lemma 2.

If Assumptions (B1)–(B3) hold, then for any fixed 𝛄Θγ𝛄subscriptΘ𝛾\boldsymbol{\gamma}\in\Theta_{\gamma}bold_italic_γ ∈ roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT, T~n(𝛄)subscript~𝑇𝑛𝛄\tilde{T}_{n}(\boldsymbol{\gamma})over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_γ ) converges in distribution to a χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT one with q𝑞qitalic_q degrees of freedom under H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT as n𝑛n\rightarrow\inftyitalic_n → ∞.

Although there is an unknown parameter 𝜸𝜸\boldsymbol{\gamma}bold_italic_γ that prevents T~n(𝜸)subscript~𝑇𝑛𝜸\tilde{T}_{n}(\boldsymbol{\gamma})over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_γ ) from being used directly in practice, Lemma 2 reveals essentially that the asymptotic distribution of T~n(𝜸)subscript~𝑇𝑛𝜸\tilde{T}_{n}(\boldsymbol{\gamma})over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_γ ) is free of the nuisance parameter 𝜸𝜸\boldsymbol{\gamma}bold_italic_γ. Thus, the supremum and the weighted average of T~n(𝜸)subscript~𝑇𝑛𝜸\tilde{T}_{n}(\boldsymbol{\gamma})over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_γ ) over 𝜸𝜸\boldsymbol{\gamma}bold_italic_γ should guarantee the correct type-I errors, which motivates constructing the robust test statistic based on the weighted average of T~n(𝜸)subscript~𝑇𝑛𝜸\tilde{T}_{n}(\boldsymbol{\gamma})over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_γ ) over the parametric space ΘγsubscriptΘ𝛾\Theta_{\gamma}roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT.

Fan et al., 2017a studied the supremum of the SST statistic T~n(𝜸)subscript~𝑇𝑛𝜸\tilde{T}_{n}(\boldsymbol{\gamma})over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_italic_γ ) (SST) over the grouping parameter 𝜸𝜸\boldsymbol{\gamma}bold_italic_γ for a semiparametric model, i.e.,

T~n=sup𝜸Θγ{n1nφ(𝑽,𝜶^τ,𝟎,𝜸)V~(𝜸)12}.subscript~𝑇𝑛subscriptsupremum𝜸subscriptΘ𝛾conditional-setsuperscript𝑛1evaluated-atsubscript𝑛𝜑𝑽subscript^𝜶𝜏0𝜸~𝑉superscript𝜸12\displaystyle\tilde{T}_{n}=\sup_{\boldsymbol{\gamma}\in\Theta_{\gamma}}\left\{% n^{-1}\|\mathbb{P}_{n}\varphi({\boldsymbol{V}},\hat{\boldsymbol{\alpha}}_{\tau% },{\boldsymbol{0}},\boldsymbol{\gamma})\|^{2}_{{\tilde{V}}(\boldsymbol{\gamma}% )^{-1}}\right\}.over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT bold_italic_γ ∈ roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_POSTSUBSCRIPT { italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_φ ( bold_italic_V , over^ start_ARG bold_italic_α end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , bold_0 , bold_italic_γ ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG ( bold_italic_γ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT } . (22)

The test statistic T~nsubscript~𝑇𝑛\tilde{T}_{n}over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT has been investigated widely in the literature; see Andrews and Ploberger, (1994, 1995); Davies, (1977); Song et al., (2009); Shen and Qu, (2020); Liu et al., (2024). It is easy to extent the SST to model (1) with heavy-tailed errors according to the Bahadur representation in Theorem 4.

When the dimension of the parametric space ΘγsubscriptΘ𝛾\Theta_{\gamma}roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT is large, searching for the supremum value over ΘγsubscriptΘ𝛾\Theta_{\gamma}roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT may cause T~nsubscript~𝑇𝑛\tilde{T}_{n}over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT to lose power in practice and is time-consuming computationally. To avoid these drawbacks, proposed in this section is a robust test procedure that is a type of WAST statistic first introduced by Liu et al., (2024).

Proposed herein is RWAST, i.e.,

Tn=1n(n1)ijωij𝒁iT𝒁jψτ(yi𝑿iT𝜶^τ)ψτ(yj𝑿jT𝜶^τ),subscript𝑇𝑛1𝑛𝑛1subscript𝑖𝑗subscript𝜔𝑖𝑗superscriptsubscript𝒁𝑖Tsubscript𝒁𝑗subscript𝜓𝜏subscript𝑦𝑖superscriptsubscript𝑿𝑖Tsubscript^𝜶𝜏subscript𝜓𝜏subscript𝑦𝑗superscriptsubscript𝑿𝑗Tsubscript^𝜶𝜏\displaystyle T_{n}=\frac{1}{n(n-1)}\sum_{i\neq j}\omega_{ij}{\boldsymbol{Z}}_% {i}^{\rm T}{\boldsymbol{Z}}_{j}\psi_{\tau}(y_{i}-{\boldsymbol{X}}_{i}^{\rm T}% \hat{\boldsymbol{\alpha}}_{\tau})\psi_{\tau}(y_{j}-{\boldsymbol{X}}_{j}^{\rm T% }\hat{\boldsymbol{\alpha}}_{\tau}),italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n ( italic_n - 1 ) end_ARG ∑ start_POSTSUBSCRIPT italic_i ≠ italic_j end_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT bold_italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_α end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - bold_italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_α end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) , (23)

where

ωij=14+12πarctan(ϱij1ϱij2)ifij,formulae-sequencesubscript𝜔𝑖𝑗1412𝜋subscriptitalic-ϱ𝑖𝑗1superscriptsubscriptitalic-ϱ𝑖𝑗2if𝑖𝑗\displaystyle\omega_{ij}=\frac{1}{4}+\frac{1}{2\pi}\arctan\left(\frac{\varrho_% {ij}}{\sqrt{1-\varrho_{ij}^{2}}}\right)\quad\mbox{if}~{}i\neq j,italic_ω start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 4 end_ARG + divide start_ARG 1 end_ARG start_ARG 2 italic_π end_ARG roman_arctan ( divide start_ARG italic_ϱ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG 1 - italic_ϱ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG ) if italic_i ≠ italic_j , (24)

and ϱij=𝑼iT𝑼j(𝑼i𝑼j)1subscriptitalic-ϱ𝑖𝑗superscriptsubscript𝑼𝑖Tsubscript𝑼𝑗superscriptnormsubscript𝑼𝑖normsubscript𝑼𝑗1\varrho_{ij}={\boldsymbol{U}}_{i}^{\rm T}{\boldsymbol{U}}_{j}(\|{\boldsymbol{U% }}_{i}\|\|{\boldsymbol{U}}_{j}\|)^{-1}italic_ϱ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = bold_italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( ∥ bold_italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ∥ bold_italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. As noted by Liu et al., (2024), there is a Bayesian explanation for the weight ωijsubscript𝜔𝑖𝑗\omega_{ij}italic_ω start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT. In fact, Lemma D1 in the Supplementary Material shows that

ωij=𝜸Θγ𝟏(𝑼iT𝜸0)𝟏(𝑼jT𝜸0)w(𝜸)𝑑𝜸,subscript𝜔𝑖𝑗subscript𝜸subscriptΘ𝛾1superscriptsubscript𝑼𝑖T𝜸01superscriptsubscript𝑼𝑗T𝜸0𝑤𝜸differential-d𝜸\displaystyle\omega_{ij}=\int_{\boldsymbol{\gamma}\in\Theta_{\gamma}}{% \boldsymbol{1}}({\boldsymbol{U}}_{i}^{\rm T}\boldsymbol{\gamma}\geq 0){% \boldsymbol{1}}({\boldsymbol{U}}_{j}^{\rm T}\boldsymbol{\gamma}\geq 0)w(% \boldsymbol{\gamma})d\boldsymbol{\gamma},italic_ω start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = ∫ start_POSTSUBSCRIPT bold_italic_γ ∈ roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_1 ( bold_italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ ≥ 0 ) bold_1 ( bold_italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ ≥ 0 ) italic_w ( bold_italic_γ ) italic_d bold_italic_γ ,

where w(𝜸)𝑤𝜸w(\boldsymbol{\gamma})italic_w ( bold_italic_γ ) is the standard multi-Gaussian density and can be chosen as another weight satisfying w(𝜸)0𝑤𝜸0w(\boldsymbol{\gamma})\geq 0italic_w ( bold_italic_γ ) ≥ 0 for all 𝜸Θγ𝜸subscriptΘ𝛾\boldsymbol{\gamma}\in\Theta_{\gamma}bold_italic_γ ∈ roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT and 𝜸Θγw(𝜸)𝑑𝜸=1subscript𝜸subscriptΘ𝛾𝑤𝜸differential-d𝜸1\int_{\boldsymbol{\gamma}\in\Theta_{\gamma}}w(\boldsymbol{\gamma})d\boldsymbol% {\gamma}=1∫ start_POSTSUBSCRIPT bold_italic_γ ∈ roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w ( bold_italic_γ ) italic_d bold_italic_γ = 1. In the Bayesian motivation, the grouping parameter 𝜸𝜸\boldsymbol{\gamma}bold_italic_γ has a prior with density w(𝜸)𝑤𝜸w(\boldsymbol{\gamma})italic_w ( bold_italic_γ ). Because the goal herein is to test for the existence of subgroups instead of estimating the grouping parameter, the difference is that there is no requirement for the posterior distribution.

The choice of the weight affect the computation of the test statistic because of the numerical integration over qsuperscript𝑞\mathbb{R}^{q}blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT. Thus, taking the weight as the standard multi-Gaussian density offers good performance in practice, as illustrated in the simulation studies in Section 4 and the case studies in Section 5. To illustrate the performance in robust regression, numerical studies were conducted to investigate the sensitivity of the weight’s choice by comparing ωijsubscript𝜔𝑖𝑗\omega_{ij}italic_ω start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT with the closed form in (24) and the approximated ωijsubscript𝜔𝑖𝑗\omega_{ij}italic_ω start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT in (E.3) of Appendix E.3 in the Supplementary Material. From the numerical results, compared with the approximated ωijsubscript𝜔𝑖𝑗\omega_{ij}italic_ω start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, the test statistic with ωijsubscript𝜔𝑖𝑗\omega_{ij}italic_ω start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT in (24) has higher power uniformly and takes only 10% of the time computationally when N=10 000𝑁10000N=10\,000italic_N = 10 000. This is a strong recommendation to use the closed-form RWAST in (24).

To establish the asymptotic distribution of RWAST, additional notation is introduced below. Denote the kernel of a U-statistic under the null hypothesis by

h(𝑽i,𝑽j)=ωij𝒁iT𝒁jψτ(yi𝑿iT𝜶)ψτ(yj𝑿jT𝜶)+φ0(𝑽i,𝜶)TKj+KiTφ0(𝑽j,𝜶)+φ0(𝑽i,𝜶)THφ0(𝑽j,𝜶),subscript𝑽𝑖subscript𝑽𝑗subscript𝜔𝑖𝑗superscriptsubscript𝒁𝑖Tsubscript𝒁𝑗subscript𝜓𝜏subscript𝑦𝑖superscriptsubscript𝑿𝑖Tsuperscript𝜶subscript𝜓𝜏subscript𝑦𝑗superscriptsubscript𝑿𝑗Tsuperscript𝜶subscript𝜑0superscriptsubscript𝑽𝑖superscript𝜶Tsubscript𝐾𝑗superscriptsubscript𝐾𝑖Tsubscript𝜑0subscript𝑽𝑗superscript𝜶subscript𝜑0superscriptsubscript𝑽𝑖superscript𝜶T𝐻subscript𝜑0subscript𝑽𝑗superscript𝜶\displaystyle\begin{split}h({\boldsymbol{V}}_{i},{\boldsymbol{V}}_{j})=&\omega% _{ij}{\boldsymbol{Z}}_{i}^{\rm T}{\boldsymbol{Z}}_{j}\psi_{\tau}(y_{i}-{% \boldsymbol{X}}_{i}^{\rm T}\boldsymbol{\alpha}^{*})\psi_{\tau}(y_{j}-{% \boldsymbol{X}}_{j}^{\rm T}\boldsymbol{\alpha}^{*})\\ &+\varphi_{0}({\boldsymbol{V}}_{i},\boldsymbol{\alpha}^{*})^{\rm T}K_{j}+K_{i}% ^{\rm T}\varphi_{0}({\boldsymbol{V}}_{j},\boldsymbol{\alpha}^{*})+\varphi_{0}(% {\boldsymbol{V}}_{i},\boldsymbol{\alpha}^{*})^{\rm T}H\varphi_{0}({\boldsymbol% {V}}_{j},\boldsymbol{\alpha}^{*}),\end{split}start_ROW start_CELL italic_h ( bold_italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = end_CELL start_CELL italic_ω start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT bold_italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - bold_italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_K start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_H italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) , end_CELL end_ROW (25)

where φ0(𝑽,𝜶)=𝑿ψτ(y𝑿T𝜶)subscript𝜑0𝑽𝜶𝑿subscript𝜓𝜏𝑦superscript𝑿T𝜶\varphi_{0}({\boldsymbol{V}},\boldsymbol{\alpha})={\boldsymbol{X}}\psi_{\tau}(% y-{\boldsymbol{X}}^{\rm T}\boldsymbol{\alpha})italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_V , bold_italic_α ) = bold_italic_X italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y - bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α ), and

H=𝜸ΘγJTG(𝜸)TG(𝜸)Jw(𝜸)𝑑𝜸andKi=𝜸ΘγJTG(𝜸)Tφ(𝑽i,𝜶,𝟎,𝜸)w(𝜸)𝑑𝜸𝐻subscript𝜸subscriptΘ𝛾superscript𝐽T𝐺superscript𝜸T𝐺𝜸𝐽𝑤𝜸differential-d𝜸andsubscript𝐾𝑖subscript𝜸subscriptΘ𝛾superscript𝐽T𝐺superscript𝜸T𝜑subscript𝑽𝑖superscript𝜶0𝜸𝑤𝜸differential-d𝜸\displaystyle H=\int_{\boldsymbol{\gamma}\in\Theta_{\gamma}}J^{\rm T}G(% \boldsymbol{\gamma})^{\rm T}G(\boldsymbol{\gamma})Jw(\boldsymbol{\gamma})d% \boldsymbol{\gamma}~{}\mbox{and}~{}K_{i}=\int_{\boldsymbol{\gamma}\in\Theta_{% \gamma}}J^{\rm T}G(\boldsymbol{\gamma})^{\rm T}\varphi({\boldsymbol{V}}_{i},% \boldsymbol{\alpha}^{*},{\boldsymbol{0}},\boldsymbol{\gamma})w(\boldsymbol{% \gamma})d\boldsymbol{\gamma}italic_H = ∫ start_POSTSUBSCRIPT bold_italic_γ ∈ roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_G ( bold_italic_γ ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_G ( bold_italic_γ ) italic_J italic_w ( bold_italic_γ ) italic_d bold_italic_γ and italic_K start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∫ start_POSTSUBSCRIPT bold_italic_γ ∈ roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_G ( bold_italic_γ ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_φ ( bold_italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_0 , bold_italic_γ ) italic_w ( bold_italic_γ ) italic_d bold_italic_γ

with φ(𝑽,𝜶,𝜷,𝜸)=𝒁𝟏(𝑼T𝜸0)ψτ(y𝑿T𝜶𝒁T𝜷𝟏(𝑼T𝜸0))𝜑𝑽𝜶𝜷𝜸𝒁1superscript𝑼T𝜸0subscript𝜓𝜏𝑦superscript𝑿T𝜶superscript𝒁T𝜷1superscript𝑼T𝜸0\varphi({\boldsymbol{V}},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{% \gamma})={\boldsymbol{Z}}{\boldsymbol{1}}({\boldsymbol{U}}^{\rm T}\boldsymbol{% \gamma}\geq 0)\psi_{\tau}(y-{\boldsymbol{X}}^{\rm T}\boldsymbol{\alpha}-{% \boldsymbol{Z}}^{\rm T}\boldsymbol{\beta}{\boldsymbol{1}}({\boldsymbol{U}}^{% \rm T}\boldsymbol{\gamma}\geq 0))italic_φ ( bold_italic_V , bold_italic_α , bold_italic_β , bold_italic_γ ) = bold_italic_Z bold_1 ( bold_italic_U start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ ≥ 0 ) italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_y - bold_italic_X start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α - bold_italic_Z start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_β bold_1 ( bold_italic_U start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ ≥ 0 ) ).

Theorem 5.

If Assumptions (B1)–(B3) hold, then under the null hypothesis, we have

nTnμ0ν,superscript𝑛subscript𝑇𝑛subscript𝜇0𝜈\displaystyle nT_{n}-\mu_{0}\stackrel{{\scriptstyle\cal{L}}}{{\longrightarrow}% }\nu,italic_n italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG caligraphic_L end_ARG end_RELOP italic_ν ,

where superscript\stackrel{{\scriptstyle\cal{L}}}{{\longrightarrow}}start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG caligraphic_L end_ARG end_RELOP denotes convergence in distribution, μ0=n{𝐏ψτ(ϵ)}2𝛄Θγ{𝐏𝐙𝟏(𝐔T𝛄0)}2w(𝛄)𝑑𝛄+𝐏[φ0(𝐕,𝛂)THφ0(𝐕,𝛂)]+2𝐏[φ0(𝐕1,𝛂)TK1]subscript𝜇0𝑛superscript𝐏subscript𝜓𝜏italic-ϵ2subscript𝛄subscriptΘ𝛾superscript𝐏𝐙1superscript𝐔T𝛄02𝑤𝛄differential-d𝛄𝐏delimited-[]subscript𝜑0superscript𝐕superscript𝛂T𝐻subscript𝜑0𝐕superscript𝛂2𝐏delimited-[]subscript𝜑0superscriptsubscript𝐕1superscript𝛂Tsubscript𝐾1\mu_{0}=n\{\boldsymbol{\mathrm{P}}\psi_{\tau}(\epsilon)\}^{2}\int_{\boldsymbol% {\gamma}\in\Theta_{\gamma}}\{\boldsymbol{\mathrm{P}}{\boldsymbol{Z}}{% \boldsymbol{1}}({\boldsymbol{U}}^{\rm T}\boldsymbol{\gamma}\geq 0)\}^{2}w(% \boldsymbol{\gamma})d\boldsymbol{\gamma}+\boldsymbol{\mathrm{P}}[\varphi_{0}({% \boldsymbol{V}},\boldsymbol{\alpha}^{*})^{\rm T}H\varphi_{0}({\boldsymbol{V}},% \boldsymbol{\alpha}^{*})]+2\boldsymbol{\mathrm{P}}[\varphi_{0}({\boldsymbol{V}% }_{1},\boldsymbol{\alpha}^{*})^{\rm T}K_{1}]italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_n { bold_P italic_ψ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( italic_ϵ ) } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT bold_italic_γ ∈ roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_POSTSUBSCRIPT { bold_P bold_italic_Z bold_1 ( bold_italic_U start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ ≥ 0 ) } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_w ( bold_italic_γ ) italic_d bold_italic_γ + bold_P [ italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_V , bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_H italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_V , bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ] + 2 bold_P [ italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ], ν𝜈\nuitalic_ν is a random variable of the form ν=j=1λj(χ1j21)𝜈superscriptsubscript𝑗1subscript𝜆𝑗subscriptsuperscript𝜒21𝑗1\nu=\sum_{j=1}^{\infty}\lambda_{j}(\chi^{2}_{1j}-1)italic_ν = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 italic_j end_POSTSUBSCRIPT - 1 ), and χ112,χ122,subscriptsuperscript𝜒211subscriptsuperscript𝜒212\chi^{2}_{11},\chi^{2}_{12},\cdotsitalic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT , italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT , ⋯ are independent χ12subscriptsuperscript𝜒21\chi^{2}_{1}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT variables, i.e., ν𝜈\nuitalic_ν has the characteristic function

𝐏[eitν]=j=1(12itλj)12eitλj.𝐏delimited-[]superscript𝑒𝑖𝑡𝜈superscriptsubscriptproduct𝑗1superscript12𝑖𝑡subscript𝜆𝑗12superscript𝑒𝑖𝑡subscript𝜆𝑗\displaystyle\boldsymbol{\mathrm{P}}\left[e^{it\nu}\right]=\prod_{j=1}^{\infty% }(1-2it\lambda_{j})^{-\frac{1}{2}}e^{-it\lambda_{j}}.bold_P [ italic_e start_POSTSUPERSCRIPT italic_i italic_t italic_ν end_POSTSUPERSCRIPT ] = ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 1 - 2 italic_i italic_t italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_i italic_t italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT .

Here, i=1𝑖1i=\sqrt{-1}italic_i = square-root start_ARG - 1 end_ARG is the imaginary unit, and {λj}subscript𝜆𝑗\{\lambda_{j}\}{ italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } are the eigenvalues of the kernel h(𝐯1,𝐯2)subscript𝐯1subscript𝐯2h({\boldsymbol{v}}_{1},{\boldsymbol{v}}_{2})italic_h ( bold_italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) under f(𝐯,𝛂,𝟎,𝛄)𝑓𝐯superscript𝛂0superscript𝛄f({\boldsymbol{v}},\boldsymbol{\alpha}^{*},{\boldsymbol{0}},\boldsymbol{\gamma% }^{*})italic_f ( bold_italic_v , bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_0 , bold_italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), i.e., they are the solutions of λjgj(𝐯2)=0h(𝐯1,𝐯2)gj(𝐯1)f(𝐯1,𝛂,𝟎,𝛄)𝑑𝐯1subscript𝜆𝑗subscript𝑔𝑗subscript𝐯2superscriptsubscript0subscript𝐯1subscript𝐯2subscript𝑔𝑗subscript𝐯1𝑓subscript𝐯1superscript𝛂0superscript𝛄differential-dsubscript𝐯1\lambda_{j}g_{j}({\boldsymbol{v}}_{2})=\int_{0}^{\infty}h({\boldsymbol{v}}_{1}% ,{\boldsymbol{v}}_{2})g_{j}({\boldsymbol{v}}_{1})\\ f({\boldsymbol{v}}_{1},\boldsymbol{\alpha}^{*},{\boldsymbol{0}},\boldsymbol{% \gamma}^{*})d{\boldsymbol{v}}_{1}italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_h ( bold_italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_f ( bold_italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_0 , bold_italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_d bold_italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for nonzero gjsubscript𝑔𝑗g_{j}italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, where f(𝐯,𝛂,𝛃,𝛄)𝑓𝐯𝛂𝛃𝛄f({\boldsymbol{v}},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma})italic_f ( bold_italic_v , bold_italic_α , bold_italic_β , bold_italic_γ ) is the density of 𝐕𝐕{\boldsymbol{V}}bold_italic_V.

Investigated next is the power performance of the proposed test statistic under two types of alternative hypotheses under which subgroups exist. Considered first is the global alternative denoted by H1gsubscript𝐻1𝑔H_{1g}italic_H start_POSTSUBSCRIPT 1 italic_g end_POSTSUBSCRIPT: 𝜷=𝝃𝜷𝝃\boldsymbol{\beta}=\boldsymbol{\xi}bold_italic_β = bold_italic_ξ, where 𝝃Θ𝜷\{𝟎}𝝃\subscriptΘ𝜷0\boldsymbol{\xi}\in\Theta_{\boldsymbol{\beta}}\backslash\{{\boldsymbol{0}}\}bold_italic_ξ ∈ roman_Θ start_POSTSUBSCRIPT bold_italic_β end_POSTSUBSCRIPT \ { bold_0 } is fixed. Theorem 6 provides the asymptotic distribution of the test statistic Tnsubscript𝑇𝑛T_{n}italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT under the global alternative.

Theorem 6.

If Assumptions (B1)–(B3) hold, then under the global alternative H1gsubscript𝐻1𝑔H_{1g}italic_H start_POSTSUBSCRIPT 1 italic_g end_POSTSUBSCRIPT, we have

n(Tnμ1)𝒩(0,σξ2),superscript𝑛subscript𝑇𝑛subscript𝜇1𝒩0subscriptsuperscript𝜎2𝜉\displaystyle\sqrt{n}(T_{n}-\mu_{1})\stackrel{{\scriptstyle\cal{L}}}{{% \longrightarrow}}\mathcal{N}(0,\sigma^{2}_{\xi}),square-root start_ARG italic_n end_ARG ( italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG caligraphic_L end_ARG end_RELOP caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT ) ,

where μ1=𝐏[h(𝐕1,𝐕2)]subscript𝜇1𝐏delimited-[]subscript𝐕1subscript𝐕2\mu_{1}=\boldsymbol{\mathrm{P}}[h({\boldsymbol{V}}_{1},{\boldsymbol{V}}_{2})]italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = bold_P [ italic_h ( bold_italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ] and σξ2=4Var(𝐏[h(𝐕1,𝐕2)|𝐕1])subscriptsuperscript𝜎2𝜉4Var𝐏delimited-[]conditionalsubscript𝐕1subscript𝐕2subscript𝐕1\sigma^{2}_{\xi}=4\hbox{Var}(\boldsymbol{\mathrm{P}}[h({\boldsymbol{V}}_{1},{% \boldsymbol{V}}_{2})|{\boldsymbol{V}}_{1}])italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT = 4 Var ( bold_P [ italic_h ( bold_italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) | bold_italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ).

To derive the asymptotic distribution under the local alternative hypothesis, denoted by H1l:𝜷=n1/2𝝃:subscript𝐻1𝑙𝜷superscript𝑛12𝝃H_{1l}:\boldsymbol{\beta}=n^{-1/2}\boldsymbol{\xi}italic_H start_POSTSUBSCRIPT 1 italic_l end_POSTSUBSCRIPT : bold_italic_β = italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_italic_ξ, additional assumptions are required, where 𝝃Θβ𝝃subscriptΘ𝛽\boldsymbol{\xi}\in\Theta_{\beta}bold_italic_ξ ∈ roman_Θ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT is a fixed vector.

  1. (B4)

    There is a positive function b(𝒗,𝝃)𝑏𝒗𝝃b({\boldsymbol{v}},\boldsymbol{\xi})italic_b ( bold_italic_v , bold_italic_ξ ) of 𝒗𝒗{\boldsymbol{v}}bold_italic_v relying on 𝜶superscript𝜶\boldsymbol{\alpha}^{*}bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, 𝜸superscript𝜸\boldsymbol{\gamma}^{*}bold_italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT such that

    |𝝃Tf(𝒗;𝜶,rn𝝃,𝜸)𝜷f(𝒗;𝜶,𝟎,𝜸)|b(𝒗,𝝃),superscript𝝃T𝑓𝒗superscript𝜶subscript𝑟𝑛𝝃superscript𝜸𝜷𝑓𝒗superscript𝜶0superscript𝜸𝑏𝒗𝝃\displaystyle\left|\boldsymbol{\xi}^{\rm T}\frac{\partial f({\boldsymbol{v}};% \boldsymbol{\alpha}^{*},r_{n}\boldsymbol{\xi},\boldsymbol{\gamma}^{*})\partial% \boldsymbol{\beta}}{f({\boldsymbol{v}};\boldsymbol{\alpha}^{*},{\boldsymbol{0}% },\boldsymbol{\gamma}^{*})}\right|\leq b({\boldsymbol{v}},\boldsymbol{\xi}),| bold_italic_ξ start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT divide start_ARG ∂ italic_f ( bold_italic_v ; bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT bold_italic_ξ , bold_italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∂ bold_italic_β end_ARG start_ARG italic_f ( bold_italic_v ; bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_0 , bold_italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG | ≤ italic_b ( bold_italic_v , bold_italic_ξ ) ,

    and 𝐏[b(𝑽,𝝃)2]𝐏delimited-[]𝑏superscript𝑽𝝃2\boldsymbol{\mathrm{P}}[b({\boldsymbol{V}},\boldsymbol{\xi})^{2}]bold_P [ italic_b ( bold_italic_V , bold_italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] and 𝐏[ϕk(𝑽)2b(𝑽,𝝃)]𝐏delimited-[]subscriptitalic-ϕ𝑘superscript𝑽2𝑏𝑽𝝃\boldsymbol{\mathrm{P}}[\phi_{k}({\boldsymbol{V}})^{2}b({\boldsymbol{V}},% \boldsymbol{\xi})]bold_P [ italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_V ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_b ( bold_italic_V , bold_italic_ξ ) ] for all k=1,,𝑘1k=1,\cdots,italic_k = 1 , ⋯ , are bounded by C𝝃subscript𝐶𝝃C_{\boldsymbol{\xi}}italic_C start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT, where 𝝃Θβ𝝃subscriptΘ𝛽\boldsymbol{\xi}\in\Theta_{\beta}bold_italic_ξ ∈ roman_Θ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT, rn=o(1)subscript𝑟𝑛𝑜1r_{n}=o(1)italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_o ( 1 ), C𝝃>0subscript𝐶𝝃0C_{\boldsymbol{\xi}}>0italic_C start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT > 0 is a constant relying on 𝝃𝝃\boldsymbol{\xi}bold_italic_ξ, ϕk()subscriptitalic-ϕ𝑘\phi_{k}(\cdot)italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) is as defined in Theorem 5, and 𝑽𝑽{\boldsymbol{V}}bold_italic_V is generated from the null distribution with density f(𝒗;𝜶,𝟎,𝜸)𝑓𝒗superscript𝜶0superscript𝜸f({\boldsymbol{v}};\boldsymbol{\alpha}^{*},{\boldsymbol{0}},\boldsymbol{\gamma% }^{*})italic_f ( bold_italic_v ; bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_0 , bold_italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ).

Theorem 7.

If Assumptions (B1)–(B4) hold, then under the local alternative hypothesis H1lsubscript𝐻1𝑙H_{1l}italic_H start_POSTSUBSCRIPT 1 italic_l end_POSTSUBSCRIPT, i.e., 𝛃=n1/2𝛏𝛃superscript𝑛12𝛏\boldsymbol{\beta}=n^{-1/2}\boldsymbol{\xi}bold_italic_β = italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_italic_ξ with a fixed vector 𝛏Θβ𝛏subscriptΘ𝛽\boldsymbol{\xi}\in\Theta_{\beta}bold_italic_ξ ∈ roman_Θ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT, we have

nTnμ0ν,superscript𝑛subscript𝑇𝑛subscript𝜇0𝜈\displaystyle nT_{n}-\mu_{0}\stackrel{{\scriptstyle\cal{L}}}{{\longrightarrow}% }\nu,italic_n italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG caligraphic_L end_ARG end_RELOP italic_ν ,

where μ0subscript𝜇0\mu_{0}italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is as defined in Theorem 5, ν𝜈\nuitalic_ν is a random variable of the form ν=j=1λj(χ1j2(μaj)1)𝜈superscriptsubscript𝑗1subscript𝜆𝑗subscriptsuperscript𝜒21𝑗subscript𝜇𝑎𝑗1\nu=\sum_{j=1}^{\infty}\lambda_{j}(\chi^{2}_{1j}(\mu_{aj})-1)italic_ν = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 italic_j end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_a italic_j end_POSTSUBSCRIPT ) - 1 ), and χ112(μa1),χ122(μa2),subscriptsuperscript𝜒211subscript𝜇𝑎1subscriptsuperscript𝜒212subscript𝜇𝑎2\chi^{2}_{11}(\mu_{a1}),\chi^{2}_{12}(\mu_{a2}),\cdotsitalic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_a 1 end_POSTSUBSCRIPT ) , italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_a 2 end_POSTSUBSCRIPT ) , ⋯ are independent noncentral χ12subscriptsuperscript𝜒21\chi^{2}_{1}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT variables, i.e., ν𝜈\nuitalic_ν has the characteristic function

𝐏[eitν]=j=1(12itλj)12exp(itλj+itλjμaj12itλj).𝐏delimited-[]superscript𝑒𝑖𝑡𝜈superscriptsubscriptproduct𝑗1superscript12𝑖𝑡subscript𝜆𝑗12𝑖𝑡subscript𝜆𝑗𝑖𝑡subscript𝜆𝑗subscript𝜇𝑎𝑗12𝑖𝑡subscript𝜆𝑗\displaystyle\boldsymbol{\mathrm{P}}\left[e^{it\nu}\right]=\prod_{j=1}^{\infty% }(1-2it\lambda_{j})^{-\frac{1}{2}}\exp\left(-it\lambda_{j}+\frac{it\lambda_{j}% \mu_{aj}}{1-2it\lambda_{j}}\right).bold_P [ italic_e start_POSTSUPERSCRIPT italic_i italic_t italic_ν end_POSTSUPERSCRIPT ] = ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( 1 - 2 italic_i italic_t italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT roman_exp ( - italic_i italic_t italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + divide start_ARG italic_i italic_t italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_a italic_j end_POSTSUBSCRIPT end_ARG start_ARG 1 - 2 italic_i italic_t italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ) .

Here, {λj}subscript𝜆𝑗\{\lambda_{j}\}{ italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } are the eigenvalues of the kernel h(𝐯1,𝐯2)subscript𝐯1subscript𝐯2h({\boldsymbol{v}}_{1},{\boldsymbol{v}}_{2})italic_h ( bold_italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) defined in (25) under f(𝐯,𝛂,𝟎,𝛄)𝑓𝐯superscript𝛂0superscript𝛄f({\boldsymbol{v}},\boldsymbol{\alpha}^{*},{\boldsymbol{0}},\boldsymbol{\gamma% }^{*})italic_f ( bold_italic_v , bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_0 , bold_italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), i.e., they are the solutions of λjgj(𝐯2)=0h(𝐯1,𝐯2)gj(𝐯1)f(𝐯1,𝛂,𝟎,𝛄)𝑑𝐯1subscript𝜆𝑗subscript𝑔𝑗subscript𝐯2superscriptsubscript0subscript𝐯1subscript𝐯2subscript𝑔𝑗subscript𝐯1𝑓subscript𝐯1superscript𝛂0superscript𝛄differential-dsubscript𝐯1\lambda_{j}g_{j}({\boldsymbol{v}}_{2})=\int_{0}^{\infty}h({\boldsymbol{v}}_{1}% ,{\boldsymbol{v}}_{2})g_{j}({\boldsymbol{v}}_{1})f({\boldsymbol{v}}_{1},% \boldsymbol{\alpha}^{*},{\boldsymbol{0}},\boldsymbol{\gamma}^{*})d{\boldsymbol% {v}}_{1}italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_h ( bold_italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_f ( bold_italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_0 , bold_italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_d bold_italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for nonzero gjsubscript𝑔𝑗g_{j}italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and each noncentrality parameter of χ1j2(μaj)subscriptsuperscript𝜒21𝑗subscript𝜇𝑎𝑗\chi^{2}_{1j}(\mu_{aj})italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 italic_j end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_a italic_j end_POSTSUBSCRIPT ) is

μaj=𝐏[ϕj(𝑽0)𝝃Tlog(f(𝑽0,𝜶,𝟎,𝜸))/𝜷],j=1,2,,formulae-sequencesubscript𝜇𝑎𝑗𝐏delimited-[]subscriptitalic-ϕ𝑗subscript𝑽0superscript𝝃T𝑓subscript𝑽0superscript𝜶0superscript𝜸𝜷𝑗12\displaystyle\mu_{aj}=\boldsymbol{\mathrm{P}}\left[\phi_{j}({\boldsymbol{V}}_{% 0})\boldsymbol{\xi}^{\rm T}\partial\log(f({\boldsymbol{V}}_{0},\boldsymbol{% \alpha}^{*},{\boldsymbol{0}},\boldsymbol{\gamma}^{*}))/\partial\boldsymbol{% \beta}\right],\quad j=1,2,\cdots,italic_μ start_POSTSUBSCRIPT italic_a italic_j end_POSTSUBSCRIPT = bold_P [ italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) bold_italic_ξ start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ∂ roman_log ( italic_f ( bold_italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_0 , bold_italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) / ∂ bold_italic_β ] , italic_j = 1 , 2 , ⋯ ,

where {ϕj(𝐯)}subscriptitalic-ϕ𝑗𝐯\{\phi_{j}({\boldsymbol{v}})\}{ italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_v ) } denotes orthonormal eigenfunctions corresponding to the eigenvalues {λj}subscript𝜆𝑗\{\lambda_{j}\}{ italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }, and 𝐕0subscript𝐕0{\boldsymbol{V}}_{0}bold_italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is generated from the null distribution f(𝐯,𝛂,𝟎,𝛄)𝑓𝐯superscript𝛂0superscript𝛄f({\boldsymbol{v}},\boldsymbol{\alpha}^{*},{\boldsymbol{0}},\boldsymbol{\gamma% }^{*})italic_f ( bold_italic_v , bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_0 , bold_italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ).

Denote by Fνsubscript𝐹𝜈F_{\nu}italic_F start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT the cumulative distribution function of ν𝜈\nuitalic_ν. It follows from Theorem 7 that the power function of nTnμ0𝑛subscript𝑇𝑛subscript𝜇0nT_{n}-\mu_{0}italic_n italic_T start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT can be approximated theoretically by Fνsubscript𝐹𝜈F_{\nu}italic_F start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT, and the proof of Theorem 7 shows that 0<𝐏h(𝑽1,𝑽2)=j=1λjμaj+o(1)0𝐏subscript𝑽1subscript𝑽2superscriptsubscript𝑗1subscript𝜆𝑗subscript𝜇𝑎𝑗𝑜10<\boldsymbol{\mathrm{P}}h({\boldsymbol{V}}_{1},{\boldsymbol{V}}_{2})=\sum_{j=% 1}^{\infty}\lambda_{j}\mu_{aj}+o(1)0 < bold_P italic_h ( bold_italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_a italic_j end_POSTSUBSCRIPT + italic_o ( 1 ) under the local alternative hypothesis. The additional mean μ1subscript𝜇1\mu_{1}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT under H1gsubscript𝐻1𝑔H_{1g}italic_H start_POSTSUBSCRIPT 1 italic_g end_POSTSUBSCRIPT (or j=1λjμajsuperscriptsubscript𝑗1subscript𝜆𝑗subscript𝜇𝑎𝑗\sum_{j=1}^{\infty}\lambda_{j}\mu_{aj}∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_a italic_j end_POSTSUBSCRIPT under H1lsubscript𝐻1𝑙H_{1l}italic_H start_POSTSUBSCRIPT 1 italic_l end_POSTSUBSCRIPT) can be viewed as a measure of the difference between H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and H1gsubscript𝐻1𝑔H_{1g}italic_H start_POSTSUBSCRIPT 1 italic_g end_POSTSUBSCRIPT (or H1lsubscript𝐻1𝑙H_{1l}italic_H start_POSTSUBSCRIPT 1 italic_l end_POSTSUBSCRIPT). Fνsubscript𝐹𝜈F_{\nu}italic_F start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT is difficult to use in practice because it is not common. In Appendix B of the Supplementary Material, a novel bootstrap method is recommended for calculating the critical value or p-value, of which the asymptotic consistency is established in Theorem B1 in the Supplementary Material.

4 Simulation studies

Consider the change-plane model (1)

yi=𝑿iT𝜶+𝑿iT𝜷𝟏(𝑼iT𝜸0)+ϵi,subscript𝑦𝑖superscriptsubscript𝑿𝑖T𝜶superscriptsubscript𝑿𝑖T𝜷1superscriptsubscript𝑼𝑖T𝜸0subscriptitalic-ϵ𝑖\displaystyle y_{i}={\boldsymbol{X}}_{i}^{\rm T}\boldsymbol{\alpha}+{% \boldsymbol{X}}_{i}^{\rm T}\boldsymbol{\beta}{\boldsymbol{1}}({\boldsymbol{U}}% _{i}^{\rm T}\boldsymbol{\gamma}\geq 0)+\epsilon_{i},italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α + bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_β bold_1 ( bold_italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ ≥ 0 ) + italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

where X1i=1subscript𝑋1𝑖1X_{1i}=1italic_X start_POSTSUBSCRIPT 1 italic_i end_POSTSUBSCRIPT = 1 and U1i=1subscript𝑈1𝑖1U_{1i}=1italic_U start_POSTSUBSCRIPT 1 italic_i end_POSTSUBSCRIPT = 1, and (X2i,,Xpi)Tsuperscriptsubscript𝑋2𝑖subscript𝑋𝑝𝑖T(X_{2i},\cdots,X_{pi})^{\rm T}( italic_X start_POSTSUBSCRIPT 2 italic_i end_POSTSUBSCRIPT , ⋯ , italic_X start_POSTSUBSCRIPT italic_p italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT and (U2i,,Uri)Tsuperscriptsubscript𝑈2𝑖subscript𝑈𝑟𝑖T(U_{2i},\cdots,U_{ri})^{\rm T}( italic_U start_POSTSUBSCRIPT 2 italic_i end_POSTSUBSCRIPT , ⋯ , italic_U start_POSTSUBSCRIPT italic_r italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT are generated independently from multivariate normal distributions N(𝟎p1,2𝑰p1)𝑁subscript0𝑝12subscript𝑰𝑝1N({\boldsymbol{0}}_{p-1},\sqrt{2}{\boldsymbol{I}}_{p-1})italic_N ( bold_0 start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT , square-root start_ARG 2 end_ARG bold_italic_I start_POSTSUBSCRIPT italic_p - 1 end_POSTSUBSCRIPT ) and N(𝟎r1,2𝑰r1)𝑁subscript0𝑟12subscript𝑰𝑟1N({\boldsymbol{0}}_{r-1},\sqrt{2}{\boldsymbol{I}}_{r-1})italic_N ( bold_0 start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT , square-root start_ARG 2 end_ARG bold_italic_I start_POSTSUBSCRIPT italic_r - 1 end_POSTSUBSCRIPT ), respectively. The error ϵisubscriptitalic-ϵ𝑖\epsilon_{i}italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is generated from following eight distributions: (i) N(0,2)𝑁02N(0,\sqrt{2})italic_N ( 0 , square-root start_ARG 2 end_ARG ); (ii) t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT; (iii) Pareto distribution Par(2,1)𝑃𝑎𝑟21Par(2,1)italic_P italic_a italic_r ( 2 , 1 ) with shape parameter 2 and scale parameter 1; (iv) Weibull distribution Weib(0.75,0.75)𝑊𝑒𝑖𝑏0.750.75Weib(0.75,0.75)italic_W italic_e italic_i italic_b ( 0.75 , 0.75 ) with shape parameter 0.75 and scale parameter 0.75. For the limit of space, we put other four error’s distributions in Appendix E of the Supplementary Material: (v) Gaussian mixture; (vi) mixture of t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and Weibull Weib(0.75,0.75)𝑊𝑒𝑖𝑏0.750.75Weib(0.75,0.75)italic_W italic_e italic_i italic_b ( 0.75 , 0.75 ); (vii) mixture of Pareto Par(2,1)𝑃𝑎𝑟21Par(2,1)italic_P italic_a italic_r ( 2 , 1 ) and N(0,2)𝑁02N(0,\sqrt{2})italic_N ( 0 , square-root start_ARG 2 end_ARG ); and (viii) mixture of lognormal exp(N(0,1))𝑁01\exp(N(0,1))roman_exp ( italic_N ( 0 , 1 ) ) and N(0,2)𝑁02N(0,\sqrt{2})italic_N ( 0 , square-root start_ARG 2 end_ARG ). (γ2,,γq)T=(1,2,,2)Tsuperscriptsubscript𝛾2subscript𝛾𝑞Tsuperscript122T(\gamma_{2},\cdots,\gamma_{q})^{\rm T}=(1,2,\cdots,2)^{\rm T}( italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_γ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT = ( 1 , 2 , ⋯ , 2 ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT is set under H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and γ1subscript𝛾1\gamma_{1}italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is chosen as the negative of 35% percentile of U2γ2++Uqγrsubscript𝑈2subscript𝛾2subscript𝑈𝑞subscript𝛾𝑟U_{2}\gamma_{2}+\cdots+U_{q}\gamma_{r}italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ⋯ + italic_U start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, which means that 𝑼T𝜸superscript𝑼T𝜸{\boldsymbol{U}}^{\rm T}\boldsymbol{\gamma}bold_italic_U start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ divides the population into two groups with 65% and 35% observations, respectively. To save space, we only present simulation results of robust subgroup-classifier learning, and we put performance of robust subgroup testing in Appendix E2 of the Supplementary Material.

The settings used herein are 𝜶=(5,0.5,,0.5)pand𝜷=(0.5,,0.5)q,formulae-sequence𝜶50.50.5superscript𝑝and𝜷0.50.5superscript𝑞\boldsymbol{\alpha}=(5,0.5,\cdots,0.5)\in\mathbb{R}^{p}\quad\mbox{and}\quad% \boldsymbol{\beta}=(0.5,\cdots,0.5)\in\mathbb{R}^{q},bold_italic_α = ( 5 , 0.5 , ⋯ , 0.5 ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT and bold_italic_β = ( 0.5 , ⋯ , 0.5 ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT , with the sigmoid function K(u)={1+exp(u)}1𝐾𝑢superscript1𝑢1K(u)=\{1+\exp(-u)\}^{-1}italic_K ( italic_u ) = { 1 + roman_exp ( - italic_u ) } start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT chosen as the smooth function. Finite-sample studies were performed for different smooth functions K(u)𝐾𝑢K(u)italic_K ( italic_u ) (such as K(u)=Φ(u)𝐾𝑢Φ𝑢K(u)=\Phi(u)italic_K ( italic_u ) = roman_Φ ( italic_u ) and K(u)=Φ(u)+uϕ(u)𝐾𝑢Φ𝑢𝑢italic-ϕ𝑢K(u)=\Phi(u)+u\phi(u)italic_K ( italic_u ) = roman_Φ ( italic_u ) + italic_u italic_ϕ ( italic_u )), but the results were similar and so are omitted here. For comparison, three strategies are considered: (i) the proposed adaptive procedure (AHu), (ii) the classic Huber method (Hub), with the robustness parameter τ𝜏\tauitalic_τ selected by τ0median{𝒚median(𝒚)}/Φ1(0.75)subscript𝜏0median𝒚median𝒚superscriptΦ10.75\tau_{0}\mbox{median}\{{\boldsymbol{y}}-\mbox{median}({\boldsymbol{y}})\}/\Phi% ^{-1}(0.75)italic_τ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT median { bold_italic_y - median ( bold_italic_y ) } / roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( 0.75 ) with τ0=1.345subscript𝜏01.345\tau_{0}=1.345italic_τ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.345 and 𝒚=(y1,,yn)T𝒚superscriptsubscript𝑦1subscript𝑦𝑛T{\boldsymbol{y}}=(y_{1},\cdots,y_{n})^{\rm T}bold_italic_y = ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT, as suggested in Wang et al., (2021)), and (iii) the estimation method based on the ordinary quadratic loss considered in Li et al., (2021) (OLS). Here, the subscript n𝑛nitalic_n in AHUn, Hubn and OLSn stands for the sample size with n=200,400,600𝑛200400600n=200,400,600italic_n = 200 , 400 , 600.

Figure 3 shows boxplots of the L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-norm of the estimation errors of the parameter 𝜽=(𝜶T,𝜷T)T𝜽superscriptsuperscript𝜶Tsuperscript𝜷TT\boldsymbol{\theta}=(\boldsymbol{\alpha}^{\rm T},\boldsymbol{\beta}^{\rm T})^{% \rm T}bold_italic_θ = ( bold_italic_α start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , bold_italic_β start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT with different errors and with (p,q,r)=(3,3,3)𝑝𝑞𝑟333(p,q,r)=(3,3,3)( italic_p , italic_q , italic_r ) = ( 3 , 3 , 3 ), where the L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-norm is defined as L2=𝜽^τ,h𝜽2subscript𝐿2superscriptnormsubscript^𝜽𝜏superscript𝜽2L_{2}=\|\hat{\boldsymbol{\theta}}_{\tau,h}-\boldsymbol{\theta}^{*}\|^{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∥ over^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT - bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, where 𝜽^τ,hsubscript^𝜽𝜏\hat{\boldsymbol{\theta}}_{\tau,h}over^ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT is the robust estimator of the true parameter 𝜽superscript𝜽\boldsymbol{\theta}^{*}bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Figure 4 shows boxplots of the accuracy (ACC) of subgroup identification with different errors and with (p,q,r)=(3,3,3)𝑝𝑞𝑟333(p,q,r)=(3,3,3)( italic_p , italic_q , italic_r ) = ( 3 , 3 , 3 ). Here, ACC is defined as

ACC=1n1i=1n|𝟏(U1i+𝑼2iT𝜼^τ,h0)𝟏(U1i+𝑼2iT𝜼0)|ACC1superscript𝑛1superscriptsubscript𝑖1𝑛1subscript𝑈1𝑖superscriptsubscript𝑼2𝑖Tsubscript^𝜼𝜏01subscript𝑈1𝑖superscriptsubscript𝑼2𝑖Tsuperscript𝜼0\mbox{ACC}=1-n^{-1}\sum_{i=1}^{n}\left|{\boldsymbol{1}}(U_{1i}+{\boldsymbol{U}% }_{2i}^{\rm T}\hat{\boldsymbol{\eta}}_{\tau,h}\geq 0)-{\boldsymbol{1}}(U_{1i}+% {\boldsymbol{U}}_{2i}^{\rm T}\boldsymbol{\eta}^{*}\geq 0)\right|ACC = 1 - italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | bold_1 ( italic_U start_POSTSUBSCRIPT 1 italic_i end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT ≥ 0 ) - bold_1 ( italic_U start_POSTSUBSCRIPT 1 italic_i end_POSTSUBSCRIPT + bold_italic_U start_POSTSUBSCRIPT 2 italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≥ 0 ) |

with the robust estimator 𝜼^τ,hsubscript^𝜼𝜏\hat{\boldsymbol{\eta}}_{\tau,h}over^ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_τ , italic_h end_POSTSUBSCRIPT of the true parameter 𝜼superscript𝜼\boldsymbol{\eta}^{*}bold_italic_η start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

Figures 3 and 4 show that the L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-norms of the estimation errors decrease and the accuracies of subgroup identification increase as the sample size n𝑛nitalic_n grows, which verifies the established theoretical results. Compared with the ordinary quadratic loss, the proposed estimation procedure achieves a uniformly lower median of the L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-norm of the estimation errors and higher accuracy of subgroup identification for all heavy-tailed distributions except for the t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT distribution. The three methods are comparable for the symmetric distributions, such as the Gaussian and t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT distributions. To limit the number of pages, the boxplots of the L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-norm of the estimation errors of the parameter 𝜽𝜽\boldsymbol{\theta}bold_italic_θ and accuracy of subgroup identification for (p,q,r)=(3,3,11),(6,6,3)𝑝𝑞𝑟3311663(p,q,r)=(3,3,11),(6,6,3)( italic_p , italic_q , italic_r ) = ( 3 , 3 , 11 ) , ( 6 , 6 , 3 ), and (6,6,11)6611(6,6,11)( 6 , 6 , 11 ) are shown in Figures E2–E4 and E6–E8 in Appendix  E.1 of the Supplementary Material.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-norm of estimation errors of parameter 𝛉=(𝛂T,𝛃T)T𝛉superscriptsuperscript𝛂Tsuperscript𝛃TT\boldsymbol{\theta}=(\boldsymbol{\alpha}^{\rm T},\boldsymbol{\beta}^{\rm T})^{% \rm T}bold_italic_θ = ( bold_italic_α start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , bold_italic_β start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT with four different error distributions. Here, (p,q,r)=(3,3,3)𝑝𝑞𝑟333(p,q,r)=(3,3,3)( italic_p , italic_q , italic_r ) = ( 3 , 3 , 3 ).
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: Accuracy of subgroup identification with four different error distributions. Here, (p,q,r)=(3,3,3)𝑝𝑞𝑟333(p,q,r)=(3,3,3)( italic_p , italic_q , italic_r ) = ( 3 , 3 , 3 ).

5 Case study

The proposed procedure is applied to cancer data for skin cutaneous melanoma (SKCM) downloaded from the Cancer Genome Atlas https://tcga-data.nci.nih.gov/. SKCM is one of the most aggressive types of cancer, and its incidence, mortality, and disease burden are increasing annually (Siegel et al., 2021; Xiong et al., 2019). It is believed that CREG1, TMEM201, and CCL8 are skin-cancer susceptibility genes (Hu et al., 2020), and the goal is to identify the subgroup of Breslow’s thickness based on mutations of those sensitive genes.

Consideration is given to the three environmental factors of (i) gender, (ii) age at diagnosis, and (iii) Clark level at diagnosis (CLAD), all of which have been studied extensively in the literature. Studies such as those by Dickson and Gershenwald, (2011) have found that these three environmental factors have positive effects. After removing missing values, there are 253 subjects, and the SKCM data are modeled by applying the change-plane model (1)

Yi=𝑿iT𝜶+𝑿iT𝜷𝟏(𝑼iT𝜸0)+ϵi,i=1,,253,formulae-sequencesubscript𝑌𝑖superscriptsubscript𝑿𝑖T𝜶superscriptsubscript𝑿𝑖T𝜷1superscriptsubscript𝑼𝑖T𝜸0subscriptitalic-ϵ𝑖𝑖1253\displaystyle Y_{i}={\boldsymbol{X}}_{i}^{\rm T}\boldsymbol{\alpha}+{% \boldsymbol{X}}_{i}^{\rm T}\boldsymbol{\beta}{\boldsymbol{1}}({\boldsymbol{U}}% _{i}^{\rm T}\boldsymbol{\gamma}\geq 0)+\epsilon_{i},\quad i=1,\cdots,253,italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_α + bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_β bold_1 ( bold_italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT bold_italic_γ ≥ 0 ) + italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i = 1 , ⋯ , 253 ,

where 𝑿i=(1,AGEi,GENDERi,CLADi)Tsubscript𝑿𝑖superscript1subscriptAGE𝑖subscriptGENDER𝑖subscriptCLAD𝑖T{\boldsymbol{X}}_{i}=(1,\mbox{AGE}_{i},\mbox{GENDER}_{i},\mbox{CLAD}_{i})^{\rm T}bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( 1 , AGE start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , GENDER start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , CLAD start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT, and 𝑼i=(1,CREG1i,TMEM201i,CCL8i)Tsubscript𝑼𝑖superscript1subscriptCREG1𝑖subscriptTMEM201𝑖subscriptCCL8𝑖T{\boldsymbol{U}}_{i}=(1,\mbox{CREG1}_{i},\mbox{TMEM201}_{i},\mbox{CCL8}_{i})^{% \rm T}bold_italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( 1 , CREG1 start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , TMEM201 start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , CCL8 start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT. The three important genes CREG1, TMEM201, and CCL8 are high correlated with breast cancer (Hu et al., 2020).

Based on B=5000𝐵5000B=5000italic_B = 5000 bootstrap samples and the Huber loss, the p-value with the proposed RWAST is 0.0020.0020.0020.002, which implies that based on the proposed method, there is a strong evidence for rejecting the null hypothesis. However, based on the ordinary quadratic loss, the p-values with WAST and SST are 0.6454 and 0.0286, respectively, where B=5000𝐵5000B=5000italic_B = 5000 and K=2000𝐾2000K=2000italic_K = 2000. Therefore, there is no evidence for rejecting the null hypothesis for WAST and SST based on the ordinary quadratic loss.

The parametric estimators are listed in Table 1, from which the indicator function 𝟏(𝑼iT𝜸^0)1superscriptsubscript𝑼𝑖T^𝜸0{\boldsymbol{1}}({\boldsymbol{U}}_{i}^{\rm T}\hat{\boldsymbol{\gamma}}\geq 0)bold_1 ( bold_italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_γ end_ARG ≥ 0 ) partitions the population into two subgroups with 121 and 132 subjects based on the Huber loss, and two subgroups with 91 and 162 subjects based on the ordinary quadratic loss. Therefore, there would appear to be a subgroup with a higher chance of skin cancer based on mutations of these three genes CREG1, TMEM201, and CCL8.

Table 1: Estimates of parameter if null hypothesis has been rejected.
Change plane Parameter Intercept AGE GENDER CLAD Intercept CREG1 TMEM201 CCL8
OLS 𝜶^^𝜶\hat{\boldsymbol{\alpha}}over^ start_ARG bold_italic_α end_ARG 0.0370.037-0.037- 0.037 0.0380.038-0.038- 0.038 0.002 0.050
𝜷^^𝜷\hat{\boldsymbol{\beta}}over^ start_ARG bold_italic_β end_ARG 0.022 0.024 0.008 0.0070.007-0.007- 0.007
𝜸^^𝜸\hat{\boldsymbol{\gamma}}over^ start_ARG bold_italic_γ end_ARG 1.000 0.179 0.6560.656-0.656- 0.656 4.970
Huber 𝜶^^𝜶\hat{\boldsymbol{\alpha}}over^ start_ARG bold_italic_α end_ARG 3.525 0.029 0.079 0.064
𝜷^^𝜷\hat{\boldsymbol{\beta}}over^ start_ARG bold_italic_β end_ARG 0.012 0.014 0.0260.026-0.026- 0.026 0.0450.045-0.045- 0.045
𝜸^^𝜸\hat{\boldsymbol{\gamma}}over^ start_ARG bold_italic_γ end_ARG 1.000 0.384 2.0932.093-2.093- 2.093 4.860

6 Conclusion

Considered herein were subgroup classification and subgroup tests for change-plane models with heavy-tailed errors, which offer help in (i) narrowing down populations for modeling and (ii) providing recommendations for optimal individualized treatments in practice. A novel subgroup classifier was proposed by smoothing the indicator function and minimizing a smoothed Huber loss. Nonasymptotic properties were derived and nonasymptotic Bahadur representation was provided, in which the estimators of the parameters 𝜶𝜶\boldsymbol{\alpha}bold_italic_α and 𝜷𝜷\boldsymbol{\beta}bold_italic_β achieve sub-Gaussian tails.

The novel test statistic RWAST was introduced to test whether subgroups of individuals exist. In a comparison with WAST Liu et al., (2024) and SST Andrews and Ploberger, (1994, 1995); Song et al., (2009); Fan et al., 2017a based on the ordinary change-plane regression with non-heavy-tailed errors, the proposed test statistic improved the power significantly because it is robust with the assistance of the Huber loss and avoids the drawbacks of taking the supremum over the parametric space ΘγsubscriptΘ𝛾\Theta_{\gamma}roman_Θ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT when its dimension is large. Asymptotic distributions were derived under the null and alternative hypotheses. As studied by Liu et al., (2024) and Huang et al., (2020), it is easy to extend the proposed robust estimation procedure and RWAST to change-plane regression with multiple change planes with heavy-tailed errors.

In the age of big data, it is interesting to consider high-dimensional change-plane regression models and to provide high-dimensional robust estimation procedures and test statistics. As noted by Liu et al., (2024), the proposed RWAST can be applied to change-plane regression with high-dimensional grouping parameter 𝜸𝜸\boldsymbol{\gamma}bold_italic_γ. However, it remains open to provide estimation procedures for change-plane regression with high-dimensional 𝜸𝜸\boldsymbol{\gamma}bold_italic_γ. A possible strategy is to penalize the loss function with 𝜽1subscriptnorm𝜽1\|\boldsymbol{\theta}\|_{1}∥ bold_italic_θ ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT under the assumption of sparsity.

Supplementary Material

Appendix A includes Algorithm 1 for implementation in Section 2.3. Appendix B provides the computation of critical value in Section 3. Appendix C provides the proofs of Theorems 14 and the related Lemmas. Appendix D provides the proofs of Theorems 57. Appendix E provides additional simulation studies to illustrate the performance of the proposed estimation and test prcedures.

Acknowledgments

This work was supported in part by the NSFC (12271329, 72331005).

References

  • Andrews and Ploberger, (1994) Andrews, D. W. K. and Ploberger, W. (1994). Optimal tests when a nuisance parameter is present only under the alternative. Econometrica, 62(6):1383–1414.
  • Andrews and Ploberger, (1995) Andrews, D. W. K. and Ploberger, W. (1995). Admissibility of the Likelihood Ratio Test When a Nuisance Parameter is Present Only Under the Alternative. The Annals of Statistics, 23(5):1609 – 1629.
  • Chen and Zhou, (2020) Chen, X. and Zhou, W.-X. (2020). Robust inference via multiplier bootstrap. The Annals of Statistics, 48(3):1665 – 1691.
  • Davies, (1977) Davies, R. B. (1977). Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika, 64(2):247–254.
  • Devroye et al., (2016) Devroye, L., Lerasle, M., Lugosi, G., and Oliveira, R. I. (2016). Sub-Gaussian mean estimators. The Annals of Statistics, 44(6):2695 – 2725.
  • Dickson and Gershenwald, (2011) Dickson, P. V. and Gershenwald, J. E. (2011). Staging and prognosis of cutaneous melanoma. Surgical oncology clinics of North America, 20(1):1–17.
  • (7) Fan, A., Rui, S., and Lu, W. (2017a). Change-plane analysis for subgroup detection and sample size calculation. Journal of the American Statistical Association, 112:769–778.
  • (8) Fan, J., Li, Q., and Wang, Y. (2017b). Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(1):247–265.
  • Foster et al., (2011) Foster, J. C., Taylor, J., and Ruberg, S. J. (2011). Subgroup identification from randomized clinical trial data. Statistics in Medicine, 30(24):2867–2880.
  • Hahn et al., (1990) Hahn, M. G., Kuelbs, J., and Weiner, D. C. (1990). The Asymptotic Joint Distribution of Self-Normalized Censored Sums and Sums of Squares. The Annals of Probability, 18(3):1284 – 1341.
  • Han et al., (2022) Han, D., Huang, J., Lin, Y., and Shen, G. (2022). Robust post-selection inference of high-dimensional mean regression with heavy-tailed asymmetric or heteroskedastic errors. Journal of Econometrics, 230(2):416 – 431.
  • Horowitz, (1993) Horowitz, J. L. (1993). Optimal rates of convergence of parameter estimators in the binary response model with weak distributional assumptions. Econometric Theory, 9(1):1–18.
  • Hu et al., (2020) Hu, B., Wei, Q., Zhou, C., Ju, M., Wang, L., Chen, L., Li, Z., Wei, M., He, M., and Zhao, L. (2020). Analysis of immune subtypes based on immunogenomic profiling identifies prognostic signature for cutaneous melanoma. International Immunopharmacology, 89:107162.
  • Huang et al., (2020) Huang, Y., Cho, J., and Fong, Y. (2020). Threshold-based subgroup testing in logistic regression models in two-phase sampling designs. Applied Statistics, 70(2).
  • Huber, (1964) Huber, P. J. (1964). Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35(4):73–101.
  • Huber, (1973) Huber, P. J. (1973). Robust Regression: Asymptotics, Conjectures and Monte Carlo. The Annals of Statistics, 1(5):799 – 821.
  • Kang et al., (2024) Kang, C., Cho, H., Song, R., Banerjee, M., Laber, E. B., and Kosorok, M. R. (2024). Inference for change-plane regression. arXiv.org/abs/2206.06140.
  • Kang et al., (2017) Kang, S., Lu, W., and Song, R. (2017). Subgroup detection and sample size calculation with proportional hazards regression for survival data. Statistics in Medicine, 36:4646 – 4659.
  • Lee et al., (2011) Lee, S., Seo, M. H., and Shin, Y. (2011). Testing for threshold effects in regression models. Journal of the American Statistical Association, 106(493):220–231.
  • Lepskii, (1992) Lepskii, O. V. (1992). Asymptotically minimax adaptive estimation. i: Upper bounds. optimally adaptive estimates. Theory of Probability & Its Applications, 36(4):682–697.
  • Li et al., (2021) Li, J., Li, Y., Jin, B., and Kosorok, M. R. (2021). Multithreshold change plane model: Estimation theory and applications in subgroup identification. Statistics in Medicine, 40(15):3440–3459.
  • Liu et al., (2024) Liu, X., Huang, J., Zhou, Y., Zhang, F., and Panpan, R. (2024). Efficient subgroup testing in change-planemodels. http://arxiv.org/abs/2408.00602.
  • Mukherjee et al., (2020) Mukherjee, D., Banerjee, M., Mukherjee, D., and Ritov, Y. (2020). Asymptotic normality of a linear threshold estimator in fixed dimension with near-optimal rate. arXiv preprint arXiv:2001.06955.
  • Mukherjee et al., (2022) Mukherjee, D., Banerjee, M., and Ritov, Y. (2022). On robust learning in the canonical change point problem under heavy tailed errors in finite and growing dimensions. Electronic Journal of Statistics, 16(1):1153 – 1252.
  • Seo and Linton, (2007) Seo, M. H. and Linton, O. (2007). A smoothed least squares estimator for threshold regression models. Journal of Econometrics, 141(2):704–735.
  • Shen and Qu, (2020) Shen, J. and Qu, A. (2020). Subgroup analysis based on structured mixedeffects models for longitudinal data. Journal of Biopharmaceutical Statistics, 30(4):607–622.
  • Siegel et al., (2021) Siegel, R. L., Miller, K. D., Fuchs, H. E., and Jemal, A. (2021). Cancer statistics, 2021. CA: A Cancer Journal for Clinicians, 71(1):7–33.
  • Song et al., (2009) Song, R., Kosorok, M. R., and Fine, J. P. (2009). On asymptotically optimal tests under loss of identifiability in semiparametric models. The Annals of Statistics, 37:2409 – 2444.
  • Sun et al., (2020) Sun, Q., Zhou, W.-X., and Fan, J. (2020). Adaptive huber regression. Journal of the American Statistical Association, 115(529):254–265. PMID: 33139964.
  • Wald, (1943) Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large. Transactions of the American Mathematical Society, 54(3):426–482.
  • Wang et al., (2021) Wang, L., Zheng, C., Zhou, W., and Zhou, W.-X. (2021). A new principle for tuning-free huber regression. Statistica Sinica, 31:2153–2177.
  • Wei and Kosorok, (2018) Wei, S. and Kosorok, M. R. (2018). The change-plane Cox model. Biometrika, 105(4):891–903.
  • Xiong et al., (2019) Xiong, J., Bing, Z., and Guo, S. (2019). Observed survival interval: A supplement to tcga pan-cancer clinical data resource. Cancers, 11(3).
  • Zhang et al., (2022) Zhang, Y., Wang, J. W., and Zhu, Z. (2022). Single-index thresholding in quantile regression. Journal of the American Statistical Association, 117(540):2222–2237.
  • Zhou et al., (2018) Zhou, W.-X., Bose, K., Fan, J., and Liu, H. (2018). A new perspective on robust M𝑀Mitalic_M-estimation: Finite sample theory and applications to dependence-adjusted multiple testing. The Annals of Statistics, 46(5):1904 – 1931.