Trimmed Sample Means For Uniform Mean Estimation and Regression
Trimmed Sample Means For Uniform Mean Estimation and Regression
ℙ 𝑋- − 𝜇 ≤ 𝑟(𝛼, 𝑛) ≥ 1 − 𝛼
2
2 log( ) 𝜎
𝑟(𝛼, 𝑛) ∼ 𝜎 𝛼 𝑟 𝛼, 𝑛 ≈
𝑛 𝛼𝑛
Catoni (AnIHP’12) + Lee/Valiant (STOC’21)
2 2
2 log( ) 2 log( )
𝑟 𝛼, 𝑛 ∼ 𝜎 𝛼 𝑟 𝛼, 𝑛 ∼ 𝜎 𝛼
𝑛 𝑛
The main pont
Contaminated sample
𝑋+2 , … , 𝑋-2 such that
#{𝑖 ≤ 𝑛 ∶ 𝑋/2 ≠ 𝑋/ } ≤ 𝜀𝑛.
-76
1
𝑋-,6 : = 9 𝑋(/)
𝑛 − 2𝑘
/06.+
How to compute a trimmed mean
𝑛 = 8 and 𝑘 = 2
#
! "1
% ,-. $
𝑟 𝛼, 𝑛, 𝜀 ≔ 𝐶 inf 𝜈+ "
+𝐶 inf 𝜈2 𝜀 "1"/2 .
+:;:, 0 23"
Then ℙ 𝑋-,6 − 𝜇 ≤ 𝑟 𝛼, 𝑛, 𝜀 ≥ 1 − 𝛼.
Better than your MoM
Median of Means for 1d data
Break data into 𝑘 blocks, take averages of blocks
& then take the median of these averages.
2- +
Requires 𝑘 ≥ ∨ log( ) for robustness + high prob.
, <
#
! "1
,-. $
"
𝑟=>= 𝛼, 𝑛, 𝜀 ≔ 𝐶 inf 𝜈+ 𝜀 + .
+:;:, 0
Can trimmed means lead to
improvements more generally?
Joint with Lucas Resende – IMPA
arXiv:2302.06710
What we did
Trimmed means give nearly optimal results for 2 problems:
uniform mean estimation + regression with mean sq. error.
Goal
Estimate 𝑃𝑓 = 𝔼9∼ * 𝑓(𝑋) for each 𝑓 ∈ ℱ with small worst-case
error:
}
Loss( 𝑃𝑓 ?∈ℱ , 𝐸𝑓 ): = sup 𝑃𝑓 − } .
𝐸𝑓
?∈ℱ ?∈ℱ
Applications
M-estimation/regression
We’ll see an example soon.
-76
1
𝑇„-,6
2
𝑓 ≔ 9 𝑓(𝑋 2/,? )
𝑛 − 2𝑘
/06.+
ℙ sup{ 𝑇„-,6
2
𝑓 − 𝑃𝑓 ∶ 𝑓 ∈ ℱ} ≤ 𝑅 𝛼, 𝑛, 𝜀 ≥ 1 − 𝛼,
where
𝑅 𝛼, 𝑛, 𝜀 ≔ 𝐶EmpC ℱ + 𝐶𝑟ℱ 𝛼, 𝑛, 𝜀 .
∀𝑓 ∈ ℱ ∶ #{𝑖 ≤ 𝑛 ∶ 𝑓 𝑋/ − 𝑃𝑓 > 𝑀} ≤ 𝑘 − 𝜀𝑛
𝐶𝑀𝑘
„2
|𝑇 𝑓 − 𝑃𝑓 − 𝑃- 𝜏E 𝑓 − 𝑃𝑓 | ≤ .
-,6 𝑛
Improved vector mean estimation
Theorem (O. & Resende 2023)
If 𝕏 = ℝA , ∃𝜇G
-,6 estimator of the mean 𝜇 st. with prob. ≥ 1 − 𝛼:
||𝜇G
-,6 − 𝜇|| ≤ 𝐶𝔼9!:$ ∼//A * || 𝑋- − 𝜇||
#
! !-
% )*+ $
+ 𝐶 inf 𝜈( "
+𝐶 inf 𝜈. 𝜀 !-!/. .
!'('% & ./!
"
2 2
𝜈+ : = sup{(𝔼6∼8 ⟨𝑋 − 𝜇, 𝑓⟩ ) ∶ 𝑓 ∈ ℝ: , ||𝑓||∗ ≤ 1}.
Regression with squared loss
Given
I.i.d. (corrupted) sample of pairs 𝑋, 𝑌 ∈ 𝕏×ℝ with law 𝑃
Family ℱ of functions from 𝕏 to ℝ.
Goal
Estimate the best fit of 𝑌 from 𝑓(𝑋)
,
𝑓!"#$ ≔ arg min 𝔼 9,G ∼* 𝑌 − 𝑓 𝑋 vs 𝑓“$H! ∈ ℱ
?∈ℱ
,
Loss(𝑓!"#$ , 𝑓“$H! ) = 𝔼 9,G ∼* 𝑓“$H! (𝑋) − 𝑓!"#$ 𝑋 .
Results on regression
Setup
𝕏×ℝ, 𝒳×ℬ(ℝ), 𝑃 a probability space
𝑍!:& ≔ 𝑍! , … , 𝑍& ∼ 𝑃 with each 𝑍2 = 𝑋2 , 𝑌2 ∈ 𝕏×ℝ.
$
𝑍!:& satisfying #{𝑖 ∈ 𝑛 ∶ 𝑍2$ ≠ 𝑍2 } ≤ 𝜀𝑛.
%
ℱ ≔ some functions 𝑓: 𝕏 → ℝ; set ℓ# 𝑥, 𝑦 ≔ 𝑦 − 𝑓 𝑥 .
&-3
1
𝑇Y&,3
$
ℓ# − ℓ4 ≔ ^ ℓ# (𝑋 2,ℓ% -ℓ& , 𝑌(2,ℓ%-ℓ&) )
𝑛 − 2𝑘
2536!
𝑌/ = 𝛽FJKL , 𝑋/ + 𝜉/
Assumptions
2nd moment + small ball on 𝑋
𝑝-th moment bound on 𝜉/ (1 < 𝑝 ≤ 2)
! ,7,/8
A.Q>R .2
||𝛽“LOF − 𝛽FJKL ||,P ≤ 𝐶8 &
.
-
Linear regression with random design
Heuristic - alternating minimization/maximization
Performs quite well in experiments.
}D , 𝛽
Set initial 𝛽 }+ ∈ ℝA arbitrarily.
Repeat until convergence.
Trim ℓUT' 𝑋/2 , 𝑌/2 − ℓUT! 𝑋/2 , 𝑌/2 .
Choose one of 𝛽 }D or 𝛽
}+ to update.
}D or 𝛽
Perform OLS on trimmed sample to obtain new 𝛽 }+ .
Experiments vs. Median-of-means
TM MOM OLS
102
L2
∞
?∞
∞Ø̂n ° Ø ∞
101
∞ "
∞
100
10°1
0
02
04
06
08
15
4
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
"
Linear regression with normal errors and contamination.
Experiments vs. Median-of-means
TM MOM OLS
102
L2
∞
?∞
∞Ø̂n ° Ø ∞
101
∞ "
∞
100
10°1
0
02
04
06
08
15
4
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
"
Linear regression with student(1) errors and contamination.
Conclusion
Theory says trimmed means give the best-known estimators
for the problems we consider. Dependence on 𝜀 is optimal.
Gaussian approx. of trimmed process: upcoming by L. Resende.