An Upper Confidence Bound Approach to Estimating the Maximum Mean

Kun Zhang111[email protected] Institute of Statistics and Big Data, Renmin University of China, Beijing, China 100872 Guangwu Liu222[email protected] Department of Management Sciences, College of Business, City University of Hong Kong, Kowloon Tong, Hong Kong Wen Shi333[email protected] Business School, Central South University, Changsha, China 410083
(August 8, 2024)
Abstract

Estimating the maximum mean finds a variety of applications in practice. In this paper, we study estimation of the maximum mean using an upper confidence bound (UCB) approach where the sampling budget is adaptively allocated to one of the systems. We study in depth the existing grand average (GA) estimator, and propose a new largest-size average (LSA) estimator. Specifically, we establish statistical guarantees, including strong consistency, asymptotic mean squared errors, and central limit theorems (CLTs) for both estimators, which are new to the literature. We show that LSA is preferable over GA, as the bias of the former decays at a rate much faster than that of the latter when sample size increases. By using the CLTs, we further construct asymptotically valid confidence intervals for the maximum mean, and propose a single hypothesis test for a multiple comparison problem with application to clinical trials. Statistical efficiency of the resulting point and interval estimates and the proposed single hypothesis test is demonstrated via numerical examples.

1 Introduction

Estimating the maximum mean of a number of stochastic systems finds a variety of applications in broad areas, ranging from financial risk measurement and evaluation of clinical trials to Markov decision processes and reinforcement learning. In financial risk measurement, coherent risk measures such as expected shortfall play an important role in determining risk capital charge for financial institutions, which is of great practical importance to both risk managers and regulators. In general, a coherent risk measure can be written as the maximum of expected losses (or means) under different probability measures. In practice, its estimation in practice often involves intensive simulation, and developing efficient methods for estimating coherent risk measures has been a topic of interest in simulation community. To improve upon a two-stage procedure in Chen and Dudewicz (1976) for constructing confidence intervals (CIs) for the maximum mean, Lesnevski et al. (2007) proposed a procedure for constructing a two-sided, fixed-width CI for a coherent risk measure, by combining stage-wise sampling and screening based on ranking-and-selection techniques. Estimating the maximum mean also finds applications in evaluation of clinical trials. One of the goals of clinical trials is to estimate the best treatment effect among a number of treatments with unknown mean effects. Estimates of the best mean effect provide useful information into the level of effectiveness of the best treatment.

In both applications in financial risk measurement and clinical trials, the accuracy of the maximum mean estimates is of great concern. For instance, FDA (U.S. Food and Drug Administration) is concerned about the bias and reliability of the estimates of treatment effects when using adaptive designs in clinical trials (FDA 2019). To address such concerns, statistical inference about the maximum mean is highly desirable, providing not only point estimates, but also CIs that offer information about the precision of the estimates. To provide theoretical guarantees for the CIs, central limit theorems (CLTs) are typically needed, which is part of the agenda in our study. In this paper, we aim to develop efficient estimators for the maximum mean, and provide theoretical results on the properties of the estimators, such as their biases, variances, and CLTs.

CLTs for maximum mean estimators can also be applied to a fundamental hypothesis testing problem in clinical trials, i.e., comparing K1𝐾1K-1italic_K - 1 treatments (with unknown mean effects denoted by μ1,,μK1subscript𝜇1subscript𝜇𝐾1\mu_{1},\ldots,\mu_{K-1}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_μ start_POSTSUBSCRIPT italic_K - 1 end_POSTSUBSCRIPT) to a control treatment with known mean effect μ0subscript𝜇0\mu_{0}italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, aiming to determine whether any of the K1𝐾1K-1italic_K - 1 treatments outperforms the control; see Dmitrienko and Hsu (2006) and references therein. In the current practice, a commonly used approach is to simultaneously test K1𝐾1K-1italic_K - 1 null hypotheses: μkμ0subscript𝜇𝑘subscript𝜇0\mu_{k}\leq\mu_{0}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, k=1,,K1𝑘1𝐾1k=1,\ldots,K-1italic_k = 1 , … , italic_K - 1, where Bonferroni correction is often applied to control the type I error. However, it is well known that this approach may be conservative especially when K𝐾Kitalic_K is large, and it comes at the cost of increasing the probability of type II errors. With CLTs for estimators of the maximum mean μ=max(μ0,μ1,,μK1)superscript𝜇subscript𝜇0subscript𝜇1subscript𝜇𝐾1\mu^{*}=\max(\mu_{0},\mu_{1},\ldots,\mu_{K-1})italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_max ( italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_μ start_POSTSUBSCRIPT italic_K - 1 end_POSTSUBSCRIPT ), we propose an alternative approach by testing only a single hypothesis: μ=μ0superscript𝜇subscript𝜇0\mu^{*}=\mu_{0}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, thus avoiding the use of Bonferroni correction and its subsequent drawbacks. The CLTs provide theoretical guarantees for the validity of the proposed single hypothesis test that is numerically shown to significantly outperform the multiple hypotheses approach.

Maximum mean estimation has also been studied in the literature on Markov decision processes (MDPs) and reinforcement learning, where the value function for a given state is usually the maximum of multiple expected values taken for different actions. Estimation of the value function is a fundamental sub-task in many solution methods for MDPs and reinforcement learning; see, e.g., Chang et al. (2005), van Hasselt (2010), and Fu (2017). A double (also referred to as cross-validation) estimator was proposed in van Hasselt (2010, 2013) based on two disjoint sets of samples, where one set of samples are used to select the best system (i.e., the system with the maximum mean) and the estimator is given by the average of the selected system using another set of samples. D’Eramo et al. (2016) proposed an estimator by using a weighted average of the sample means of individual systems, where the weight of a system is set as a Gaussian approximation of the probability that this system is the best. The double estimator leads to double Q-learning algorithms, and empirical studies in van Hasselt (2010) and van Hasselt et al. (2016) suggest that these algorithms may outperform conventional Q-learning and deep reinforcement learning algorithms. However, given that maximum mean estimation is only an intermediate module but not the ultimate goal in MDPs and reinforcement learning, how the estimation errors propagate through the state-action pairs and affect the optimal policies is beyond the scope of this paper and requires dedicated future research.

Estimating the maximum mean is closely related to the problem of selecting the best system, an area of active research in simulation community; see, e.g., Kim and Nelson (2006) for an overview. Arguably, the problem of estimating the maximum mean is more difficult than selecting the best system (Lesnevski et al. 2007), because the former requires a careful balance between two goals, including selecting the best system and allocating as many samples as possible to the best system. These two goals are not necessarily complementary, because correctly selecting the best system requires some samples to be allocated to all systems, which limits the achievement of the second goal.

One of the key issues in estimating the maximum mean is how to sample from individual systems. In this regard, both static sampling (van Hasselt 2010, 2013; D’Eramo et al. 2016) and adaptive sampling (Chang et al. 2005; Kocsis and Szepesvári 2006) have been studied in the literature. The former refers to sampling policies that maintain a pre-specified sampling frequency for each system, while the latter sequentially adjusts sampling frequencies based on historical samples drawn from the systems. Many adaptive sampling methods such as the upper confidence bound (UCB) policy and Thompson sampling share a common feature that higher sampling frequencies are geared towards systems with larger means. When estimating the maximum mean, this feature implies a smaller variance for the resulting estimator because more samples are allocated to the target system, thus offering better computational efficiency. Adaptive sampling/design using policies like UCB offers benefits beyond computational efficiency in the application of clinical trials, where each patient is assigned to one of the K𝐾Kitalic_K treatments to assess treatment effects. In this application, other than assessing treatment effects, another important goal is to treat patients as effectively as possible during the trials. Therefore, for the well-being of patients, it is desirable to adopt adaptive designs like the UCB policy so that more patients are assigned to more effective treatments. Interested readers are referred to Villar et al. (2015) and Robertson and Wason (2019) for recent work on adaptive designs in clinical trials.

In this paper, we study maximum mean estimation based on the UCB sampling policy. UCB is one of the most celebrated adaptive sampling methods used in online decision making with learning, which has been extensively used as a tool to balance the trade-off between exploration and exploitation during the course of learning; see, e.g., Agrawal (1995) and Auer et al. (2002) for early work on multi-armed bandit problems, and Hao et al. (2019) and references therein for recent related work. Via the UCB sampling policy, the expected amount of sampling budget allocated to the non-maximum systems is at most of a logarithmic order of the total budget, which implies that the majority of sampling budget is allocated to the system with the maximum mean. Within the UCB framework, one of the estimators being considered is the average of all the samples, no matter drawn from the system with the maximum mean or not. We refer to it as the grand average (GA) estimator. The rationale of the GA estimator stems from the fact that it is dominated by the sample average of the system with the maximum mean, from which the majority of the samples are drawn. The GA estimator is not new in the literature, and has served as a key ingredient for algorithms in MDPs; see, e.g., Chang et al. (2005) and Kocsis and Szepesvári (2006). However, to the best of our knowledge, very little is known about the statistical properties of the GA estimator, except its asymptotic unbiasedness. In this paper, we fill this gap by providing statistical guarantees for the GA estimator, including its strong consistency, mean squared error (MSE), and CLT. On the downside, a well-known drawback of the GA estimator is its bias that results from taking grand average of all systems, including those with smaller means. While this bias gradually dies out as sample size increases, its impact in finite-sample settings cannot be neglected.

As an attempt to remedy the above drawback, we propose a new estimator by taking the sample average of the system that has been allocated the largest amount of samples, referred to as the largest-size average (LSA) estimator. This estimator is intuitively appealing, as the number of allocated samples to the system with the maximum mean dominates those allocated to other systems, ensuring a high probability that the system identified by LSA is exactly the one with the maximum mean. LSA avoids taking average of systems with smaller means, and it is thus expected to have a smaller bias compared to the GA estimator.

To summarize, we make the following contributions in this paper.

  • We provide statistical guarantees for the GA estimator, including its strong consistency, MSE, and CLT, which are new to the literature to the best of our knowledge.

  • We develop a new LSA estimator for estimating the maximum mean with statistical guarantees, and show that it is preferable over the GA estimator, due to its bias that converges at least as fast as νn/n3/2subscript𝜈𝑛superscript𝑛32\nu_{n}/n^{3/2}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / italic_n start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT, a rate much faster than that (νn/nsubscript𝜈𝑛𝑛\nu_{n}/nitalic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / italic_n) of the GA estimator, where n𝑛nitalic_n and νnsubscript𝜈𝑛\nu_{n}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT denote the total sample size and exploration rate, respectively. While polynomial exploration rates are allowed in our setup, both theoretical results and empirical evidence suggest that logarithmic exploration rate is asymptotically optimal when minimizing the MSEs of both the GA and LSA estimators.

  • Using the established CLTs, we construct asymptotically valid GA and LSA CIs for the maximum mean, and propose a single hypothesis test for a multiple comparison problem with application to clinical trials. Numerical experiments confirm the validity of the CIs, and show that the proposed single test exhibits lower type I error and higher statistical power, compared to a benchmarking test with multiple hypotheses.

The rest of the paper is organized as follows. In Section 2, we formulate the problem of estimating the maximum mean. The GA and LSA estimators, and a single hypothesis test for the multiple comparison problem are presented in Section 3. Theoretical analysis of the estimators is provided in Section 4. A numerical study is presented in Section 5 to demonstrate the efficiency of the estimators and the proposed single hypothesis test, followed by conclusions in Section 6.

2 Problem and Backgrounds

Consider K𝐾Kitalic_K stochastic systems, with their performances denoted by random variables {Xk,k=1,,K}formulae-sequencesubscript𝑋𝑘𝑘1𝐾\{X_{k},k=1,\ldots,K\}{ italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_k = 1 , … , italic_K }. Let μk=𝔼(Xk)subscript𝜇𝑘𝔼subscript𝑋𝑘\mu_{k}={\mathbb{E}}(X_{k})italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = blackboard_E ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). For each k𝑘kitalic_k, we assume that Xksubscript𝑋𝑘X_{k}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT follows an unknown probability distribution, while independent samples can be drawn. This assumption is usually satisfied in many practical applications. For instance, Xksubscript𝑋𝑘X_{k}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT may represent the stochastic performance of a complex simulation model and follows an unknown probability distribution, while samples of Xksubscript𝑋𝑘X_{k}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT can be generated via simulation. We make the following assumption on Xksubscript𝑋𝑘X_{k}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT’s.

Assumption 1.

For any k=1,,K𝑘1𝐾k=1,\ldots,Kitalic_k = 1 , … , italic_K, Xksubscript𝑋𝑘X_{k}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT follows a sub-Gaussian distribution, i.e., there exists a sub-Gaussian parameter γksubscript𝛾𝑘\gamma_{k}italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT such that 𝔼[exp(λ(Xkμk))]exp(γk2λ2/2)𝔼delimited-[]𝜆subscript𝑋𝑘subscript𝜇𝑘superscriptsubscript𝛾𝑘2superscript𝜆22{\mathbb{E}}[\exp\left(\lambda(X_{k}-\mu_{k})\right)]\leq\exp\left(\gamma_{k}^% {2}\lambda^{2}/2\right)blackboard_E [ roman_exp ( italic_λ ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) ] ≤ roman_exp ( italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 ), for any λ𝜆\lambda\in{\mathbb{R}}italic_λ ∈ blackboard_R.

Assumption 1 is widely used in the literature on machine learning. Many commonly used distributions, such as those with bounded supports and normal distributions, are sub-Gaussian. Assumption 1 ensures the existence of a Hoeffding bound that serves as the basis for the theoretical analysis in this paper. This bound is presented in Lemma A1 in the appendix.

In this paper, we are interested in estimating the maximum mean of the K𝐾Kitalic_K systems, defined as

μmaxk=1,,Kμk.superscript𝜇subscript𝑘1𝐾subscript𝜇𝑘\mu^{*}\triangleq\max_{k=1,\ldots,K}\mu_{k}.italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≜ roman_max start_POSTSUBSCRIPT italic_k = 1 , … , italic_K end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT .

Let ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT denote the index of the system with the largest mean, i.e., μ=μksuperscript𝜇subscript𝜇superscript𝑘\mu^{*}=\mu_{k^{*}}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Without loss of generality, ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is assumed to be unique.

Assumption 2.

The system with the largest mean is unique, i.e., μ>μksuperscript𝜇subscript𝜇𝑘\mu^{*}>\mu_{k}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for any kk𝑘superscript𝑘k\neq k^{*}italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

Assumption 2 implies that the mean gap between systems ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and k𝑘kitalic_k, given by ΔkμμksubscriptΔ𝑘superscript𝜇subscript𝜇𝑘\Delta_{k}\triangleq\mu^{*}-\mu_{k}roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≜ italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, is positive for any kk𝑘superscript𝑘k\neq k^{*}italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

Given samples of Xksubscript𝑋𝑘X_{k}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, denoted by {Xk,j,j=1,,nk}formulae-sequencesubscript𝑋𝑘𝑗𝑗1subscript𝑛𝑘\{X_{k,j},j=1,\ldots,n_{k}\}{ italic_X start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT , italic_j = 1 , … , italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT }, k=1,,K𝑘1𝐾k=1,\ldots,Kitalic_k = 1 , … , italic_K, a straightforward estimator of μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, referred to as the maximum estimator (see, e.g., D’Eramo et al. 2016), is given by

maxk=1,,KX¯k,subscript𝑘1𝐾subscript¯𝑋𝑘\max_{k=1,\ldots,K}\bar{X}_{k},roman_max start_POSTSUBSCRIPT italic_k = 1 , … , italic_K end_POSTSUBSCRIPT over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ,

where X¯k[nk]j=1nkXk,j/nksubscript¯𝑋𝑘delimited-[]subscript𝑛𝑘superscriptsubscript𝑗1subscript𝑛𝑘subscript𝑋𝑘𝑗subscript𝑛𝑘\bar{X}_{k}[n_{k}]\triangleq\sum_{j=1}^{n_{k}}X_{k,j}/n_{k}over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ≜ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT / italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the sample average of Xksubscript𝑋𝑘X_{k}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. For notational convenience, where no confusion arises, we write simply X¯ksubscript¯𝑋𝑘\bar{X}_{k}over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in places for X¯k[nk]subscript¯𝑋𝑘delimited-[]subscript𝑛𝑘\bar{X}_{k}[n_{k}]over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ].

It has been well known that the maximum estimator has a positive bias and may overestimate μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT; see, e.g., Smith and Winkler (2006) and van Hasselt (2010). The positive bias may lead to adverse effects in various applications. For instance, in the estimation of coherent risk measures, it may lead to overestimation of risk, resulting in unduly high capital charges for risky activities (Lesnevski et al. 2007). In reinforcement learning, as errors propagate over all the state-action pairs, the positive bias may negatively affect the speed of learning (van Hasselt 2010). More importantly, when using the maximum estimator, it is not clear how to allocate sampling budget to the systems, i.e., setting the values of nksubscript𝑛𝑘n_{k}italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT’s, so as to ensure reasonable accuracy. Therefore, it is of both theoretical and practical interests to develop alternative estimators of μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

3 An Upper Confidence Bound Approach

Sampling from the K𝐾Kitalic_K systems is essential for constructing any estimator of μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. In developing an efficient estimator, two issues are of major concern. The first issue is on how to efficiently allocate the sampling budget, while the second issue is on methods of constructing an estimator such that statistical guarantees can be established.

To address the first issue, adaptive sampling policies have been studied in the literature. A popular adaptive sampling policy is the UCB policy that was originally proposed for the multi-armed bandit (MAB) problem; see Auer et al. (2002). In traditional MAB problem, in each round, one of the K𝐾Kitalic_K systems is chosen and a random sample (reward) is drawn from the chosen system, and the decision maker aims to maximize the total rewards collected. At the heart of the MAB problem is to find an adaptive sampling policy that decides which system to sample from in each round so as to balance the tradeoff between exploration and exploitation. Among various sampling policies for MAB, it has been well known that the UCB policy achieves the optimal rate of regret, defined as the expected loss due to the fact that a policy does not always choose the system with the highest expected reward; see Lai and Robbin (1985) and Auer et al. (2002). Specifically, let Tk(n)subscript𝑇𝑘𝑛T_{k}(n)italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) denote the number of samples drawn from system k𝑘kitalic_k during the first n𝑛nitalic_n rounds, for k=1,,K𝑘1𝐾k=1,\dots,Kitalic_k = 1 , … , italic_K. The UCB sampling policy is described in Algorithm 1.

Algorithm 1: UCB Sampling Policy

  1. 1.

    Initialization: During the first K𝐾Kitalic_K rounds, draw a sample from each system.

  2. 2.

    Repeat: For nK+1𝑛𝐾1n\geq K+1italic_n ≥ italic_K + 1, draw a sample from the system indexed by

    κn=argmaxk{1,,K}{X¯k[Tk(n1)]+2lognTk(n1)},subscript𝜅𝑛argsubscript𝑘1𝐾subscript¯𝑋𝑘delimited-[]subscript𝑇𝑘𝑛12𝑛subscript𝑇𝑘𝑛1\kappa_{n}={\text{arg}\max}_{k\in\{1,\dots,K\}}\left\{\bar{X}_{k}[T_{k}(n-1)]+% \sqrt{2\log n\over T_{k}(n-1)}\right\},italic_κ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = arg roman_max start_POSTSUBSCRIPT italic_k ∈ { 1 , … , italic_K } end_POSTSUBSCRIPT { over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n - 1 ) ] + square-root start_ARG divide start_ARG 2 roman_log italic_n end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n - 1 ) end_ARG end_ARG } , (1)

    where X¯k[Tk(n1)]subscript¯𝑋𝑘delimited-[]subscript𝑇𝑘𝑛1\bar{X}_{k}[T_{k}(n-1)]over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n - 1 ) ] denotes the sample mean of system k𝑘kitalic_k during the first (n1)𝑛1(n-1)( italic_n - 1 ) rounds.

    Update the sample average of system κnsubscript𝜅𝑛\kappa_{n}italic_κ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

The UCB policy offers important insights into the adaptive allocation of sampling budget. Essentially, it balances the tradeoff between exploration and exploitation using the UCB defined on the right-hand-side (RHS) of (1). On the one hand, systems with higher on-going sample averages have a higher chance to be chosen in the current round, contributing to higher total rewards. On the other hand, systems that are chosen less frequently during the previous rounds, i.e., Tk(n1)subscript𝑇𝑘𝑛1T_{k}(n-1)italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n - 1 ) being smaller, may also have a sufficient chance to be chosen so that such systems will be sufficiently explored. The function logn𝑛\log nroman_log italic_n in (1) can be interpreted as the exploration rate that controls the speed of exploring systems that may not have the largest mean.

In the MAB context, the exploration rate is set to be logn𝑛\log nroman_log italic_n simply because it leads to the optimal rate of regret. However, it should be pointed out that when the objective is not to minimize the regret, as the case in our problem setting, it is not yet clear whether logarithmic exploration rate is optimal. For the sake of generality, we henceforth allow the exploration rate, denoted by νnsubscript𝜈𝑛\nu_{n}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, to take different forms as a function of n𝑛nitalic_n, and refer to the resulting UCB policy as a generalized UCB (GUCB) policy, which is described in Algorithm 2.

Algorithm 2 (GUCB): Generalized UCB Sampling Policy

  1. 1.

    Initialization: During the first K𝐾Kitalic_K rounds, draw a sample from each system.

  2. 2.

    Repeat: For nK+1𝑛𝐾1n\geq K+1italic_n ≥ italic_K + 1, draw a sample from the system indexed by

    In=argmaxk{1,,K}{X¯k[Tk(n1)]+2νnTk(n1)},subscript𝐼𝑛argsubscript𝑘1𝐾subscript¯𝑋𝑘delimited-[]subscript𝑇𝑘𝑛12subscript𝜈𝑛subscript𝑇𝑘𝑛1I_{n}={\text{arg}\max}_{k\in\{1,\dots,K\}}\left\{\bar{X}_{k}[T_{k}(n-1)]+\sqrt% {2\nu_{n}\over T_{k}(n-1)}\right\},italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = arg roman_max start_POSTSUBSCRIPT italic_k ∈ { 1 , … , italic_K } end_POSTSUBSCRIPT { over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n - 1 ) ] + square-root start_ARG divide start_ARG 2 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n - 1 ) end_ARG end_ARG } ,

    where X¯k[Tk(n1)]subscript¯𝑋𝑘delimited-[]subscript𝑇𝑘𝑛1\bar{X}_{k}[T_{k}(n-1)]over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n - 1 ) ] denotes the sample mean of system k𝑘kitalic_k during the first (n1)𝑛1(n-1)( italic_n - 1 ) rounds.

    Update the sample average of system Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

3.1 Grand Average (GA) Estimator

Before developing estimators for μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, we highlight some key properties of the GUCB policy. To convey the main idea, for a while we focus on a special case where the exploration rate νn=lognsubscript𝜈𝑛𝑛\nu_{n}=\log nitalic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_log italic_n. In this case, it has been well known that (see Auer et al. 2002), for system k𝑘kitalic_k (kk𝑘superscript𝑘k\neq k^{*}italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT),

𝔼[Tk(n)]Clogn,𝔼delimited-[]subscript𝑇𝑘𝑛𝐶𝑛{\mathbb{E}}[T_{k}(n)]\leq C\log n,blackboard_E [ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ] ≤ italic_C roman_log italic_n ,

for some constant C𝐶Citalic_C that depends on ΔksubscriptΔ𝑘\Delta_{k}roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Then, it can be easily seen that 𝔼[Tk(n)]𝔼delimited-[]subscript𝑇superscript𝑘𝑛{\mathbb{E}}[T_{k^{*}}(n)]blackboard_E [ italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ], the expected number of times system ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is chosen, is of order n𝑛nitalic_n during the first n𝑛nitalic_n rounds, because the summation of Tk(n)subscript𝑇𝑘𝑛T_{k}(n)italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n )’s equals n𝑛nitalic_n.

In other words, it is expected that among the first n𝑛nitalic_n rounds, system ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is chosen in the majority of rounds. Recall that our objective is to estimate μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, the mean of system ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. It is, therefore, reasonable to use the grand average of the samples of all the n𝑛nitalic_n rounds as an estimator of μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, i.e., the GA estimator. Although it uses the samples that are not drawn from system ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, the validity of the GA estimator can be justified by the fact that the estimation error due to such samples may phase out when n𝑛nitalic_n is sufficiently large, as the number of such samples is negligible compared to those drawn from system ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

Specifically, the GA estimator is given by

M~n=1nk=1Kj=1Tk(n)Xk,j,subscript~𝑀𝑛1𝑛superscriptsubscript𝑘1𝐾superscriptsubscript𝑗1subscript𝑇𝑘𝑛subscript𝑋𝑘𝑗\widetilde{M}_{n}=\frac{1}{n}\sum_{k=1}^{K}\sum_{j=1}^{T_{k}(n)}X_{k,j},over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT , (2)

where {Xk,j,j=1,,Tk(n)}formulae-sequencesubscript𝑋𝑘𝑗𝑗1subscript𝑇𝑘𝑛\{X_{k,j},j=1,\ldots,T_{k}(n)\}{ italic_X start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT , italic_j = 1 , … , italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) } denotes the samples drawn from system k𝑘kitalic_k during the first n𝑛nitalic_n rounds using the GUCB policy in Algorithm 2.

The GA estimator is not new in the literature, which has served as a key ingredient for algorithms for MDPs and reinforcement learning under the special case that νn=lognsubscript𝜈𝑛𝑛\nu_{n}=\log nitalic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_log italic_n. However, research on the statistical properties of the estimator has been underdeveloped. To the best of our knowledge, only the asymptotic unbiasedness of the estimator has been established; see, e.g., Kocsis and Szepesvári (2006), while other statistical properties seem to be missing. One of the contributions of this paper is to fill this gap. In particular, we establish strong consistency, asymptotic MSE, and CLT for the GA estimator, which shall be discussed in detail in Section 4.

By taking grand average of the samples of all the systems including those with means smaller than μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, the GA estimator is naturally negatively biased. While this bias gradually dies out as the sample size n𝑛nitalic_n increases, its impact cannot be neglected with finite samples, which has been considered as a main drawback of the GA estimator.

3.2 Largest-Size Average (LSA) Estimator

As an attempt to remedy the bias drawback of the GA estimator, we propose a new estimator based on a key observation that the number of samples drawn from system ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT usually dominates those from other systems. We therefore define, at the n𝑛nitalic_nth round,

Inargmaxk=1,,KTk(n).superscriptsubscript𝐼𝑛subscript𝑘1𝐾subscript𝑇𝑘𝑛I_{n}^{*}\triangleq\mathop{\arg\max}_{k=1,...,K}T_{k}\left(n\right).italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≜ start_BIGOP roman_arg roman_max end_BIGOP start_POSTSUBSCRIPT italic_k = 1 , … , italic_K end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) . (3)

In other words, Insuperscriptsubscript𝐼𝑛I_{n}^{*}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is the index of the system from which the GUCB policy has sampled most frequently so far. It is reasonable to expect that it is very likely to observe In=ksuperscriptsubscript𝐼𝑛superscript𝑘I_{n}^{*}=k^{*}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Recall that μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is the mean of the performance of system ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. It is, therefore, reasonable to propose a new estimator of μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT by taking sample average of the system indexed by Insuperscriptsubscript𝐼𝑛I_{n}^{*}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, i.e., the LSA estimator. Specifically, the LSA estimator is given by

MIn=1TIn(n)j=1TIn(n)XIn,j,subscript𝑀superscriptsubscript𝐼𝑛1subscript𝑇superscriptsubscript𝐼𝑛𝑛superscriptsubscript𝑗1subscript𝑇superscriptsubscript𝐼𝑛𝑛subscript𝑋superscriptsubscript𝐼𝑛𝑗M_{I_{n}^{*}}=\frac{1}{T_{I_{n}^{*}}(n)}\sum_{j=1}^{T_{I_{n}^{*}}(n)}X_{I_{n}^% {*},j},italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_j end_POSTSUBSCRIPT , (4)

where the samples are drawn using the GUCB policy in Algorithm 2.

The LSA estimator and the maximum estimator share a similar structure, both of them being sample averages of certain identified systems. However, the way of identifying the target system differs significantly. The maximum estimator chooses the system with the largest sample average and uses directly this sample average to estimate μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Instead, the LSA estimator chooses the system from which the GUCB policy has sampled most frequently so far, and uses the sample average of this system to estimate μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. The latter seems to be more intuitively appealing in that it is virtually an unbiased sample-mean estimator conditioning on the event that system ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is correctly identified. Arguably, correctly identifying ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT may be relatively easy as the number of rounds GUCB samples from system ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is usually overwhelmingly larger than those from other systems.

Compared to the GA estimator, the LSA estimator avoids taking averages of those systems with smaller means with a high probability. It is thus expected that the LSA estimator may have smaller bias, which shall be shown theoretically in Section 4.

3.3 Hypothesis Testing using Maximum Mean Estimators

In what follows, we demonstrate how the maximum mean estimators may lead to a new approach to a multiple comparison problem that finds important application in clinical trials.

In a clinical trial, suppose that one is interested in comparing K1𝐾1K-1italic_K - 1 treatments with unknown mean effects μ1,,μK1subscript𝜇1subscript𝜇𝐾1\mu_{1},\ldots,\mu_{K-1}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_μ start_POSTSUBSCRIPT italic_K - 1 end_POSTSUBSCRIPT to a control treatment with known mean effect μ0subscript𝜇0\mu_{0}italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, where K2𝐾2K\geq 2italic_K ≥ 2 and a higher mean effect implies a better treatment. In the current practice, a commonly used approach to this comparison problem is to test simultaneously K1𝐾1K-1italic_K - 1 null hypotheses H0,ksubscript𝐻0𝑘H_{0,k}italic_H start_POSTSUBSCRIPT 0 , italic_k end_POSTSUBSCRIPT’s against alternatives H1,ksubscript𝐻1𝑘H_{1,k}italic_H start_POSTSUBSCRIPT 1 , italic_k end_POSTSUBSCRIPT’s, where

H0,k:μkμ0v.s.H1,k:μk>μ0,k=1,,K1.\displaystyle H_{0,k}:\mu_{k}\leq\mu_{0}\quad v.s.\quad H_{1,k}:\mu_{k}>\mu_{0% },\quad k=1,\ldots,K-1.italic_H start_POSTSUBSCRIPT 0 , italic_k end_POSTSUBSCRIPT : italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_v . italic_s . italic_H start_POSTSUBSCRIPT 1 , italic_k end_POSTSUBSCRIPT : italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_k = 1 , … , italic_K - 1 . (5)

With the above multiple hypotheses framework, a meta null hypothesis is set as H0,ksubscript𝐻0𝑘H_{0,k}italic_H start_POSTSUBSCRIPT 0 , italic_k end_POSTSUBSCRIPT’s being true simultaneously for all k=1,,K1𝑘1𝐾1k=1,\ldots,K-1italic_k = 1 , … , italic_K - 1, which is rejected if any of H0,ksubscript𝐻0𝑘H_{0,k}italic_H start_POSTSUBSCRIPT 0 , italic_k end_POSTSUBSCRIPT’s is rejected, implying that at least one of the K1𝐾1K-1italic_K - 1 treatments outperforms the control with certain confidence. Suppose that the overall type I error of this test with multiple hypotheses is set to be α𝛼\alphaitalic_α. In this case, type I errors associated with individual hypotheses cannot be set as α𝛼\alphaitalic_α, in order to control the overall type I error. Instead, Bonferroni correction is often applied, which assigns a type I error of α/(K1)𝛼𝐾1\alpha/(K-1)italic_α / ( italic_K - 1 ) to the K1𝐾1K-1italic_K - 1 individual tests. By doing so, it can be verified that the family-wise error rate (FWER), defined as the probability of rejecting at least one true H0,ksubscript𝐻0𝑘H_{0,k}italic_H start_POSTSUBSCRIPT 0 , italic_k end_POSTSUBSCRIPT is not larger than α𝛼\alphaitalic_α, provided that these individual tests are independent. To see this, suppose that there are m0subscript𝑚0m_{0}italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT true null hypotheses that are assumed to be {H0,1,,H0,m0}subscript𝐻01subscript𝐻0subscript𝑚0\{H_{0,1},\ldots,H_{0,m_{0}}\}{ italic_H start_POSTSUBSCRIPT 0 , 1 end_POSTSUBSCRIPT , … , italic_H start_POSTSUBSCRIPT 0 , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT } without loss of generality, where m0subscript𝑚0m_{0}italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT takes values in {1,,K1}1𝐾1\{1,\ldots,K-1\}{ 1 , … , italic_K - 1 } and is unknown to the researcher. Then,

FWER=Pr(i=1m0{Rejecting H0,i})i=1m0Pr(Rejecting H0,i)=m0αK1α.FWERPrsuperscriptsubscript𝑖1subscript𝑚0Rejecting H0,isuperscriptsubscript𝑖1subscript𝑚0PrRejecting H0,isubscript𝑚0𝛼𝐾1𝛼\displaystyle\mbox{FWER}=\Pr\left(\cup_{i=1}^{m_{0}}\{\mbox{Rejecting $H_{0,i}% $}\}\right)\leq\sum_{i=1}^{m_{0}}\Pr\left(\mbox{Rejecting $H_{0,i}$}\right)=m_% {0}{\alpha\over K-1}\leq\alpha.FWER = roman_Pr ( ∪ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT { Rejecting italic_H start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT } ) ≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_Pr ( Rejecting italic_H start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT ) = italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT divide start_ARG italic_α end_ARG start_ARG italic_K - 1 end_ARG ≤ italic_α .

While it is easy to implement, Bonferroni correction has been criticized, due to its conservativeness, i.e., a large gap between FWER and α𝛼\alphaitalic_α especially when K𝐾Kitalic_K is large, and the drawback that it usually increases the probability of type II errors. It is thus desirable to develop other procedures for the comparison of treatments that avoid Bonferroni correction.

We note that with the estimators of the maximum mean, a more appealing approach can be developed for this multiple comparison problem. More specifically, in contrast to the multiple hypotheses formulation, we propose a single hypothesis test, with

H0:maxk=0,,K1μk=μ0v.s.H1:maxk=0,,K1μk>μ0.\displaystyle H_{0}:\max_{k=0,\ldots,K-1}\mu_{k}=\mu_{0}\quad v.s.\quad H_{1}:% \max_{k=0,\ldots,K-1}\mu_{k}>\mu_{0}.italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : roman_max start_POSTSUBSCRIPT italic_k = 0 , … , italic_K - 1 end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_v . italic_s . italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : roman_max start_POSTSUBSCRIPT italic_k = 0 , … , italic_K - 1 end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT . (6)

Relabelling μ0subscript𝜇0\mu_{0}italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT as μKsubscript𝜇𝐾\mu_{K}italic_μ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT, we note that the GA or LSA estimator, M~nsubscript~𝑀𝑛\widetilde{M}_{n}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT or MInsubscript𝑀superscriptsubscript𝐼𝑛M_{I_{n}^{*}}italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, can be used to estimate maxk=0,,K1μkmaxk=1,,Kμksubscript𝑘0𝐾1subscript𝜇𝑘subscript𝑘1𝐾subscript𝜇𝑘\max_{k=0,\ldots,K-1}\mu_{k}\equiv\max_{k=1,\ldots,K}\mu_{k}roman_max start_POSTSUBSCRIPT italic_k = 0 , … , italic_K - 1 end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≡ roman_max start_POSTSUBSCRIPT italic_k = 1 , … , italic_K end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Essentially, by comparing M~nsubscript~𝑀𝑛\widetilde{M}_{n}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (or MInsubscript𝑀superscriptsubscript𝐼𝑛M_{I_{n}^{*}}italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT) to μ0subscript𝜇0\mu_{0}italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, one may be able to decide whether to reject H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT or not. Nevertheless, one needs to know the asymptotic distribution of M~nsubscript~𝑀𝑛\widetilde{M}_{n}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (or MInsubscript𝑀superscriptsubscript𝐼𝑛M_{I_{n}^{*}}italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT), in order to formalize the test. We shall show in Section 4, more specifically, Theorems 5 and 8, that both M~nsubscript~𝑀𝑛\widetilde{M}_{n}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and MInsubscript𝑀superscriptsubscript𝐼𝑛M_{I_{n}^{*}}italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT follow asymptotic normal distributions, thus providing theoretical support for the proposed single hypothesis test. Exact forms of the testing statistics resulting from the GA and LSA estimators are delayed to (11) and (14) in the subsequent section.

4 Theoretical Analysis

Throughout the paper, we denote γ¯=max{γ1,,γK}¯𝛾subscript𝛾1subscript𝛾𝐾\bar{\gamma}=\max\{\gamma_{1},...,\gamma_{K}\}over¯ start_ARG italic_γ end_ARG = roman_max { italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_γ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT }, where γksubscript𝛾𝑘\gamma_{k}italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT’s are the sub-Gaussian parameter defined in Assumption 1. To facilitate analysis, we first show a result on Tk(n)subscript𝑇𝑘𝑛T_{k}(n)italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ).

Proposition 1.

Suppose that Assumptions 1 and 2 hold, and νn4γ¯2lognsubscript𝜈𝑛4superscript¯𝛾2𝑛\nu_{n}\geq 4\bar{\gamma}^{2}\log nitalic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ 4 over¯ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log italic_n. Then, for kk𝑘superscript𝑘k\neq k^{*}italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and any positive integer p𝑝pitalic_p,

𝔼Tk(n)p{8νnΔk2+5ifp=1;{8νnΔk2+5}2ifp=2;{8νnΔk2+1+[6log(n+1)+O(1)]13}3ifp=3;{8νnΔk2+1+[2pp3(n+1)p3+O(np4)]1p}pifp4,𝔼subscript𝑇𝑘superscript𝑛𝑝cases8subscript𝜈𝑛superscriptsubscriptΔ𝑘25𝑖𝑓𝑝1superscript8subscript𝜈𝑛superscriptsubscriptΔ𝑘252𝑖𝑓𝑝2superscript8subscript𝜈𝑛superscriptsubscriptΔ𝑘21superscriptdelimited-[]6𝑛1𝑂1133𝑖𝑓𝑝3superscript8subscript𝜈𝑛superscriptsubscriptΔ𝑘21superscriptdelimited-[]2𝑝𝑝3superscript𝑛1𝑝3𝑂superscript𝑛𝑝41𝑝𝑝𝑖𝑓𝑝4\displaystyle{\mathbb{E}}T_{k}(n)^{p}\leq\begin{cases}\displaystyle\frac{8\nu_% {n}}{\Delta_{k}^{2}}+5&if\ p=1;\\ \displaystyle\left\{\frac{8\nu_{n}}{\Delta_{k}^{2}}+5\right\}^{2}&if\ p=2;\\ \displaystyle\left\{\frac{8\nu_{n}}{\Delta_{k}^{2}}+1+\left[6\log\left(n+1% \right)+O(1)\right]^{\frac{1}{3}}\right\}^{3}&if\ p=3;\\ \displaystyle\left\{\frac{8\nu_{n}}{\Delta_{k}^{2}}+1+\left[\frac{2p}{p-3}(n+1% )^{p-3}+O\left(n^{p-4}\right)\right]^{\frac{1}{p}}\right\}^{p}&if\ p\geq 4,% \end{cases}blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ≤ { start_ROW start_CELL divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 5 end_CELL start_CELL italic_i italic_f italic_p = 1 ; end_CELL end_ROW start_ROW start_CELL { divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 5 } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL italic_i italic_f italic_p = 2 ; end_CELL end_ROW start_ROW start_CELL { divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 1 + [ 6 roman_log ( italic_n + 1 ) + italic_O ( 1 ) ] start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_CELL start_CELL italic_i italic_f italic_p = 3 ; end_CELL end_ROW start_ROW start_CELL { divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 1 + [ divide start_ARG 2 italic_p end_ARG start_ARG italic_p - 3 end_ARG ( italic_n + 1 ) start_POSTSUPERSCRIPT italic_p - 3 end_POSTSUPERSCRIPT + italic_O ( italic_n start_POSTSUPERSCRIPT italic_p - 4 end_POSTSUPERSCRIPT ) ] start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p end_ARG end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_CELL start_CELL italic_i italic_f italic_p ≥ 4 , end_CELL end_ROW

where Δk>0subscriptΔ𝑘0\Delta_{k}>0roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0 for kk𝑘superscript𝑘k\neq k^{*}italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT due to Assumption 2, and the notation O()𝑂O(\cdot)italic_O ( ⋅ ) means that lim supnan/bnCsubscriptlimit-supremum𝑛subscript𝑎𝑛subscript𝑏𝑛𝐶\limsup_{n\rightarrow\infty}a_{n}/b_{n}\leq Clim sup start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_C for some constant C𝐶Citalic_C if an=O(bn)subscript𝑎𝑛𝑂subscript𝑏𝑛a_{n}=O(b_{n})italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_O ( italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ).

Proposition 1 provides an upper bound for the p𝑝pitalic_pth moment of Tk(n)subscript𝑇𝑘𝑛T_{k}(n)italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ), the number of samples drawn from system k𝑘kitalic_k during the first n𝑛nitalic_n rounds, for kk𝑘superscript𝑘k\neq k^{*}italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. This upper bound relies on the exploration rate νnsubscript𝜈𝑛\nu_{n}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, under the assumption that the exploration rate is sufficiently fast, i.e., at least as fast as the logarithmic order. In a special case when νn=4γ¯2lognsubscript𝜈𝑛4superscript¯𝛾2𝑛\nu_{n}=4\bar{\gamma}^{2}\log nitalic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 4 over¯ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log italic_n and p=1𝑝1p=1italic_p = 1, the result is the same as that in Theorem 1 of Auer et al. (2002).

Essentially, Proposition 1 gives an upper bound on the rate at which the p𝑝pitalic_pth moment of Tk(n)subscript𝑇𝑘𝑛T_{k}(n)italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) converges to \infty as n𝑛n\rightarrow\inftyitalic_n → ∞ for kk𝑘superscript𝑘k\neq k^{*}italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. It implies that the sampling ratios Tk(n)/nsubscript𝑇𝑘𝑛𝑛T_{k}(n)/nitalic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) / italic_n may satisfy certain convergence properties. In particular, strong consistency of the sampling ratios are summarized in the following theorem, whose proof is provided in Section A.3 of the appendix.

Theorem 1.

Suppose that Assumptions 1 and 2 hold, and νn[αlogn,n1δ]subscript𝜈𝑛𝛼𝑛superscript𝑛1𝛿\nu_{n}\in\left[\alpha\log n,n^{1-\delta}\right]italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ [ italic_α roman_log italic_n , italic_n start_POSTSUPERSCRIPT 1 - italic_δ end_POSTSUPERSCRIPT ] with 0<δ<10𝛿10<\delta<10 < italic_δ < 1 and α4γ¯2𝛼4superscript¯𝛾2\alpha\geq 4\bar{\gamma}^{2}italic_α ≥ 4 over¯ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Then, as n𝑛n\to\inftyitalic_n → ∞,

Tk(n)na.s.{0forkk;1fork=k,\displaystyle\frac{T_{k}(n)}{n}\overset{a.s.}{\longrightarrow}\begin{cases}0&% for\ k\neq k^{*};\\ 1&for\ k=k^{*},\end{cases}divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG start_OVERACCENT italic_a . italic_s . end_OVERACCENT start_ARG ⟶ end_ARG { start_ROW start_CELL 0 end_CELL start_CELL italic_f italic_o italic_r italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ; end_CELL end_ROW start_ROW start_CELL 1 end_CELL start_CELL italic_f italic_o italic_r italic_k = italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , end_CELL end_ROW

where the notation a.s.\overset{a.s.}{\longrightarrow}start_OVERACCENT italic_a . italic_s . end_OVERACCENT start_ARG ⟶ end_ARG denotes convergence almost surely (or with probability 1).

Theorem 1 implies that in the long run, the number of samples drawn from system ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT dominates the total numbers of samples drawn from other systems, in the almost sure sense. This result suggests that, it is natural to use Insuperscriptsubscript𝐼𝑛I_{n}^{*}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, defined in (3), as an estimator of the unknown ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Strong consistency of Insuperscriptsubscript𝐼𝑛I_{n}^{*}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT can also be established, which is summarized in the following theorem. Proof of the theorem is provided in Section A.4 of the appendix.

Theorem 2.

Under the same conditions of Theorem 1, as n𝑛n\to\inftyitalic_n → ∞,

Ina.s.k.\displaystyle I^{*}_{n}\overset{a.s.}{\longrightarrow}k^{*}.italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_OVERACCENT italic_a . italic_s . end_OVERACCENT start_ARG ⟶ end_ARG italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT .

4.1 GA Estimator

In what follows, we establish strong consistency, asymptotic MSE and asymptotic normality for the GA estimator M~nsubscript~𝑀𝑛\widetilde{M}_{n}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Let XIj,jsubscript𝑋subscript𝐼𝑗𝑗X_{I_{j},j}italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT denote the sample drawn at the j𝑗jitalic_jth round. Note that

M~nsubscript~𝑀𝑛\displaystyle\widetilde{M}_{n}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT =\displaystyle== 1nk=1KTk(n)X¯k[Tk(n)]=1nj=1nXIj,j=1nj=1n(XIj,jμIj)+1nj=1nμIj.1𝑛superscriptsubscript𝑘1𝐾subscript𝑇𝑘𝑛subscript¯𝑋𝑘delimited-[]subscript𝑇𝑘𝑛1𝑛superscriptsubscript𝑗1𝑛subscript𝑋subscript𝐼𝑗𝑗1𝑛superscriptsubscript𝑗1𝑛subscript𝑋subscript𝐼𝑗𝑗subscript𝜇subscript𝐼𝑗1𝑛superscriptsubscript𝑗1𝑛subscript𝜇subscript𝐼𝑗\displaystyle\frac{1}{n}\sum_{k=1}^{K}T_{k}(n)\bar{X}_{k}[T_{k}(n)]=\frac{1}{n% }\sum_{j=1}^{n}X_{I_{j},j}=\frac{1}{n}\sum_{j=1}^{n}\left(X_{I_{j},j}-\mu_{I_{% j}}\right)+\frac{1}{n}\sum_{j=1}^{n}\mu_{I_{j}}.divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ] = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT .

Define

Znj=1n(XIj,jμIj).subscript𝑍𝑛superscriptsubscript𝑗1𝑛subscript𝑋subscript𝐼𝑗𝑗subscript𝜇subscript𝐼𝑗\displaystyle Z_{n}\triangleq\sum_{j=1}^{n}\left(X_{I_{j},j}-\mu_{I_{j}}\right).italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≜ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) .

Then,

M~n=Znn+1nj=1nμIj.subscript~𝑀𝑛subscript𝑍𝑛𝑛1𝑛superscriptsubscript𝑗1𝑛subscript𝜇subscript𝐼𝑗\displaystyle\widetilde{M}_{n}=\frac{Z_{n}}{n}+\frac{1}{n}\sum_{j=1}^{n}\mu_{I% _{j}}.over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT . (7)

Let nsubscript𝑛{\cal F}_{n}caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be the σ𝜎\sigmaitalic_σ-field generated by the first n𝑛nitalic_n samples for n1𝑛1n\geq 1italic_n ≥ 1, and 0={Ω,}subscript0Ω{\cal F}_{0}=\{\Omega,\emptyset\}caligraphic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = { roman_Ω , ∅ }. Note that Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is n1subscript𝑛1{\cal F}_{n-1}caligraphic_F start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT-measurable. It follows that for n1𝑛1n\geq 1italic_n ≥ 1,

𝔼[XIn,n|n1]=μIn,andthen𝔼XIn,n=𝔼μIn.formulae-sequence𝔼delimited-[]conditionalsubscript𝑋subscript𝐼𝑛𝑛subscript𝑛1subscript𝜇subscript𝐼𝑛andthen𝔼subscript𝑋subscript𝐼𝑛𝑛𝔼subscript𝜇subscript𝐼𝑛\displaystyle{\mathbb{E}}\left[\left.X_{I_{n},n}\right|{\cal F}_{n-1}\right]=% \mu_{I_{n}},\ {\rm and\ then}\ {\mathbb{E}}X_{I_{n},n}={\mathbb{E}}\mu_{I_{n}}.blackboard_E [ italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_n end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ] = italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT , roman_and roman_then blackboard_E italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_n end_POSTSUBSCRIPT = blackboard_E italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT .

Therefore, 𝔼Zn=0𝔼subscript𝑍𝑛0{\mathbb{E}}Z_{n}=0blackboard_E italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 0 and

𝔼[Zn|n1]=Zn1+𝔼[XIn,nμIn|n1]=Zn1,𝔼delimited-[]conditionalsubscript𝑍𝑛subscript𝑛1subscript𝑍𝑛1𝔼delimited-[]subscript𝑋subscript𝐼𝑛𝑛conditionalsubscript𝜇subscript𝐼𝑛subscript𝑛1subscript𝑍𝑛1\displaystyle{\mathbb{E}}\left[\left.Z_{n}\right|{\cal F}_{n-1}\right]=Z_{n-1}% +{\mathbb{E}}\left[\left.X_{I_{n},n}-\mu_{I_{n}}\right|{\cal F}_{n-1}\right]=Z% _{n-1},blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT + blackboard_E [ italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_n end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ] = italic_Z start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ,

i.e., Znsubscript𝑍𝑛Z_{n}italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is a martingale with mean 0.

Note that Xkμksubscript𝑋𝑘subscript𝜇𝑘X_{k}-\mu_{k}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is sub-Gaussian with parameter γksubscript𝛾𝑘\gamma_{k}italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for k=1,,K𝑘1𝐾k=1,...,Kitalic_k = 1 , … , italic_K. Because In+1subscript𝐼𝑛1I_{n+1}italic_I start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT is nsubscript𝑛{\cal F}_{n}caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT-measurable,

𝔼[exp{s(XIn+1,n+1μIn+1)}|n]exp{γ¯2s22}, for s,formulae-sequence𝔼delimited-[]conditional𝑠subscript𝑋subscript𝐼𝑛1𝑛1subscript𝜇subscript𝐼𝑛1subscript𝑛superscript¯𝛾2superscript𝑠22 for 𝑠\displaystyle{\mathbb{E}}\left[\left.\exp\{s(X_{I_{n+1},n+1}-\mu_{I_{n+1}})\}% \right|{\cal F}_{n}\right]\leq\exp\left\{\frac{\bar{\gamma}^{2}s^{2}}{2}\right% \},\mbox{ for }s\in{\mathbb{R}},blackboard_E [ roman_exp { italic_s ( italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT , italic_n + 1 end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) } | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] ≤ roman_exp { divide start_ARG over¯ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG } , for italic_s ∈ blackboard_R ,

implies that XIn+1,n+1μIn+1subscript𝑋subscript𝐼𝑛1𝑛1subscript𝜇subscript𝐼𝑛1X_{I_{n+1},n+1}-\mu_{I_{n+1}}italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT , italic_n + 1 end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT is sub-Gaussian with parameter γ¯¯𝛾\bar{\gamma}over¯ start_ARG italic_γ end_ARG.

Recall that by the definition of Znsubscript𝑍𝑛Z_{n}italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT,

Zn+1Zn=XIn+1,n+1μIn+1.subscript𝑍𝑛1subscript𝑍𝑛subscript𝑋subscript𝐼𝑛1𝑛1subscript𝜇subscript𝐼𝑛1\displaystyle Z_{n+1}-Z_{n}=X_{I_{n+1},n+1}-\mu_{I_{n+1}}.italic_Z start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT , italic_n + 1 end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT .

Then, we have

(|Zn+1Zn|>a|n)2exp{a22γ¯2},subscript𝑍𝑛1subscript𝑍𝑛conditional𝑎subscript𝑛2superscript𝑎22superscript¯𝛾2\displaystyle{\mathbb{P}}\left(\left.|Z_{n+1}-Z_{n}|>a\right|{\cal F}_{n}% \right)\leq 2\exp\left\{-\frac{a^{2}}{2\bar{\gamma}^{2}}\right\},blackboard_P ( | italic_Z start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | > italic_a | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≤ 2 roman_exp { - divide start_ARG italic_a start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 over¯ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } ,

by the Hoeffding bound in Lemma A1.

Further note that {Zn+1Zn,n=1,2,}formulae-sequencesubscript𝑍𝑛1subscript𝑍𝑛𝑛12\{Z_{n+1}-Z_{n},n=1,2,...\}{ italic_Z start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_n = 1 , 2 , … } is a martingale difference sequence, the condition of Theorem 2 in Shamir (2011) is thus satisfied. Applying this theorem, for any δ>0𝛿0\delta>0italic_δ > 0, it holds that with probability at least 1δ1𝛿1-\delta1 - italic_δ,

Znn56log(1/δ)n/(2γ¯2)=112γ¯2log(1/δ)n.subscript𝑍𝑛𝑛561𝛿𝑛2superscript¯𝛾2112superscript¯𝛾21𝛿𝑛\displaystyle\frac{Z_{n}}{n}\leq\sqrt{\frac{56\log(1/\delta)}{n/(2\bar{\gamma}% ^{2})}}=\sqrt{\frac{112\bar{\gamma}^{2}\log(1/\delta)}{n}}.divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG ≤ square-root start_ARG divide start_ARG 56 roman_log ( 1 / italic_δ ) end_ARG start_ARG italic_n / ( 2 over¯ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG end_ARG = square-root start_ARG divide start_ARG 112 over¯ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( 1 / italic_δ ) end_ARG start_ARG italic_n end_ARG end_ARG .

Let ε=112γ¯2log(1/δ)n𝜀112superscript¯𝛾21𝛿𝑛\varepsilon=\sqrt{\frac{112\bar{\gamma}^{2}\log(1/\delta)}{n}}italic_ε = square-root start_ARG divide start_ARG 112 over¯ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( 1 / italic_δ ) end_ARG start_ARG italic_n end_ARG end_ARG. Then, δ=exp{nε2112γ¯2}𝛿𝑛superscript𝜀2112superscript¯𝛾2\delta=\exp\left\{-\frac{n\varepsilon^{2}}{112\bar{\gamma}^{2}}\right\}italic_δ = roman_exp { - divide start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 112 over¯ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } and

(Znn>ε)δ=exp{nε2112γ¯2}.subscript𝑍𝑛𝑛𝜀𝛿𝑛superscript𝜀2112superscript¯𝛾2\displaystyle{\mathbb{P}}\left(\frac{Z_{n}}{n}>\varepsilon\right)\leq\delta=% \exp\left\{-\frac{n\varepsilon^{2}}{112\bar{\gamma}^{2}}\right\}.blackboard_P ( divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG > italic_ε ) ≤ italic_δ = roman_exp { - divide start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 112 over¯ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } .

Applying the same argument to the sequence {Zn,n=1,2,}formulae-sequencesubscript𝑍𝑛𝑛12\{-Z_{n},n=1,2,...\}{ - italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_n = 1 , 2 , … }, symmetrically we have

(Znn<ε)δ=exp{nε2112γ¯2}.subscript𝑍𝑛𝑛𝜀𝛿𝑛superscript𝜀2112superscript¯𝛾2\displaystyle{\mathbb{P}}\left(\frac{Z_{n}}{n}<-\varepsilon\right)\leq\delta=% \exp\left\{-\frac{n\varepsilon^{2}}{112\bar{\gamma}^{2}}\right\}.blackboard_P ( divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG < - italic_ε ) ≤ italic_δ = roman_exp { - divide start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 112 over¯ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } .

Therefore,

(|Znn|>ε)2exp{nε2112γ¯2},subscript𝑍𝑛𝑛𝜀2𝑛superscript𝜀2112superscript¯𝛾2\displaystyle{\mathbb{P}}\left(\left|\frac{Z_{n}}{n}\right|>\varepsilon\right)% \leq 2\exp\left\{-\frac{n\varepsilon^{2}}{112\bar{\gamma}^{2}}\right\},blackboard_P ( | divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG | > italic_ε ) ≤ 2 roman_exp { - divide start_ARG italic_n italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 112 over¯ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } ,

and moreover,

n=1(|Zn/n|>ε)<.superscriptsubscript𝑛1subscript𝑍𝑛𝑛𝜀\displaystyle\sum_{n=1}^{\infty}{\mathbb{P}}\left(\left|Z_{n}/n\right|>% \varepsilon\right)<\infty.∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT blackboard_P ( | italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / italic_n | > italic_ε ) < ∞ .

By Borel-Cantelli lemma, Zn/na.s.0Z_{n}/n\overset{a.s.}{\longrightarrow}0italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / italic_n start_OVERACCENT italic_a . italic_s . end_OVERACCENT start_ARG ⟶ end_ARG 0 as n𝑛n\to\inftyitalic_n → ∞. i.e., the first term on the RHS of (7) converges to 0 almost surely.

Moreover, the second term on the RHS of (7) satisfies

1nj=1nμIj=1nk=1KTk(n)μka.s.μk,\displaystyle\frac{1}{n}\sum_{j=1}^{n}\mu_{I_{j}}=\frac{1}{n}\sum_{k=1}^{K}T_{% k}(n)\mu_{k}\overset{a.s.}{\longrightarrow}\mu_{k^{*}},divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_OVERACCENT italic_a . italic_s . end_OVERACCENT start_ARG ⟶ end_ARG italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ,

where the convergence follows from Theorem 1 and the continuous mapping theorem. Therefore, by (7), we establish strong consistency of M~nsubscript~𝑀𝑛\widetilde{M}_{n}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT that is summarized as follows.

Theorem 3.

Under the same conditions of Theorem 1, M~nsubscript~𝑀𝑛\widetilde{M}_{n}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is a strongly consistent estimator of μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, i.e., as n𝑛n\to\inftyitalic_n → ∞,

M~na.s.μ.\displaystyle\widetilde{M}_{n}\overset{a.s.}{\longrightarrow}\mu^{*}.over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_OVERACCENT italic_a . italic_s . end_OVERACCENT start_ARG ⟶ end_ARG italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT .

Theorem 3 ensures that the GA estimator, M~nsubscript~𝑀𝑛\widetilde{M}_{n}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, converges to μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT with probability 1, as n𝑛n\rightarrow\inftyitalic_n → ∞, which serves as an important theoretical guarantee for the estimator. To further understand its probabilistic behavior, it is desirable to conduct error analysis, especially on its bias and variance. In the following theorem, we establish upper bounds for its bias, variance and MSE. The proof of the theorem is provided in Section A.5 of the appendix.

Theorem 4.

Suppose that Assumptions 1 and 2 hold, and νn[αlogn,n12δ]subscript𝜈𝑛𝛼𝑛superscript𝑛12𝛿\nu_{n}\in\left[\alpha\log n,n^{\frac{1}{2}-\delta}\right]italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ [ italic_α roman_log italic_n , italic_n start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_δ end_POSTSUPERSCRIPT ] with 0<δ<10𝛿10<\delta<10 < italic_δ < 1 and α4γ¯2𝛼4superscript¯𝛾2\alpha\geq 4\bar{\gamma}^{2}italic_α ≥ 4 over¯ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Let c𝑐citalic_c be a constant satisfying 8νnΔk2+5cνn8subscript𝜈𝑛superscriptsubscriptΔ𝑘25𝑐subscript𝜈𝑛\frac{8\nu_{n}}{\Delta_{k}^{2}}+5\leq c\nu_{n}divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 5 ≤ italic_c italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for kk𝑘superscript𝑘k\neq k^{*}italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, and μ¯2=max{μ12,,μK2}superscript¯𝜇2superscriptsubscript𝜇12superscriptsubscript𝜇𝐾2\bar{\mu}^{2}=\max\{\mu_{1}^{2},...,\mu_{K}^{2}\}over¯ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = roman_max { italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … , italic_μ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }. Then,

|Bias(M~n)|kk1n(8νnΔk2+5Δk)=O(νnn),Var[M~n]2Kσk2n+2K3(μ¯2+1)n2c2νn2=O(1n),formulae-sequenceBiassubscript~𝑀𝑛subscript𝑘superscript𝑘1𝑛8subscript𝜈𝑛superscriptsubscriptΔ𝑘25subscriptΔ𝑘𝑂subscript𝜈𝑛𝑛Vardelimited-[]subscript~𝑀𝑛2𝐾superscriptsubscript𝜎superscript𝑘2𝑛2superscript𝐾3superscript¯𝜇21superscript𝑛2superscript𝑐2superscriptsubscript𝜈𝑛2𝑂1𝑛\displaystyle\left|{\rm Bias}\left(\widetilde{M}_{n}\right)\right|\leq\sum_{k% \neq k^{*}}\frac{1}{n}\left(\frac{8\nu_{n}}{\Delta_{k}^{2}}+5\Delta_{k}\right)% =O\left(\frac{\nu_{n}}{n}\right),\quad{\rm Var}\left[\widetilde{M}_{n}\right]% \leq\frac{2K\sigma_{k^{*}}^{2}}{n}+\frac{2K^{3}(\bar{\mu}^{2}+1)}{n^{2}}c^{2}% \nu_{n}^{2}=O\left(\frac{1}{n}\right),| roman_Bias ( over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) | ≤ ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 5 roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = italic_O ( divide start_ARG italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG ) , roman_Var [ over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] ≤ divide start_ARG 2 italic_K italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG 2 italic_K start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ( over¯ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) ,

and thus

MSE(M~n)2Kσk2n+o(1n),MSEsubscript~𝑀𝑛2𝐾superscriptsubscript𝜎superscript𝑘2𝑛𝑜1𝑛\displaystyle{\rm MSE}\left(\widetilde{M}_{n}\right)\leq{2K\sigma_{k^{*}}^{2}% \over n}+o\left({1\over n}\right),roman_MSE ( over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≤ divide start_ARG 2 italic_K italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + italic_o ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) ,

where the notation o()𝑜o(\cdot)italic_o ( ⋅ ) means that limnan/bn=0subscript𝑛subscript𝑎𝑛subscript𝑏𝑛0\lim_{n\rightarrow\infty}a_{n}/b_{n}=0roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 0 if an=o(bn)subscript𝑎𝑛𝑜subscript𝑏𝑛a_{n}=o(b_{n})italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_o ( italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ).

Theorem 4 shows that the MSE of the GA estimator converges to 00 at a rate of n1superscript𝑛1n^{-1}italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. In particular, the variance is the dominant term in the MSE, compared to the square of the bias, which is of order νn2/n2superscriptsubscript𝜈𝑛2superscript𝑛2\nu_{n}^{2}/n^{2}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. This result implies that when the exploration rate takes value in the range νn[αlogn,n1/2δ]subscript𝜈𝑛𝛼𝑛superscript𝑛12𝛿\nu_{n}\in\left[\alpha\log n,n^{1/2-\delta}\right]italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ [ italic_α roman_log italic_n , italic_n start_POSTSUPERSCRIPT 1 / 2 - italic_δ end_POSTSUPERSCRIPT ] with 0<δ<1/20𝛿120<\delta<1/20 < italic_δ < 1 / 2, the bias of the GA estimator is negligible compared to its variance, making it possible to construct CIs by ignoring its bias. Furthermore, as suggested by Theorem 4, the leading term of the variance does not depend on νnsubscript𝜈𝑛\nu_{n}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. It is thus asymptotically optimal to choose logarithmic νnsubscript𝜈𝑛\nu_{n}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, the slowest possible rate, so as to reduce the asymptotic bias as far as possible. The superiority of this choice of νnsubscript𝜈𝑛\nu_{n}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is also confirmed by the numerical experiments in Section 5.

To provide a formal theoretical support for asymptotically valid CIs and the single hypothesis test in (6), we establish a CLT for M~nsubscript~𝑀𝑛\widetilde{M}_{n}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT in the following theorem, whose proof is provided in Section A.6 of the appendix.

Theorem 5.

Suppose that Assumptions 1 and 2 hold, and νn[αlogn,n12δ]subscript𝜈𝑛𝛼𝑛superscript𝑛12𝛿\nu_{n}\in\left[\alpha\log n,n^{\frac{1}{2}-\delta}\right]italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ [ italic_α roman_log italic_n , italic_n start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_δ end_POSTSUPERSCRIPT ] with 0<δ<10𝛿10<\delta<10 < italic_δ < 1 and α4γ¯2𝛼4superscript¯𝛾2\alpha\geq 4\bar{\gamma}^{2}italic_α ≥ 4 over¯ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Then,

n(M~nμ)N(0,σk2),asn,formulae-sequence𝑛subscript~𝑀𝑛superscript𝜇𝑁0superscriptsubscript𝜎superscript𝑘2𝑎𝑠𝑛\displaystyle\sqrt{n}\left(\widetilde{M}_{n}-\mu^{*}\right)\Rightarrow N(0,% \sigma_{k^{*}}^{2}),\ \ as\ \ n\rightarrow\infty,square-root start_ARG italic_n end_ARG ( over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ⇒ italic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , italic_a italic_s italic_n → ∞ ,

where \Rightarrow denotes convergence in distribution.

Theorem 5 shows that the GA estimator is asymptotically normally distributed with mean μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and variance σk2/nsuperscriptsubscript𝜎superscript𝑘2𝑛\sigma_{k^{*}}^{2}/nitalic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_n. Based on Theorem 5, an asymptotically valid CI can be constructed for μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. To do so, a remaining issue is on how to estimate the unknown σk2superscriptsubscript𝜎superscript𝑘2\sigma_{k^{*}}^{2}italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. In light of the proof of Theorem 5, we estimate σk2superscriptsubscript𝜎superscript𝑘2\sigma_{k^{*}}^{2}italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT using

σ~n2=1nk=1KTk(n)σ^k2,superscriptsubscript~𝜎𝑛21𝑛superscriptsubscript𝑘1𝐾subscript𝑇𝑘𝑛superscriptsubscript^𝜎𝑘2\displaystyle\widetilde{\sigma}_{n}^{2}=\frac{1}{n}\sum_{k=1}^{K}T_{k}(n)\hat{% \sigma}_{k}^{2},over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (8)

where for k=1,,K𝑘1𝐾k=1,...,Kitalic_k = 1 , … , italic_K,

σ^k2=1Tk(n)j=1Tk(n)Xk,j2[1Tk(n)j=1Tk(n)Xk,j]2.superscriptsubscript^𝜎𝑘21subscript𝑇𝑘𝑛superscriptsubscript𝑗1subscript𝑇𝑘𝑛superscriptsubscript𝑋𝑘𝑗2superscriptdelimited-[]1subscript𝑇𝑘𝑛superscriptsubscript𝑗1subscript𝑇𝑘𝑛subscript𝑋𝑘𝑗2\displaystyle\hat{\sigma}_{k}^{2}=\frac{1}{T_{k}(n)}\sum_{j=1}^{T_{k}(n)}X_{k,% j}^{2}-\left[\frac{1}{T_{k}(n)}\sum_{j=1}^{T_{k}(n)}X_{k,j}\right]^{2}.over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - [ divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (9)

It can be shown that σ~n2superscriptsubscript~𝜎𝑛2\widetilde{\sigma}_{n}^{2}over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT converges to σk2superscriptsubscript𝜎superscript𝑘2\sigma_{k^{*}}^{2}italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT in probability, as n𝑛n\rightarrow\inftyitalic_n → ∞. This result is summarized in the following proposition, whose proof is provided in Section A.7 of the appendix.

Proposition 2.

Under the same conditions of Theorem 5, as n𝑛n\rightarrow\inftyitalic_n → ∞,

σ~n2σk2,inprobability.superscriptsubscript~𝜎𝑛2superscriptsubscript𝜎superscript𝑘2inprobability\displaystyle\widetilde{\sigma}_{n}^{2}\to\sigma_{k^{*}}^{2},\ \ {\rm in\ % probability}.over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , roman_in roman_probability .

From Theorem 5 and Proposition 2, it follows that

nσ~n1(M~nμ)N(0,1).𝑛superscriptsubscript~𝜎𝑛1subscript~𝑀𝑛superscript𝜇𝑁01\displaystyle\sqrt{n}\widetilde{\sigma}_{n}^{-1}\left(\widetilde{M}_{n}-\mu^{*% }\right)\Rightarrow N(0,1).square-root start_ARG italic_n end_ARG over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ⇒ italic_N ( 0 , 1 ) .

Then, an asymptotically valid 100(1β)%percent1𝛽(1-\beta)\%( 1 - italic_β ) % CI of μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is given by

(M~nz1β/2σ~n/n,M~n+z1β/2σ~n/n),subscript~𝑀𝑛subscript𝑧1𝛽2subscript~𝜎𝑛𝑛subscript~𝑀𝑛subscript𝑧1𝛽2subscript~𝜎𝑛𝑛\displaystyle\left(\widetilde{M}_{n}-z_{1-\beta/2}\widetilde{\sigma}_{n}/\sqrt% {n},\quad\widetilde{M}_{n}+z_{1-\beta/2}\widetilde{\sigma}_{n}/\sqrt{n}\right),( over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 1 - italic_β / 2 end_POSTSUBSCRIPT over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / square-root start_ARG italic_n end_ARG , over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_z start_POSTSUBSCRIPT 1 - italic_β / 2 end_POSTSUBSCRIPT over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / square-root start_ARG italic_n end_ARG ) , (10)

where z1β/2subscript𝑧1𝛽2z_{1-\beta/2}italic_z start_POSTSUBSCRIPT 1 - italic_β / 2 end_POSTSUBSCRIPT is the 1β/21𝛽21-\beta/21 - italic_β / 2 quantile of the standard normal distribution.

The CLT in Theorem 5 and the variance estimator σ~n2superscriptsubscript~𝜎𝑛2\widetilde{\sigma}_{n}^{2}over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT in Proposition 2 also serve as the theoretical foundation for the proposed single hypothesis test in Section 3.3. More specifically, the following testing statistics can be constructed:

U~nn(M~nμ0)σ~n.subscript~𝑈𝑛𝑛subscript~𝑀𝑛subscript𝜇0subscript~𝜎𝑛\widetilde{U}_{n}\triangleq{\sqrt{n}\left(\widetilde{M}_{n}-\mu_{0}\right)% \over\widetilde{\sigma}_{n}}.over~ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≜ divide start_ARG square-root start_ARG italic_n end_ARG ( over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG start_ARG over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG . (11)

Note that under either the null or alternative hypothesis, maxk=0,,K1μkμ0subscript𝑘0𝐾1subscript𝜇𝑘subscript𝜇0\max_{k=0,\ldots,K-1}\mu_{k}\geq\mu_{0}roman_max start_POSTSUBSCRIPT italic_k = 0 , … , italic_K - 1 end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≥ italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Therefore, a one-side test can be conducted using the GA-based statistics U~nsubscript~𝑈𝑛\widetilde{U}_{n}over~ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. In particular, the null hypothesis is rejected when U~n>z1αsubscript~𝑈𝑛subscript𝑧1𝛼\widetilde{U}_{n}>z_{1-\alpha}over~ start_ARG italic_U end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT > italic_z start_POSTSUBSCRIPT 1 - italic_α end_POSTSUBSCRIPT, suggesting that at least one of the considered treatments outperform the control treatment, where α𝛼\alphaitalic_α denotes the overall type I error.

4.2 LSA Estimator

Strong consistency of the LSA estimator is stated in the following theorem, whose proof is provided in Section A.8 of the appendix.

Theorem 6.

Suppose that Assumptions 1 and 2 hold, and νn[αlogn,n1δ]subscript𝜈𝑛𝛼𝑛superscript𝑛1𝛿\nu_{n}\in\left[\alpha\log n,n^{1-\delta}\right]italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ [ italic_α roman_log italic_n , italic_n start_POSTSUPERSCRIPT 1 - italic_δ end_POSTSUPERSCRIPT ] with 0<δ<10𝛿10<\delta<10 < italic_δ < 1 and α4γ¯2𝛼4superscript¯𝛾2\alpha\geq 4\bar{\gamma}^{2}italic_α ≥ 4 over¯ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Then, as n𝑛n\to\inftyitalic_n → ∞,

MIna.s.μ.\displaystyle M_{I^{*}_{n}}\overset{a.s.}{\longrightarrow}\mu^{*}.italic_M start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_OVERACCENT italic_a . italic_s . end_OVERACCENT start_ARG ⟶ end_ARG italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT .

Theorem 6 shows that the LSA estimator converges to μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT as n𝑛n\rightarrow\inftyitalic_n → ∞. To better understand its asymptotic error, we establish upper bounds for its asymptotic bias, variance and MSE, which are summarized in the following theorem with proof provided in Section A.9 of the appendix.

Theorem 7.

Under the same conditions of Theorem 6, Bias(MIn)=𝔼[MInμk]<0Biassubscript𝑀superscriptsubscript𝐼𝑛𝔼delimited-[]subscript𝑀superscriptsubscript𝐼𝑛subscript𝜇superscript𝑘0{\rm Bias}(M_{I_{n}^{*}})={\mathbb{E}}\left[M_{I_{n}^{*}}-\mu_{k^{*}}\right]<0roman_Bias ( italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = blackboard_E [ italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] < 0 for sufficiently large n𝑛nitalic_n, and there exist constants c1,c2subscript𝑐1subscript𝑐2c_{1},c_{2}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and c3subscript𝑐3c_{3}italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, such that

|Bias(MIn)|K2σk2c2νn+c3n(nKc1νn)+o(νnn3/2)=O(νnn3/2),Biassubscript𝑀superscriptsubscript𝐼𝑛superscript𝐾2superscriptsubscript𝜎superscript𝑘2subscript𝑐2subscript𝜈𝑛subscript𝑐3𝑛𝑛𝐾subscript𝑐1subscript𝜈𝑛𝑜subscript𝜈𝑛superscript𝑛32𝑂subscript𝜈𝑛superscript𝑛32\displaystyle\left|{\rm Bias}\left(M_{I_{n}^{*}}\right)\right|\leq{K^{2}\sigma% _{k^{*}}^{2}c_{2}\nu_{n}+c_{3}\over\sqrt{n}\left(n-Kc_{1}\nu_{n}\right)}+o% \left({\nu_{n}\over n^{3/2}}\right)=O\left({\nu_{n}\over n^{3/2}}\right),| roman_Bias ( italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) | ≤ divide start_ARG italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_n end_ARG ( italic_n - italic_K italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG + italic_o ( divide start_ARG italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT end_ARG ) = italic_O ( divide start_ARG italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT end_ARG ) ,
Var[MIn]σk2K2n+o(1n),Vardelimited-[]subscript𝑀superscriptsubscript𝐼𝑛superscriptsubscript𝜎superscript𝑘2superscript𝐾2𝑛𝑜1𝑛\displaystyle{\rm Var}\left[M_{I_{n}^{*}}\right]\leq{\sigma_{k^{*}}^{2}K^{2}% \over n}+o\left({1\over n}\right),roman_Var [ italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] ≤ divide start_ARG italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + italic_o ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) ,

and thus

MSE[MIn]σk2K2n+O(νn2n3)=O(1n).MSEdelimited-[]subscript𝑀superscriptsubscript𝐼𝑛superscriptsubscript𝜎superscript𝑘2superscript𝐾2𝑛𝑂superscriptsubscript𝜈𝑛2superscript𝑛3𝑂1𝑛\displaystyle{\rm MSE}\left[M_{I_{n}^{*}}\right]\leq{\sigma_{k^{*}}^{2}K^{2}% \over n}+O\left(\frac{\nu_{n}^{2}}{n^{3}}\right)=O\left(\frac{1}{n}\right).roman_MSE [ italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] ≤ divide start_ARG italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + italic_O ( divide start_ARG italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ) = italic_O ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) .

Theorem 7 shows that the bias of LSA estimator decays at a rate at least as fast as νn/n3/2subscript𝜈𝑛superscript𝑛32\nu_{n}/n^{3/2}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / italic_n start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT, which is much faster than that of GA estimator (see Theorem 4). Interestingly, the rate of convergence of its bias can be faster than n1superscript𝑛1n^{-1}italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT if νn=o(n1/2)subscript𝜈𝑛𝑜superscript𝑛12\nu_{n}=o(n^{1/2})italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_o ( italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ). Furthermore, as suggested by Theorem 7, the leading term of the variance does not depend on νnsubscript𝜈𝑛\nu_{n}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. It is thus asymptotically optimal to choose logarithmic νnsubscript𝜈𝑛\nu_{n}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, the slowest possible rate, so as to reduce the asymptotic bias as far as possible. The superiority of this choice of νnsubscript𝜈𝑛\nu_{n}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is also confirmed by the numerical experiments in Section 5.

It should be remarked that in Theorems 7 and 4, the dominant coefficients of the variance bounds for the LSA and GA estimators, i.e., σk2K2superscriptsubscript𝜎superscript𝑘2superscript𝐾2\sigma_{k^{*}}^{2}K^{2}italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and 2Kσk22𝐾superscriptsubscript𝜎superscript𝑘22K\sigma_{k^{*}}^{2}2 italic_K italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, are not tight and can be further reduced with more elaborate analysis. These coefficients do not imply that the GA estimator has a smaller variance, and in fact, numerical results in Section 5 suggest the opposite.

Similar to the case for GA, we establish a CLT for the LSA estimator, which is stated in the following theorem. Proof of the theorem is provided in Section A.10 of the appendix.

Theorem 8.

Under the same conditions of Theorem 6,

n(MInμ)N(0,σk2),asn.formulae-sequence𝑛subscript𝑀subscriptsuperscript𝐼𝑛superscript𝜇𝑁0superscriptsubscript𝜎superscript𝑘2𝑎𝑠𝑛\displaystyle\sqrt{n}\left(M_{I^{*}_{n}}-\mu^{*}\right)\Rightarrow N(0,\sigma_% {k^{*}}^{2}),\ \ as\ \ n\to\infty.square-root start_ARG italic_n end_ARG ( italic_M start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ⇒ italic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , italic_a italic_s italic_n → ∞ .

Theorem 8 shows the asymptotic normality of the LSA estimator, whose rate of convergence is n1/2superscript𝑛12n^{-1/2}italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT. From its proof, it can also be seen that if replacing n𝑛\sqrt{n}square-root start_ARG italic_n end_ARG by TIn(n)subscript𝑇subscriptsuperscript𝐼𝑛𝑛\sqrt{T_{I^{*}_{n}}(n)}square-root start_ARG italic_T start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG, the CLT still holds.

While both CLTs for the GA and LSA estimators (in Theorems 5 and 8, respectively) have the same rate of convergence, it should be emphasized that they require different conditions on the exploration rate. Specifically, CLT holds for the LSA estimator so long as the exploration rate νnsubscript𝜈𝑛\nu_{n}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is within the range of [αlogn,n1δ]𝛼𝑛superscript𝑛1𝛿\left[\alpha\log n,n^{1-\delta}\right][ italic_α roman_log italic_n , italic_n start_POSTSUPERSCRIPT 1 - italic_δ end_POSTSUPERSCRIPT ], while CLT of the GA estimator can only be guaranteed under a more restricted condition that νn[αlogn,n1/2δ]subscript𝜈𝑛𝛼𝑛superscript𝑛12𝛿\nu_{n}\in\left[\alpha\log n,n^{{1/2}-\delta}\right]italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ [ italic_α roman_log italic_n , italic_n start_POSTSUPERSCRIPT 1 / 2 - italic_δ end_POSTSUPERSCRIPT ].

In what follows, we construct an asymptotically valid CI for μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT based on Theorem 8. To do this, we propose to estimate the unknown variance σk2superscriptsubscript𝜎superscript𝑘2\sigma_{k^{*}}^{2}italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT by

σ¯n2=1TIn(n)j=1TIn(n)XIn,j2[1TIn(n)j=1TIn(n)XIn,j]2.superscriptsubscript¯𝜎𝑛21subscript𝑇subscriptsuperscript𝐼𝑛𝑛superscriptsubscript𝑗1subscript𝑇subscriptsuperscript𝐼𝑛𝑛superscriptsubscript𝑋subscriptsuperscript𝐼𝑛𝑗2superscriptdelimited-[]1subscript𝑇subscriptsuperscript𝐼𝑛𝑛superscriptsubscript𝑗1subscript𝑇subscriptsuperscript𝐼𝑛𝑛subscript𝑋subscriptsuperscript𝐼𝑛𝑗2\displaystyle\bar{\sigma}_{n}^{2}=\frac{1}{T_{I^{*}_{n}}(n)}\sum_{j=1}^{T_{I^{% *}_{n}}(n)}X_{I^{*}_{n},j}^{2}-\left[\frac{1}{T_{I^{*}_{n}}(n)}\sum_{j=1}^{T_{% I^{*}_{n}}(n)}X_{I^{*}_{n},j}\right]^{2}.over¯ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - [ divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Strong consistency of the estimator σ¯n2superscriptsubscript¯𝜎𝑛2\bar{\sigma}_{n}^{2}over¯ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is established in the following proposition, whose proof is provided in Section A.11 of the appendix.

Proposition 3.

Under the same conditions of Theorem 6,

σ¯n2a.s.σk2,asn.\displaystyle\bar{\sigma}_{n}^{2}\overset{a.s.}{\longrightarrow}\sigma_{k^{*}}% ^{2},\quad{\mbox{a}s}\ n\rightarrow\infty.over¯ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_OVERACCENT italic_a . italic_s . end_OVERACCENT start_ARG ⟶ end_ARG italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , a italic_s italic_n → ∞ . (12)

Similar to (10), an asymptotically valid 100(1β)%percent1𝛽(1-\beta)\%( 1 - italic_β ) % CI of μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is given by

(MInz1β/2σ¯n/n,MIn+z1β/2σ¯n/n).subscript𝑀subscriptsuperscript𝐼𝑛subscript𝑧1𝛽2subscript¯𝜎𝑛𝑛subscript𝑀subscriptsuperscript𝐼𝑛subscript𝑧1𝛽2subscript¯𝜎𝑛𝑛\displaystyle\left(M_{I^{*}_{n}}-z_{1-\beta/2}\bar{\sigma}_{n}/\sqrt{n},\quad M% _{I^{*}_{n}}+z_{1-\beta/2}\bar{\sigma}_{n}/\sqrt{n}\right).( italic_M start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 1 - italic_β / 2 end_POSTSUBSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / square-root start_ARG italic_n end_ARG , italic_M start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_z start_POSTSUBSCRIPT 1 - italic_β / 2 end_POSTSUBSCRIPT over¯ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / square-root start_ARG italic_n end_ARG ) . (13)

Similar to the discussion in Section 4.1, for the proposed single hypothesis test in Section 3.3, we construct the following LSA-based testing statistics:

Unn(MInμ0)σ¯n,superscriptsubscript𝑈𝑛𝑛subscript𝑀superscriptsubscript𝐼𝑛subscript𝜇0subscript¯𝜎𝑛U_{n}^{*}\triangleq{\sqrt{n}\left(M_{I_{n}^{*}}-\mu_{0}\right)\over\bar{\sigma% }_{n}},italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≜ divide start_ARG square-root start_ARG italic_n end_ARG ( italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG start_ARG over¯ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG , (14)

and reject the null hypothesis when Un>z1αsuperscriptsubscript𝑈𝑛subscript𝑧1𝛼U_{n}^{*}>z_{1-\alpha}italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT > italic_z start_POSTSUBSCRIPT 1 - italic_α end_POSTSUBSCRIPT, where α𝛼\alphaitalic_α denotes the overall type I error.

5 Numerical Study

We consider three examples. The first example aims to illustrate the asymptotic behaviors of the GA and LSA estimators, including their errors and rates of convergence. While it has been argued in Sections 4.1 and 4.2 that logarithmic exploration rate is asymptotically optimal when minimizing MSEs of the GA and LSA estimators, we numerically confirm it using the first set of experiments. For the rest of the experiments in all examples, we choose logarithmic exploration rate given its superiority on both theoretical and empirical grounds.

In the second example, we consider an application of the GA and LSA estimator to the estimation of coherent risk measures, and compare our constructed GA and LSA CIs to the fixed-width CIs produced by the procedure of Lesnevski et al. (2007).

In the third example, we consider a clinical trial using the GA- and LSA-based single hypothesis tests proposed in (6), aiming to demonstrate their relative merits compared to an existing test with multiple hypotheses in the form of (5).

5.1 Implementation Issues

In the numerical experiments, we observed that the bias of the GA estimator with finite samples could be quite large, especially when the variances of the systems are large, or the exploration rate νnsubscript𝜈𝑛\nu_{n}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is large. This phenomenon is intuitively explained by that there may be too many samples drawn from systems other than ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, leading to a significant underestimation of the maximum mean. However, this adverse effect dies out gradually in the long run. As a means to alleviate this adverse effect in finite-sample setting, we propose to choose a certain amount of samples as a warm-up period and discard the warm-up samples in the estimation. Numerical experiments suggest that a small amount of warm-up samples, e.g. 10%percent1010\%10 % of the total sampling budget, may greatly improve the estimation quality of the GA estimator. Throughout our experiments, we set 10% of the budget as the warm-up period for the GA estimator.

By contrast, the LSA estimator is insensitive to the warm-up samples. Whether to keep or to discard these samples does not have much impact on the quality of the LSA estimator. During the implementation, we discard the warm-up period samples for the LSA estimator as well, to be parallel to the setting for the GA estimator.

From an implementation perspective, it has also been documented that slightly tuning the UCB policy by taking into consideration of the variances of the systems may substantially improve the performances of algorithms for multi-armed bandit problems; see, e.g., Auer et al. (2002). We observe the same phenomenon when estimating the maximum mean. Following the same idea of Auer et al. (2002), we propose to replace the UCB, i.e., 2νn/Tk(n1)2subscript𝜈𝑛subscript𝑇𝑘𝑛1\sqrt{2\nu_{n}/T_{k}(n-1)}square-root start_ARG 2 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n - 1 ) end_ARG, by

2νnTk(n1)Vk,n2subscript𝜈𝑛subscript𝑇𝑘𝑛1subscript𝑉𝑘𝑛\sqrt{{2\nu_{n}\over T_{k}(n-1)}V_{k,n}}square-root start_ARG divide start_ARG 2 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n - 1 ) end_ARG italic_V start_POSTSUBSCRIPT italic_k , italic_n end_POSTSUBSCRIPT end_ARG

for k=1,,K𝑘1𝐾k=1,\ldots,Kitalic_k = 1 , … , italic_K during the implementation, where

Vk,n=1Tk(n1)i=1Tk(n1)Xk,j2X¯k2[Tk(n1)]+2νnTk(n1)subscript𝑉𝑘𝑛1subscript𝑇𝑘𝑛1superscriptsubscript𝑖1subscript𝑇𝑘𝑛1superscriptsubscript𝑋𝑘𝑗2superscriptsubscript¯𝑋𝑘2delimited-[]subscript𝑇𝑘𝑛12subscript𝜈𝑛subscript𝑇𝑘𝑛1V_{k,n}=\frac{1}{T_{k}(n-1)}\sum_{i=1}^{T_{k}(n-1)}X_{k,j}^{2}-\bar{X}_{k}^{2}% [T_{k}(n-1)]+\sqrt{2\nu_{n}\over T_{k}(n-1)}italic_V start_POSTSUBSCRIPT italic_k , italic_n end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n - 1 ) end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n - 1 ) end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n - 1 ) ] + square-root start_ARG divide start_ARG 2 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n - 1 ) end_ARG end_ARG

is an estimate of the UCB of the variance of system k𝑘kitalic_k.

5.2 Maximum of K𝐾Kitalic_K Normal Means

Consider K𝐾Kitalic_K systems, with samples from normal distributions with unknown means (μ1,μ2,,μK)subscript𝜇1subscript𝜇2subscript𝜇𝐾(\mu_{1},\mu_{2},\dots,\mu_{K})( italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_μ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ). Parameter configuration is similar to that of Fan et al. (2016), who focused on selecting the system with the largest mean rather than estimating the maximum mean. More specifically, the means (μ1,μ2,,μK)subscript𝜇1subscript𝜇2subscript𝜇𝐾(\mu_{1},\mu_{2},\dots,\mu_{K})( italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_μ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) of systems are set to be μk=1.5+0.5ksubscript𝜇𝑘1.50.5𝑘\mu_{k}=1.5+0.5kitalic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1.5 + 0.5 italic_k, k=1,,K𝑘1𝐾k=1,\ldots,Kitalic_k = 1 , … , italic_K, and the standard deviations are set to be σk=μksubscript𝜎𝑘subscript𝜇𝑘\sigma_{k}=\mu_{k}italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for all systems.

We examine the performances of different estimators when estimating μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, by considering their biases, standard deviations, MSEs, and RRMSEs, where RRMSE is defined as the percentage of the root MSE to the true maximum mean. Results reported are based on 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT independent replications unless otherwise stated.

In the first set of experiments, let K=20𝐾20K=20italic_K = 20 with the total sampling budget n𝑛nitalic_n ranging between 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT and 107superscript10710^{7}10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT. We compare biases, MSEs, and coverage probabilities for GA and LSA estimators with respect to (w.r.t.) νn=lognsubscript𝜈𝑛𝑛\nu_{n}=\log nitalic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_log italic_n, n1/2superscript𝑛12n^{1/2}italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT, and n2/3superscript𝑛23n^{2/3}italic_n start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT. The results are summarized in Figures 1, 2, and 3, illustrating the rates of convergence of absolute biases and MSEs, and coverage probabilities of the 90% CIs, for the GA (left panel) and LSA (right panel) estimators.

  • Figure 1 shows that for different choices of νnsubscript𝜈𝑛\nu_{n}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, the bias of the LSA estimator is substantially smaller than that of the GA estimator, consistent with Theorems 4 and 7 that the former converges to 00 at a rate of at least νn/n3/2subscript𝜈𝑛superscript𝑛32\nu_{n}/n^{3/2}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / italic_n start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT, much faster than the latter (with a rate of νn/nsubscript𝜈𝑛𝑛\nu_{n}/nitalic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / italic_n). Furthermore, it is observed that logarithmic νnsubscript𝜈𝑛\nu_{n}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT leads to the smallest biases among the three choices, for both the GA and LSA estimators.

  • Figure 2 shows that the MSE of the LSA estimator converges approximately to 00 at a rate of n1superscript𝑛1n^{-1}italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT for all choice of νnsubscript𝜈𝑛\nu_{n}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. For the GA estimator, the same rate is observed for logarithmic νnsubscript𝜈𝑛\nu_{n}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, while the rate tends to be slower when νn=n1/2subscript𝜈𝑛superscript𝑛12\nu_{n}=n^{1/2}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT and νn=n2/3subscript𝜈𝑛superscript𝑛23\nu_{n}=n^{2/3}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_n start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT. This coincides with the requirement in Theorems 4 that νnsubscript𝜈𝑛\nu_{n}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT must be slower than n1/2superscript𝑛12n^{1/2}italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT for the GA estimator.

  • Figure 3 shows the LSA CIs outperform the GA ones substantially in terms of coverage probabilities, especially when sample sizes are relatively small, due to the significantly smaller bias of the LSA estimator. When νn=lognsubscript𝜈𝑛𝑛\nu_{n}=\log nitalic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_log italic_n, both the GA and LSA CIs achieve the nominal coverage probability with sufficiently large sample sizes. However, the GA CIs fail to do so for νn=n1/2subscript𝜈𝑛superscript𝑛12\nu_{n}=n^{1/2}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT and νn=n2/3subscript𝜈𝑛superscript𝑛23\nu_{n}=n^{2/3}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_n start_POSTSUPERSCRIPT 2 / 3 end_POSTSUPERSCRIPT that violate the requirement in Theorem 5.

Refer to caption
Figure 1: (Color online) Convergence of estimated absolute biases.
Refer to caption
Figure 2: (Color online) Convergence rate of estimated MSEs.
Refer to caption
Figure 3: (Color online) Observed coverage probabilities of the 90% confidence intervals with respect to different sample sizes.

Given the superiority of logarithmic exploration rate, we set νn=lognsubscript𝜈𝑛𝑛\nu_{n}=\log nitalic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_log italic_n in the rest of the experiments in Section 5.

In the second set of experiments, we fix K=20𝐾20K=20italic_K = 20, and compare the GA and LSA estimators to four competing estimators given as follows. Consider two maximum average estimators that take the maximum of the sample means of the K𝐾Kitalic_K systems. In particular, the first is a maximum average estimator based on the GUCB adaptive sampling policy, referred to as adaptive maximum average (AMA) estimator, while the other is the maximum average estimator based on a simple static sampling policy that randomly allocates each sample to one of the systems with equal probabilities, referred to as simple maximum average (SMA) estimator. Moreover, we consider two heuristic estimators in the form of the sample average of certain selected system resulting from existing best arm identification methods. In particular, we consider two Bayesian best arm identification methods, including the top-two probability sampling (TTPS) method of Russo (2020), and the top-two expected improvement (TTEI) of Qin et al. (2017). For these two estimators, we assume known variances of the systems as required by the TTPS and TTEI methods, and set the tuning parameter β𝛽\betaitalic_β to be the default value of 0.50.50.50.5, following Russo (2020) and Qin et al. (2017).

It should be pointed out that, in spite of being reasonable estimators, asymptotic properties of the AMA, TTPS and TTEI estimators, such as their biases, MSEs and central limit theorems, are not known in the literature. The comparison of these estimators in our experiments serves merely to numerically examine their merits, while development of theoretical underpinnings is a topic that deserves future investigation.

Relative biases and standard deviations as percentages of the true maximum mean, and RRMSEs are summarized in Table 1444For cases of n=104𝑛superscript104n=10^{4}italic_n = 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT and 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT, reported errors are estimated based on 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT independent replications.. From Table 1, it can be seen that all the considered estimators converge as sample size increases. Among them, LSA and AMA have almost the same biases and variances, and perform the best with the smallest MSEs. Their biases are also smaller or comparable to other estimators. GA and SMA suffer from a disadvantage of significantly larger biases compared to other estimators. TTPS and TTEI have biases comparable to those of LSA and AMA, but with larger variances. It is also interesting to observe that while SMA is positively biased, AMA is negatively biased, suggesting that the dependence structure among the samples drawn from the UCB policy may lead to a negative bias.

Table 1: Relative bias (%), relative standard deviation (%) and RRMSE (%) of different estimators
LSA GA
n𝑛nitalic_n 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 106superscript10610^{6}10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 106superscript10610^{6}10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
Relative bias --0.12 --2.7e-03 --1.3e-03 --2.38 --0.40 --0.03
Relative stdev 1.28 0.33 0.10 1.31 0.36 0.10
RRMSE 1.28 0.33 0.10 3.03 0.54 0.11
AMA SMA
n𝑛nitalic_n 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 106superscript10610^{6}10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 106superscript10610^{6}10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
Relative bias 0.120.12-0.12- 0.12 --2.7e-03 --1.3e-03 0.98 0.01 3.0e-05
Relative stdev 1.27 0.33 0.10 3.76 1.40 0.45
RRMSE 1.27 0.33 0.10 3.89 1.40 0.45
TTPS TTEI
n𝑛nitalic_n 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 106superscript10610^{6}10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 106superscript10610^{6}10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
Relative bias --9.0e-03 --7.3e-03 --6.1e-03 0.140.14-0.14- 0.14 --2.7e-03 --7.5e-04
Relative stdev 1.51 0.49 0.14 1.77 0.45 0.14
RRMSE 1.51 0.50 0.14 1.77 0.45 0.14

5.3 Simulated CIs for Coherent Risk Measures

Consider a risk measurement application in which we are interested in estimating a coherent risk measure that is the maximum of the expected discounted portfolio loss taken over K𝐾Kitalic_K probability distributions. Lesnevski et al. (2007) proposed a simulation-based procedure to construct a fixed-width CI for the risk measure, and used an example with K=256𝐾256K=256italic_K = 256 to demonstrate the performance of the constructed CIs. For the sake of completeness, we briefly describe this example as follows.

A coherent risk measure can be written as

sup𝒫𝔼[Y/r],subscriptsupremum𝒫subscript𝔼delimited-[]𝑌𝑟\displaystyle\sup_{{\mathbb{P}}\in{\cal P}}{\mathbb{E}}_{\mathbb{P}}[-Y/r],roman_sup start_POSTSUBSCRIPT blackboard_P ∈ caligraphic_P end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT [ - italic_Y / italic_r ] ,

where Y𝑌Yitalic_Y is the value of a portfolio at a future time horizon, 1/r1𝑟1/r1 / italic_r is a stochastic discount factor which represents the time value of money, and 𝒫𝒫{\cal P}caligraphic_P is a set of probability measures simplified by 𝒫={1,,K}𝒫subscript1subscript𝐾{\cal P}=\{{\mathbb{P}}_{1},...,{\mathbb{P}}_{K}\}caligraphic_P = { blackboard_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , blackboard_P start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT }. Denote X=Y/r𝑋𝑌𝑟X=-Y/ritalic_X = - italic_Y / italic_r and μk=𝔼k[X]subscript𝜇𝑘subscript𝔼subscript𝑘delimited-[]𝑋\mu_{k}={\mathbb{E}}_{{\mathbb{P}}_{k}}[X]italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT blackboard_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_X ]. Let Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be a random variable whose distribution under a probability measure {\mathbb{P}}blackboard_P is the same as that of X𝑋Xitalic_X under ksubscript𝑘{\mathbb{P}}_{k}blackboard_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, i.e., (Xkx)=k(Xx)subscript𝑋𝑘𝑥subscript𝑘𝑋𝑥{\mathbb{P}}(X_{k}\leq x)={\mathbb{P}}_{k}(X\leq x)blackboard_P ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_x ) = blackboard_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_X ≤ italic_x ). Then the coherent risk measure is exactly the maximum mean:

maxk=1,,K𝔼Xk.subscript𝑘1𝐾𝔼subscript𝑋𝑘\displaystyle\max_{k=1,...,K}{\mathbb{E}}X_{k}.roman_max start_POSTSUBSCRIPT italic_k = 1 , … , italic_K end_POSTSUBSCRIPT blackboard_E italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT . (15)

Consider estimating the coherent risk measure of a portfolio consisting of European-style call and put options written on three assets with price Sj(t)subscript𝑆𝑗𝑡S_{j}(t)italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) at time t𝑡titalic_t for j=1,2,3𝑗123j=1,2,3italic_j = 1 , 2 , 3. Let S0(t)subscript𝑆0𝑡S_{0}(t)italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t ) denote a market index. All options in the portfolio have a common terminal time T𝑇Titalic_T. For each of j=0,1,2,3𝑗0123j=0,1,2,3italic_j = 0 , 1 , 2 , 3, Sj(t)subscript𝑆𝑗𝑡S_{j}(t)italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) follows geometric Brownian motion with drift djsubscript𝑑𝑗d_{j}italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and volatility ζjsubscript𝜁𝑗\zeta_{j}italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and therefore logSj(t)=logSj(0)+(djζj2/2)t+ζjWjtsubscript𝑆𝑗𝑡subscript𝑆𝑗0subscript𝑑𝑗superscriptsubscript𝜁𝑗22𝑡subscript𝜁𝑗subscript𝑊𝑗𝑡\log S_{j}(t)=\log S_{j}(0)+(d_{j}-\zeta_{j}^{2}/2)t+\zeta_{j}W_{j}\sqrt{t}roman_log italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) = roman_log italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( 0 ) + ( italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 ) italic_t + italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT square-root start_ARG italic_t end_ARG, where Wjsubscript𝑊𝑗W_{j}italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is a standard normal random variable. The assets are dependent in the sense that W0=Z0subscript𝑊0subscript𝑍0W_{0}=Z_{0}italic_W start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and Wj=λjZ0+1λj2Zjsubscript𝑊𝑗subscript𝜆𝑗subscript𝑍01superscriptsubscript𝜆𝑗2subscript𝑍𝑗W_{j}=\lambda_{j}Z_{0}+\sqrt{1-\lambda_{j}^{2}}Z_{j}italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + square-root start_ARG 1 - italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for j=1,2,3𝑗123j=1,2,3italic_j = 1 , 2 , 3, where under probability measure {\mathbb{P}}blackboard_P, Z0subscript𝑍0Z_{0}italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, Z1subscript𝑍1Z_{1}italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, Z2subscript𝑍2Z_{2}italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and Z3subscript𝑍3Z_{3}italic_Z start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are independent standard normal random variables. In this model, Z0subscript𝑍0Z_{0}italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT corresponds to the market factor common to all assets, while Z1subscript𝑍1Z_{1}italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, Z2subscript𝑍2Z_{2}italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and Z3subscript𝑍3Z_{3}italic_Z start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are idiosyncratic factors corresponding to each individual asset.

Denote by ΦΦ\Phiroman_Φ the standard normal distribution function. Each Zjsubscript𝑍𝑗Z_{j}italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for j=0,1,2,3𝑗0123j=0,1,2,3italic_j = 0 , 1 , 2 , 3 is sampled restricted on four cases with equal probabilities: sampled conditional on exceeding Φ1(11/20)superscriptΦ11120\Phi^{-1}(1-1/\sqrt{20})roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( 1 - 1 / square-root start_ARG 20 end_ARG ), sampled conditional on being below Φ1(1/20)superscriptΦ1120\Phi^{-1}(1/\sqrt{20})roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( 1 / square-root start_ARG 20 end_ARG ), sampled conditional on being between these two numbers, and sampled with no restriction. Therefore, K=44=256𝐾superscript44256K=4^{4}=256italic_K = 4 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = 256 in (15), and each Xksubscript𝑋𝑘X_{k}italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT denotes the value of the portfolio corresponding to a combination of these four cases. The parameters are set as follows: T𝑇Titalic_T is one week, ζ1=39.8%subscript𝜁1percent39.8\zeta_{1}=39.8\%italic_ζ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 39.8 %, ζ2=19.3%subscript𝜁2percent19.3\zeta_{2}=19.3\%italic_ζ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 19.3 %, ζ3=27.0%subscript𝜁3percent27.0\zeta_{3}=27.0\%italic_ζ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 27.0 %, λ1=0.617subscript𝜆10.617\lambda_{1}=0.617italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.617, λ2=0.368subscript𝜆20.368\lambda_{2}=0.368italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.368, λ3=0.785subscript𝜆30.785\lambda_{3}=0.785italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 0.785, and dj=0subscript𝑑𝑗0d_{j}=0italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = 0 and Sj(0)=100subscript𝑆𝑗0100S_{j}(0)=100italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( 0 ) = 100 for j=1,2,3𝑗123j=1,2,3italic_j = 1 , 2 , 3. The amounts of options in the portfolio could be referred to Table 1 in Lesnevski et al. (2007).

We estimate the coherent risk measure using the GA and LSA estimators. In the experiments, we examine the performance of the estimators, and compare the CIs to those of Lesnevski et al. (2007), which are constructed via the so-called “two-stage algorithm with screening” in the literature. When implementing this algorithm, we use exactly the same algorithm parameters as in Lesnevski et al. (2007).

The simulation-based procedure of Lesnevski et al. (2007) sets a certain amount of initial samples to enable the use of ranking-and-selection techniques. In our implementation, we set the same amount of warm-up samples. Moreover, a fixed width of the CI has to be set at the beginning of their procedure, the total sample size required to stop the process may thus vary across different replications. For the sake of fairness in the comparison, we first run the procedure of Lesnevski et al. (2007), and the same sample size is then used to construct the GA and LSA CIs. We then compare the width of our CIs to that of Lesnevski et al. (2007).

Relative errors of the GA and LSA estimators and coverage probabilities of their 95%percent9595\%95 % CIs w.r.t. different sample sizes are presented in Table 2. From the table, it can be seen that both the GA and LSA estimators exhibit high accuracy. In particular, when the sample size is larger than 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT, RRMSEs of both estimators are less than 1%percent11\%1 %, which are contributed mainly by the variances while their biases are negligible. However, when the sample size is relatively small, e.g., n=104𝑛superscript104n=10^{4}italic_n = 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT, the bias of the LSA estimator is substantially lower than that of the GA estimator, which also explains the low coverage probabilities of the GA CIs in this case. This result suggests that the LSA estimator is preferable over the GA estimator, especially for small or medium sample sizes.

Table 2: Relative bias (%), standard deviation (%) and RRMSE (%), and coverage probabilities (%)
LSA GA
n𝑛nitalic_n 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 106superscript10610^{6}10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 107superscript10710^{7}10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 106superscript10610^{6}10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 107superscript10710^{7}10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT
Bias 0.070.07-0.07- 0.07 0.010.01-0.01- 0.01 0.000.00-0.00- 0.00 0.000.000.000.00 10.6410.64-10.64- 10.64 0.480.48-0.48- 0.48 0.030.03-0.03- 0.03 0.000.00-0.00- 0.00
Stdev 1.21 0.29 0.09 0.03 1.54 0.31 0.09 0.03
RRMSE 1.21 0.29 0.09 0.03 10.75 0.57 0.10 0.03
Cover. prob. 93.1 95.7 95.3 95.8 0 60.6 93.0 95.6

In the comparison of CIs, we observed that the widths of the GA and the LSA CIs are quite close as they have similar variances. Hence, we only report the result of the LSA CIs when comparing their widths to the method of Lesnevski et al. (2007). The comparison results are summarized in Figure 4, w.r.t. varying settings of the fixed width by Lesnevski et al. (2007). From the figure, it can be seen that the LSA CIs achieve the nominal coverage probability (95%percent9595\%95 %), while the CIs constructed by Lesnevski et al. (2007) tend to be conservative, yielding coverage probabilities close to 100%percent100100\%100 %. The ratio between the width of the CI of Lesnevski et al. (2007) and that of the LSA CIs is also reported. It can be observed that the widths of the CIs of Lesnevski et al. (2007) are more than 2 times larger than LSA CIs, especially when the fixed width is set to be large. The ratio can be as large as 4 times when the fixed width is large, implying that the LSA method produces much narrower CIs and thus has better quality.

Refer to caption
Figure 4: (Color online) Comparison of CIs

5.4 Hypothesis Testing in Clinical Trials

In a clinical trial, suppose that one is interested in comparing K1𝐾1K-1italic_K - 1 treatments with unknown mean effects μ1,,μK1subscript𝜇1subscript𝜇𝐾1\mu_{1},\ldots,\mu_{K-1}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_μ start_POSTSUBSCRIPT italic_K - 1 end_POSTSUBSCRIPT to a control treatment with known mean effect μ0subscript𝜇0\mu_{0}italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. To examine the performances of different hypothesis tests, we consider respectively the GA-based and LSA-based testing statistics in (11) and (14), based on the proposed single hypothesis test in (6). We further consider a benchmarking test with multiple hypotheses in the form of (5). The benchmarking test uses a fixed randomization design that randomly assigns a patient to the treatments with equal probabilities, referred to as FR test.

Our numerical setting is the same as Mozgunov and Jaki (2020), where the outcomes of treatments for patients follow Bernoulli distributions with success probabilities (μ0,μ1,,μK1)subscript𝜇0subscript𝜇1subscript𝜇𝐾1(\mu_{0},\mu_{1},\ldots,\mu_{K-1})( italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_μ start_POSTSUBSCRIPT italic_K - 1 end_POSTSUBSCRIPT ), K=4𝐾4K=4italic_K = 4, n=423𝑛423n=423italic_n = 423, and overall type I error equal to 5%percent55\%5 %. In addition to determining whether some treatments outperform the control, another important goal in the clinical trial is to treat the patients as effectively as possible during the trial (Villar et al. 2015). To quantify to what extent this goal is achieved, two measures are commonly used, including the expected proportion of patients assigned to the best treatment (POB) and the expected number of patient success (ENS); see, e.g., Mozgunov and Jaki (2020).

To measure the performance of each test, we estimate its POB and ENS. Furthermore, for a case with true means (μ0,μ1,μ2,μ3)=(0.31,0.27,0.28,0.29)subscript𝜇0subscript𝜇1subscript𝜇2subscript𝜇30.310.270.280.29(\mu_{0},\mu_{1},\mu_{2},\mu_{3})=(0.31,0.27,0.28,0.29)( italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = ( 0.31 , 0.27 , 0.28 , 0.29 ) where H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is true, we estimate its overall type I error, while we estimate its power for another case with true means (μ0,μ1,μ2,μ3)=(0.3,0.3,0.3,0.5)subscript𝜇0subscript𝜇1subscript𝜇2subscript𝜇30.30.30.30.5(\mu_{0},\mu_{1},\mu_{2},\mu_{3})=(0.3,0.3,0.3,0.5)( italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = ( 0.3 , 0.3 , 0.3 , 0.5 ) where H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is false. Comparison results are summarized in Table 3, based on 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT independent replications.

It can be seen from Table 3 that the GA-based and LSA-based tests have an advantage over the FR test in that they consistently assign more patients to the best treatment and have higher ENS, and are thus preferable when taking into account the well-being of the patients. From a statistical perspective, when H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is true (i.e., Case 1), the estimated type I error of the LSA-based test is close to the nominal one (5%percent55\%5 %), while the rest tend to be conservative. Conservativeness of the GA-based test comes from its negative bias that is not negligible with a sample size of 423423423423, and that of the FR test is due to drawbacks of Bonferroni correction. In Case 2 when H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is false, both the GA-based and LSA-based tests have much higher estimated power compared to the FR test. In summary, the LSA-based test outperforms the rest of the tests significantly in all the considered dimensions including POB, ENS, type I error, and statistical power.

Table 3: Comparison of different hypothesis tests.
Case 1: (μ0,μ1,μ2,μ3)=(0.31,0.27,0.28,0.29)subscript𝜇0subscript𝜇1subscript𝜇2subscript𝜇30.310.270.280.29(\mu_{0},\mu_{1},\mu_{2},\mu_{3})=(0.31,0.27,0.28,0.29)( italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = ( 0.31 , 0.27 , 0.28 , 0.29 ) Case 2: (μ0,μ1,μ2,μ3)=(0.3,0.3,0.3,0.5)subscript𝜇0subscript𝜇1subscript𝜇2subscript𝜇30.30.30.30.5(\mu_{0},\mu_{1},\mu_{2},\mu_{3})=(0.3,0.3,0.3,0.5)( italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = ( 0.3 , 0.3 , 0.3 , 0.5 )
Test Est. Type I POB (s.e.) ENS (s.e.) Est. power POB (s.e.) ENS (s.e.)
FR 0.014 0.25(0.0001) 121.6(0.03) 0.814 0.25(0.0001) 148.1(0.03)
GA 0.011 0.34(0.0006) 123.1(0.03) 0.966 0.78(0.0006) 192.7(0.02)
LSA 0.044 0.34(0.0006) 123.1(0.03) 0.970 0.78(0.0006) 192.7(0.02)

5.5 Robust Analysis of Stochastic Simulation: A Call Center Case

An application of the proposed maximum mean estimator is on robust analysis of stochastic simulation. Robust analysis aims to quantify the impact of input uncertainty (i.e., misspecification of the input models) inherent in stochastic simulation. For this purpose, Ghosh and Lam (2019) and Blanchet and Murthy (2019) considered worst-case performance bounds, i.e., maximum (and/or minimum) of the expected simulation output taken over a set of plausible input models. While these authors consider a collection of plausible input models in the space of continuous distributions by imposing constraints on their moments, distances from a baseline model or some other summary statistics, we consider in this paper a discrete setting where a finite number of plausible input models are specified by fitting to input data.

Specifically, the expected simulation outputs corresponding to a set of K𝐾Kitalic_K plausible input models are denoted by μ1,,μKsubscript𝜇1subscript𝜇𝐾\mu_{1},\ldots,\mu_{K}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_μ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT, respectively. From each input model, a simulation output with noise can be generated. We are interested in estimating the following worst-case bounds,

μ=mink=1,,Kμk,andμ=maxk=1,,Kμk,formulae-sequencesuperscript𝜇subscript𝑘1𝐾subscript𝜇𝑘andsuperscript𝜇subscript𝑘1𝐾subscript𝜇𝑘\mu^{{\dagger}}=\min_{k=1,\ldots,K}\mu_{k},\quad\text{and}\quad\mu^{*}=\max_{k% =1,\ldots,K}\mu_{k},italic_μ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT = roman_min start_POSTSUBSCRIPT italic_k = 1 , … , italic_K end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , and italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_max start_POSTSUBSCRIPT italic_k = 1 , … , italic_K end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ,

which help to quantify the input uncertainty of the plausible input models.

The maximum mean estimators studied in this paper can be applied to estimate μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, as well as μsuperscript𝜇\mu^{{\dagger}}italic_μ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT by taking a negative sign on the simulation outputs. To illustrate how it works, we consider a case of a call center from an anonymous bank with real data available over a period of 12 months from January 1999 till December 1999 (Mandelbaum 2014). Workflows of the call center are described as follows. Customer phone calls arrive and ask for a banking service on their own. The call center consists of both automated service (computer-generated voice information, VRU) and agent service (AS) provided by eight agents. After the VRU service, customers either to a queue before AS, or directly to receive AS, or leave the system directly (see Figure 5). While privoviding VRU service seven days a week, 24 hours a day, the call center is staffed during 7:00 and 0:00 during weekdays. The information of each phone is stockpile, and each call has 17 fields in the data, e.g., Call ID (5 digits), VRU entry (6 digits), VRU exit (6 digits), Queue start (6 digits), Queue exit (6 digits), Outcome (AS or Hang), AS start (6 digits), AS exit (6 digits), etc.

Refer to caption
Figure 5: Logic flow of the call center of an anonymous bank

We build up a simulation model with the logic flow described in Figure 5, to study customers’ expected waiting time during the interval 10:00-12:00 on a weekday. Key inputs to the simulation include interarrival (IA) distribution, and service time distributions for both VRU and AS services. During input modeling, we consider a base dataset and an extended dataset, including input data during the specified interval on January 1, 1999 (with 172 phone calls), and the combined data on both January 1 and 8 (with 311 phone calls in total), respectively. Comparison of simulation performances based on these two datasets helps to understand the impact of input data size on the magnitude of input uncertainty. Arena’s Input Analyzer tool is used to fit the input distributions, and Table 4 presents three candidate distributions for each of IA and AS times, all with p𝑝pitalic_p-value larger than 0.01, while empirical distribution of VRU service time is used due to unsatisfactory p𝑝pitalic_p-values for parametric distributions given by Inout Analyzer. The abandon rates for the two scenarios are 7.1% and 5.5%, respectively, according to the proportion of calls that hang out after VRU service.

Table 4: Fitted distributions with p𝑝pitalic_p-value larger than 0.01 for IA and AS times.
Dataset IA time AS time
Base dataset -0.001 + ERLA(41.9, 1) -0.001 + ERLA(225, 1)
-0.001 + EXPO(41.9) -0.001 + EXPO(225)
-0.001 + WEIB(38.4, 0.82) -0.001 + WEIB(225, 0.992)
Extended dataset -0.001 + ERLA(41.8, 1) -0.001 + ERLA(222, 1)
-0.001 + EXPO(41.8) -0.001 + EXPO(222)
-0.001 + WEIB(40.9, 0.951) -0.001 + WEIB(222, 0.996)

For each dataset, we specify the plausible input models as all possible combinations of the three candidate distributions for IA and AS times, while empirical distribution is employed for the VRU service time. In total there are K=9𝐾9K=9italic_K = 9 (=3×3absent33=3\times 3= 3 × 3) plausible models for each dataset, and the proposed GA and LSA estimators are applied to estimate μsuperscript𝜇\mu^{{\dagger}}italic_μ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT and μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, the worst-case bounds for expected waiting time. Using real data, we also estimate the true value of the expected waiting time.

Table 5: Comparison of 95% CIs of minimum mean μsuperscript𝜇\mu^{{\dagger}}italic_μ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT, maximum mean μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, the plausible performance range [μ^,μ^]superscript^𝜇superscript^𝜇[\widehat{\mu}^{\dagger},\widehat{\mu}^{*}][ over^ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT , over^ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ], and the gap between μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and μsuperscript𝜇\mu^{{\dagger}}italic_μ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT for the base and extended datasets.
LSA GA True value1
Dataset n𝑛nitalic_n 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT
Base dataset CI of μsuperscript𝜇\mu^{\dagger}italic_μ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT [9.99, 10.87] [10.04, 10.24] [10.40, 10.94] [10.20, 10.37] 10.13
CI of μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT [13.70, 14.90] [14.71, 14.97] [13.89, 14.60] [14.64, 14.87] 14.81
[μ^,μ^]superscript^𝜇superscript^𝜇[\widehat{\mu}^{\dagger},\widehat{\mu}^{*}][ over^ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT , over^ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] [10.43, 14.30] [10.14, 14.84] [10.67, 14.25] [10.29, 14.75] 11.98
μ^μ^superscript^𝜇superscript^𝜇\widehat{\mu}^{*}-\widehat{\mu}^{\dagger}over^ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - over^ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 3.87 4.70 3.58 4.46
Extended dataset CI of μsuperscript𝜇\mu^{\dagger}italic_μ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT [8.78, 9.13] [8.76, 8.91] [9.29, 9.78] [8.76, 8.91] 8.84
CI of μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT [11.69, 12.51] [11.98, 12.21] [11.28, 11.86] [11.86, 12.06] 12.10
[μ^,μ^]superscript^𝜇superscript^𝜇[\widehat{\mu}^{\dagger},\widehat{\mu}^{*}][ over^ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT , over^ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] [9.05, 12.10] [8.83, 12.10] [9.53, 11.57] [8.84, 11.96] 12.05
μ^μ^superscript^𝜇superscript^𝜇\widehat{\mu}^{*}-\widehat{\mu}^{\dagger}over^ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - over^ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT 3.05 3.27 2.04 3.12
  • 1

    In the last column, the first two numbers for each dataset are the (approximately) true values of μsuperscript𝜇\mu^{\dagger}italic_μ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT and μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT estimated based a large (105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT) sample size for each plausible input model, and the bold numbers represent the average waiting times obtained by taking averages from the real data.

Table 5 reports the constructed 90% CIs for μsuperscript𝜇\mu^{{\dagger}}italic_μ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT and μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, the plausible performance range [μ^,μ^]superscript^𝜇superscript^𝜇[\widehat{\mu}^{{\dagger}},\widehat{\mu}^{*}][ over^ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT , over^ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ], as well as the width μ^μ^superscript^𝜇superscript^𝜇\widehat{\mu}^{*}-\widehat{\mu}^{{\dagger}}over^ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - over^ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT obtained by the GA and LSA estimators. Based on the table, we highlight our observations as follows.

  • Both the GA and the LSA estimators produce accurate estimates of μsuperscript𝜇\mu^{{\dagger}}italic_μ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT and μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT when sample size n𝑛nitalic_n is sufficiently large, exemplified by CIs with narrow half-widths when n=105𝑛superscript105n=10^{5}italic_n = 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT. However, the GA estimator may have larger bias when sample size is not sufficiently large. For instance, when n=104𝑛superscript104n=10^{4}italic_n = 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT, the GA CIs of μsuperscript𝜇\mu^{{\dagger}}italic_μ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT and μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT lie above and below their true values (e.g. 10.13 and 14.81 for the base dataset) respectively, suggesting positive bias and negative bias for μsuperscript𝜇\mu^{{\dagger}}italic_μ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT and μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, respectively. By contrast, the LSA CIs have better performance as they cover the true values for both the base and extended datasets.

  • The estimated plausible ranges of [μ,μ]superscript𝜇superscript𝜇[\mu^{{\dagger}},\mu^{*}][ italic_μ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT , italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] resulting from the LSA estimator cover the true waiting times for both datasets, while those from the GA estimator cover true waiting times for n=105𝑛superscript105n=10^{5}italic_n = 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT but not for n=104𝑛superscript104n=10^{4}italic_n = 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT due to its larger bias. Compared to the GA estimator, the LSA estimator produces more accurate point estimates of both μsuperscript𝜇\mu^{{\dagger}}italic_μ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT and μsuperscript𝜇\mu^{*}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (and thus the gaps of the worst-case bounds μμsuperscript𝜇superscript𝜇\mu^{*}-\mu^{{\dagger}}italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_μ start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT) for all sample sizes and both datasets. Comparing the gaps of the worst-case bounds for these two datasets (with 172 and 311 pone calls respectively), it can be seen that increasing the size of the input dataset helps reducing significantly this gap and thus the input uncertainty, e.g., the gap is reduced from 4.70 to 3.27 using the LSA estimator with n=105𝑛superscript105n=10^{5}italic_n = 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT.

6 Conclusions

We have studied two estimators, namely, the GA and LSA estimators, for estimating the maximum mean based on the UCB sampling policy. We have established strong consistency, asymptotic MSEs, and CLTs for both estimators, and constructed asymptotically valid CIs. Furthermore, a single hypothesis test has been proposed to solve a multiple comparison problem with application to clinical trials, and shown to significantly outperform an existing benchmarking test with multiple hypotheses, as the former exhibits lower type I error and higher statistical power.

The LSA estimator is preferable over the GA estimator, mainly due to the fact that its bias decays at a rate much faster than that of the GA estimator. The superiority of the LSA estimator over the GA estimator and several competing estimators has been demonstrated through numerical experiments.

Acknowledgements

This research is partially supported by the National Natural Science Foundation of China and Research Grants Council of Hong Kong under Joint Research Scheme project N_CityU 105/21 and GRF 11508217 and 11508620, and InnoHK initiative, The Government of the HKSAR, and Laboratory for AI-Powered Financial Technologies.

Appendix A A Appendix

A.1 Lemmas

Lemma A1.

(Hoeffding bound, Proposition 2.5 in Wainwright, 2019) Suppose that random variables ξisubscript𝜉𝑖\xi_{i}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i=1,,n𝑖1𝑛i=1,...,nitalic_i = 1 , … , italic_n, are independent, and ξisubscript𝜉𝑖\xi_{i}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT has mean μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and sub-Gaussian parameter γisubscript𝛾𝑖\gamma_{i}italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Then for all t0𝑡0t\geq 0italic_t ≥ 0, we have

(i=1n(ξiμi)t)exp{t22i=1nγi2}.superscriptsubscript𝑖1𝑛subscript𝜉𝑖subscript𝜇𝑖𝑡superscript𝑡22superscriptsubscript𝑖1𝑛superscriptsubscript𝛾𝑖2\displaystyle{\mathbb{P}}\left(\sum_{i=1}^{n}(\xi_{i}-\mu_{i})\geq t\right)% \leq\exp\left\{-\frac{t^{2}}{2\sum_{i=1}^{n}\gamma_{i}^{2}}\right\}.blackboard_P ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≥ italic_t ) ≤ roman_exp { - divide start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } .
Lemma A2.

Suppose that Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s are i.i.d. mean-zero random variables, and N𝑁Nitalic_N is a stopping time taking positive integer values with Nn𝑁𝑛N\leq nitalic_N ≤ italic_n w.p.1. Let St=i=1tXisubscript𝑆𝑡superscriptsubscript𝑖1𝑡subscript𝑋𝑖S_{t}=\sum_{i=1}^{t}X_{i}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. For any positive integer p2𝑝2p\geq 2italic_p ≥ 2, if 𝔼|X1|p<𝔼superscriptsubscript𝑋1𝑝{\mathbb{E}}|X_{1}|^{p}<\inftyblackboard_E | italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT < ∞, then

𝔼|SNp|=O(np/2).𝔼superscriptsubscript𝑆𝑁𝑝𝑂superscript𝑛𝑝2{\mathbb{E}}\left|S_{N}^{p}\right|=O(n^{p/2}).blackboard_E | italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT | = italic_O ( italic_n start_POSTSUPERSCRIPT italic_p / 2 end_POSTSUPERSCRIPT ) .
Proof of Lemma A2.

First note that it suffices to show that |𝔼(SNp)|=O(np/2)𝔼superscriptsubscript𝑆𝑁𝑝𝑂superscript𝑛𝑝2\left|{\mathbb{E}}(S_{N}^{p})\right|=O(n^{p/2})| blackboard_E ( italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ) | = italic_O ( italic_n start_POSTSUPERSCRIPT italic_p / 2 end_POSTSUPERSCRIPT ) for any p2𝑝2p\geq 2italic_p ≥ 2, because these two are equivalent when p𝑝pitalic_p is an even number, and if p𝑝pitalic_p is an odd number, 𝔼|SNp|[𝔼(SNp+1)]pp+1=O(np+12pp+1)=O(np/2)𝔼superscriptsubscript𝑆𝑁𝑝superscriptdelimited-[]𝔼superscriptsubscript𝑆𝑁𝑝1𝑝𝑝1𝑂superscript𝑛𝑝12𝑝𝑝1𝑂superscript𝑛𝑝2{\mathbb{E}}\left|S_{N}^{p}\right|\leq\left[{\mathbb{E}}\left(S_{N}^{p+1}% \right)\right]^{p\over p+1}=O(n^{{p+1\over 2}\cdot{p\over p+1}})=O(n^{p/2})blackboard_E | italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT | ≤ [ blackboard_E ( italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT ) ] start_POSTSUPERSCRIPT divide start_ARG italic_p end_ARG start_ARG italic_p + 1 end_ARG end_POSTSUPERSCRIPT = italic_O ( italic_n start_POSTSUPERSCRIPT divide start_ARG italic_p + 1 end_ARG start_ARG 2 end_ARG ⋅ divide start_ARG italic_p end_ARG start_ARG italic_p + 1 end_ARG end_POSTSUPERSCRIPT ) = italic_O ( italic_n start_POSTSUPERSCRIPT italic_p / 2 end_POSTSUPERSCRIPT ), following Hölder’s inequality.

For notational ease, denote Cqk=(qk)superscriptsubscript𝐶𝑞𝑘binomial𝑞𝑘C_{q}^{k}=\binom{q}{k}italic_C start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = ( FRACOP start_ARG italic_q end_ARG start_ARG italic_k end_ARG ) and κq=𝔼[X1q]subscript𝜅𝑞𝔼delimited-[]superscriptsubscript𝑋1𝑞\kappa_{q}={\mathbb{E}}[X_{1}^{q}]italic_κ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = blackboard_E [ italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ]. Note that κ1=0subscript𝜅10\kappa_{1}=0italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0. Further let tsubscript𝑡{\cal F}_{t}caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT denote the filtration up to t𝑡titalic_t.

We first note the following fact. With c𝑐citalic_c being a constant, s𝑠sitalic_s a nonnegative integer, and Vtsubscript𝑉𝑡V_{t}italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT a martingale, if a process {Mt,t1}subscript𝑀𝑡𝑡1\{M_{t},t\geq 1\}{ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ≥ 1 } satisfies

𝔼[Mt+1|t]=Mt+ctsVt,𝔼delimited-[]conditionalsubscript𝑀𝑡1subscript𝑡subscript𝑀𝑡𝑐superscript𝑡𝑠subscript𝑉𝑡{\mathbb{E}}\left[M_{t+1}|{\cal F}_{t}\right]=M_{t}+ct^{s}V_{t},blackboard_E [ italic_M start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_c italic_t start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

then it can be seen that

WtMt+𝒂𝐭s+1Vtsubscript𝑊𝑡subscript𝑀𝑡superscript𝒂topsuperscript𝐭𝑠1subscript𝑉𝑡W_{t}\triangleq M_{t}+\bm{a}^{\top}{\bf t}^{s+1}V_{t}italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≜ italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + bold_italic_a start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_t start_POSTSUPERSCRIPT italic_s + 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT

is a martingale, by verifying the property that 𝔼[Wt+1|t]=Wt𝔼delimited-[]conditionalsubscript𝑊𝑡1subscript𝑡subscript𝑊𝑡{\mathbb{E}}[W_{t+1}|{\cal F}_{t}]=W_{t}blackboard_E [ italic_W start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where the shorthand notation 𝐭s+1(t,,ts+1)superscript𝐭𝑠1superscript𝑡superscript𝑡𝑠1top{\bf t}^{s+1}\triangleq(t,\ldots,t^{s+1})^{\top}bold_t start_POSTSUPERSCRIPT italic_s + 1 end_POSTSUPERSCRIPT ≜ ( italic_t , … , italic_t start_POSTSUPERSCRIPT italic_s + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, and the constant vector 𝒂=(a1,,as+1)𝒂superscriptsubscript𝑎1subscript𝑎𝑠1top\bm{a}=(a_{1},\ldots,a_{s+1})^{\top}bold_italic_a = ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_s + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT satisfies

as+1=c/Cs+11,andasi=k=2i+2asi1+kCsi1+kk/Csi1,i=0,,s1.formulae-sequencesubscript𝑎𝑠1𝑐superscriptsubscript𝐶𝑠11andformulae-sequencesubscript𝑎𝑠𝑖superscriptsubscript𝑘2𝑖2subscript𝑎𝑠𝑖1𝑘superscriptsubscript𝐶𝑠𝑖1𝑘𝑘superscriptsubscript𝐶𝑠𝑖1𝑖0𝑠1a_{s+1}=-c/C_{s+1}^{1},\quad\mbox{and}\quad a_{s-i}=-\sum_{k=2}^{i+2}a_{s-i-1+% k}C_{s-i-1+k}^{k}/C_{s-i}^{1},\quad i=0,\ldots,s-1.italic_a start_POSTSUBSCRIPT italic_s + 1 end_POSTSUBSCRIPT = - italic_c / italic_C start_POSTSUBSCRIPT italic_s + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , and italic_a start_POSTSUBSCRIPT italic_s - italic_i end_POSTSUBSCRIPT = - ∑ start_POSTSUBSCRIPT italic_k = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i + 2 end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_s - italic_i - 1 + italic_k end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_s - italic_i - 1 + italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT / italic_C start_POSTSUBSCRIPT italic_s - italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_i = 0 , … , italic_s - 1 . (16)

Define Vt,01subscript𝑉𝑡01V_{t,0}\equiv 1italic_V start_POSTSUBSCRIPT italic_t , 0 end_POSTSUBSCRIPT ≡ 1 and Vt,1=Stsubscript𝑉𝑡1subscript𝑆𝑡V_{t,1}=S_{t}italic_V start_POSTSUBSCRIPT italic_t , 1 end_POSTSUBSCRIPT = italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. For any nonnegative integer q0𝑞0q\geq 0italic_q ≥ 0, if 𝔼[St+1q|t]𝔼delimited-[]conditionalsuperscriptsubscript𝑆𝑡1𝑞subscript𝑡{\mathbb{E}}\left[S_{t+1}^{q}|{\cal F}_{t}\right]blackboard_E [ italic_S start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] can be written in the form of

𝔼[St+1q|t]=Stq+j𝒮qc(j,q)ts(j,q)Vt,r(j,q),𝔼delimited-[]conditionalsuperscriptsubscript𝑆𝑡1𝑞subscript𝑡superscriptsubscript𝑆𝑡𝑞subscript𝑗subscript𝒮𝑞𝑐𝑗𝑞superscript𝑡𝑠𝑗𝑞subscript𝑉𝑡𝑟𝑗𝑞\displaystyle{\mathbb{E}}\left[S_{t+1}^{q}|{\cal F}_{t}\right]=S_{t}^{q}+\sum_% {j\in{\cal S}_{q}}c(j,q)t^{s(j,q)}V_{t,r(j,q)},blackboard_E [ italic_S start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_c ( italic_j , italic_q ) italic_t start_POSTSUPERSCRIPT italic_s ( italic_j , italic_q ) end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_t , italic_r ( italic_j , italic_q ) end_POSTSUBSCRIPT , (17)

where c(j,q)𝑐𝑗𝑞c(j,q)italic_c ( italic_j , italic_q )’s are constants, s(j,q)𝑠𝑗𝑞s(j,q)italic_s ( italic_j , italic_q ) and r(j,q)𝑟𝑗𝑞r(j,q)italic_r ( italic_j , italic_q ) are nonnegative integers depending on (j,q)𝑗𝑞(j,q)( italic_j , italic_q ), and 𝒮psubscript𝒮𝑝{\cal S}_{p}caligraphic_S start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is an index set with a finite number of elements, then, we define

Vt,q=Stq+j𝒮q𝒂(j,q)𝐭s(j,q)+1Vt,r(j,q),subscript𝑉𝑡𝑞superscriptsubscript𝑆𝑡𝑞subscript𝑗subscript𝒮𝑞𝒂superscript𝑗𝑞topsuperscript𝐭𝑠𝑗𝑞1subscript𝑉𝑡𝑟𝑗𝑞V_{t,q}=S_{t}^{q}+\sum_{j\in{\cal S}_{q}}\bm{a}(j,q)^{\top}{\bf t}^{s(j,q)+1}V% _{t,r(j,q)},italic_V start_POSTSUBSCRIPT italic_t , italic_q end_POSTSUBSCRIPT = italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_italic_a ( italic_j , italic_q ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_t start_POSTSUPERSCRIPT italic_s ( italic_j , italic_q ) + 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_t , italic_r ( italic_j , italic_q ) end_POSTSUBSCRIPT ,

where 𝒂(j,q)𝒂𝑗𝑞\bm{a}(j,q)bold_italic_a ( italic_j , italic_q ) is a contant vector obtained via (16) by setting s=s(j,q)𝑠𝑠𝑗𝑞s=s(j,q)italic_s = italic_s ( italic_j , italic_q ) and c=c(j,q)𝑐𝑐𝑗𝑞c=c(j,q)italic_c = italic_c ( italic_j , italic_q ).

Then, using the aforementioned fact, the process {Vt,q,t1}subscript𝑉𝑡𝑞𝑡1\{V_{t,q},t\geq 1\}{ italic_V start_POSTSUBSCRIPT italic_t , italic_q end_POSTSUBSCRIPT , italic_t ≥ 1 } is a martingale, and therefore,

Stq=Vt,qj𝒮q𝒂(j,p)𝐭s(j,q)+1Vt,r(j,q)superscriptsubscript𝑆𝑡𝑞subscript𝑉𝑡𝑞subscript𝑗subscript𝒮𝑞𝒂superscript𝑗𝑝topsuperscript𝐭𝑠𝑗𝑞1subscript𝑉𝑡𝑟𝑗𝑞S_{t}^{q}=V_{t,q}-\sum_{j\in{\cal S}_{q}}\bm{a}(j,p)^{\top}{\bf t}^{s(j,q)+1}V% _{t,r(j,q)}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT = italic_V start_POSTSUBSCRIPT italic_t , italic_q end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_italic_a ( italic_j , italic_p ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_t start_POSTSUPERSCRIPT italic_s ( italic_j , italic_q ) + 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_t , italic_r ( italic_j , italic_q ) end_POSTSUBSCRIPT (18)

is a linear combination of martingales.

In the rest of this proof, we use an induction argument. Suppose that for any integer q𝑞qitalic_q satisfying 2qp2𝑞𝑝2\leq q\leq p2 ≤ italic_q ≤ italic_p, (a) and (b) hold, where

  1. a)

    𝔼[St+1q|t]𝔼delimited-[]conditionalsuperscriptsubscript𝑆𝑡1𝑞subscript𝑡{\mathbb{E}}\left[S_{t+1}^{q}|{\cal F}_{t}\right]blackboard_E [ italic_S start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] is of the form in (17), satisfying V0,r(j,q)=0subscript𝑉0𝑟𝑗𝑞0V_{0,r(j,q)}=0italic_V start_POSTSUBSCRIPT 0 , italic_r ( italic_j , italic_q ) end_POSTSUBSCRIPT = 0 when r(j,q)1𝑟𝑗𝑞1r(j,q)\geq 1italic_r ( italic_j , italic_q ) ≥ 1, and r(j,q)q2𝑟𝑗𝑞𝑞2r(j,q)\leq q-2italic_r ( italic_j , italic_q ) ≤ italic_q - 2 for any j𝒮q𝑗subscript𝒮𝑞j\in{\cal S}_{q}italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT; and

  2. b)

    𝔼|Ns(j,q)+1VN,r(j,q)|=O(nq/2)𝔼superscript𝑁𝑠𝑗𝑞1subscript𝑉𝑁𝑟𝑗𝑞𝑂superscript𝑛𝑞2{\mathbb{E}}\left|N^{s(j,q)+1}V_{N,r(j,q)}\right|=O(n^{q/2})blackboard_E | italic_N start_POSTSUPERSCRIPT italic_s ( italic_j , italic_q ) + 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_N , italic_r ( italic_j , italic_q ) end_POSTSUBSCRIPT | = italic_O ( italic_n start_POSTSUPERSCRIPT italic_q / 2 end_POSTSUPERSCRIPT ) for any s(j,q)𝑠𝑗𝑞s(j,q)italic_s ( italic_j , italic_q ) and r(j,q)𝑟𝑗𝑞r(j,q)italic_r ( italic_j , italic_q ).,

we want to show that both (a) and (b) hold for q=p+1𝑞𝑝1q=p+1italic_q = italic_p + 1.

It should be pointed out that combining (18) and (b) yields |𝔼(SNq)|=O(nq/2)𝔼superscriptsubscript𝑆𝑁𝑞𝑂superscript𝑛𝑞2\left|{\mathbb{E}}(S_{N}^{q})\right|=O(n^{q/2})| blackboard_E ( italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ) | = italic_O ( italic_n start_POSTSUPERSCRIPT italic_q / 2 end_POSTSUPERSCRIPT ), because 𝔼[VN,q]=V0,q=0𝔼delimited-[]subscript𝑉𝑁𝑞subscript𝑉0𝑞0{\mathbb{E}}[V_{N,q}]=V_{0,q}=0blackboard_E [ italic_V start_POSTSUBSCRIPT italic_N , italic_q end_POSTSUBSCRIPT ] = italic_V start_POSTSUBSCRIPT 0 , italic_q end_POSTSUBSCRIPT = 0 due to the martingale stopping theorem, and

𝔼(SNq)𝔼superscriptsubscript𝑆𝑁𝑞\displaystyle{\mathbb{E}}(S_{N}^{q})blackboard_E ( italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ) =\displaystyle== 𝔼[VN,q]j𝒮q𝒂(j,p)𝔼[𝐍s(j,q)+1VN,r(j,q)]=O(nq/2).𝔼delimited-[]subscript𝑉𝑁𝑞subscript𝑗subscript𝒮𝑞𝒂superscript𝑗𝑝top𝔼delimited-[]superscript𝐍𝑠𝑗𝑞1subscript𝑉𝑁𝑟𝑗𝑞𝑂superscript𝑛𝑞2\displaystyle{\mathbb{E}}[V_{N,q}]-\sum_{j\in{\cal S}_{q}}\bm{a}(j,p)^{\top}{% \mathbb{E}}\left[{\bf N}^{s(j,q)+1}V_{N,r(j,q)}\right]=O(n^{q/2}).blackboard_E [ italic_V start_POSTSUBSCRIPT italic_N , italic_q end_POSTSUBSCRIPT ] - ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_italic_a ( italic_j , italic_p ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT blackboard_E [ bold_N start_POSTSUPERSCRIPT italic_s ( italic_j , italic_q ) + 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_N , italic_r ( italic_j , italic_q ) end_POSTSUBSCRIPT ] = italic_O ( italic_n start_POSTSUPERSCRIPT italic_q / 2 end_POSTSUPERSCRIPT ) .

Moreover, when q=2𝑞2q=2italic_q = 2, it can be easily checked that 𝔼[St+12|t]=St2+κ2Vt,0𝔼delimited-[]conditionalsuperscriptsubscript𝑆𝑡12subscript𝑡superscriptsubscript𝑆𝑡2subscript𝜅2subscript𝑉𝑡0{\mathbb{E}}[S_{t+1}^{2}|{\cal F}_{t}]=S_{t}^{2}+\kappa_{2}V_{t,0}blackboard_E [ italic_S start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_t , 0 end_POSTSUBSCRIPT and Vt,2=St2κ2tVt,0subscript𝑉𝑡2superscriptsubscript𝑆𝑡2subscript𝜅2𝑡subscript𝑉𝑡0V_{t,2}=S_{t}^{2}-\kappa_{2}tV_{t,0}italic_V start_POSTSUBSCRIPT italic_t , 2 end_POSTSUBSCRIPT = italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_t italic_V start_POSTSUBSCRIPT italic_t , 0 end_POSTSUBSCRIPT, and thus both (a) and (b) are satisfied.

Therefore, it suffices to complete the induction argument, i.e., (a) and (b) must hold for q=p+1𝑞𝑝1q=p+1italic_q = italic_p + 1 if they hold for any 2qp2𝑞𝑝2\leq q\leq p2 ≤ italic_q ≤ italic_p. To see this, note that

𝔼[St+1p+1|t]=𝔼[(St+Xt+1)p+1|t]=Stp+1+j=2p+1Cp+1jκjStp+1j.𝔼delimited-[]conditionalsuperscriptsubscript𝑆𝑡1𝑝1subscript𝑡𝔼delimited-[]conditionalsuperscriptsubscript𝑆𝑡subscript𝑋𝑡1𝑝1subscript𝑡superscriptsubscript𝑆𝑡𝑝1superscriptsubscript𝑗2𝑝1superscriptsubscript𝐶𝑝1𝑗subscript𝜅𝑗superscriptsubscript𝑆𝑡𝑝1𝑗\displaystyle{\mathbb{E}}\left[S_{t+1}^{p+1}|{\cal F}_{t}\right]={\mathbb{E}}% \left[\left(S_{t}+X_{t+1}\right)^{p+1}|{\cal F}_{t}\right]=S_{t}^{p+1}+\sum_{j% =2}^{p+1}C_{p+1}^{j}\kappa_{j}S_{t}^{p+1-j}.blackboard_E [ italic_S start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = blackboard_E [ ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_X start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_κ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p + 1 - italic_j end_POSTSUPERSCRIPT . (19)

For any j=2,,p1𝑗2𝑝1j=2,\ldots,p-1italic_j = 2 , … , italic_p - 1, combining (a) and (18) suggests that Stp+1jsuperscriptsubscript𝑆𝑡𝑝1𝑗S_{t}^{p+1-j}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p + 1 - italic_j end_POSTSUPERSCRIPT is a linear combination of Vt,p+1jsubscript𝑉𝑡𝑝1𝑗V_{t,p+1-j}italic_V start_POSTSUBSCRIPT italic_t , italic_p + 1 - italic_j end_POSTSUBSCRIPT and Vt,r(j,p+1j)subscript𝑉𝑡𝑟𝑗𝑝1𝑗V_{t,r(j,p+1-j)}italic_V start_POSTSUBSCRIPT italic_t , italic_r ( italic_j , italic_p + 1 - italic_j ) end_POSTSUBSCRIPT’s. Moreover, when j=p𝑗𝑝j=pitalic_j = italic_p and p+1𝑝1p+1italic_p + 1, Stp+1jsuperscriptsubscript𝑆𝑡𝑝1𝑗S_{t}^{p+1-j}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p + 1 - italic_j end_POSTSUPERSCRIPT equals Vt,1subscript𝑉𝑡1V_{t,1}italic_V start_POSTSUBSCRIPT italic_t , 1 end_POSTSUBSCRIPT and Vt,0subscript𝑉𝑡0V_{t,0}italic_V start_POSTSUBSCRIPT italic_t , 0 end_POSTSUBSCRIPT, respectively. Therefore, by (19), assembling terms and rearranging the index set 𝒮p+1subscript𝒮𝑝1{\cal S}_{p+1}caligraphic_S start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT lead to

𝔼[St+1p+1|t]=Stp+1+j𝒮p+1c(j,p+1)ts(j,p+1)Vt,r(j,p+1)𝔼delimited-[]conditionalsuperscriptsubscript𝑆𝑡1𝑝1subscript𝑡superscriptsubscript𝑆𝑡𝑝1subscript𝑗subscript𝒮𝑝1𝑐𝑗𝑝1superscript𝑡𝑠𝑗𝑝1subscript𝑉𝑡𝑟𝑗𝑝1\displaystyle{\mathbb{E}}\left[S_{t+1}^{p+1}|{\cal F}_{t}\right]=S_{t}^{p+1}+% \sum_{j\in{\cal S}_{p+1}}c(j,p+1)t^{s(j,p+1)}V_{t,r(j,p+1)}blackboard_E [ italic_S start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_c ( italic_j , italic_p + 1 ) italic_t start_POSTSUPERSCRIPT italic_s ( italic_j , italic_p + 1 ) end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_t , italic_r ( italic_j , italic_p + 1 ) end_POSTSUBSCRIPT

for some c(j,p+1)𝑐𝑗𝑝1c(j,p+1)italic_c ( italic_j , italic_p + 1 ), s(j,p+1)𝑠𝑗𝑝1s(j,p+1)italic_s ( italic_j , italic_p + 1 ) and r(j,p+1)𝑟𝑗𝑝1r(j,p+1)italic_r ( italic_j , italic_p + 1 ), where it can be verified that V0,r(j,p+1)=0subscript𝑉0𝑟𝑗𝑝10V_{0,r(j,p+1)}=0italic_V start_POSTSUBSCRIPT 0 , italic_r ( italic_j , italic_p + 1 ) end_POSTSUBSCRIPT = 0 when r(j,p+1)1𝑟𝑗𝑝11r(j,p+1)\geq 1italic_r ( italic_j , italic_p + 1 ) ≥ 1.

It can also be checked that

  1. (i)

    r(j,p+1)=r(j1,q)𝑟𝑗𝑝1𝑟subscript𝑗1𝑞r(j,p+1)=r(j_{1},q)italic_r ( italic_j , italic_p + 1 ) = italic_r ( italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_q ) and s(j,p+1)s(j2,q)+1𝑠𝑗𝑝1𝑠subscript𝑗2𝑞1s(j,p+1)\leq s(j_{2},q)+1italic_s ( italic_j , italic_p + 1 ) ≤ italic_s ( italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_q ) + 1 for some qp2𝑞𝑝2q\leq p-2italic_q ≤ italic_p - 2 and j1,j2𝒮qsubscript𝑗1subscript𝑗2subscript𝒮𝑞j_{1},j_{2}\in{\cal S}_{q}italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ caligraphic_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT; or,

  2. (ii)

    r(j,p+1)=q𝑟𝑗𝑝1𝑞r(j,p+1)=qitalic_r ( italic_j , italic_p + 1 ) = italic_q and s(j,p+1)=0𝑠𝑗𝑝10s(j,p+1)=0italic_s ( italic_j , italic_p + 1 ) = 0 for some qp1𝑞𝑝1q\leq p-1italic_q ≤ italic_p - 1.

Because r(j1,q)q2𝑟subscript𝑗1𝑞𝑞2r(j_{1},q)\leq q-2italic_r ( italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_q ) ≤ italic_q - 2, it is clear that r(j,p+1)p1𝑟𝑗𝑝1𝑝1r(j,p+1)\leq p-1italic_r ( italic_j , italic_p + 1 ) ≤ italic_p - 1. Hence, (a) holds for q=p+1𝑞𝑝1q=p+1italic_q = italic_p + 1.

Furthermore, define

Vt,p+1=Stp+1+j𝒮p+1𝒂(j,p+1)𝐭s(j,p+1)+1Vt,r(j,p+1).subscript𝑉𝑡𝑝1superscriptsubscript𝑆𝑡𝑝1subscript𝑗subscript𝒮𝑝1𝒂superscript𝑗𝑝1topsuperscript𝐭𝑠𝑗𝑝11subscript𝑉𝑡𝑟𝑗𝑝1V_{t,p+1}=S_{t}^{p+1}+\sum_{j\in{\cal S}_{p+1}}\bm{a}(j,p+1)^{\top}{\bf t}^{s(% j,p+1)+1}V_{t,r(j,p+1)}.italic_V start_POSTSUBSCRIPT italic_t , italic_p + 1 end_POSTSUBSCRIPT = italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_italic_a ( italic_j , italic_p + 1 ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_t start_POSTSUPERSCRIPT italic_s ( italic_j , italic_p + 1 ) + 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_t , italic_r ( italic_j , italic_p + 1 ) end_POSTSUBSCRIPT .

Then, {Vt,p+1,t1}subscript𝑉𝑡𝑝1𝑡1\{V_{t,p+1},t\geq 1\}{ italic_V start_POSTSUBSCRIPT italic_t , italic_p + 1 end_POSTSUBSCRIPT , italic_t ≥ 1 } is a martingale. Using the martingale stopping theorem, we have 𝔼[VN,p+1]=V0,p+1=0𝔼delimited-[]subscript𝑉𝑁𝑝1subscript𝑉0𝑝10{\mathbb{E}}\left[V_{N,p+1}\right]=V_{0,p+1}=0blackboard_E [ italic_V start_POSTSUBSCRIPT italic_N , italic_p + 1 end_POSTSUBSCRIPT ] = italic_V start_POSTSUBSCRIPT 0 , italic_p + 1 end_POSTSUBSCRIPT = 0, i.e.,

𝔼[SNp+1]=j𝒮p+1𝔼[𝒂(j,p+1)𝐍s(j,p+1)+1VN,r(j,p+1)],𝔼delimited-[]superscriptsubscript𝑆𝑁𝑝1subscript𝑗subscript𝒮𝑝1𝔼delimited-[]𝒂superscript𝑗𝑝1topsuperscript𝐍𝑠𝑗𝑝11subscript𝑉𝑁𝑟𝑗𝑝1{\mathbb{E}}\left[S_{N}^{p+1}\right]=-\sum_{j\in{\cal S}_{p+1}}{\mathbb{E}}% \left[\bm{a}(j,p+1)^{\top}{\bf N}^{s(j,p+1)+1}V_{N,r(j,p+1)}\right],blackboard_E [ italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT ] = - ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_p + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_E [ bold_italic_a ( italic_j , italic_p + 1 ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_N start_POSTSUPERSCRIPT italic_s ( italic_j , italic_p + 1 ) + 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_N , italic_r ( italic_j , italic_p + 1 ) end_POSTSUBSCRIPT ] , (20)

where 𝐍s(j,p+1)+1=(N,,Ns(j,p+1)+1)superscript𝐍𝑠𝑗𝑝11superscript𝑁superscript𝑁𝑠𝑗𝑝11top{\bf N}^{s(j,p+1)+1}=(N,\ldots,N^{s(j,p+1)+1})^{\top}bold_N start_POSTSUPERSCRIPT italic_s ( italic_j , italic_p + 1 ) + 1 end_POSTSUPERSCRIPT = ( italic_N , … , italic_N start_POSTSUPERSCRIPT italic_s ( italic_j , italic_p + 1 ) + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT.

When (i) is true,

𝔼|Ns(j,p+1)+1VN,r(j,p+1)|𝔼|Ns(j2,q)+1VN,r(j1,q)|n𝔼|Ns(j2,q)VN,r(j1,q)|=O(nnq2)=O(np2),𝔼superscript𝑁𝑠𝑗𝑝11subscript𝑉𝑁𝑟𝑗𝑝1𝔼superscript𝑁𝑠subscript𝑗2𝑞1subscript𝑉𝑁𝑟subscript𝑗1𝑞𝑛𝔼superscript𝑁𝑠subscript𝑗2𝑞subscript𝑉𝑁𝑟subscript𝑗1𝑞𝑂𝑛superscript𝑛𝑞2𝑂superscript𝑛𝑝2\displaystyle{\mathbb{E}}\left|N^{s(j,p+1)+1}V_{N,r(j,p+1)}\right|\leq{\mathbb% {E}}\left|N^{s(j_{2},q)+1}V_{N,r(j_{1},q)}\right|\leq n{\mathbb{E}}\left|N^{s(% j_{2},q)}V_{N,r(j_{1},q)}\right|=O(n\cdot n^{q\over 2})=O(n^{p\over 2}),blackboard_E | italic_N start_POSTSUPERSCRIPT italic_s ( italic_j , italic_p + 1 ) + 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_N , italic_r ( italic_j , italic_p + 1 ) end_POSTSUBSCRIPT | ≤ blackboard_E | italic_N start_POSTSUPERSCRIPT italic_s ( italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_q ) + 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_N , italic_r ( italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_q ) end_POSTSUBSCRIPT | ≤ italic_n blackboard_E | italic_N start_POSTSUPERSCRIPT italic_s ( italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_q ) end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_N , italic_r ( italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_q ) end_POSTSUBSCRIPT | = italic_O ( italic_n ⋅ italic_n start_POSTSUPERSCRIPT divide start_ARG italic_q end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) = italic_O ( italic_n start_POSTSUPERSCRIPT divide start_ARG italic_p end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) ,

where the last equality follows from the fact that qp2𝑞𝑝2q\leq p-2italic_q ≤ italic_p - 2.

When (ii) is true, for some qp1𝑞𝑝1q\leq p-1italic_q ≤ italic_p - 1,

𝔼|Ns(j,p+1)+1VN,r(j,p+1)|=𝔼|NVN,q|.𝔼superscript𝑁𝑠𝑗𝑝11subscript𝑉𝑁𝑟𝑗𝑝1𝔼𝑁subscript𝑉𝑁𝑞\displaystyle{\mathbb{E}}\left|N^{s(j,p+1)+1}V_{N,r(j,p+1)}\right|={\mathbb{E}% }\left|NV_{N,q}\right|.blackboard_E | italic_N start_POSTSUPERSCRIPT italic_s ( italic_j , italic_p + 1 ) + 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_N , italic_r ( italic_j , italic_p + 1 ) end_POSTSUBSCRIPT | = blackboard_E | italic_N italic_V start_POSTSUBSCRIPT italic_N , italic_q end_POSTSUBSCRIPT | .

Note that |VN,q||SNq|+j𝒮q|𝒂(j,q)𝐍s(j,q)+1VN,r(j,q)|subscript𝑉𝑁𝑞superscriptsubscript𝑆𝑁𝑞subscript𝑗subscript𝒮𝑞𝒂superscript𝑗𝑞topsuperscript𝐍𝑠𝑗𝑞1subscript𝑉𝑁𝑟𝑗𝑞\left|V_{N,q}\right|\leq|S_{N}^{q}|+\sum_{j\in{\cal S}_{q}}\left|\bm{a}(j,q)^{% \top}{\bf N}^{s(j,q)+1}V_{N,r(j,q)}\right|| italic_V start_POSTSUBSCRIPT italic_N , italic_q end_POSTSUBSCRIPT | ≤ | italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT | + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT | bold_italic_a ( italic_j , italic_q ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_N start_POSTSUPERSCRIPT italic_s ( italic_j , italic_q ) + 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_N , italic_r ( italic_j , italic_q ) end_POSTSUBSCRIPT |. If q𝑞qitalic_q is an even number, then

𝔼|VN,q|𝔼(SNq)+j𝒮q𝔼|𝒂(j,q)𝐍s(j,q)+1VN,r(j,q)|=O(nq2)=O(np12),𝔼subscript𝑉𝑁𝑞𝔼superscriptsubscript𝑆𝑁𝑞subscript𝑗subscript𝒮𝑞𝔼𝒂superscript𝑗𝑞topsuperscript𝐍𝑠𝑗𝑞1subscript𝑉𝑁𝑟𝑗𝑞𝑂superscript𝑛𝑞2𝑂superscript𝑛𝑝12{\mathbb{E}}\left|V_{N,q}\right|\leq{\mathbb{E}}(S_{N}^{q})+\sum_{j\in{\cal S}% _{q}}{\mathbb{E}}\left|\bm{a}(j,q)^{\top}{\bf N}^{s(j,q)+1}V_{N,r(j,q)}\right|% =O(n^{q\over 2})=O(n^{p-1\over 2}),blackboard_E | italic_V start_POSTSUBSCRIPT italic_N , italic_q end_POSTSUBSCRIPT | ≤ blackboard_E ( italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_E | bold_italic_a ( italic_j , italic_q ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_N start_POSTSUPERSCRIPT italic_s ( italic_j , italic_q ) + 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_N , italic_r ( italic_j , italic_q ) end_POSTSUBSCRIPT | = italic_O ( italic_n start_POSTSUPERSCRIPT divide start_ARG italic_q end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) = italic_O ( italic_n start_POSTSUPERSCRIPT divide start_ARG italic_p - 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) ,

where the first equality follows from (18) and (b). Otherwise, q𝑞qitalic_q is an odd number, and by Cauchy-Schwartz inequality,

𝔼|VN,q|𝔼subscript𝑉𝑁𝑞\displaystyle{\mathbb{E}}\left|V_{N,q}\right|blackboard_E | italic_V start_POSTSUBSCRIPT italic_N , italic_q end_POSTSUBSCRIPT | \displaystyle\leq 𝔼|SNq|+j𝒮q𝔼|𝒂(j,q)𝐍s(j,q)+1VN,r(j,q)|𝔼superscriptsubscript𝑆𝑁𝑞subscript𝑗subscript𝒮𝑞𝔼𝒂superscript𝑗𝑞topsuperscript𝐍𝑠𝑗𝑞1subscript𝑉𝑁𝑟𝑗𝑞\displaystyle{\mathbb{E}}\left|S_{N}^{q}\right|+\sum_{j\in{\cal S}_{q}}{% \mathbb{E}}\left|\bm{a}(j,q)^{\top}{\bf N}^{s(j,q)+1}V_{N,r(j,q)}\right|blackboard_E | italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT | + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_E | bold_italic_a ( italic_j , italic_q ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_N start_POSTSUPERSCRIPT italic_s ( italic_j , italic_q ) + 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_N , italic_r ( italic_j , italic_q ) end_POSTSUBSCRIPT |
\displaystyle\leq [𝔼(SNq+1)]qq+1+j𝒮q𝔼|𝒂(j,q)𝐍s(j,q)+1VN,r(j,q)|superscriptdelimited-[]𝔼superscriptsubscript𝑆𝑁𝑞1𝑞𝑞1subscript𝑗subscript𝒮𝑞𝔼𝒂superscript𝑗𝑞topsuperscript𝐍𝑠𝑗𝑞1subscript𝑉𝑁𝑟𝑗𝑞\displaystyle\left[{\mathbb{E}}\left(S_{N}^{q+1}\right)\right]^{q\over q+1}+% \sum_{j\in{\cal S}_{q}}{\mathbb{E}}\left|\bm{a}(j,q)^{\top}{\bf N}^{s(j,q)+1}V% _{N,r(j,q)}\right|[ blackboard_E ( italic_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q + 1 end_POSTSUPERSCRIPT ) ] start_POSTSUPERSCRIPT divide start_ARG italic_q end_ARG start_ARG italic_q + 1 end_ARG end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_E | bold_italic_a ( italic_j , italic_q ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_N start_POSTSUPERSCRIPT italic_s ( italic_j , italic_q ) + 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_N , italic_r ( italic_j , italic_q ) end_POSTSUBSCRIPT |
\displaystyle\leq O(nq+12qq+1)+O(nq2)=O(np12).𝑂superscript𝑛𝑞12𝑞𝑞1𝑂superscript𝑛𝑞2𝑂superscript𝑛𝑝12\displaystyle O(n^{{q+1\over 2}\cdot{q\over q+1}})+O(n^{q\over 2})=O(n^{p-1% \over 2}).italic_O ( italic_n start_POSTSUPERSCRIPT divide start_ARG italic_q + 1 end_ARG start_ARG 2 end_ARG ⋅ divide start_ARG italic_q end_ARG start_ARG italic_q + 1 end_ARG end_POSTSUPERSCRIPT ) + italic_O ( italic_n start_POSTSUPERSCRIPT divide start_ARG italic_q end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) = italic_O ( italic_n start_POSTSUPERSCRIPT divide start_ARG italic_p - 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) .

Therefore no matter q𝑞qitalic_q is even or odd, 𝔼|Ns(j,p+1)+1VN,r(j,p+1)|=𝔼|NVN,q|n𝔼|VN,q|=O(np+12)𝔼superscript𝑁𝑠𝑗𝑝11subscript𝑉𝑁𝑟𝑗𝑝1𝔼𝑁subscript𝑉𝑁𝑞𝑛𝔼subscript𝑉𝑁𝑞𝑂superscript𝑛𝑝12{\mathbb{E}}\left|N^{s(j,p+1)+1}V_{N,r(j,p+1)}\right|={\mathbb{E}}\left|NV_{N,% q}\right|\leq n{\mathbb{E}}\left|V_{N,q}\right|=O(n^{p+1\over 2})blackboard_E | italic_N start_POSTSUPERSCRIPT italic_s ( italic_j , italic_p + 1 ) + 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_N , italic_r ( italic_j , italic_p + 1 ) end_POSTSUBSCRIPT | = blackboard_E | italic_N italic_V start_POSTSUBSCRIPT italic_N , italic_q end_POSTSUBSCRIPT | ≤ italic_n blackboard_E | italic_V start_POSTSUBSCRIPT italic_N , italic_q end_POSTSUBSCRIPT | = italic_O ( italic_n start_POSTSUPERSCRIPT divide start_ARG italic_p + 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ). In other words, (b) holds for q=p+1𝑞𝑝1q=p+1italic_q = italic_p + 1, which completes the induction. ∎

A.2 Proof of Proposition 1

To prove the proposition, we introduce a lemma.

Lemma A3.

Given a real λ(0,1)𝜆01\lambda\notin(0,1)italic_λ ∉ ( 0 , 1 ), for any positive integer n𝑛nitalic_n,

j=1njλ{=nifλ=0;=n(n+1)2ifλ=1;<log(n+12)+log2ifλ=1;<1λ+1[(n+12)λ+112λ+1]ifλ1.superscriptsubscript𝑗1𝑛superscript𝑗𝜆casesabsent𝑛𝑖𝑓𝜆0absent𝑛𝑛12𝑖𝑓𝜆1absent𝑛122𝑖𝑓𝜆1absent1𝜆1delimited-[]superscript𝑛12𝜆11superscript2𝜆1𝑖𝑓𝜆1\displaystyle\sum_{j=1}^{n}j^{\lambda}\begin{cases}\displaystyle=n&if\ \lambda% =0;\\ \displaystyle=\frac{n(n+1)}{2}&if\ \lambda=1;\\ \displaystyle<\log\left(n+\frac{1}{2}\right)+\log 2&if\ \lambda=-1;\\ \displaystyle<\dfrac{1}{\lambda+1}\left[\left(n+\frac{1}{2}\right)^{\lambda+1}% -\frac{1}{2^{\lambda+1}}\right]&if\ \lambda\neq-1.\end{cases}∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_j start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT { start_ROW start_CELL = italic_n end_CELL start_CELL italic_i italic_f italic_λ = 0 ; end_CELL end_ROW start_ROW start_CELL = divide start_ARG italic_n ( italic_n + 1 ) end_ARG start_ARG 2 end_ARG end_CELL start_CELL italic_i italic_f italic_λ = 1 ; end_CELL end_ROW start_ROW start_CELL < roman_log ( italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) + roman_log 2 end_CELL start_CELL italic_i italic_f italic_λ = - 1 ; end_CELL end_ROW start_ROW start_CELL < divide start_ARG 1 end_ARG start_ARG italic_λ + 1 end_ARG [ ( italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_λ + 1 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_λ + 1 end_POSTSUPERSCRIPT end_ARG ] end_CELL start_CELL italic_i italic_f italic_λ ≠ - 1 . end_CELL end_ROW
Proof of Lemma A3.

The conclusion of the lemma for λ=0𝜆0\lambda=0italic_λ = 0 or 1 is trivial, and we thus mainly examine the case when λ[0,1]𝜆01\lambda\notin[0,1]italic_λ ∉ [ 0 , 1 ]. Because xλsuperscript𝑥𝜆x^{\lambda}italic_x start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT (x>0𝑥0x>0italic_x > 0) is convex for λ[0,1]𝜆01\lambda\notin[0,1]italic_λ ∉ [ 0 , 1 ], by Jensen’s Inequality,

n12n+12xλdx>(n12n+12xdx)λ=nλ,ifλ[0,1].formulae-sequencesuperscriptsubscript𝑛12𝑛12superscript𝑥𝜆differential-d𝑥superscriptsuperscriptsubscript𝑛12𝑛12𝑥differential-d𝑥𝜆superscript𝑛𝜆if𝜆01\displaystyle\int_{n-\frac{1}{2}}^{n+\frac{1}{2}}x^{\lambda}{\rm d}x>\left(% \int_{n-\frac{1}{2}}^{n+\frac{1}{2}}x{\rm d}x\right)^{\lambda}=n^{\lambda},\ {% \rm if}\ \lambda\notin[0,1].∫ start_POSTSUBSCRIPT italic_n - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT roman_d italic_x > ( ∫ start_POSTSUBSCRIPT italic_n - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_x roman_d italic_x ) start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT = italic_n start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT , roman_if italic_λ ∉ [ 0 , 1 ] .

It then follows that

(n+12)λ+1(n12)λ+1=(λ+1)n12n+12xλdx{>(λ+1)nλif1<λ<0,orλ>1,<(λ+1)nλ,ifλ<1,superscript𝑛12𝜆1superscript𝑛12𝜆1𝜆1superscriptsubscript𝑛12𝑛12superscript𝑥𝜆differential-d𝑥casesabsent𝜆1superscript𝑛𝜆formulae-sequenceif1𝜆0or𝜆1absent𝜆1superscript𝑛𝜆if𝜆1\displaystyle\left(n+\frac{1}{2}\right)^{\lambda+1}-\left(n-\frac{1}{2}\right)% ^{\lambda+1}=(\lambda+1)\int_{n-\frac{1}{2}}^{n+\frac{1}{2}}x^{\lambda}{\rm d}% x\begin{cases}\displaystyle>(\lambda+1)n^{\lambda}&{\rm if}\ -1<\lambda<0,\ \ % {\rm or}\ \ \lambda>1,\\ \displaystyle<(\lambda+1)n^{\lambda},&{\rm if}\ \lambda<-1,\end{cases}( italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_λ + 1 end_POSTSUPERSCRIPT - ( italic_n - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_λ + 1 end_POSTSUPERSCRIPT = ( italic_λ + 1 ) ∫ start_POSTSUBSCRIPT italic_n - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT roman_d italic_x { start_ROW start_CELL > ( italic_λ + 1 ) italic_n start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT end_CELL start_CELL roman_if - 1 < italic_λ < 0 , roman_or italic_λ > 1 , end_CELL end_ROW start_ROW start_CELL < ( italic_λ + 1 ) italic_n start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT , end_CELL start_CELL roman_if italic_λ < - 1 , end_CELL end_ROW

and

log(n+12)log(n12)=n12n+12x1dx>n1,ifλ=1.formulae-sequence𝑛12𝑛12superscriptsubscript𝑛12𝑛12superscript𝑥1differential-d𝑥superscript𝑛1if𝜆1\displaystyle\log\left(n+\frac{1}{2}\right)-\log\left(n-\frac{1}{2}\right)=% \int_{n-\frac{1}{2}}^{n+\frac{1}{2}}x^{-1}{\rm d}x>n^{-1},\ {\rm if}\ \lambda=% -1.roman_log ( italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) - roman_log ( italic_n - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) = ∫ start_POSTSUBSCRIPT italic_n - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_d italic_x > italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , roman_if italic_λ = - 1 .

Therefore, for λ[0,1]𝜆01\lambda\notin[0,1]italic_λ ∉ [ 0 , 1 ] and λ1𝜆1\lambda\neq-1italic_λ ≠ - 1,

j=1njλ<1λ+1j=1n[(j+12)λ+1(j12)λ+1]=1λ+1[(n+12)λ+1(12)λ+1],superscriptsubscript𝑗1𝑛superscript𝑗𝜆1𝜆1superscriptsubscript𝑗1𝑛delimited-[]superscript𝑗12𝜆1superscript𝑗12𝜆11𝜆1delimited-[]superscript𝑛12𝜆1superscript12𝜆1\displaystyle\sum_{j=1}^{n}j^{\lambda}<\dfrac{1}{\lambda+1}\sum_{j=1}^{n}\left% [\left(j+\frac{1}{2}\right)^{\lambda+1}-\left(j-\frac{1}{2}\right)^{\lambda+1}% \right]=\dfrac{1}{\lambda+1}\left[\left(n+\frac{1}{2}\right)^{\lambda+1}-\left% (\frac{1}{2}\right)^{\lambda+1}\right],∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_j start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT < divide start_ARG 1 end_ARG start_ARG italic_λ + 1 end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ ( italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_λ + 1 end_POSTSUPERSCRIPT - ( italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_λ + 1 end_POSTSUPERSCRIPT ] = divide start_ARG 1 end_ARG start_ARG italic_λ + 1 end_ARG [ ( italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_λ + 1 end_POSTSUPERSCRIPT - ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_λ + 1 end_POSTSUPERSCRIPT ] ,

and for λ=1𝜆1\lambda=-1italic_λ = - 1,

j=1njλ<j=1n[log(j+12)log(j12)]=log(n+12)log(12).superscriptsubscript𝑗1𝑛superscript𝑗𝜆superscriptsubscript𝑗1𝑛delimited-[]𝑗12𝑗12𝑛1212\displaystyle\sum_{j=1}^{n}j^{\lambda}<\sum_{j=1}^{n}\left[\log\left(j+\frac{1% }{2}\right)-\log\left(j-\frac{1}{2}\right)\right]=\log\left(n+\frac{1}{2}% \right)-\log\left(\frac{1}{2}\right).∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_j start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT < ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ roman_log ( italic_j + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) - roman_log ( italic_j - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) ] = roman_log ( italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) - roman_log ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) .

Proof of Proposition 1. We first the result with p=1𝑝1p=1italic_p = 1 and then p2𝑝2p\geq 2italic_p ≥ 2.

(i) When p=1𝑝1p=1italic_p = 1, we want to prove that for kk𝑘superscript𝑘k\neq k^{*}italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT

𝔼Tk(n)8νnΔk2+5𝔼subscript𝑇𝑘𝑛8subscript𝜈𝑛superscriptsubscriptΔ𝑘25\displaystyle{\mathbb{E}}T_{k}(n)\leq\frac{8\nu_{n}}{\Delta_{k}^{2}}+5blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ≤ divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 5 (21)

Let

j=max{j:Tk(j1)8νnΔk2}.superscript𝑗:𝑗subscript𝑇𝑘𝑗18subscript𝜈𝑛superscriptsubscriptΔ𝑘2\displaystyle j^{*}=\max\left\{j:T_{k}(j-1)\leq\frac{8\nu_{n}}{\Delta_{k}^{2}}% \right\}.italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_max { italic_j : italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_j - 1 ) ≤ divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } .

If j=nsuperscript𝑗𝑛j^{*}=nitalic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_n, then (21) is true. Therefore, we just have to consider j<nsuperscript𝑗𝑛j^{*}<nitalic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT < italic_n. Note that for j>j𝑗superscript𝑗j>j^{*}italic_j > italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, we have

Tk(j1)>8νnΔk2.subscript𝑇𝑘𝑗18subscript𝜈𝑛superscriptsubscriptΔ𝑘2\displaystyle T_{k}(j-1)>\frac{8\nu_{n}}{\Delta_{k}^{2}}.italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_j - 1 ) > divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (22)

For j>j𝑗superscript𝑗j>j^{*}italic_j > italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and Ij=ksubscript𝐼𝑗𝑘I_{j}=kitalic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_k, we claim that at least one of the two following inequalities holds:

X¯k[Tk(j1)]+cj,Tk(j1)subscript¯𝑋superscript𝑘delimited-[]subscript𝑇superscript𝑘𝑗1subscript𝑐𝑗subscript𝑇superscript𝑘𝑗1\displaystyle\bar{X}_{k^{*}}[T_{k^{*}}(j-1)]+c_{j,T_{k^{*}}(j-1)}over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_j - 1 ) ] + italic_c start_POSTSUBSCRIPT italic_j , italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_j - 1 ) end_POSTSUBSCRIPT \displaystyle\leq μk,subscript𝜇superscript𝑘\displaystyle\mu_{k^{*}},italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , (23)
X¯k[Tk(j1)]cj,Tk(j1)subscript¯𝑋𝑘delimited-[]subscript𝑇𝑘𝑗1subscript𝑐𝑗subscript𝑇𝑘𝑗1\displaystyle\bar{X}_{k}[T_{k}(j-1)]-c_{j,T_{k}(j-1)}over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_j - 1 ) ] - italic_c start_POSTSUBSCRIPT italic_j , italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_j - 1 ) end_POSTSUBSCRIPT >\displaystyle>> μk,subscript𝜇𝑘\displaystyle\mu_{k},italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , (24)

where cj,s=2νj/ssubscript𝑐𝑗𝑠2subscript𝜈𝑗𝑠c_{j,s}=\sqrt{2\nu_{j}/s}italic_c start_POSTSUBSCRIPT italic_j , italic_s end_POSTSUBSCRIPT = square-root start_ARG 2 italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT / italic_s end_ARG. In fact, by contradiction, if both the inequalities (23) and (24) are false when j>j𝑗superscript𝑗j>j^{*}italic_j > italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and Ij=ksubscript𝐼𝑗𝑘I_{j}=kitalic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_k, we have

X¯k[Tk(j1)]+cj,Tk(j1)>subscript¯𝑋superscript𝑘delimited-[]subscript𝑇superscript𝑘𝑗1subscript𝑐𝑗subscript𝑇superscript𝑘𝑗1absent\displaystyle\bar{X}_{k^{*}}[T_{k^{*}}(j-1)]+c_{j,T_{k^{*}}(j-1)}>over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_j - 1 ) ] + italic_c start_POSTSUBSCRIPT italic_j , italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_j - 1 ) end_POSTSUBSCRIPT > μk=μk+Δk>μk+22νnTk(j1)subscript𝜇superscript𝑘subscript𝜇𝑘subscriptΔ𝑘subscript𝜇𝑘22subscript𝜈𝑛subscript𝑇𝑘𝑗1\displaystyle\mu_{k^{*}}=\mu_{k}+\Delta_{k}>\mu_{k}+2\sqrt{\frac{2\nu_{n}}{T_{% k}(j-1)}}italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + 2 square-root start_ARG divide start_ARG 2 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_j - 1 ) end_ARG end_ARG
\displaystyle\geq μk+22νjTk(j1)X¯k[Tk(j1)]+cj,Tk(j1),subscript𝜇𝑘22subscript𝜈𝑗subscript𝑇𝑘𝑗1subscript¯𝑋𝑘delimited-[]subscript𝑇𝑘𝑗1subscript𝑐𝑗subscript𝑇𝑘𝑗1\displaystyle\mu_{k}+2\sqrt{\frac{2\nu_{j}}{T_{k}(j-1)}}\geq\bar{X}_{k}[T_{k}(% j-1)]+c_{j,T_{k}(j-1)},italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + 2 square-root start_ARG divide start_ARG 2 italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_j - 1 ) end_ARG end_ARG ≥ over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_j - 1 ) ] + italic_c start_POSTSUBSCRIPT italic_j , italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_j - 1 ) end_POSTSUBSCRIPT ,

where the second inequality is due to (22), which contradicts with Ij=ksubscript𝐼𝑗𝑘I_{j}=kitalic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_k.

In the following, we prove (21). Note that

Tk(n)=Tk(j)+j=j+1n𝟙{Ij=k}=Tk(j)+j=j+1n𝟙{Ij=k,(23) or (24) true}subscript𝑇𝑘𝑛subscript𝑇𝑘superscript𝑗superscriptsubscript𝑗superscript𝑗1𝑛1subscript𝐼𝑗𝑘subscript𝑇𝑘superscript𝑗superscriptsubscript𝑗superscript𝑗1𝑛1subscript𝐼𝑗𝑘(23) or (24) true\displaystyle T_{k}(n)=T_{k}(j^{*})+\sum_{j=j^{*}+1}^{n}\mathbbm{1}\{I_{j}=k\}% =T_{k}(j^{*})+\sum_{j=j^{*}+1}^{n}\mathbbm{1}\{I_{j}=k,\mbox{(\ref{eqn:Tk11}) % or (\ref{eqn:Tk12}) true}\}italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) = italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 { italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_k } = italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 { italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_k , ( ) or ( ) true }
Tk(j1)+1+j=j+1n𝟙{(23) or (24) true}8νnΔk2+1+j=j+1n𝟙{(23) or (24) true}.absentsubscript𝑇𝑘superscript𝑗11superscriptsubscript𝑗superscript𝑗1𝑛1(23) or (24) true8subscript𝜈𝑛superscriptsubscriptΔ𝑘21superscriptsubscript𝑗superscript𝑗1𝑛1(23) or (24) true\displaystyle\leq T_{k}(j^{*}-1)+1+\sum_{j=j^{*}+1}^{n}\mathbbm{1}\{\mbox{(% \ref{eqn:Tk11}) or (\ref{eqn:Tk12}) true}\}\leq\frac{8\nu_{n}}{\Delta_{k}^{2}}% +1+\sum_{j=j^{*}+1}^{n}\mathbbm{1}\{\mbox{(\ref{eqn:Tk11}) or (\ref{eqn:Tk12})% true}\}.≤ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - 1 ) + 1 + ∑ start_POSTSUBSCRIPT italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 { ( ) or ( ) true } ≤ divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 1 + ∑ start_POSTSUBSCRIPT italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 { ( ) or ( ) true } . (25)

Taking expectation, it follows that

𝔼Tk(n)𝔼subscript𝑇𝑘𝑛absent\displaystyle{\mathbb{E}}T_{k}(n)\leqblackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ≤ 8νnΔk2+1+j=j+1n((23) or (24) true)8subscript𝜈𝑛superscriptsubscriptΔ𝑘21superscriptsubscript𝑗superscript𝑗1𝑛(23) or (24) true\displaystyle\frac{8\nu_{n}}{\Delta_{k}^{2}}+1+\sum_{j=j^{*}+1}^{n}{\mathbb{P}% }\left(\mbox{(\ref{eqn:Tk11}) or (\ref{eqn:Tk12}) true}\right)divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 1 + ∑ start_POSTSUBSCRIPT italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_P ( ( ) or ( ) true )
\displaystyle\leq 8νnΔk2+1+j=j+1n[((23) true)+((24) true)].8subscript𝜈𝑛superscriptsubscriptΔ𝑘21superscriptsubscript𝑗superscript𝑗1𝑛delimited-[](23) true(24) true\displaystyle\frac{8\nu_{n}}{\Delta_{k}^{2}}+1+\sum_{j=j^{*}+1}^{n}\left[{% \mathbb{P}}\left(\mbox{(\ref{eqn:Tk11}) true}\right)+{\mathbb{P}}\left(\mbox{(% \ref{eqn:Tk12}) true}\right)\right].divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 1 + ∑ start_POSTSUBSCRIPT italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ blackboard_P ( ( ) true ) + blackboard_P ( ( ) true ) ] .

Furthermore, by Lemma A1,

((23) true)=(23) trueabsent\displaystyle{\mathbb{P}}\left(\mbox{(\ref{eqn:Tk11}) true}\right)=blackboard_P ( ( ) true ) = (X¯k[Tk(j1)]+cj,Tk(j1)μk)(mins=1,,j1X¯k[s]+cj,sμk)subscript¯𝑋superscript𝑘delimited-[]subscript𝑇superscript𝑘𝑗1subscript𝑐𝑗subscript𝑇superscript𝑘𝑗1subscript𝜇superscript𝑘subscript𝑠1𝑗1subscript¯𝑋superscript𝑘delimited-[]𝑠subscript𝑐𝑗𝑠subscript𝜇superscript𝑘\displaystyle{\mathbb{P}}\left(\bar{X}_{k^{*}}[T_{k^{*}}(j-1)]+c_{j,T_{k^{*}}(% j-1)}\leq\mu_{k^{*}}\right)\leq{\mathbb{P}}\left(\min_{s=1,...,j-1}\bar{X}_{k^% {*}}[s]+c_{j,s}\leq\mu_{k^{*}}\right)blackboard_P ( over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_j - 1 ) ] + italic_c start_POSTSUBSCRIPT italic_j , italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_j - 1 ) end_POSTSUBSCRIPT ≤ italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ≤ blackboard_P ( roman_min start_POSTSUBSCRIPT italic_s = 1 , … , italic_j - 1 end_POSTSUBSCRIPT over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_s ] + italic_c start_POSTSUBSCRIPT italic_j , italic_s end_POSTSUBSCRIPT ≤ italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT )
=\displaystyle== (s=1j1X¯k[s]+cj,sμk)s=1j1(X¯k[s]+cj,sμk)superscriptsubscript𝑠1𝑗1subscript¯𝑋superscript𝑘delimited-[]𝑠subscript𝑐𝑗𝑠subscript𝜇superscript𝑘superscriptsubscript𝑠1𝑗1subscript¯𝑋superscript𝑘delimited-[]𝑠subscript𝑐𝑗𝑠subscript𝜇superscript𝑘\displaystyle{\mathbb{P}}\left(\cup_{s=1}^{j-1}\bar{X}_{k^{*}}[s]+c_{j,s}\leq% \mu_{k^{*}}\right)\leq\sum_{s=1}^{j-1}{\mathbb{P}}\left(\bar{X}_{k^{*}}[s]+c_{% j,s}\leq\mu_{k^{*}}\right)blackboard_P ( ∪ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_s ] + italic_c start_POSTSUBSCRIPT italic_j , italic_s end_POSTSUBSCRIPT ≤ italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ≤ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT blackboard_P ( over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_s ] + italic_c start_POSTSUBSCRIPT italic_j , italic_s end_POSTSUBSCRIPT ≤ italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT )
\displaystyle\leq s=1j1exp{(scj,s)22sγk2}=(j1)exp{νjγk2}<j3,superscriptsubscript𝑠1𝑗1superscript𝑠subscript𝑐𝑗𝑠22𝑠superscriptsubscript𝛾superscript𝑘2𝑗1subscript𝜈𝑗superscriptsubscript𝛾superscript𝑘2superscript𝑗3\displaystyle\sum_{s=1}^{j-1}\exp\left\{-\frac{(sc_{j,s})^{2}}{2s\gamma_{k^{*}% }^{2}}\right\}=(j-1)\exp\left\{-\frac{\nu_{j}}{\gamma_{k^{*}}^{2}}\right\}<j^{% -3},∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT roman_exp { - divide start_ARG ( italic_s italic_c start_POSTSUBSCRIPT italic_j , italic_s end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_s italic_γ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } = ( italic_j - 1 ) roman_exp { - divide start_ARG italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_γ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG } < italic_j start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT , (26)

where the last inequality is due to the condition νn4γ¯2lognsubscript𝜈𝑛4superscript¯𝛾2𝑛\nu_{n}\geq 4\bar{\gamma}^{2}\log nitalic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ 4 over¯ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log italic_n. Similarly, ((24) true)(24) true{\mathbb{P}}\left(\mbox{(\ref{eqn:Tk12}) true}\right)blackboard_P ( ( ) true ) has the same bound. Therefore,

𝔼Tk(n)8νnΔk2+1+2j=j+1nj38νnΔk2+1+4(n+12)28νnΔk2+5,𝔼subscript𝑇𝑘𝑛8subscript𝜈𝑛superscriptsubscriptΔ𝑘212superscriptsubscript𝑗superscript𝑗1𝑛superscript𝑗38subscript𝜈𝑛superscriptsubscriptΔ𝑘214superscript𝑛1228subscript𝜈𝑛superscriptsubscriptΔ𝑘25\displaystyle{\mathbb{E}}T_{k}(n)\leq\frac{8\nu_{n}}{\Delta_{k}^{2}}+1+2\sum_{% j=j^{*}+1}^{n}j^{-3}\leq\frac{8\nu_{n}}{\Delta_{k}^{2}}+1+4-\left(n+\frac{1}{2% }\right)^{-2}\leq\frac{8\nu_{n}}{\Delta_{k}^{2}}+5,blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ≤ divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 1 + 2 ∑ start_POSTSUBSCRIPT italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_j start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT ≤ divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 1 + 4 - ( italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 5 ,

where the second inequality is by Lamma A3.

(ii) Next, we prove the result with p2𝑝2p\geq 2italic_p ≥ 2. Denote

Rj=𝟙{Ij=k},j=j+1,,n.formulae-sequencesubscript𝑅𝑗1subscript𝐼𝑗𝑘𝑗superscript𝑗1𝑛\displaystyle R_{j}=\mathbbm{1}\left\{I_{j}=k\right\},j=j^{*}+1,...,n.italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = blackboard_1 { italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_k } , italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 , … , italic_n . (27)

Then, by (A.2), we have

Tk(n)8νnΔk2+1+j=j+1nRj.subscript𝑇𝑘𝑛8subscript𝜈𝑛superscriptsubscriptΔ𝑘21superscriptsubscript𝑗superscript𝑗1𝑛subscript𝑅𝑗\displaystyle T_{k}(n)\leq\frac{8\nu_{n}}{\Delta_{k}^{2}}+1+\sum_{j=j^{*}+1}^{% n}R_{j}.italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ≤ divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 1 + ∑ start_POSTSUBSCRIPT italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT . (28)

For any integer p2𝑝2p\geq 2italic_p ≥ 2, by Minkowski’s inequality, we have

𝔼Tk(n)p𝔼subscript𝑇𝑘superscript𝑛𝑝\displaystyle{\mathbb{E}}T_{k}(n)^{p}blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT \displaystyle\leq 𝔼(8νnΔk2+1+j=j+1nRj)p{8νnΔk2+1+[𝔼(j=j+1nRj)p]1p}p.𝔼superscript8subscript𝜈𝑛superscriptsubscriptΔ𝑘21superscriptsubscript𝑗superscript𝑗1𝑛subscript𝑅𝑗𝑝superscript8subscript𝜈𝑛superscriptsubscriptΔ𝑘21superscriptdelimited-[]𝔼superscriptsuperscriptsubscript𝑗superscript𝑗1𝑛subscript𝑅𝑗𝑝1𝑝𝑝\displaystyle{\mathbb{E}}\left(\frac{8\nu_{n}}{\Delta_{k}^{2}}+1+\sum_{j=j^{*}% +1}^{n}R_{j}\right)^{p}\leq\left\{\frac{8\nu_{n}}{\Delta_{k}^{2}}+1+\left[{% \mathbb{E}}\left(\sum_{j=j^{*}+1}^{n}R_{j}\right)^{p}\right]^{\frac{1}{p}}% \right\}^{p}.blackboard_E ( divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 1 + ∑ start_POSTSUBSCRIPT italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ≤ { divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 1 + [ blackboard_E ( ∑ start_POSTSUBSCRIPT italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p end_ARG end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT . (29)

In what follows, we analyze 𝔼(j=j+1nRj)p𝔼superscriptsuperscriptsubscript𝑗superscript𝑗1𝑛subscript𝑅𝑗𝑝{\mathbb{E}}\left(\sum_{j=j^{*}+1}^{n}R_{j}\right)^{p}blackboard_E ( ∑ start_POSTSUBSCRIPT italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT. Note that

(j=j+1nRj)p=j1,,jp{j+1,,n}Rj1Rjpsuperscriptsuperscriptsubscript𝑗superscript𝑗1𝑛subscript𝑅𝑗𝑝subscriptsubscript𝑗1subscript𝑗𝑝superscript𝑗1𝑛subscript𝑅subscript𝑗1subscript𝑅subscript𝑗𝑝\displaystyle\left(\sum_{j=j^{*}+1}^{n}R_{j}\right)^{p}=\sum_{j_{1},...,j_{p}% \in\{j^{*}+1,...,n\}}R_{j_{1}}\cdots R_{j_{p}}( ∑ start_POSTSUBSCRIPT italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_j start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∈ { italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 , … , italic_n } end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT (30)
=\displaystyle== p!j1>>jpRj1Rjp+m=1p1i1++im=pi1,,im1(pi1,,im)j1>>jmRj1i1Rj2i2Rjmim𝑝subscriptsubscript𝑗1subscript𝑗𝑝subscript𝑅subscript𝑗1subscript𝑅subscript𝑗𝑝superscriptsubscript𝑚1𝑝1subscriptsubscript𝑖1subscript𝑖𝑚𝑝subscript𝑖1subscript𝑖𝑚1binomial𝑝subscript𝑖1subscript𝑖𝑚subscriptsubscript𝑗1subscript𝑗𝑚superscriptsubscript𝑅subscript𝑗1subscript𝑖1superscriptsubscript𝑅subscript𝑗2subscript𝑖2superscriptsubscript𝑅subscript𝑗𝑚subscript𝑖𝑚\displaystyle p!\sum_{j_{1}>\cdots>j_{p}}R_{j_{1}}\cdots R_{j_{p}}+\sum_{m=1}^% {p-1}\sum_{\begin{subarray}{l}i_{1}+\cdots+i_{m}=p\\ i_{1},...,i_{m}\geq 1\end{subarray}}\binom{p}{i_{1},...,i_{m}}\sum_{j_{1}>% \cdots>j_{m}}R_{j_{1}}^{i_{1}}R_{j_{2}}^{i_{2}}\cdots R_{j_{m}}^{i_{m}}italic_p ! ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > ⋯ > italic_j start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_p end_CELL end_ROW start_ROW start_CELL italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ≥ 1 end_CELL end_ROW end_ARG end_POSTSUBSCRIPT ( FRACOP start_ARG italic_p end_ARG start_ARG italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG ) ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > ⋯ > italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT
=\displaystyle== p!j1>>jpRj1Rjp+m=1p1i1++im=pi1,,im1(pi1,,im)j1>>jmRj1Rj2Rjm𝑝subscriptsubscript𝑗1subscript𝑗𝑝subscript𝑅subscript𝑗1subscript𝑅subscript𝑗𝑝superscriptsubscript𝑚1𝑝1subscriptsubscript𝑖1subscript𝑖𝑚𝑝subscript𝑖1subscript𝑖𝑚1binomial𝑝subscript𝑖1subscript𝑖𝑚subscriptsubscript𝑗1subscript𝑗𝑚subscript𝑅subscript𝑗1subscript𝑅subscript𝑗2subscript𝑅subscript𝑗𝑚\displaystyle p!\sum_{j_{1}>\cdots>j_{p}}R_{j_{1}}\cdots R_{j_{p}}+\sum_{m=1}^% {p-1}\sum_{\begin{subarray}{l}i_{1}+\cdots+i_{m}=p\\ i_{1},...,i_{m}\geq 1\end{subarray}}\binom{p}{i_{1},...,i_{m}}\sum_{j_{1}>% \cdots>j_{m}}R_{j_{1}}R_{j_{2}}\cdots R_{j_{m}}italic_p ! ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > ⋯ > italic_j start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_p end_CELL end_ROW start_ROW start_CELL italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ≥ 1 end_CELL end_ROW end_ARG end_POSTSUBSCRIPT ( FRACOP start_ARG italic_p end_ARG start_ARG italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG ) ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > ⋯ > italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT

where (pi1,,im)=p!/(i1!im!)binomial𝑝subscript𝑖1subscript𝑖𝑚𝑝subscript𝑖1subscript𝑖𝑚\binom{p}{i_{1},...,i_{m}}=p!/(i_{1}!\cdots i_{m}!)( FRACOP start_ARG italic_p end_ARG start_ARG italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG ) = italic_p ! / ( italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ! ⋯ italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ! ), and the last equality follows from the definition of Rjsubscript𝑅𝑗R_{j}italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in (27).

Consider j1>>jmRj1Rjmsubscriptsubscript𝑗1subscript𝑗𝑚subscript𝑅subscript𝑗1subscript𝑅subscript𝑗𝑚\sum_{j_{1}>\cdots>j_{m}}R_{j_{1}}\cdots R_{j_{m}}∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > ⋯ > italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT when 2mp2𝑚𝑝2\leq m\leq p2 ≤ italic_m ≤ italic_p. By Lemma A3, for a positive integer λ𝜆\lambdaitalic_λ,

j=1njλ<1λ+1(n+1)λ+1.superscriptsubscript𝑗1𝑛superscript𝑗𝜆1𝜆1superscript𝑛1𝜆1\displaystyle\sum_{j=1}^{n}j^{\lambda}<\frac{1}{\lambda+1}\left(n+1\right)^{% \lambda+1}.∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_j start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT < divide start_ARG 1 end_ARG start_ARG italic_λ + 1 end_ARG ( italic_n + 1 ) start_POSTSUPERSCRIPT italic_λ + 1 end_POSTSUPERSCRIPT . (31)

Therefore, for 2mp2𝑚𝑝2\leq m\leq p2 ≤ italic_m ≤ italic_p, we have

j1>>jmRj1Rjm=j1>>jm1Rj1Rjm1jm=j+1jm11Rjmsubscriptsubscript𝑗1subscript𝑗𝑚subscript𝑅subscript𝑗1subscript𝑅subscript𝑗𝑚subscriptsubscript𝑗1subscript𝑗𝑚1subscript𝑅subscript𝑗1subscript𝑅subscript𝑗𝑚1superscriptsubscriptsubscript𝑗𝑚superscript𝑗1subscript𝑗𝑚11subscript𝑅subscript𝑗𝑚\displaystyle\sum_{j_{1}>\cdots>j_{m}}R_{j_{1}}\cdots R_{j_{m}}=\sum_{j_{1}>% \cdots>j_{m-1}}R_{j_{1}}\cdots R_{j_{m-1}}\sum_{j_{m}=j^{*}+1}^{j_{m-1}-1}R_{j% _{m}}∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > ⋯ > italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > ⋯ > italic_j start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT
\displaystyle\leq j1>>jm1Rj1Rjm1jm1=j1>>jm2Rj1Rjm2jm1=j+1jm21Rjm1jm1subscriptsubscript𝑗1subscript𝑗𝑚1subscript𝑅subscript𝑗1subscript𝑅subscript𝑗𝑚1subscript𝑗𝑚1subscriptsubscript𝑗1subscript𝑗𝑚2subscript𝑅subscript𝑗1subscript𝑅subscript𝑗𝑚2superscriptsubscriptsubscript𝑗𝑚1superscript𝑗1subscript𝑗𝑚21subscript𝑅subscript𝑗𝑚1subscript𝑗𝑚1\displaystyle\sum_{j_{1}>\cdots>j_{m-1}}R_{j_{1}}\cdots R_{j_{m-1}}\cdot j_{m-% 1}=\sum_{j_{1}>\cdots>j_{m-2}}R_{j_{1}}\cdots R_{j_{m-2}}\sum_{j_{m-1}=j^{*}+1% }^{j_{m-2}-1}R_{j_{m-1}}\cdot j_{m-1}∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > ⋯ > italic_j start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ italic_j start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > ⋯ > italic_j start_POSTSUBSCRIPT italic_m - 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m - 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j start_POSTSUBSCRIPT italic_m - 2 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ italic_j start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT
\displaystyle\leq j1>>jm2Rj1Rjm2jm1=j+1jm21jm112j1>>jm2Rj1Rjm2jm22,subscriptsubscript𝑗1subscript𝑗𝑚2subscript𝑅subscript𝑗1subscript𝑅subscript𝑗𝑚2superscriptsubscriptsubscript𝑗𝑚1superscript𝑗1subscript𝑗𝑚21subscript𝑗𝑚112subscriptsubscript𝑗1subscript𝑗𝑚2subscript𝑅subscript𝑗1subscript𝑅subscript𝑗𝑚2superscriptsubscript𝑗𝑚22\displaystyle\sum_{j_{1}>\cdots>j_{m-2}}R_{j_{1}}\cdots R_{j_{m-2}}\sum_{j_{m-% 1}=j^{*}+1}^{j_{m-2}-1}j_{m-1}\leq\frac{1}{2}\sum_{j_{1}>\cdots>j_{m-2}}R_{j_{% 1}}\cdots R_{j_{m-2}}\cdot j_{m-2}^{2},∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > ⋯ > italic_j start_POSTSUBSCRIPT italic_m - 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m - 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j start_POSTSUBSCRIPT italic_m - 2 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT italic_j start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > ⋯ > italic_j start_POSTSUBSCRIPT italic_m - 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m - 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ italic_j start_POSTSUBSCRIPT italic_m - 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where the first inequality follows from the fact that |Rj|1subscript𝑅𝑗1|R_{j}|\leq 1| italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≤ 1, and the last inequality follows from (31) by letting λ=1𝜆1\lambda=1italic_λ = 1 and n=jm21𝑛subscript𝑗𝑚21n=j_{m-2}-1italic_n = italic_j start_POSTSUBSCRIPT italic_m - 2 end_POSTSUBSCRIPT - 1.

Iteratively applying the above argument leads to

j1>>jmRj1Rjm1(m1)!j1=j+1nRj1j1m1.subscriptsubscript𝑗1subscript𝑗𝑚subscript𝑅subscript𝑗1subscript𝑅subscript𝑗𝑚1𝑚1superscriptsubscriptsubscript𝑗1superscript𝑗1𝑛subscript𝑅subscript𝑗1superscriptsubscript𝑗1𝑚1\displaystyle\sum_{j_{1}>\cdots>j_{m}}R_{j_{1}}\cdots R_{j_{m}}\leq\frac{1}{(m% -1)!}\sum_{j_{1}=j^{*}+1}^{n}R_{j_{1}}\cdot j_{1}^{m-1}.∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > ⋯ > italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG ( italic_m - 1 ) ! end_ARG ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT . (32)

By (A.2) and (A.2), we know that

𝔼Rj1((23) true)+((24) true)2j13.𝔼subscript𝑅subscript𝑗1(23) true(24) true2superscriptsubscript𝑗13\displaystyle{\mathbb{E}}R_{j_{1}}\leq{\mathbb{P}}\left(\mbox{(\ref{eqn:Tk11})% true}\right)+{\mathbb{P}}\left(\mbox{(\ref{eqn:Tk12}) true}\right)\leq 2j_{1}% ^{-3}.blackboard_E italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ blackboard_P ( ( ) true ) + blackboard_P ( ( ) true ) ≤ 2 italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT .

Hence, combining with (32), we have, for 2mp2𝑚𝑝2\leq m\leq p2 ≤ italic_m ≤ italic_p,

𝔼j1>>jmRj1Rjm1(m1)!j1=j+1nj1m1𝔼Rj12(m1)!j1=1nj1m4.𝔼subscriptsubscript𝑗1subscript𝑗𝑚subscript𝑅subscript𝑗1subscript𝑅subscript𝑗𝑚1𝑚1superscriptsubscriptsubscript𝑗1superscript𝑗1𝑛superscriptsubscript𝑗1𝑚1𝔼subscript𝑅subscript𝑗12𝑚1superscriptsubscriptsubscript𝑗11𝑛superscriptsubscript𝑗1𝑚4\displaystyle{\mathbb{E}}\sum_{j_{1}>\cdots>j_{m}}R_{j_{1}}\cdots R_{j_{m}}% \leq\frac{1}{(m-1)!}\sum_{j_{1}=j^{*}+1}^{n}j_{1}^{m-1}{\mathbb{E}}R_{j_{1}}% \leq\frac{2}{(m-1)!}\sum_{j_{1}=1}^{n}j_{1}^{m-4}.blackboard_E ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > ⋯ > italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG ( italic_m - 1 ) ! end_ARG ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT blackboard_E italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ divide start_ARG 2 end_ARG start_ARG ( italic_m - 1 ) ! end_ARG ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m - 4 end_POSTSUPERSCRIPT .

By Lemma A3, we have

𝔼j1>>jmRj1Rjm{2(21n+12)ifm=2;log(n+12)+log2ifm=3;2(m1)!1m3(n+12)m3ifm4.𝔼subscriptsubscript𝑗1subscript𝑗𝑚subscript𝑅subscript𝑗1subscript𝑅subscript𝑗𝑚cases221𝑛12𝑖𝑓𝑚2𝑛122𝑖𝑓𝑚32𝑚11𝑚3superscript𝑛12𝑚3𝑖𝑓𝑚4\displaystyle{\mathbb{E}}\sum_{j_{1}>\cdots>j_{m}}R_{j_{1}}\cdots R_{j_{m}}% \leq\begin{cases}\displaystyle 2\left(2-\frac{1}{n+\frac{1}{2}}\right)&if\ m=2% ;\\ \displaystyle\log\left(n+\frac{1}{2}\right)+\log 2&if\ m=3;\\ \displaystyle\frac{2}{(m-1)!}\frac{1}{m-3}\left(n+\frac{1}{2}\right)^{m-3}&if% \ m\geq 4.\\ \end{cases}blackboard_E ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > ⋯ > italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ { start_ROW start_CELL 2 ( 2 - divide start_ARG 1 end_ARG start_ARG italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_ARG ) end_CELL start_CELL italic_i italic_f italic_m = 2 ; end_CELL end_ROW start_ROW start_CELL roman_log ( italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) + roman_log 2 end_CELL start_CELL italic_i italic_f italic_m = 3 ; end_CELL end_ROW start_ROW start_CELL divide start_ARG 2 end_ARG start_ARG ( italic_m - 1 ) ! end_ARG divide start_ARG 1 end_ARG start_ARG italic_m - 3 end_ARG ( italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT italic_m - 3 end_POSTSUPERSCRIPT end_CELL start_CELL italic_i italic_f italic_m ≥ 4 . end_CELL end_ROW (33)

By (30) and (33), it follows that for p4𝑝4p\geq 4italic_p ≥ 4,

𝔼(j=j+1nRj)p𝔼superscriptsuperscriptsubscript𝑗superscript𝑗1𝑛subscript𝑅𝑗𝑝\displaystyle{\mathbb{E}}\left(\sum_{j=j^{*}+1}^{n}R_{j}\right)^{p}blackboard_E ( ∑ start_POSTSUBSCRIPT italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT =p!𝔼j1>>jpRj1Rjp+m=1p1i1++im=pi1,,im1(pi1,,im)𝔼j1>>jmRj1Rj2Rjmabsent𝑝𝔼subscriptsubscript𝑗1subscript𝑗𝑝subscript𝑅subscript𝑗1subscript𝑅subscript𝑗𝑝superscriptsubscript𝑚1𝑝1subscriptsubscript𝑖1subscript𝑖𝑚𝑝subscript𝑖1subscript𝑖𝑚1binomial𝑝subscript𝑖1subscript𝑖𝑚𝔼subscriptsubscript𝑗1subscript𝑗𝑚subscript𝑅subscript𝑗1subscript𝑅subscript𝑗2subscript𝑅subscript𝑗𝑚\displaystyle=p!\cdot{\mathbb{E}}\sum_{j_{1}>\cdots>j_{p}}R_{j_{1}}\cdots R_{j% _{p}}+\sum_{m=1}^{p-1}\sum_{\begin{subarray}{l}i_{1}+\cdots+i_{m}=p\\ i_{1},...,i_{m}\geq 1\end{subarray}}\binom{p}{i_{1},...,i_{m}}\cdot{\mathbb{E}% }\sum_{j_{1}>\cdots>j_{m}}R_{j_{1}}R_{j_{2}}\cdots R_{j_{m}}= italic_p ! ⋅ blackboard_E ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > ⋯ > italic_j start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_p end_CELL end_ROW start_ROW start_CELL italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ≥ 1 end_CELL end_ROW end_ARG end_POSTSUBSCRIPT ( FRACOP start_ARG italic_p end_ARG start_ARG italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG ) ⋅ blackboard_E ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > ⋯ > italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋯ italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT
2pp3(n+1)p3+O((n+1)p4).absent2𝑝𝑝3superscript𝑛1𝑝3𝑂superscript𝑛1𝑝4\displaystyle\leq\frac{2p}{p-3}(n+1)^{p-3}+O\left((n+1)^{p-4}\right).≤ divide start_ARG 2 italic_p end_ARG start_ARG italic_p - 3 end_ARG ( italic_n + 1 ) start_POSTSUPERSCRIPT italic_p - 3 end_POSTSUPERSCRIPT + italic_O ( ( italic_n + 1 ) start_POSTSUPERSCRIPT italic_p - 4 end_POSTSUPERSCRIPT ) .

Plugging the above into (29), we have, for p4𝑝4p\geq 4italic_p ≥ 4,

𝔼Tk(n)p{8νnΔk2+1+[2pp3(n+1)p3+O(np4)]1p}p.𝔼subscript𝑇𝑘superscript𝑛𝑝superscript8subscript𝜈𝑛superscriptsubscriptΔ𝑘21superscriptdelimited-[]2𝑝𝑝3superscript𝑛1𝑝3𝑂superscript𝑛𝑝41𝑝𝑝\displaystyle{\mathbb{E}}T_{k}(n)^{p}\leq\left\{\frac{8\nu_{n}}{\Delta_{k}^{2}% }+1+\left[\frac{2p}{p-3}(n+1)^{p-3}+O\left(n^{p-4}\right)\right]^{\frac{1}{p}}% \right\}^{p}.blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ≤ { divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 1 + [ divide start_ARG 2 italic_p end_ARG start_ARG italic_p - 3 end_ARG ( italic_n + 1 ) start_POSTSUPERSCRIPT italic_p - 3 end_POSTSUPERSCRIPT + italic_O ( italic_n start_POSTSUPERSCRIPT italic_p - 4 end_POSTSUPERSCRIPT ) ] start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p end_ARG end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT .

Similarly, for p=3𝑝3p=3italic_p = 3, we have

𝔼(j=j+1nRj)3𝔼superscriptsuperscriptsubscript𝑗superscript𝑗1𝑛subscript𝑅𝑗3\displaystyle{\mathbb{E}}\left(\sum_{j=j^{*}+1}^{n}R_{j}\right)^{3}blackboard_E ( ∑ start_POSTSUBSCRIPT italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT =6[log(n+12)+log2]+O(1)6log(n+1)+O(1).absent6delimited-[]𝑛122𝑂16𝑛1𝑂1\displaystyle=6\left[\log\left(n+\frac{1}{2}\right)+\log 2\right]+O(1)\leq 6% \log\left(n+1\right)+O(1).= 6 [ roman_log ( italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) + roman_log 2 ] + italic_O ( 1 ) ≤ 6 roman_log ( italic_n + 1 ) + italic_O ( 1 ) .

Hence,

𝔼Tk(n)3{8νnΔk2+1+[6log(n+1)+O(1)]13}3.𝔼subscript𝑇𝑘superscript𝑛3superscript8subscript𝜈𝑛superscriptsubscriptΔ𝑘21superscriptdelimited-[]6𝑛1𝑂1133\displaystyle{\mathbb{E}}T_{k}(n)^{3}\leq\left\{\frac{8\nu_{n}}{\Delta_{k}^{2}% }+1+\left[6\log\left(n+1\right)+O(1)\right]^{\frac{1}{3}}\right\}^{3}.blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ≤ { divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 1 + [ 6 roman_log ( italic_n + 1 ) + italic_O ( 1 ) ] start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT .

For p=2𝑝2p=2italic_p = 2, it follows from (31) and (33) that

𝔼(j=j+1nRj)2=j=j+1n𝔼Rj2+2𝔼j1>j2Rj1Rj2=j=j+1n𝔼Rj+2𝔼j1>j2Rj1Rj2𝔼superscriptsuperscriptsubscript𝑗superscript𝑗1𝑛subscript𝑅𝑗2superscriptsubscript𝑗superscript𝑗1𝑛𝔼superscriptsubscript𝑅𝑗22𝔼subscriptsubscript𝑗1subscript𝑗2subscript𝑅subscript𝑗1subscript𝑅subscript𝑗2superscriptsubscript𝑗superscript𝑗1𝑛𝔼subscript𝑅𝑗2𝔼subscriptsubscript𝑗1subscript𝑗2subscript𝑅subscript𝑗1subscript𝑅subscript𝑗2\displaystyle{\mathbb{E}}\left(\sum_{j=j^{*}+1}^{n}R_{j}\right)^{2}=\sum_{j=j^% {*}+1}^{n}{\mathbb{E}}R_{j}^{2}+2{\mathbb{E}}\sum_{j_{1}>j_{2}}R_{j_{1}}R_{j_{% 2}}=\sum_{j=j^{*}+1}^{n}{\mathbb{E}}R_{j}+2{\mathbb{E}}\sum_{j_{1}>j_{2}}R_{j_% {1}}R_{j_{2}}blackboard_E ( ∑ start_POSTSUBSCRIPT italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 blackboard_E ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + 2 blackboard_E ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT
j1=j+1n2j13+4[21n+12]<4(n+12)2+4[21n+12]<12,absentsuperscriptsubscriptsubscript𝑗1superscript𝑗1𝑛2superscriptsubscript𝑗134delimited-[]21𝑛124superscript𝑛1224delimited-[]21𝑛1212\displaystyle\leq\sum_{j_{1}=j^{*}+1}^{n}2j_{1}^{-3}+4\left[2-\frac{1}{n+\frac% {1}{2}}\right]<4-\left(n+\frac{1}{2}\right)^{-2}+4\left[2-\frac{1}{n+\frac{1}{% 2}}\right]<12,≤ ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_j start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT 2 italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT + 4 [ 2 - divide start_ARG 1 end_ARG start_ARG italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_ARG ] < 4 - ( italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT + 4 [ 2 - divide start_ARG 1 end_ARG start_ARG italic_n + divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_ARG ] < 12 ,

and thus

𝔼Tk(n)2{8νnΔk2+1+12}2<{8νnΔk2+5}2.𝔼subscript𝑇𝑘superscript𝑛2superscript8subscript𝜈𝑛superscriptsubscriptΔ𝑘21122superscript8subscript𝜈𝑛superscriptsubscriptΔ𝑘252\displaystyle{\mathbb{E}}T_{k}(n)^{2}\leq\left\{\frac{8\nu_{n}}{\Delta_{k}^{2}% }+1+\sqrt{12}\right\}^{2}<\left\{\frac{8\nu_{n}}{\Delta_{k}^{2}}+5\right\}^{2}.blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ { divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 1 + square-root start_ARG 12 end_ARG } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < { divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 5 } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

The proof is complete.

A.3 Proof of Theorem 1

When kk𝑘superscript𝑘k\neq k^{*}italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, by Markov’s inequality and Proposition 1, for any ε>0𝜀0\varepsilon>0italic_ε > 0 and p3𝑝3p\geq 3italic_p ≥ 3,

(Tk(n)n>ε)1εp𝔼[Tk(n)n]psubscript𝑇𝑘𝑛𝑛𝜀1superscript𝜀𝑝𝔼superscriptdelimited-[]subscript𝑇𝑘𝑛𝑛𝑝\displaystyle{\mathbb{P}}\left(\frac{T_{k}(n)}{n}>\varepsilon\right)\leq\frac{% 1}{\varepsilon^{p}}{\mathbb{E}}\left[\frac{T_{k}(n)}{n}\right]^{p}blackboard_P ( divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG > italic_ε ) ≤ divide start_ARG 1 end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG blackboard_E [ divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG ] start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT \displaystyle\leq 1εp{8Δk2νnn+1n+[2pp31n3+O(n4)]1p}p1superscript𝜀𝑝superscript8superscriptsubscriptΔ𝑘2subscript𝜈𝑛𝑛1𝑛superscriptdelimited-[]2𝑝𝑝31superscript𝑛3𝑂superscript𝑛41𝑝𝑝\displaystyle\frac{1}{\varepsilon^{p}}\left\{\frac{8}{\Delta_{k}^{2}}\frac{\nu% _{n}}{n}+\frac{1}{n}+\left[\frac{2p}{p-3}\frac{1}{n^{3}}+O\left(n^{-4}\right)% \right]^{\frac{1}{p}}\right\}^{p}divide start_ARG 1 end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG { divide start_ARG 8 end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG 1 end_ARG start_ARG italic_n end_ARG + [ divide start_ARG 2 italic_p end_ARG start_ARG italic_p - 3 end_ARG divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG + italic_O ( italic_n start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT ) ] start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p end_ARG end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT
\displaystyle\leq 2p1εp[(8Δk2νnn+1n)p+2pp31n3+O(n4)].superscript2𝑝1superscript𝜀𝑝delimited-[]superscript8superscriptsubscriptΔ𝑘2subscript𝜈𝑛𝑛1𝑛𝑝2𝑝𝑝31superscript𝑛3𝑂superscript𝑛4\displaystyle\frac{2^{p-1}}{\varepsilon^{p}}\left[\left(\frac{8}{\Delta_{k}^{2% }}\frac{\nu_{n}}{n}+\frac{1}{n}\right)^{p}+\frac{2p}{p-3}\frac{1}{n^{3}}+O% \left(n^{-4}\right)\right].divide start_ARG 2 start_POSTSUPERSCRIPT italic_p - 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG [ ( divide start_ARG 8 end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT + divide start_ARG 2 italic_p end_ARG start_ARG italic_p - 3 end_ARG divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG + italic_O ( italic_n start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT ) ] .

Because νnn1δsubscript𝜈𝑛superscript𝑛1𝛿\nu_{n}\leq n^{1-\delta}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_n start_POSTSUPERSCRIPT 1 - italic_δ end_POSTSUPERSCRIPT, it follows that for p4𝑝4p\geq 4italic_p ≥ 4,

(Tk(n)n>ε)subscript𝑇𝑘𝑛𝑛𝜀\displaystyle{\mathbb{P}}\left(\frac{T_{k}(n)}{n}>\varepsilon\right)blackboard_P ( divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG > italic_ε ) \displaystyle\leq 2p1εp[(8Δk2nδ+1n)p+2pp31n3+O(n4)]superscript2𝑝1superscript𝜀𝑝delimited-[]superscript8superscriptsubscriptΔ𝑘2superscript𝑛𝛿1𝑛𝑝2𝑝𝑝31superscript𝑛3𝑂superscript𝑛4\displaystyle\frac{2^{p-1}}{\varepsilon^{p}}\left[\left(\frac{8}{\Delta_{k}^{2% }n^{\delta}}+\frac{1}{n}\right)^{p}+\frac{2p}{p-3}\frac{1}{n^{3}}+O\left(n^{-4% }\right)\right]divide start_ARG 2 start_POSTSUPERSCRIPT italic_p - 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG [ ( divide start_ARG 8 end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT + divide start_ARG 2 italic_p end_ARG start_ARG italic_p - 3 end_ARG divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG + italic_O ( italic_n start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT ) ] (34)
\displaystyle\leq 2p1εp[2p1(8Δk2npδ+1np)+2pp31n3+O(n4)].superscript2𝑝1superscript𝜀𝑝delimited-[]superscript2𝑝18superscriptsubscriptΔ𝑘2superscript𝑛𝑝𝛿1superscript𝑛𝑝2𝑝𝑝31superscript𝑛3𝑂superscript𝑛4\displaystyle\frac{2^{p-1}}{\varepsilon^{p}}\left[2^{p-1}\left(\frac{8}{\Delta% _{k}^{2}n^{p\delta}}+\frac{1}{n^{p}}\right)+\frac{2p}{p-3}\frac{1}{n^{3}}+O% \left(n^{-4}\right)\right].divide start_ARG 2 start_POSTSUPERSCRIPT italic_p - 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG [ 2 start_POSTSUPERSCRIPT italic_p - 1 end_POSTSUPERSCRIPT ( divide start_ARG 8 end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT italic_p italic_δ end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG ) + divide start_ARG 2 italic_p end_ARG start_ARG italic_p - 3 end_ARG divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG + italic_O ( italic_n start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT ) ] .

Let p>max{1/δ,4}𝑝1𝛿4p>\max\{1/\delta,4\}italic_p > roman_max { 1 / italic_δ , 4 }. Then, for any ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0,

n=1(Tk(n)n>ε)<.superscriptsubscript𝑛1subscript𝑇𝑘𝑛𝑛𝜀\displaystyle\sum_{n=1}^{\infty}{\mathbb{P}}\left(\frac{T_{k}(n)}{n}>% \varepsilon\right)<\infty.∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT blackboard_P ( divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG > italic_ε ) < ∞ .

By Borel-Cantelli lemma, Tk(n)/na.s.0T_{k}(n)/n\overset{a.s.}{\longrightarrow}0italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) / italic_n start_OVERACCENT italic_a . italic_s . end_OVERACCENT start_ARG ⟶ end_ARG 0 as n𝑛n\to\inftyitalic_n → ∞, for kk𝑘superscript𝑘k\neq k^{*}italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, and thus

Tk(n)n=1kkTk(n)na.s.1.\displaystyle\frac{T_{k^{*}}(n)}{n}=1-\sum_{k\neq k^{*}}\frac{T_{k}(n)}{n}% \overset{a.s.}{\longrightarrow}1.divide start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG = 1 - ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG start_OVERACCENT italic_a . italic_s . end_OVERACCENT start_ARG ⟶ end_ARG 1 .

A.4 Proof of Theorem 2

It follows from Theorem 1 that

(ωΩ:limnTk(n)(ω)n=0,kk)=1.\displaystyle{\mathbb{P}}\left(\omega\in\Omega:\lim_{n\to\infty}\frac{T_{k}(n)% (\omega)}{n}=0,\ \forall k\neq k^{*}\right)=1.blackboard_P ( italic_ω ∈ roman_Ω : roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ( italic_ω ) end_ARG start_ARG italic_n end_ARG = 0 , ∀ italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = 1 .

Because k=1KTk(n)=nsuperscriptsubscript𝑘1𝐾subscript𝑇𝑘𝑛𝑛\sum_{k=1}^{K}T_{k}(n)=n∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) = italic_n, the event limnTk(n)(ω)n=0,kkformulae-sequencesubscript𝑛subscript𝑇𝑘𝑛𝜔𝑛0for-all𝑘superscript𝑘\lim_{n\to\infty}\frac{T_{k}(n)(\omega)}{n}=0,\ \forall k\neq k^{*}roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ( italic_ω ) end_ARG start_ARG italic_n end_ARG = 0 , ∀ italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT implies that limnTk(n)n=1subscript𝑛subscript𝑇superscript𝑘𝑛𝑛1\lim_{n\to\infty}\frac{T_{k^{*}}(n)}{n}=1roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG = 1. Therefore,

(ωΩ:limnTk(n)(ω)n=0,kk,andlimnTk(n)(ω)n=1)=1.\displaystyle{\mathbb{P}}\left(\omega\in\Omega:\lim_{n\to\infty}\frac{T_{k}(n)% (\omega)}{n}=0,\ \forall k\neq k^{*},\ {\rm and}\ \lim_{n\to\infty}\frac{T_{k^% {*}}(n)(\omega)}{n}=1\right)=1.blackboard_P ( italic_ω ∈ roman_Ω : roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ( italic_ω ) end_ARG start_ARG italic_n end_ARG = 0 , ∀ italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , roman_and roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ( italic_ω ) end_ARG start_ARG italic_n end_ARG = 1 ) = 1 .

Then, (A)=1𝐴1{\mathbb{P}}(A)=1blackboard_P ( italic_A ) = 1, where

A={ωΩ:limn[Tk(n)(ω)nTk(n)(ω)n]=1,kk}.𝐴conditional-set𝜔Ωformulae-sequencesubscript𝑛delimited-[]subscript𝑇superscript𝑘𝑛𝜔𝑛subscript𝑇𝑘𝑛𝜔𝑛1for-all𝑘superscript𝑘\displaystyle A=\left\{\omega\in\Omega:\lim_{n\to\infty}\left[\frac{T_{k^{*}}(% n)(\omega)}{n}-\frac{T_{k}(n)(\omega)}{n}\right]=1,\ \forall k\neq k^{*}\right\}.italic_A = { italic_ω ∈ roman_Ω : roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT [ divide start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ( italic_ω ) end_ARG start_ARG italic_n end_ARG - divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ( italic_ω ) end_ARG start_ARG italic_n end_ARG ] = 1 , ∀ italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } .

In other words, for any ωA𝜔𝐴\omega\in Aitalic_ω ∈ italic_A and 0<ε<10𝜀10<\varepsilon<10 < italic_ε < 1, there exists N=N(ω,ε)𝑁𝑁𝜔𝜀N=N(\omega,\varepsilon)\in{\mathbb{N}}italic_N = italic_N ( italic_ω , italic_ε ) ∈ blackboard_N, such that for any n>N𝑛𝑁n>Nitalic_n > italic_N,

0<1ε<Tk(n)(ω)nTk(n)(ω)n<1+ε,01𝜀subscript𝑇superscript𝑘𝑛𝜔𝑛subscript𝑇𝑘𝑛𝜔𝑛1𝜀\displaystyle 0<1-\varepsilon<\frac{T_{k^{*}}(n)(\omega)}{n}-\frac{T_{k}(n)(% \omega)}{n}<1+\varepsilon,0 < 1 - italic_ε < divide start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ( italic_ω ) end_ARG start_ARG italic_n end_ARG - divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ( italic_ω ) end_ARG start_ARG italic_n end_ARG < 1 + italic_ε ,

which implies

max{T1(n)(ω),,TK(n)(ω)}=Tk(n)(ω).subscript𝑇1𝑛𝜔subscript𝑇𝐾𝑛𝜔subscript𝑇superscript𝑘𝑛𝜔\displaystyle\max\left\{{T_{1}(n)(\omega)},...,{T_{K}(n)(\omega)}\right\}={T_{% k^{*}}(n)(\omega)}.roman_max { italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_n ) ( italic_ω ) , … , italic_T start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_n ) ( italic_ω ) } = italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ( italic_ω ) .

By the uniqueness assumption of ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, we have

(ωΩ:0<ε<1,N=N(ω,ε),s.t.n>N,In(ω)=k)=1,\displaystyle{\mathbb{P}}\left(\omega\in\Omega:\forall 0<\varepsilon<1,\exists N% =N(\omega,\varepsilon)\in{\mathbb{N}},s.t.\ \forall n>N,I^{*}_{n}(\omega)=k^{*% }\right)=1,blackboard_P ( italic_ω ∈ roman_Ω : ∀ 0 < italic_ε < 1 , ∃ italic_N = italic_N ( italic_ω , italic_ε ) ∈ blackboard_N , italic_s . italic_t . ∀ italic_n > italic_N , italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_ω ) = italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = 1 , (35)

i.e., Ina.s.kI^{*}_{n}\overset{a.s.}{\longrightarrow}k^{*}italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_OVERACCENT italic_a . italic_s . end_OVERACCENT start_ARG ⟶ end_ARG italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT as n𝑛n\rightarrow\inftyitalic_n → ∞.

A.5 Proof of Theorem 4

We analyze the bias and variance of M~nsubscript~𝑀𝑛\widetilde{M}_{n}over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT separately.

The Bias. We assert that Tk(n)subscript𝑇𝑘𝑛T_{k}(n)italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) is a stopping time. In fact, given the filtration generated by {Xij,ik,j=1,2,}formulae-sequencesubscript𝑋𝑖𝑗𝑖𝑘𝑗12\{X_{ij},i\neq k,j=1,2,\dots\}{ italic_X start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , italic_i ≠ italic_k , italic_j = 1 , 2 , … }, denoted by σ(Xij,ik,j=1,2,)\sigma\left(X_{ij},i\neq k,j=1,2,\dots\right)italic_σ ( italic_X start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , italic_i ≠ italic_k , italic_j = 1 , 2 , … ), one has

{Tk(n)m}={Tk(n)m1}cσ(Xk1,,Xk,m1).subscript𝑇𝑘𝑛𝑚superscriptsubscript𝑇𝑘𝑛𝑚1𝑐𝜎subscript𝑋𝑘1subscript𝑋𝑘𝑚1\{T_{k}(n)\geq m\}=\{T_{k}(n)\leq m-1\}^{c}\in\sigma\left(X_{k1},\dots,X_{k,m-% 1}\right).{ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ≥ italic_m } = { italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ≤ italic_m - 1 } start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ∈ italic_σ ( italic_X start_POSTSUBSCRIPT italic_k 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_k , italic_m - 1 end_POSTSUBSCRIPT ) .

Hence Tk(n)subscript𝑇𝑘𝑛T_{k}(n)italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) is a stopping time with respect to {Xkj,j1}subscript𝑋𝑘𝑗𝑗1\{X_{kj},j\geq 1\}{ italic_X start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT , italic_j ≥ 1 }. Then by Wald’s equation,

𝔼[j=1Tk(n)Xkj]=μk𝔼[Tk(n)],𝔼delimited-[]superscriptsubscript𝑗1subscript𝑇𝑘𝑛subscript𝑋𝑘𝑗subscript𝜇𝑘𝔼delimited-[]subscript𝑇𝑘𝑛\displaystyle{\mathbb{E}}\left[\sum_{j=1}^{T_{k}(n)}X_{kj}\right]=\mu_{k}{% \mathbb{E}}\left[T_{k}(n)\right],blackboard_E [ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT ] = italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT blackboard_E [ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ] ,

leading to

𝔼M~n=1nk=1Kμk𝔼[Tk(n)].𝔼subscript~𝑀𝑛1𝑛superscriptsubscript𝑘1𝐾subscript𝜇𝑘𝔼delimited-[]subscript𝑇𝑘𝑛{\mathbb{E}}\widetilde{M}_{n}={1\over n}\sum_{k=1}^{K}\mu_{k}{\mathbb{E}}\left% [T_{k}(n)\right].blackboard_E over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT blackboard_E [ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ] .

Thus,

Bias(M~n)Biassubscript~𝑀𝑛\displaystyle{\rm Bias}\left(\widetilde{M}_{n}\right)roman_Bias ( over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) =\displaystyle== 𝔼M~nμ=(1+𝔼Tk(n)n)μk+kk𝔼Tk(n)nμk𝔼subscript~𝑀𝑛superscript𝜇1𝔼subscript𝑇superscript𝑘𝑛𝑛subscript𝜇superscript𝑘subscript𝑘superscript𝑘𝔼subscript𝑇𝑘𝑛𝑛subscript𝜇𝑘\displaystyle{\mathbb{E}}\widetilde{M}_{n}-\mu^{*}=\left(-1+\frac{{\mathbb{E}}% T_{k^{*}}(n)}{n}\right)\mu_{k^{*}}+\sum_{k\neq k^{*}}\frac{{\mathbb{E}}T_{k}(n% )}{n}\mu_{k}blackboard_E over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( - 1 + divide start_ARG blackboard_E italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG ) italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
=\displaystyle== kk𝔼Tk(n)nμk+kk𝔼Tk(n)nμk=kk𝔼Tk(n)n(μkμk).subscript𝑘superscript𝑘𝔼subscript𝑇𝑘𝑛𝑛subscript𝜇superscript𝑘subscript𝑘superscript𝑘𝔼subscript𝑇𝑘𝑛𝑛subscript𝜇𝑘subscript𝑘superscript𝑘𝔼subscript𝑇𝑘𝑛𝑛subscript𝜇𝑘subscript𝜇superscript𝑘\displaystyle-\sum_{k\neq k^{*}}\frac{{\mathbb{E}}T_{k}(n)}{n}\mu_{k^{*}}+\sum% _{k\neq k^{*}}\frac{{\mathbb{E}}T_{k}(n)}{n}\mu_{k}=\sum_{k\neq k^{*}}\frac{{% \mathbb{E}}T_{k}(n)}{n}\left(\mu_{k}-\mu_{k^{*}}\right).- ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG ( italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) .

Because μk<μksubscript𝜇𝑘subscript𝜇superscript𝑘\mu_{k}<\mu_{k^{*}}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT < italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT for kk𝑘superscript𝑘k\neq k^{*}italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, we have Bias(M~n)<0Biassubscript~𝑀𝑛0{\rm Bias}\left(\widetilde{M}_{n}\right)<0roman_Bias ( over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) < 0.

By Proposition 1,

|Bias(M~In)|kk1n(8νnΔk2+5)(μkμk)kk1n(8νnΔk2+5Δk)=O(νnn).Biassubscript~𝑀superscriptsubscript𝐼𝑛subscript𝑘superscript𝑘1𝑛8subscript𝜈𝑛superscriptsubscriptΔ𝑘25subscript𝜇superscript𝑘subscript𝜇𝑘subscript𝑘superscript𝑘1𝑛8subscript𝜈𝑛superscriptsubscriptΔ𝑘25subscriptΔ𝑘𝑂subscript𝜈𝑛𝑛\displaystyle\left|{\rm Bias}\left(\widetilde{M}_{I_{n}^{*}}\right)\right|\leq% \sum_{k\neq k^{*}}\frac{1}{n}\left(\frac{8\nu_{n}}{\Delta_{k}^{2}}+5\right)% \left(\mu_{k^{*}}-\mu_{k}\right)\leq\sum_{k\neq k^{*}}\frac{1}{n}\left(\frac{8% \nu_{n}}{\Delta_{k}^{2}}+5\Delta_{k}\right)=O\left(\frac{\nu_{n}}{n}\right).| roman_Bias ( over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) | ≤ ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 5 ) ( italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≤ ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 5 roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = italic_O ( divide start_ARG italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG ) . (36)

The Variance. Note that

Var[M~n]=𝔼(M~n𝔼[M~n])2Vardelimited-[]subscript~𝑀𝑛𝔼superscriptsubscript~𝑀𝑛𝔼delimited-[]subscript~𝑀𝑛2\displaystyle{\rm Var}\left[\widetilde{M}_{n}\right]={\mathbb{E}}\left(% \widetilde{M}_{n}-{\mathbb{E}}\left[\widetilde{M}_{n}\right]\right)^{2}roman_Var [ over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] = blackboard_E ( over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - blackboard_E [ over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (37)
=\displaystyle== 𝔼{(M~n1nk=1KμkTk(n))+(1nk=1KμkTk(n)1nk=1Kμk𝔼Tk(n))}2𝔼superscriptsubscript~𝑀𝑛1𝑛superscriptsubscript𝑘1𝐾subscript𝜇𝑘subscript𝑇𝑘𝑛1𝑛superscriptsubscript𝑘1𝐾subscript𝜇𝑘subscript𝑇𝑘𝑛1𝑛superscriptsubscript𝑘1𝐾subscript𝜇𝑘𝔼subscript𝑇𝑘𝑛2\displaystyle{\mathbb{E}}\left\{\left(\widetilde{M}_{n}-{1\over n}\sum_{k=1}^{% K}\mu_{k}T_{k}(n)\right)+\left({1\over n}\sum_{k=1}^{K}\mu_{k}T_{k}(n)-{1\over n% }\sum_{k=1}^{K}\mu_{k}{\mathbb{E}}T_{k}(n)\right)\right\}^{2}blackboard_E { ( over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ) + ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ) } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
\displaystyle\leq 2𝔼(M~n1nk=1KμkTk(n))2+2𝔼(1nk=1KμkTk(n)1nk=1Kμk𝔼Tk(n))2.2𝔼superscriptsubscript~𝑀𝑛1𝑛superscriptsubscript𝑘1𝐾subscript𝜇𝑘subscript𝑇𝑘𝑛22𝔼superscript1𝑛superscriptsubscript𝑘1𝐾subscript𝜇𝑘subscript𝑇𝑘𝑛1𝑛superscriptsubscript𝑘1𝐾subscript𝜇𝑘𝔼subscript𝑇𝑘𝑛2\displaystyle 2{\mathbb{E}}\left(\widetilde{M}_{n}-{1\over n}\sum_{k=1}^{K}\mu% _{k}T_{k}(n)\right)^{2}+2{\mathbb{E}}\left({1\over n}\sum_{k=1}^{K}\mu_{k}T_{k% }(n)-{1\over n}\sum_{k=1}^{K}\mu_{k}{\mathbb{E}}T_{k}(n)\right)^{2}.2 blackboard_E ( over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 blackboard_E ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

We analyze the two terms on the RHS of (37) separately. By the Wald’s Lemma (see Theorem 13.2.14 in Athreya and Lahiri 2006),

𝔼(j=1Tk(n)XkjμkTk(n))2=σk2𝔼Tk(n).𝔼superscriptsuperscriptsubscript𝑗1subscript𝑇𝑘𝑛subscript𝑋𝑘𝑗subscript𝜇𝑘subscript𝑇𝑘𝑛2superscriptsubscript𝜎𝑘2𝔼subscript𝑇𝑘𝑛\displaystyle{\mathbb{E}}\left(\sum_{j=1}^{T_{k}(n)}X_{kj}-\mu_{k}T_{k}(n)% \right)^{2}=\sigma_{k}^{2}{\mathbb{E}}T_{k}(n).blackboard_E ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) . (38)

Then, by (7) and Cauchy-Schwarz inequality, it can be seen that

𝔼(M~n1nk=1KμkTk(n))2Kn2k=1K𝔼(j=1Tk(n)XkjμkTk(n))2=Kn2k=1Kσk2𝔼Tk(n)𝔼superscriptsubscript~𝑀𝑛1𝑛superscriptsubscript𝑘1𝐾subscript𝜇𝑘subscript𝑇𝑘𝑛2𝐾superscript𝑛2superscriptsubscript𝑘1𝐾𝔼superscriptsuperscriptsubscript𝑗1subscript𝑇𝑘𝑛subscript𝑋𝑘𝑗subscript𝜇𝑘subscript𝑇𝑘𝑛2𝐾superscript𝑛2superscriptsubscript𝑘1𝐾superscriptsubscript𝜎𝑘2𝔼subscript𝑇𝑘𝑛\displaystyle{\mathbb{E}}\left(\widetilde{M}_{n}-{1\over n}\sum_{k=1}^{K}\mu_{% k}T_{k}(n)\right)^{2}\leq\frac{K}{n^{2}}\sum_{k=1}^{K}{\mathbb{E}}\left(\sum_{% j=1}^{T_{k}(n)}X_{kj}-\mu_{k}T_{k}(n)\right)^{2}=\frac{K}{n^{2}}\sum_{k=1}^{K}% \sigma_{k}^{2}{\mathbb{E}}T_{k}(n)blackboard_E ( over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG italic_K end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT blackboard_E ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG italic_K end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n )
=Kn2σk2𝔼Tk(n)+Kn2kk𝔼Tk(n)Knσk2+Kn2kk𝔼Tk(n),absent𝐾superscript𝑛2superscriptsubscript𝜎superscript𝑘2𝔼subscript𝑇superscript𝑘𝑛𝐾superscript𝑛2subscript𝑘superscript𝑘𝔼subscript𝑇𝑘𝑛𝐾𝑛superscriptsubscript𝜎superscript𝑘2𝐾superscript𝑛2subscript𝑘superscript𝑘𝔼subscript𝑇𝑘𝑛\displaystyle=\frac{K}{n^{2}}\sigma_{k^{*}}^{2}{\mathbb{E}}T_{k^{*}}(n)+\frac{% K}{n^{2}}\sum_{k\neq k^{*}}{\mathbb{E}}T_{k}(n)\leq\frac{K}{n}\sigma_{k^{*}}^{% 2}+\frac{K}{n^{2}}\sum_{k\neq k^{*}}{\mathbb{E}}T_{k}(n),= divide start_ARG italic_K end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) + divide start_ARG italic_K end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ≤ divide start_ARG italic_K end_ARG start_ARG italic_n end_ARG italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_K end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) , (39)

where the last inequality follows from the fact that Tk(n)nsubscript𝑇superscript𝑘𝑛𝑛T_{k^{*}}(n)\leq nitalic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ≤ italic_n.

We then analyze the second term on the RHS of (37). Note that, by Cauchy-Schwarz inequality,

𝔼(1nk=1KμkTk(n)1nk=1Kμk𝔼Tk(n))2=1n2𝔼(k=1KμkTk(n)k=1Kμk𝔼Tk(n))2𝔼superscript1𝑛superscriptsubscript𝑘1𝐾subscript𝜇𝑘subscript𝑇𝑘𝑛1𝑛superscriptsubscript𝑘1𝐾subscript𝜇𝑘𝔼subscript𝑇𝑘𝑛21superscript𝑛2𝔼superscriptsuperscriptsubscript𝑘1𝐾subscript𝜇𝑘subscript𝑇𝑘𝑛superscriptsubscript𝑘1𝐾subscript𝜇𝑘𝔼subscript𝑇𝑘𝑛2\displaystyle{\mathbb{E}}\left({1\over n}\sum_{k=1}^{K}\mu_{k}T_{k}(n)-{1\over n% }\sum_{k=1}^{K}\mu_{k}{\mathbb{E}}T_{k}(n)\right)^{2}={1\over n^{2}}{\mathbb{E% }}\left(\sum_{k=1}^{K}\mu_{k}T_{k}(n)-\sum_{k=1}^{K}\mu_{k}{\mathbb{E}}T_{k}(n% )\right)^{2}blackboard_E ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) - ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (40)
\displaystyle\leq Kn2k=1Kμk2𝔼[Tk(n)𝔼Tk(n)]2=Kn2k=1Kμk2Var[Tk(n)]𝐾superscript𝑛2superscriptsubscript𝑘1𝐾superscriptsubscript𝜇𝑘2𝔼superscriptdelimited-[]subscript𝑇𝑘𝑛𝔼subscript𝑇𝑘𝑛2𝐾superscript𝑛2superscriptsubscript𝑘1𝐾superscriptsubscript𝜇𝑘2Vardelimited-[]subscript𝑇𝑘𝑛\displaystyle{K\over n^{2}}\sum_{k=1}^{K}\mu_{k}^{2}{\mathbb{E}}\left[T_{k}(n)% -{\mathbb{E}}T_{k}(n)\right]^{2}={K\over n^{2}}\sum_{k=1}^{K}\mu_{k}^{2}{\rm Var% }\left[T_{k}(n)\right]divide start_ARG italic_K end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) - blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG italic_K end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Var [ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ]
\displaystyle\leq Kn2kkμk2Var[Tk(n)]+Kn2μk2Var[Tk(n)].𝐾superscript𝑛2subscript𝑘superscript𝑘superscriptsubscript𝜇𝑘2Vardelimited-[]subscript𝑇𝑘𝑛𝐾superscript𝑛2superscriptsubscript𝜇superscript𝑘2Vardelimited-[]subscript𝑇superscript𝑘𝑛\displaystyle{K\over n^{2}}\sum_{k\neq k^{*}}\mu_{k}^{2}{\rm Var}\left[T_{k}(n% )\right]+{K\over n^{2}}\mu_{k^{*}}^{2}{\rm Var}\left[T_{k^{*}}(n)\right].divide start_ARG italic_K end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Var [ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ] + divide start_ARG italic_K end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Var [ italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ] .

Furthermore,

Var[Tk(n)]=Var[nkkTk(n)]=Var[kkTk(n)]𝔼[kkTk(n)]2(K1)kk𝔼Tk2(n).Vardelimited-[]subscript𝑇superscript𝑘𝑛Vardelimited-[]𝑛subscript𝑘superscript𝑘subscript𝑇𝑘𝑛Vardelimited-[]subscript𝑘superscript𝑘subscript𝑇𝑘𝑛𝔼superscriptdelimited-[]subscript𝑘superscript𝑘subscript𝑇𝑘𝑛2𝐾1subscript𝑘superscript𝑘𝔼superscriptsubscript𝑇𝑘2𝑛\displaystyle{\rm Var}\left[T_{k^{*}}(n)\right]={\rm Var}\left[n-\sum_{k\neq k% ^{*}}T_{k}(n)\right]={\rm Var}\left[\sum_{k\neq k^{*}}T_{k}(n)\right]\leq{% \mathbb{E}}\left[\sum_{k\neq k^{*}}T_{k}(n)\right]^{2}\leq(K-1)\sum_{k\neq k^{% *}}{\mathbb{E}}T_{k}^{2}(n).roman_Var [ italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ] = roman_Var [ italic_n - ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ] = roman_Var [ ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ] ≤ blackboard_E [ ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ( italic_K - 1 ) ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) .

Therefore,

RHSof(40)Kn2kkμk2𝔼Tk2(n)+K(K1)n2μk2kk𝔼Tk2(n)K2n2μ¯2kk𝔼Tk2(n),RHSof40𝐾superscript𝑛2subscript𝑘superscript𝑘superscriptsubscript𝜇𝑘2𝔼superscriptsubscript𝑇𝑘2𝑛𝐾𝐾1superscript𝑛2superscriptsubscript𝜇superscript𝑘2subscript𝑘superscript𝑘𝔼superscriptsubscript𝑇𝑘2𝑛superscript𝐾2superscript𝑛2superscript¯𝜇2subscript𝑘superscript𝑘𝔼superscriptsubscript𝑇𝑘2𝑛\displaystyle{\rm RHS\ of\ }(\ref{eqn:Temp12})\leq{K\over n^{2}}\sum_{k\neq k^% {*}}\mu_{k}^{2}{\mathbb{E}}T_{k}^{2}(n)+{K(K-1)\over n^{2}}\mu_{k^{*}}^{2}\sum% _{k\neq k^{*}}{\mathbb{E}}T_{k}^{2}(n)\leq\frac{K^{2}}{n^{2}}\bar{\mu}^{2}\sum% _{k\neq k^{*}}{\mathbb{E}}T_{k}^{2}(n),roman_RHS roman_of ( ) ≤ divide start_ARG italic_K end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) + divide start_ARG italic_K ( italic_K - 1 ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) ≤ divide start_ARG italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG over¯ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) , (41)

where μ¯2=max{μ12,,μK2}superscript¯𝜇2superscriptsubscript𝜇12superscriptsubscript𝜇𝐾2\bar{\mu}^{2}=\max\{\mu_{1}^{2},...,\mu_{K}^{2}\}over¯ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = roman_max { italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … , italic_μ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }.

Combining (37), (A.5)-(41) and Proposition 1 yields

Var[M~n]Vardelimited-[]subscript~𝑀𝑛absent\displaystyle{\rm Var}\left[\widetilde{M}_{n}\right]\leqroman_Var [ over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] ≤ 2Knσk2+2Kn2kk𝔼Tk(n)+2K2μ¯2n2kk𝔼Tk2(n)2𝐾𝑛superscriptsubscript𝜎superscript𝑘22𝐾superscript𝑛2subscript𝑘superscript𝑘𝔼subscript𝑇𝑘𝑛2superscript𝐾2superscript¯𝜇2superscript𝑛2subscript𝑘superscript𝑘𝔼superscriptsubscript𝑇𝑘2𝑛\displaystyle\frac{2K}{n}\sigma_{k^{*}}^{2}+\frac{2K}{n^{2}}\sum_{k\neq k^{*}}% {\mathbb{E}}T_{k}(n)+\frac{2K^{2}\bar{\mu}^{2}}{n^{2}}\sum_{k\neq k^{*}}{% \mathbb{E}}T_{k}^{2}(n)divide start_ARG 2 italic_K end_ARG start_ARG italic_n end_ARG italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_K end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) + divide start_ARG 2 italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n )
\displaystyle\leq 2Knσk2+2Kn2kk(8νnΔk2+5)+2K2μ¯2n2kk{8νnΔk2+5}22𝐾𝑛superscriptsubscript𝜎superscript𝑘22𝐾superscript𝑛2subscript𝑘superscript𝑘8subscript𝜈𝑛superscriptsubscriptΔ𝑘252superscript𝐾2superscript¯𝜇2superscript𝑛2subscript𝑘superscript𝑘superscript8subscript𝜈𝑛superscriptsubscriptΔ𝑘252\displaystyle\frac{2K}{n}\sigma_{k^{*}}^{2}+\frac{2K}{n^{2}}\sum_{k\neq k^{*}}% \left(\frac{8\nu_{n}}{\Delta_{k}^{2}}+5\right)+\frac{2K^{2}\bar{\mu}^{2}}{n^{2% }}\sum_{k\neq k^{*}}\left\{\frac{8\nu_{n}}{\Delta_{k}^{2}}+5\right\}^{2}divide start_ARG 2 italic_K end_ARG start_ARG italic_n end_ARG italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_K end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 5 ) + divide start_ARG 2 italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 5 } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
\displaystyle\leq 2Knσk2+2K2cνnn2+2K3μ¯2n2c2νn22Knσk2+2K3(μ¯2+1)n2c2νn2,2𝐾𝑛superscriptsubscript𝜎superscript𝑘22superscript𝐾2𝑐subscript𝜈𝑛superscript𝑛22superscript𝐾3superscript¯𝜇2superscript𝑛2superscript𝑐2superscriptsubscript𝜈𝑛22𝐾𝑛superscriptsubscript𝜎superscript𝑘22superscript𝐾3superscript¯𝜇21superscript𝑛2superscript𝑐2superscriptsubscript𝜈𝑛2\displaystyle\frac{2K}{n}\sigma_{k^{*}}^{2}+\frac{2K^{2}c\nu_{n}}{n^{2}}+\frac% {2K^{3}\bar{\mu}^{2}}{n^{2}}c^{2}\nu_{n}^{2}\leq\frac{2K}{n}\sigma_{k^{*}}^{2}% +\frac{2K^{3}(\bar{\mu}^{2}+1)}{n^{2}}c^{2}\nu_{n}^{2},divide start_ARG 2 italic_K end_ARG start_ARG italic_n end_ARG italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 2 italic_K start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT over¯ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 2 italic_K end_ARG start_ARG italic_n end_ARG italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 2 italic_K start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ( over¯ start_ARG italic_μ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_c start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (42)

where the third inequality is because of the condition 8νnΔk2+5cνn8subscript𝜈𝑛superscriptsubscriptΔ𝑘25𝑐subscript𝜈𝑛\frac{8\nu_{n}}{\Delta_{k}^{2}}+5\leq c\nu_{n}divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 5 ≤ italic_c italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for kk𝑘superscript𝑘k\neq k^{*}italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

Incorporating the bias in (36), the variance in (A.5), and the condition νnn1/2δsubscript𝜈𝑛superscript𝑛12𝛿\nu_{n}\leq n^{1/2-\delta}italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_n start_POSTSUPERSCRIPT 1 / 2 - italic_δ end_POSTSUPERSCRIPT with 0<δ<1/20𝛿120<\delta<1/20 < italic_δ < 1 / 2, we have

MSE(M~n)=Var[M~n]+Bias(M~In)22Knσk2+o(1n).MSEsubscript~𝑀𝑛Vardelimited-[]subscript~𝑀𝑛Biassuperscriptsubscript~𝑀superscriptsubscript𝐼𝑛22𝐾𝑛superscriptsubscript𝜎superscript𝑘2𝑜1𝑛\displaystyle{\rm MSE}\left(\widetilde{M}_{n}\right)={\rm Var}\left[\widetilde% {M}_{n}\right]+{\rm Bias}\left(\widetilde{M}_{I_{n}^{*}}\right)^{2}\leq\frac{2% K}{n}\sigma_{k^{*}}^{2}+o\left(\frac{1}{n}\right).roman_MSE ( over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = roman_Var [ over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] + roman_Bias ( over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 2 italic_K end_ARG start_ARG italic_n end_ARG italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_o ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) .

A.6 Proof of Theorem 5

To prove the result, we first define a martingale difference array and introduce a lemma on asymptotic normality for martingale different arrays.

Definition A1.

(Definition 16.1.1 in Athreya and Lahiri (2006)) Let {Yn}n1subscriptsubscript𝑌𝑛𝑛1\left\{Y_{n}\right\}_{n\geq 1}{ italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ≥ 1 end_POSTSUBSCRIPT be a collection of random variables on a probability space (Ω,,)Ω(\Omega,{\cal F},{\mathbb{P}})( roman_Ω , caligraphic_F , blackboard_P ) and let {n}n1subscriptsubscript𝑛𝑛1\left\{{\cal F}_{n}\right\}_{n\geq 1}{ caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ≥ 1 end_POSTSUBSCRIPT be a filtration. Then, {Yn,n}n1subscriptsubscript𝑌𝑛subscript𝑛𝑛1\left\{Y_{n},{\cal F}_{n}\right\}_{n\geq 1}{ italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ≥ 1 end_POSTSUBSCRIPT is called a martingale difference array if Ynsubscript𝑌𝑛Y_{n}italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is nsubscript𝑛{\cal F}_{n}caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT-measurable and 𝔼[Yn|n1]=0𝔼delimited-[]conditionalsubscript𝑌𝑛subscript𝑛10{\mathbb{E}}[Y_{n}|{\cal F}_{n-1}]=0blackboard_E [ italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ] = 0 for each n1𝑛1n\geq 1italic_n ≥ 1.

Lemma A4 (Theorem 16.1.1 of Athreya and Lahiri (2006)).

For each n1𝑛1n\geq 1italic_n ≥ 1, let {Yni,ni}i1subscriptsubscript𝑌𝑛𝑖subscript𝑛𝑖𝑖1\left\{Y_{ni},{\cal F}_{ni}\right\}_{i\geq 1}{ italic_Y start_POSTSUBSCRIPT italic_n italic_i end_POSTSUBSCRIPT , caligraphic_F start_POSTSUBSCRIPT italic_n italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ≥ 1 end_POSTSUBSCRIPT be a martingale difference array on (Ω,,)Ω(\Omega,{\cal F},{\mathbb{P}})( roman_Ω , caligraphic_F , blackboard_P ), with 𝔼|Yni|2<𝔼superscriptsubscript𝑌𝑛𝑖2{\mathbb{E}}|Y_{ni}|^{2}<\inftyblackboard_E | italic_Y start_POSTSUBSCRIPT italic_n italic_i end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < ∞ for all n1𝑛1n\geq 1italic_n ≥ 1 and let τnsubscript𝜏𝑛\tau_{n}italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be a finite stopping time w.r.t. nisubscript𝑛𝑖{\cal F}_{ni}caligraphic_F start_POSTSUBSCRIPT italic_n italic_i end_POSTSUBSCRIPT. Suppose that for some constant σ2>0superscript𝜎20\sigma^{2}>0italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > 0, as n𝑛n\rightarrow\inftyitalic_n → ∞,

i=1τn𝔼[|Yni|2|ni]σ2,inprobability,superscriptsubscript𝑖1subscript𝜏𝑛𝔼delimited-[]conditionalsuperscriptsubscript𝑌𝑛𝑖2subscript𝑛𝑖superscript𝜎2𝑖𝑛𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦\displaystyle\sum_{i=1}^{\tau_{n}}{\mathbb{E}}\left[\left.|Y_{ni}|^{2}\right|{% \cal F}_{ni}\right]\to\sigma^{2},\ in\ probability,∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT blackboard_E [ | italic_Y start_POSTSUBSCRIPT italic_n italic_i end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n italic_i end_POSTSUBSCRIPT ] → italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_i italic_n italic_p italic_r italic_o italic_b italic_a italic_b italic_i italic_l italic_i italic_t italic_y ,

and that for each ε>0𝜀0\varepsilon>0italic_ε > 0,

i=1τn𝔼[|Yni|2𝟙{|Yni|>ε}|ni]0,inprobability,superscriptsubscript𝑖1subscript𝜏𝑛𝔼delimited-[]conditionalsuperscriptsubscript𝑌𝑛𝑖2subscript1subscript𝑌𝑛𝑖𝜀subscript𝑛𝑖0𝑖𝑛𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦\displaystyle\sum_{i=1}^{\tau_{n}}{\mathbb{E}}\left[\left.|Y_{ni}|^{2}\mathds{% 1}_{\left\{\left|Y_{ni}\right|>\varepsilon\right\}}\right|{\cal F}_{ni}\right]% \to 0,\ in\ probability,∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT blackboard_E [ | italic_Y start_POSTSUBSCRIPT italic_n italic_i end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_1 start_POSTSUBSCRIPT { | italic_Y start_POSTSUBSCRIPT italic_n italic_i end_POSTSUBSCRIPT | > italic_ε } end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n italic_i end_POSTSUBSCRIPT ] → 0 , italic_i italic_n italic_p italic_r italic_o italic_b italic_a italic_b italic_i italic_l italic_i italic_t italic_y ,

Then,

i=1τnYniN(0,σ2),asn.formulae-sequencesuperscriptsubscript𝑖1subscript𝜏𝑛subscript𝑌𝑛𝑖𝑁0superscript𝜎2𝑎𝑠𝑛\displaystyle\sum_{i=1}^{\tau_{n}}Y_{ni}\Rightarrow N(0,\sigma^{2}),\ as\ n% \rightarrow\infty.∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_Y start_POSTSUBSCRIPT italic_n italic_i end_POSTSUBSCRIPT ⇒ italic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , italic_a italic_s italic_n → ∞ .

Proof of Theorem 5. By (7),

n(M~nμ)=Znn+n(1nj=1nμIjμ).𝑛subscript~𝑀𝑛superscript𝜇subscript𝑍𝑛𝑛𝑛1𝑛superscriptsubscript𝑗1𝑛subscript𝜇subscript𝐼𝑗superscript𝜇\displaystyle\sqrt{n}\left(\widetilde{M}_{n}-\mu^{*}\right)=\frac{Z_{n}}{\sqrt% {n}}+\sqrt{n}\left(\frac{1}{n}\sum_{j=1}^{n}\mu_{I_{j}}-\mu^{*}\right).square-root start_ARG italic_n end_ARG ( over~ start_ARG italic_M end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG + square-root start_ARG italic_n end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) . (43)

We analyze the two terms on the RHS of (43) separately. We have shown that Znsubscript𝑍𝑛Z_{n}italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is a martingale. So ZnZn1=XIn,nμInsubscript𝑍𝑛subscript𝑍𝑛1subscript𝑋subscript𝐼𝑛𝑛subscript𝜇subscript𝐼𝑛Z_{n}-Z_{n-1}=X_{I_{n},n}-\mu_{I_{n}}italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_Z start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT = italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_n end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT is a martingale difference array (see Definition A1). To show the asymptotic normality of Zn/nsubscript𝑍𝑛𝑛Z_{n}/\sqrt{n}italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / square-root start_ARG italic_n end_ARG, it suffices to prove that the two conditions of Lemma A4 are satisfied for Ynj=(XIj,jμIj)/nsubscript𝑌𝑛𝑗subscript𝑋subscript𝐼𝑗𝑗subscript𝜇subscript𝐼𝑗𝑛Y_{nj}=\left(X_{I_{j},j}-\mu_{I_{j}}\right)/\sqrt{n}italic_Y start_POSTSUBSCRIPT italic_n italic_j end_POSTSUBSCRIPT = ( italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) / square-root start_ARG italic_n end_ARG, τn=nsubscript𝜏𝑛𝑛\tau_{n}=nitalic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_n and nj=jsubscript𝑛𝑗subscript𝑗{\cal F}_{nj}={\cal F}_{j}caligraphic_F start_POSTSUBSCRIPT italic_n italic_j end_POSTSUBSCRIPT = caligraphic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

First, note that

j=1n𝔼[|(XIj,jμIj)/n|2|j1]=1nj=1n𝔼[XIj,j22μIjXIj,j+μIj2|j1]superscriptsubscript𝑗1𝑛𝔼delimited-[]conditionalsuperscriptsubscript𝑋subscript𝐼𝑗𝑗subscript𝜇subscript𝐼𝑗𝑛2subscript𝑗11𝑛superscriptsubscript𝑗1𝑛𝔼delimited-[]superscriptsubscript𝑋subscript𝐼𝑗𝑗22subscript𝜇subscript𝐼𝑗subscript𝑋subscript𝐼𝑗𝑗conditionalsuperscriptsubscript𝜇subscript𝐼𝑗2subscript𝑗1\displaystyle\sum_{j=1}^{n}{\mathbb{E}}\left[\left.\left|\left(X_{I_{j},j}-\mu% _{I_{j}}\right)/\sqrt{n}\right|^{2}\right|{\cal F}_{j-1}\right]=\frac{1}{n}% \sum_{j=1}^{n}{\mathbb{E}}\left[\left.X_{I_{j},j}^{2}-2\mu_{I_{j}}X_{I_{j},j}+% \mu_{I_{j}}^{2}\right|{\cal F}_{j-1}\right]∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E [ | ( italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) / square-root start_ARG italic_n end_ARG | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ] = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E [ italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT + italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ]
=\displaystyle== 1nj=1n{𝔼[XIj,j2|j1]2μIj𝔼[XIj,j|j1]+μIj2}=1nj=1n{𝔼[XIj,j2|j1]μIj2}1𝑛superscriptsubscript𝑗1𝑛𝔼delimited-[]conditionalsuperscriptsubscript𝑋subscript𝐼𝑗𝑗2subscript𝑗12subscript𝜇subscript𝐼𝑗𝔼delimited-[]conditionalsubscript𝑋subscript𝐼𝑗𝑗subscript𝑗1superscriptsubscript𝜇subscript𝐼𝑗21𝑛superscriptsubscript𝑗1𝑛𝔼delimited-[]conditionalsuperscriptsubscript𝑋subscript𝐼𝑗𝑗2subscript𝑗1superscriptsubscript𝜇subscript𝐼𝑗2\displaystyle\frac{1}{n}\sum_{j=1}^{n}\left\{{\mathbb{E}}\left[\left.X_{I_{j},% j}^{2}\right|{\cal F}_{j-1}\right]-2\mu_{I_{j}}{\mathbb{E}}\left[\left.X_{I_{j% },j}\right|{\cal F}_{j-1}\right]+\mu_{I_{j}}^{2}\right\}=\frac{1}{n}\sum_{j=1}% ^{n}\left\{{\mathbb{E}}\left[\left.X_{I_{j},j}^{2}\right|{\cal F}_{j-1}\right]% -\mu_{I_{j}}^{2}\right\}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT { blackboard_E [ italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ] - 2 italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_E [ italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ] + italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT { blackboard_E [ italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ] - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }
=\displaystyle== 1nj=1nσIj2=1nk=1KTk(n)σk2a.s.σk2,asn,\displaystyle\frac{1}{n}\sum_{j=1}^{n}\sigma_{I_{j}}^{2}=\frac{1}{n}\sum_{k=1}% ^{K}T_{k}(n)\sigma_{k}^{2}\overset{a.s.}{\longrightarrow}\sigma_{k^{*}}^{2},\ % {\rm as}\ n\to\infty,divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_OVERACCENT italic_a . italic_s . end_OVERACCENT start_ARG ⟶ end_ARG italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , roman_as italic_n → ∞ ,

where the convergence follows from Theorem 1.

Second, by Cauchy-Schwarz inequality,

j=1n𝔼[|(XIj,jμIj)/n|2𝟙{|(XIj,jμIj)/n|>ε}|j1]superscriptsubscript𝑗1𝑛𝔼delimited-[]conditionalsuperscriptsubscript𝑋subscript𝐼𝑗𝑗subscript𝜇subscript𝐼𝑗𝑛21subscript𝑋subscript𝐼𝑗𝑗subscript𝜇subscript𝐼𝑗𝑛𝜀subscript𝑗1\displaystyle\sum_{j=1}^{n}{\mathbb{E}}\left[\left.\left|\left(X_{I_{j},j}-\mu% _{I_{j}}\right)/\sqrt{n}\right|^{2}\mathbbm{1}\left\{\left|\left(X_{I_{j},j}-% \mu_{I_{j}}\right)/\sqrt{n}\right|>\varepsilon\right\}\right|{\cal F}_{j-1}\right]∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_E [ | ( italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) / square-root start_ARG italic_n end_ARG | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_1 { | ( italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) / square-root start_ARG italic_n end_ARG | > italic_ε } | caligraphic_F start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ]
\displaystyle\leq j=1n(𝔼[|(XIj,jμIj)/n|4|j1])12(𝔼[𝟙{|(XIj,jμIj)/n|>ε}|j1])12superscriptsubscript𝑗1𝑛superscript𝔼delimited-[]conditionalsuperscriptsubscript𝑋subscript𝐼𝑗𝑗subscript𝜇subscript𝐼𝑗𝑛4subscript𝑗112superscript𝔼delimited-[]conditional1subscript𝑋subscript𝐼𝑗𝑗subscript𝜇subscript𝐼𝑗𝑛𝜀subscript𝑗112\displaystyle\sum_{j=1}^{n}\left({\mathbb{E}}\left[\left.\left|\left(X_{I_{j},% j}-\mu_{I_{j}}\right)/\sqrt{n}\right|^{4}\right|{\cal F}_{j-1}\right]\right)^{% \frac{1}{2}}\left({\mathbb{E}}\left[\left.\mathbbm{1}\left\{\left|\left(X_{I_{% j},j}-\mu_{I_{j}}\right)/\sqrt{n}\right|>\varepsilon\right\}\right|{\cal F}_{j% -1}\right]\right)^{\frac{1}{2}}∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( blackboard_E [ | ( italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) / square-root start_ARG italic_n end_ARG | start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ( blackboard_E [ blackboard_1 { | ( italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) / square-root start_ARG italic_n end_ARG | > italic_ε } | caligraphic_F start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
=\displaystyle== 1nj=1n(𝔼[|XIj,jμIj|4|j1])12(|(XIj,jμIj)|>εn|j1)121𝑛superscriptsubscript𝑗1𝑛superscript𝔼delimited-[]conditionalsuperscriptsubscript𝑋subscript𝐼𝑗𝑗subscript𝜇subscript𝐼𝑗4subscript𝑗112superscriptsubscript𝑋subscript𝐼𝑗𝑗subscript𝜇subscript𝐼𝑗conditional𝜀𝑛subscript𝑗112\displaystyle\frac{1}{n}\sum_{j=1}^{n}\left({\mathbb{E}}\left[\left.\left|X_{I% _{j},j}-\mu_{I_{j}}\right|^{4}\right|{\cal F}_{j-1}\right]\right)^{\frac{1}{2}% }{\mathbb{P}}\left(\left.\left|\left(X_{I_{j},j}-\mu_{I_{j}}\right)\right|>% \varepsilon\sqrt{n}\right|{\cal F}_{j-1}\right)^{\frac{1}{2}}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( blackboard_E [ | italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT blackboard_P ( | ( italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) | > italic_ε square-root start_ARG italic_n end_ARG | caligraphic_F start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
=\displaystyle== 1nj=1nϑIj2(|(XIj,jμIj)|>εn|j1)121nj=1nϑIj2((ε2n)1𝔼[|XIj,jμIj|2|j1])121𝑛superscriptsubscript𝑗1𝑛superscriptsubscriptitalic-ϑsubscript𝐼𝑗2superscriptsubscript𝑋subscript𝐼𝑗𝑗subscript𝜇subscript𝐼𝑗conditional𝜀𝑛subscript𝑗1121𝑛superscriptsubscript𝑗1𝑛superscriptsubscriptitalic-ϑsubscript𝐼𝑗2superscriptsuperscriptsuperscript𝜀2𝑛1𝔼delimited-[]conditionalsuperscriptsubscript𝑋subscript𝐼𝑗𝑗subscript𝜇subscript𝐼𝑗2subscript𝑗112\displaystyle\frac{1}{n}\sum_{j=1}^{n}\vartheta_{I_{j}}^{2}{\mathbb{P}}\left(% \left.\left|\left(X_{I_{j},j}-\mu_{I_{j}}\right)\right|>\varepsilon\sqrt{n}% \right|{\cal F}_{j-1}\right)^{\frac{1}{2}}\leq\frac{1}{n}\sum_{j=1}^{n}% \vartheta_{I_{j}}^{2}\left((\varepsilon^{2}n)^{-1}{\mathbb{E}}\left[\left.% \left|X_{I_{j},j}-\mu_{I_{j}}\right|^{2}\right|{\cal F}_{j-1}\right]\right)^{% \frac{1}{2}}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϑ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_P ( | ( italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) | > italic_ε square-root start_ARG italic_n end_ARG | caligraphic_F start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϑ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT blackboard_E [ | italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
=\displaystyle== 1nj=1nϑIj2((ε2n)1σIj2)12ϑ¯2σ¯εn0, as n,formulae-sequence1𝑛superscriptsubscript𝑗1𝑛superscriptsubscriptitalic-ϑsubscript𝐼𝑗2superscriptsuperscriptsuperscript𝜀2𝑛1superscriptsubscript𝜎subscript𝐼𝑗212superscript¯italic-ϑ2¯𝜎𝜀𝑛0 as 𝑛\displaystyle\frac{1}{n}\sum_{j=1}^{n}\vartheta_{I_{j}}^{2}\left((\varepsilon^% {2}n)^{-1}\sigma_{I_{j}}^{2}\right)^{\frac{1}{2}}\leq\frac{\bar{\vartheta}^{2}% \bar{\sigma}}{\varepsilon\sqrt{n}}\to 0,\mbox{ as }n\to\infty,divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϑ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ≤ divide start_ARG over¯ start_ARG italic_ϑ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¯ start_ARG italic_σ end_ARG end_ARG start_ARG italic_ε square-root start_ARG italic_n end_ARG end_ARG → 0 , as italic_n → ∞ ,

where ϑk4=𝔼[|Xkμk|4]superscriptsubscriptitalic-ϑ𝑘4𝔼delimited-[]superscriptsubscript𝑋𝑘subscript𝜇𝑘4\vartheta_{k}^{4}={\mathbb{E}}[|X_{k}-\mu_{k}|^{4}]italic_ϑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = blackboard_E [ | italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ], ϑ¯4=max{ϑ14,,ϑK4}superscript¯italic-ϑ4superscriptsubscriptitalic-ϑ14superscriptsubscriptitalic-ϑ𝐾4\bar{\vartheta}^{4}=\max\{\vartheta_{1}^{4},...,\vartheta_{K}^{4}\}over¯ start_ARG italic_ϑ end_ARG start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = roman_max { italic_ϑ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT , … , italic_ϑ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT } and σ¯2=max{σ12,,σK2}superscript¯𝜎2superscriptsubscript𝜎12superscriptsubscript𝜎𝐾2\bar{\sigma}^{2}=\max\{\sigma_{1}^{2},...,\sigma_{K}^{2}\}over¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = roman_max { italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … , italic_σ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }.

Therefore, by Lemma A4, we have

Zn/nN(0,σk2),asn.formulae-sequencesubscript𝑍𝑛𝑛𝑁0superscriptsubscript𝜎superscript𝑘2𝑎𝑠𝑛\displaystyle Z_{n}/\sqrt{n}\Rightarrow N(0,\sigma_{k^{*}}^{2}),\ \ as\ \ n% \rightarrow\infty.italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / square-root start_ARG italic_n end_ARG ⇒ italic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , italic_a italic_s italic_n → ∞ .

It remains to prove that the second term on the RHS of (43) converges to 0 in probability. Note that

1nj=1nμIjμk1𝑛superscriptsubscript𝑗1𝑛subscript𝜇subscript𝐼𝑗subscript𝜇superscript𝑘\displaystyle\frac{1}{n}\sum_{j=1}^{n}\mu_{I_{j}}-\mu_{k^{*}}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT =\displaystyle== 1nk=1KTk(n)μkμk=kkTk(n)nμk+(Tk(n)n1)μk1𝑛superscriptsubscript𝑘1𝐾subscript𝑇𝑘𝑛subscript𝜇𝑘subscript𝜇superscript𝑘subscript𝑘superscript𝑘subscript𝑇𝑘𝑛𝑛subscript𝜇𝑘subscript𝑇superscript𝑘𝑛𝑛1subscript𝜇superscript𝑘\displaystyle\frac{1}{n}\sum_{k=1}^{K}T_{k}(n)\mu_{k}-\mu_{k^{*}}=\sum_{k\neq k% ^{*}}\frac{T_{k}(n)}{n}\mu_{k}+\left(\frac{T_{k^{*}}(n)}{n}-1\right)\mu_{k^{*}}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ( divide start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG - 1 ) italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT
=\displaystyle== kkTk(n)nμkkkTk(n)nμk=kkTk(n)n(μkμk).subscript𝑘superscript𝑘subscript𝑇𝑘𝑛𝑛subscript𝜇𝑘subscript𝑘superscript𝑘subscript𝑇𝑘𝑛𝑛subscript𝜇superscript𝑘subscript𝑘superscript𝑘subscript𝑇𝑘𝑛𝑛subscript𝜇𝑘subscript𝜇superscript𝑘\displaystyle\sum_{k\neq k^{*}}\frac{T_{k}(n)}{n}\mu_{k}-\sum_{k\neq k^{*}}% \frac{T_{k}(n)}{n}\mu_{k^{*}}=\sum_{k\neq k^{*}}\frac{T_{k}(n)}{n}\left(\mu_{k% }-\mu_{k^{*}}\right).∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_n end_ARG ( italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) .

Therefore, the first moment of the second term on the RHS of (43) satisfies

𝔼|n(1nj=1nμIjμk)|=𝔼|kkTk(n)n(μkμk)|kk𝔼Tk(n)nΔk0.𝔼𝑛1𝑛superscriptsubscript𝑗1𝑛subscript𝜇subscript𝐼𝑗subscript𝜇superscript𝑘𝔼subscript𝑘superscript𝑘subscript𝑇𝑘𝑛𝑛subscript𝜇𝑘subscript𝜇superscript𝑘subscript𝑘superscript𝑘𝔼subscript𝑇𝑘𝑛𝑛subscriptΔ𝑘0\displaystyle{\mathbb{E}}\left|\sqrt{n}\left(\frac{1}{n}\sum_{j=1}^{n}\mu_{I_{% j}}-\mu_{k^{*}}\right)\right|={\mathbb{E}}\left|\sum_{k\neq k^{*}}\frac{T_{k}(% n)}{\sqrt{n}}\left(\mu_{k}-\mu_{k^{*}}\right)\right|\leq\sum_{k\neq k^{*}}% \frac{{\mathbb{E}}T_{k}(n)}{\sqrt{n}}\Delta_{k}\to 0.blackboard_E | square-root start_ARG italic_n end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) | = blackboard_E | ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ( italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) | ≤ ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → 0 .

where the convergence follows from Proposition 1 with p=1𝑝1p=1italic_p = 1 and the condition νn[αlogn,n12δ]subscript𝜈𝑛𝛼𝑛superscript𝑛12𝛿\nu_{n}\in\left[\alpha\log n,n^{\frac{1}{2}-\delta}\right]italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ [ italic_α roman_log italic_n , italic_n start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_δ end_POSTSUPERSCRIPT ] (0<δ<1,α4γ¯2)formulae-sequence0𝛿1𝛼4superscript¯𝛾2(0<\delta<1,\alpha\geq 4\bar{\gamma}^{2})( 0 < italic_δ < 1 , italic_α ≥ 4 over¯ start_ARG italic_γ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).

Applying Slutsky’s Theorem to the RHS of (43) leads to the conclusion.

A.7 Proof of Proposition 2

Note that

σk2=𝔼Xk2(𝔼Xk)2.superscriptsubscript𝜎𝑘2𝔼superscriptsubscript𝑋𝑘2superscript𝔼subscript𝑋𝑘2\displaystyle\sigma_{k}^{2}={\mathbb{E}}X_{k}^{2}-\left({\mathbb{E}}X_{k}% \right)^{2}.italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = blackboard_E italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ( blackboard_E italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (44)

We first prove that σ^k2superscriptsubscript^𝜎𝑘2\hat{\sigma}_{k}^{2}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT converges to σk2superscriptsubscript𝜎𝑘2\sigma_{k}^{2}italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT in probability. From Theorem 2 of Lai and Robbins (1985), it follows that for kk𝑘superscript𝑘k\neq k^{*}italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, Tk(n)subscript𝑇𝑘𝑛T_{k}(n)\to\inftyitalic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) → ∞ in probability; and from Theorem 1, it follows that Tk(n)subscript𝑇superscript𝑘𝑛T_{k^{*}}(n)\to\inftyitalic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) → ∞ a.s. By Theorem 2 in Richter (1965), we have that for k=1,,K𝑘1𝐾k=1,...,Kitalic_k = 1 , … , italic_K,

1Tk(n)j=1Tk(n)Xk,j2𝔼Xk2,and1Tk(n)j=1Tk(n)Xk,j𝔼Xk.1subscript𝑇𝑘𝑛superscriptsubscript𝑗1subscript𝑇𝑘𝑛superscriptsubscript𝑋𝑘𝑗2𝔼superscriptsubscript𝑋𝑘2and1subscript𝑇𝑘𝑛superscriptsubscript𝑗1subscript𝑇𝑘𝑛subscript𝑋𝑘𝑗𝔼subscript𝑋𝑘\displaystyle\frac{1}{T_{k}(n)}\sum_{j=1}^{T_{k}(n)}X_{k,j}^{2}\overset{{% \mathbb{P}}}{\longrightarrow}{\mathbb{E}}X_{k}^{2},\ \ {\rm and}\ \ \frac{1}{T% _{k}(n)}\sum_{j=1}^{T_{k}(n)}X_{k,j}\overset{{\mathbb{P}}}{\longrightarrow}{% \mathbb{E}}X_{k}.divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT overblackboard_P start_ARG ⟶ end_ARG blackboard_E italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , roman_and divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT overblackboard_P start_ARG ⟶ end_ARG blackboard_E italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT .

By the continuous mapping theorem, it follows that σ^k2σk2superscriptsubscript^𝜎𝑘2superscriptsubscript𝜎𝑘2\hat{\sigma}_{k}^{2}\to\sigma_{k}^{2}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT in probability.

Then by Theorem 1 and the continuous mapping theorem, σ~n2superscriptsubscript~𝜎𝑛2\widetilde{\sigma}_{n}^{2}over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT converges to σk2superscriptsubscript𝜎superscript𝑘2\sigma_{k^{*}}^{2}italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT in probability.

A.8 Proof of Theorem 6

By Theorem 1, we have

Tk(n)a.s..\displaystyle T_{k^{*}}(n)\overset{a.s.}{\longrightarrow}\infty.italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) start_OVERACCENT italic_a . italic_s . end_OVERACCENT start_ARG ⟶ end_ARG ∞ . (45)

By the strong law of large numbers (SLLN) and Theorem 1 in Richter (1965) or Theorem 8.2 in Chapter 6 of Gut (2013), we have

1Tk(n)j=1Tk(n)Xk,ja.s.μ,asn.\displaystyle\frac{1}{T_{k^{*}}(n)}\sum_{j=1}^{T_{k^{*}}(n)}X_{k^{*},j}% \overset{a.s.}{\longrightarrow}\mu^{*},\ \ {\rm as}\ \ n\to\infty.divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_j end_POSTSUBSCRIPT start_OVERACCENT italic_a . italic_s . end_OVERACCENT start_ARG ⟶ end_ARG italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , roman_as italic_n → ∞ .

Denote Mk=1Tk(n)j=1Tk(n)Xk,jsubscript𝑀𝑘1subscript𝑇𝑘𝑛superscriptsubscript𝑗1subscript𝑇𝑘𝑛subscript𝑋𝑘𝑗M_{k}=\frac{1}{T_{k}(n)}\sum_{j=1}^{T_{k}(n)}X_{k,j}italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT. The definition of almost sure convergence says that

(ωΩ:0<ε<1,N=N(ω,ε),s.t.n>N,|Mkμ|<ε)=1.\displaystyle{\mathbb{P}}\left(\omega\in\Omega:\forall 0<\varepsilon<1,\exists N% ^{\prime}=N^{\prime}(\omega,\varepsilon)\in{\mathbb{N}},s.t.\ \forall n>N^{% \prime},\left|M_{k^{*}}-\mu^{*}\right|<\varepsilon\right)=1.blackboard_P ( italic_ω ∈ roman_Ω : ∀ 0 < italic_ε < 1 , ∃ italic_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ω , italic_ε ) ∈ blackboard_N , italic_s . italic_t . ∀ italic_n > italic_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , | italic_M start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | < italic_ε ) = 1 . (46)

From (35), it follows that

(ωΩ:0<ε<1,N=N(ω,ε),s.t.n>N,MIn=Mk)=1.\displaystyle{\mathbb{P}}\left(\omega\in\Omega:\forall 0<\varepsilon<1,\exists N% =N(\omega,\varepsilon)\in{\mathbb{N}},s.t.\ \forall n>N,M_{I^{*}_{n}}=M_{k^{*}% }\right)=1.blackboard_P ( italic_ω ∈ roman_Ω : ∀ 0 < italic_ε < 1 , ∃ italic_N = italic_N ( italic_ω , italic_ε ) ∈ blackboard_N , italic_s . italic_t . ∀ italic_n > italic_N , italic_M start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_M start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = 1 . (47)

Combining (46) and (47) leads to

(ωΩ:0<ε<1,N¯=max{N,N},s.t.n>N¯,|MInμ|<ε)=1,\displaystyle{\mathbb{P}}\left(\omega\in\Omega:\forall 0<\varepsilon<1,\exists% \bar{N}=\max\{N,N^{\prime}\}\in{\mathbb{N}},s.t.\ \forall n>\bar{N},\left|M_{I% ^{*}_{n}}-\mu^{*}\right|<\varepsilon\right)=1,blackboard_P ( italic_ω ∈ roman_Ω : ∀ 0 < italic_ε < 1 , ∃ over¯ start_ARG italic_N end_ARG = roman_max { italic_N , italic_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } ∈ blackboard_N , italic_s . italic_t . ∀ italic_n > over¯ start_ARG italic_N end_ARG , | italic_M start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | < italic_ε ) = 1 ,

implying that

MIna.s.μ,asn.\displaystyle M_{I^{*}_{n}}\overset{a.s.}{\rightarrow}\mu^{*},\ \ {\rm as}\ \ % n\to\infty.italic_M start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_OVERACCENT italic_a . italic_s . end_OVERACCENT start_ARG → end_ARG italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , roman_as italic_n → ∞ .

A.9 Proof of Theorem 7

For notational ease, denote

Zn,k=j=1Tk(n)(Xk,jμk),k=1,,K.formulae-sequencesubscript𝑍𝑛𝑘superscriptsubscript𝑗1subscript𝑇𝑘𝑛subscript𝑋𝑘𝑗subscript𝜇𝑘𝑘1𝐾Z_{n,k}=\sum_{j=1}^{T_{k}(n)}\left(X_{k,j}-\mu_{k}\right),\quad k=1,\ldots,K.italic_Z start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , italic_k = 1 , … , italic_K .

Recall that sub-Gaussian distributions have finite moments up to any order. By Assumption 1 and the fact that Tk(n)nsubscript𝑇𝑘𝑛𝑛T_{k}(n)\leq nitalic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ≤ italic_n, it can be seen that {Xk,j,j1}subscript𝑋𝑘𝑗𝑗1\{X_{k,j},j\geq 1\}{ italic_X start_POSTSUBSCRIPT italic_k , italic_j end_POSTSUBSCRIPT , italic_j ≥ 1 } satisfies the conditions in Lemma A2 for any k=1,,K𝑘1𝐾k=1,\ldots,Kitalic_k = 1 , … , italic_K. We first note the following two facts that are useful in the proof. For any positive integer p𝑝pitalic_p, by Proposition 1,

Pr(Ink)kkPr(In=k)kkPr(Tk(n)n/K)Kp+1𝔼[Tk(n)p]/np=O(νnpnp+1n3),Prsuperscriptsubscript𝐼𝑛superscript𝑘subscript𝑘superscript𝑘Prsuperscriptsubscript𝐼𝑛𝑘subscript𝑘superscript𝑘Prsubscript𝑇𝑘𝑛𝑛𝐾superscript𝐾𝑝1𝔼delimited-[]subscript𝑇𝑘superscript𝑛𝑝superscript𝑛𝑝𝑂superscriptsubscript𝜈𝑛𝑝superscript𝑛𝑝1superscript𝑛3\displaystyle\Pr(I_{n}^{*}\neq k^{*})\leq\sum_{k\neq k^{*}}\Pr(I_{n}^{*}=k)% \leq\sum_{k\neq k^{*}}\Pr(T_{k}(n)\geq n/K)\leq K^{p+1}{\mathbb{E}}\left[T_{k}% (n)^{p}\right]/n^{p}=O\left({\nu_{n}^{p}\over n^{p}}+{1\over n^{3}}\right),roman_Pr ( italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤ ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Pr ( italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k ) ≤ ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Pr ( italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ≥ italic_n / italic_K ) ≤ italic_K start_POSTSUPERSCRIPT italic_p + 1 end_POSTSUPERSCRIPT blackboard_E [ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ] / italic_n start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT = italic_O ( divide start_ARG italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ) , (48)

and by Lemma A2,

𝔼[|Zn,k|p]=O(np/2),k=1,,K.formulae-sequence𝔼delimited-[]superscriptsubscript𝑍𝑛𝑘𝑝𝑂superscript𝑛𝑝2𝑘1𝐾\displaystyle{\mathbb{E}}\left[\left|Z_{n,k}\right|^{p}\right]=O(n^{p/2}),% \quad k=1,\ldots,K.blackboard_E [ | italic_Z start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ] = italic_O ( italic_n start_POSTSUPERSCRIPT italic_p / 2 end_POSTSUPERSCRIPT ) , italic_k = 1 , … , italic_K . (49)

It suffices to analyze the bias and variance of MInsubscript𝑀superscriptsubscript𝐼𝑛M_{I_{n}^{*}}italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, and then the MSE follows straightforwardly. Consider the bias first. Note that

Bias(MIn)=𝔼[MInμk]=𝔼[1TIn(n)j=1TIn(n)(XIn,jμk)]Biassubscript𝑀superscriptsubscript𝐼𝑛𝔼delimited-[]subscript𝑀superscriptsubscript𝐼𝑛subscript𝜇superscript𝑘𝔼delimited-[]1subscript𝑇superscriptsubscript𝐼𝑛𝑛superscriptsubscript𝑗1subscript𝑇superscriptsubscript𝐼𝑛𝑛subscript𝑋superscriptsubscript𝐼𝑛𝑗subscript𝜇superscript𝑘\displaystyle{\rm Bias}(M_{I_{n}^{*}})={\mathbb{E}}\left[M_{I_{n}^{*}}-\mu_{k^% {*}}\right]={\mathbb{E}}\left[\frac{1}{T_{I_{n}^{*}}(n)}\sum_{j=1}^{T_{I_{n}^{% *}}(n)}(X_{I_{n}^{*},j}-\mu_{k^{*}})\right]roman_Bias ( italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = blackboard_E [ italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] = blackboard_E [ divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_j end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ]
=\displaystyle== 𝔼[Zn,kTk(n)𝟙{In=k}]+kk𝔼[Zn,kTk(n)𝟙{In=k}]𝔼delimited-[]subscript𝑍𝑛superscript𝑘subscript𝑇superscript𝑘𝑛1superscriptsubscript𝐼𝑛superscript𝑘subscript𝑘superscript𝑘𝔼delimited-[]subscript𝑍𝑛superscript𝑘subscript𝑇𝑘𝑛1superscriptsubscript𝐼𝑛𝑘\displaystyle{\mathbb{E}}\left[\frac{Z_{n,k^{*}}}{T_{k^{*}}(n)}\mathbbm{1}\{I_% {n}^{*}=k^{*}\}\right]+\sum_{k\neq k^{*}}{\mathbb{E}}\left[\frac{Z_{n,k^{*}}}{% T_{k}(n)}\mathbbm{1}\{I_{n}^{*}=k\}\right]blackboard_E [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } ] + ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k } ]
=\displaystyle== 𝔼[Zn,kTk(n)]kk𝔼[Zn,kTk(n)𝟙{In=k}]+kk𝔼[Zn,kTk(n)𝟙{In=k}]𝔼delimited-[]subscript𝑍𝑛superscript𝑘subscript𝑇superscript𝑘𝑛subscript𝑘superscript𝑘𝔼delimited-[]subscript𝑍𝑛superscript𝑘subscript𝑇superscript𝑘𝑛1superscriptsubscript𝐼𝑛𝑘subscript𝑘superscript𝑘𝔼delimited-[]subscript𝑍𝑛superscript𝑘subscript𝑇𝑘𝑛1superscriptsubscript𝐼𝑛𝑘\displaystyle{\mathbb{E}}\left[\frac{Z_{n,k^{*}}}{T_{k^{*}}(n)}\right]-\sum_{k% \neq k^{*}}{\mathbb{E}}\left[\frac{Z_{n,k^{*}}}{T_{k^{*}}(n)}\mathbbm{1}\{I_{n% }^{*}=k\}\right]+\sum_{k\neq k^{*}}{\mathbb{E}}\left[\frac{Z_{n,k^{*}}}{T_{k}(% n)}\mathbbm{1}\{I_{n}^{*}=k\}\right]blackboard_E [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG ] - ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k } ] + ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k } ]
=\displaystyle== Q1Q2+Q3,subscript𝑄1subscript𝑄2subscript𝑄3\displaystyle Q_{1}-Q_{2}+Q_{3},italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_Q start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ,

where Q1,Q2subscript𝑄1subscript𝑄2Q_{1},Q_{2}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and Q3subscript𝑄3Q_{3}italic_Q start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT denote the three terms on the RHS of (A.9), respectively.

Recall that 𝔼[Zn,k]=0𝔼delimited-[]subscript𝑍𝑛superscript𝑘0{\mathbb{E}}[Z_{n,k^{*}}]=0blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] = 0 due to Wald’s equation. It follows that

Q1=subscript𝑄1absent\displaystyle Q_{1}=italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 𝔼[Zn,kTk(n)Zn,k𝔼Tk(n)]=𝔼[Zn,kTk(n)(1Tk(n)𝔼Tk(n))]=Cov[Zn,kTk(n),1Tk(n)𝔼Tk(n)]𝔼delimited-[]subscript𝑍𝑛superscript𝑘subscript𝑇superscript𝑘𝑛subscript𝑍𝑛superscript𝑘𝔼subscript𝑇superscript𝑘𝑛𝔼delimited-[]subscript𝑍𝑛superscript𝑘subscript𝑇superscript𝑘𝑛1subscript𝑇superscript𝑘𝑛𝔼subscript𝑇superscript𝑘𝑛Covsubscript𝑍𝑛superscript𝑘subscript𝑇superscript𝑘𝑛1subscript𝑇superscript𝑘𝑛𝔼subscript𝑇superscript𝑘𝑛\displaystyle{\mathbb{E}}\left[\frac{Z_{n,k^{*}}}{T_{k^{*}}(n)}-\frac{Z_{n,k^{% *}}}{{\mathbb{E}}T_{k^{*}}(n)}\right]={\mathbb{E}}\left[\frac{Z_{n,k^{*}}}{T_{% k^{*}}(n)}\left(1-\frac{T_{k^{*}}(n)}{{\mathbb{E}}T_{k^{*}}(n)}\right)\right]=% {\rm Cov}\left[\frac{Z_{n,k^{*}}}{T_{k^{*}}(n)},1-\frac{T_{k^{*}}(n)}{{\mathbb% {E}}T_{k^{*}}(n)}\right]blackboard_E [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG - divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG blackboard_E italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG ] = blackboard_E [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG ( 1 - divide start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG blackboard_E italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG ) ] = roman_Cov [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG , 1 - divide start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG blackboard_E italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG ]
=\displaystyle== Cov[Zn,kTk(n),Tk(n)𝔼Tk(n)]=ρVar[Zn,k/Tk(n)]Var[Tk(n)]𝔼Tk(n),Covsubscript𝑍𝑛superscript𝑘subscript𝑇superscript𝑘𝑛subscript𝑇superscript𝑘𝑛𝔼subscript𝑇superscript𝑘𝑛𝜌Vardelimited-[]subscript𝑍𝑛superscript𝑘subscript𝑇superscript𝑘𝑛Vardelimited-[]subscript𝑇superscript𝑘𝑛𝔼subscript𝑇superscript𝑘𝑛\displaystyle-{\rm Cov}\left[\frac{Z_{n,k^{*}}}{T_{k^{*}}(n)},\frac{T_{k^{*}}(% n)}{{\mathbb{E}}T_{k^{*}}(n)}\right]=-\rho\cdot\frac{\sqrt{{\rm Var}[Z_{n,k^{*% }}/T_{k^{*}}(n)]{\rm Var}[T_{k^{*}}(n)]}}{{\mathbb{E}}T_{k^{*}}(n)},- roman_Cov [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG , divide start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG blackboard_E italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG ] = - italic_ρ ⋅ divide start_ARG square-root start_ARG roman_Var [ italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT / italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ] roman_Var [ italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ] end_ARG end_ARG start_ARG blackboard_E italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG , (51)

where ρ𝜌\rhoitalic_ρ denote the correlation coefficient between Zn,k/Tk(n)subscript𝑍𝑛superscript𝑘subscript𝑇superscript𝑘𝑛Z_{n,k^{*}}/T_{k^{*}}(n)italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT / italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) and Tk(n)subscript𝑇superscript𝑘𝑛T_{k^{*}}(n)italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ), i.e.

ρ=Cov[Zn,k/Tk(n),Tk(n)]Var[Zn,k/Tk(n)]Var[Tk(n)].𝜌Covsubscript𝑍𝑛superscript𝑘subscript𝑇superscript𝑘𝑛subscript𝑇superscript𝑘𝑛Vardelimited-[]subscript𝑍𝑛superscript𝑘subscript𝑇superscript𝑘𝑛Vardelimited-[]subscript𝑇superscript𝑘𝑛\displaystyle\rho=\frac{{\rm Cov}\left[Z_{n,k^{*}}/T_{k^{*}}(n),T_{k^{*}}(n)% \right]}{\sqrt{{\rm Var}[Z_{n,k^{*}}/T_{k^{*}}(n)]{\rm Var}[T_{k^{*}}(n)]}}.italic_ρ = divide start_ARG roman_Cov [ italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT / italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) , italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ] end_ARG start_ARG square-root start_ARG roman_Var [ italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT / italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ] roman_Var [ italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ] end_ARG end_ARG .

It should also be noted that ρ>0𝜌0\rho>0italic_ρ > 0 (and thus Q1<0subscript𝑄10Q_{1}<0italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < 0), because Zn,k/Tk(n)subscript𝑍𝑛superscript𝑘subscript𝑇superscript𝑘𝑛Z_{n,k^{*}}/T_{k^{*}}(n)italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT / italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) and Tk(n)subscript𝑇superscript𝑘𝑛T_{k^{*}}(n)italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) are positive correlated. To see this, note that

Cov[Zn,kTk(n),Tk(n)]=Cov[X¯k[Tk(n)]Tk(n)Tk(n)μk,Tk(n)]=Cov[X¯k[Tk(n)],Tk(n)]>0,Covsubscript𝑍𝑛superscript𝑘subscript𝑇superscript𝑘𝑛subscript𝑇superscript𝑘𝑛Covsubscript¯𝑋superscript𝑘delimited-[]subscript𝑇superscript𝑘𝑛subscript𝑇superscript𝑘𝑛subscript𝑇superscript𝑘𝑛subscript𝜇superscript𝑘subscript𝑇superscript𝑘𝑛Covsubscript¯𝑋superscript𝑘delimited-[]subscript𝑇superscript𝑘𝑛subscript𝑇superscript𝑘𝑛0\displaystyle{\rm Cov}\left[\frac{Z_{n,k^{*}}}{T_{k^{*}}(n)},T_{k^{*}}(n)% \right]={\rm Cov}\left[\frac{\bar{X}_{k^{*}}[T_{k^{*}}(n)]T_{k^{*}}(n)}{T_{k^{% *}}(n)}-\mu_{k^{*}},T_{k^{*}}(n)\right]={\rm Cov}\left[\bar{X}_{k^{*}}[T_{k^{*% }}(n)],T_{k^{*}}(n)\right]>0,roman_Cov [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG , italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ] = roman_Cov [ divide start_ARG over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ] italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG - italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ] = roman_Cov [ over¯ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ] , italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ] > 0 ,

where the inequality follows naturally from the way that GUCB is defined, i.e., larger sample mean leads to higher chance of sampling from arm ksuperscript𝑘k^{*}italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

By (51), to understand the order Q1subscript𝑄1Q_{1}italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we only need to analyze 𝔼Tk(n)𝔼subscript𝑇superscript𝑘𝑛{\mathbb{E}}T_{k^{*}}(n)blackboard_E italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ), Var[Tk(n)]Vardelimited-[]subscript𝑇superscript𝑘𝑛{\rm Var}[T_{k^{*}}(n)]roman_Var [ italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ] and Var[Zn,k/Tk(n)]Vardelimited-[]subscript𝑍𝑛superscript𝑘subscript𝑇superscript𝑘𝑛{\rm Var}[Z_{n,k^{*}}/T_{k^{*}}(n)]roman_Var [ italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT / italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ]. Note that by Proposition 1,

n>𝔼Tk(n)=nkk𝔼Tk(n)nkk(8νnΔk2+5)nKc1νn,𝑛𝔼subscript𝑇superscript𝑘𝑛𝑛subscript𝑘superscript𝑘𝔼subscript𝑇𝑘𝑛𝑛subscript𝑘superscript𝑘8subscript𝜈𝑛superscriptsubscriptΔ𝑘25𝑛𝐾subscript𝑐1subscript𝜈𝑛\displaystyle n>{\mathbb{E}}T_{k^{*}}(n)=n-\sum_{k\neq k^{*}}{\mathbb{E}}T_{k}% (n)\geq n-\sum_{k\neq k^{*}}\left(\frac{8\nu_{n}}{\Delta_{k}^{2}}+5\right)\geq n% -Kc_{1}\nu_{n},italic_n > blackboard_E italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) = italic_n - ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ≥ italic_n - ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 5 ) ≥ italic_n - italic_K italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , (52)

and

Var[Tk(n)]=Vardelimited-[]subscript𝑇superscript𝑘𝑛absent\displaystyle{\rm Var}[T_{k^{*}}(n)]=roman_Var [ italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ] = Var[nkkTk(n)]=Var[kkTk(n)]𝔼[(kkTk(n))2]Vardelimited-[]𝑛subscript𝑘superscript𝑘subscript𝑇𝑘𝑛Vardelimited-[]subscript𝑘superscript𝑘subscript𝑇𝑘𝑛𝔼delimited-[]superscriptsubscript𝑘superscript𝑘subscript𝑇𝑘𝑛2\displaystyle{\rm Var}\left[n-\sum_{k\neq k^{*}}T_{k}(n)\right]={\rm Var}\left% [\sum_{k\neq k^{*}}T_{k}(n)\right]\leq{\mathbb{E}}\left[\left(\sum_{k\neq k^{*% }}T_{k}(n)\right)^{2}\right]roman_Var [ italic_n - ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ] = roman_Var [ ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ] ≤ blackboard_E [ ( ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
\displaystyle\leq (K1)𝔼[kkTk(n)2]Kkk(8νnΔk2+5)2K2c22νn2,𝐾1𝔼delimited-[]subscript𝑘superscript𝑘subscript𝑇𝑘superscript𝑛2𝐾subscript𝑘superscript𝑘superscript8subscript𝜈𝑛superscriptsubscriptΔ𝑘252superscript𝐾2superscriptsubscript𝑐22superscriptsubscript𝜈𝑛2\displaystyle(K-1){\mathbb{E}}\left[\sum_{k\neq k^{*}}T_{k}(n)^{2}\right]\leq K% \sum_{k\neq k^{*}}\left(\frac{8\nu_{n}}{\Delta_{k}^{2}}+5\right)^{2}\leq K^{2}% c_{2}^{2}\nu_{n}^{2},( italic_K - 1 ) blackboard_E [ ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_K ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( divide start_ARG 8 italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + 5 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (53)

where the third inequality follows from Proposition 1 with p=2𝑝2p=2italic_p = 2.

Moreover,

Var[Zn,kTk(n)]𝔼[Zn,k2Tk2(n)]=𝔼[Zn,k2TIn2(n)𝟙{In=k}]+𝔼[Zn,k2Tk(n)2𝟙{Ink}],Vardelimited-[]subscript𝑍𝑛superscript𝑘subscript𝑇superscript𝑘𝑛𝔼delimited-[]superscriptsubscript𝑍𝑛superscript𝑘2superscriptsubscript𝑇superscript𝑘2𝑛𝔼delimited-[]superscriptsubscript𝑍𝑛superscript𝑘2superscriptsubscript𝑇superscriptsubscript𝐼𝑛2𝑛1superscriptsubscript𝐼𝑛superscript𝑘𝔼delimited-[]superscriptsubscript𝑍𝑛superscript𝑘2subscript𝑇superscript𝑘superscript𝑛21superscriptsubscript𝐼𝑛superscript𝑘\displaystyle{\rm Var}\left[\frac{Z_{n,k^{*}}}{T_{k^{*}}(n)}\right]\leq{% \mathbb{E}}\left[\frac{Z_{n,k^{*}}^{2}}{T_{k^{*}}^{2}(n)}\right]={\mathbb{E}}% \left[\frac{Z_{n,k^{*}}^{2}}{T_{I_{n}^{*}}^{2}(n)}\mathbbm{1}\{I_{n}^{*}=k^{*}% \}\right]+{\mathbb{E}}\left[\frac{Z_{n,k^{*}}^{2}}{T_{k^{*}}(n)^{2}}\mathbbm{1% }\{I_{n}^{*}\neq k^{*}\}\right],roman_Var [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) end_ARG ] ≤ blackboard_E [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) end_ARG ] = blackboard_E [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) end_ARG blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } ] + blackboard_E [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } ] , (54)

We analyze the RHS of (54). Because TInn/Ksubscript𝑇superscriptsubscript𝐼𝑛𝑛𝐾T_{I_{n}^{*}}\geq n/Kitalic_T start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≥ italic_n / italic_K,

𝔼[Zn,k2TIn2(n)𝟙{In=k}]K2n2𝔼[Zn,k2]=σk2K2n2𝔼[Tk(n)]σk2K2n,𝔼delimited-[]superscriptsubscript𝑍𝑛superscript𝑘2superscriptsubscript𝑇superscriptsubscript𝐼𝑛2𝑛1superscriptsubscript𝐼𝑛superscript𝑘superscript𝐾2superscript𝑛2𝔼delimited-[]superscriptsubscript𝑍𝑛superscript𝑘2superscriptsubscript𝜎superscript𝑘2superscript𝐾2superscript𝑛2𝔼delimited-[]subscript𝑇superscript𝑘𝑛superscriptsubscript𝜎superscript𝑘2superscript𝐾2𝑛\displaystyle{\mathbb{E}}\left[\frac{Z_{n,k^{*}}^{2}}{T_{I_{n}^{*}}^{2}(n)}% \mathbbm{1}\{I_{n}^{*}=k^{*}\}\right]\leq{K^{2}\over n^{2}}{\mathbb{E}}\left[Z% _{n,k^{*}}^{2}\right]={\sigma_{k^{*}}^{2}K^{2}\over n^{2}}{\mathbb{E}}\left[T_% {k^{*}}(n)\right]\leq{\sigma_{k^{*}}^{2}K^{2}\over n},blackboard_E [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) end_ARG blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } ] ≤ divide start_ARG italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = divide start_ARG italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_E [ italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) ] ≤ divide start_ARG italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG , (55)

where the first equality follows from Wald’s lemma (Theorem 13.2.14 in Athreya and Lahiri 2006).

We also note that

𝔼[Zn,k2Tk(n)2𝟙{Ink}]𝔼[Zn,k2𝟙{Ink}]𝔼delimited-[]superscriptsubscript𝑍𝑛superscript𝑘2subscript𝑇superscript𝑘superscript𝑛21superscriptsubscript𝐼𝑛superscript𝑘𝔼delimited-[]superscriptsubscript𝑍𝑛superscript𝑘21superscriptsubscript𝐼𝑛superscript𝑘\displaystyle{\mathbb{E}}\left[\frac{Z_{n,k^{*}}^{2}}{T_{k^{*}}(n)^{2}}% \mathbbm{1}\{I_{n}^{*}\neq k^{*}\}\right]\leq{\mathbb{E}}\left[Z_{n,k^{*}}^{2}% \mathbbm{1}\{I_{n}^{*}\neq k^{*}\}\right]blackboard_E [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_n ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } ] ≤ blackboard_E [ italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } ] (56)
\displaystyle\leq 𝔼p11p1(𝟙{Ink})𝔼1p1(Zn,k2p1)superscript𝔼subscript𝑝11subscript𝑝11superscriptsubscript𝐼𝑛superscript𝑘superscript𝔼1subscript𝑝1superscriptsubscript𝑍𝑛superscript𝑘2subscript𝑝1\displaystyle{\mathbb{E}}^{p_{1}-1\over p_{1}}\left(\mathbbm{1}\{I_{n}^{*}\neq k% ^{*}\}\right){\mathbb{E}}^{1\over p_{1}}\left(Z_{n,k^{*}}^{2p_{1}}\right)blackboard_E start_POSTSUPERSCRIPT divide start_ARG italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 end_ARG start_ARG italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_POSTSUPERSCRIPT ( blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } ) blackboard_E start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_POSTSUPERSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT )
=\displaystyle== O((νnp2np2+1n3)p11p1n2p121p1)=O((n(1δ)p2(p11)/p1np2(p11)/p1+1n3(p11)/p1)n)=o(1n),𝑂superscriptsuperscriptsubscript𝜈𝑛subscript𝑝2superscript𝑛subscript𝑝21superscript𝑛3subscript𝑝11subscript𝑝1superscript𝑛2subscript𝑝121subscript𝑝1𝑂superscript𝑛1𝛿subscript𝑝2subscript𝑝11subscript𝑝1superscript𝑛subscript𝑝2subscript𝑝11subscript𝑝11superscript𝑛3subscript𝑝11subscript𝑝1𝑛𝑜1𝑛\displaystyle O\left(\left({\nu_{n}^{p_{2}}\over n^{p_{2}}}+{1\over n^{3}}% \right)^{p_{1}-1\over p_{1}}\cdot n^{{2p_{1}\over 2}\cdot{1\over p_{1}}}\right% )=O\left(\left({n^{(1-\delta)p_{2}(p_{1}-1)/p_{1}}\over n^{p_{2}(p_{1}-1)/p_{1% }}}+{1\over n^{3(p_{1}-1)/p_{1}}}\right)\cdot n\right)=o\left({1\over n}\right),italic_O ( ( divide start_ARG italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT divide start_ARG italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 end_ARG start_ARG italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_POSTSUPERSCRIPT ⋅ italic_n start_POSTSUPERSCRIPT divide start_ARG 2 italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ⋅ divide start_ARG 1 end_ARG start_ARG italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_POSTSUPERSCRIPT ) = italic_O ( ( divide start_ARG italic_n start_POSTSUPERSCRIPT ( 1 - italic_δ ) italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 ) / italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 ) / italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 ) / italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ) ⋅ italic_n ) = italic_o ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) ,

where the second inequality follows from Hölder’s inequality, and the third-to-last equality is due to the facts in (48) and (49) with p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and p2subscript𝑝2p_{2}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT satisfying p1=3/(1ϵ)subscript𝑝131italic-ϵp_{1}=3/(1-\epsilon)italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 3 / ( 1 - italic_ϵ ) and p2>3/δsubscript𝑝23𝛿p_{2}>3/\deltaitalic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 3 / italic_δ for some ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ).

Combining Equations (51)-(56) yields

|Q1|K2σk2c2νn+o(1)n(nKc1νn)=O(νnn3/2).subscript𝑄1superscript𝐾2superscriptsubscript𝜎superscript𝑘2subscript𝑐2subscript𝜈𝑛𝑜1𝑛𝑛𝐾subscript𝑐1subscript𝜈𝑛𝑂subscript𝜈𝑛superscript𝑛32\displaystyle|Q_{1}|\leq{K^{2}\sigma_{k^{*}}^{2}c_{2}\nu_{n}+o(1)\over\sqrt{n}% \left(n-Kc_{1}\nu_{n}\right)}=O\left({\nu_{n}\over n^{3/2}}\right).| italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | ≤ divide start_ARG italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_o ( 1 ) end_ARG start_ARG square-root start_ARG italic_n end_ARG ( italic_n - italic_K italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG = italic_O ( divide start_ARG italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT end_ARG ) .

Using the same argument in proving (56), we have

|Q2|kk𝔼[|Zn,k|𝟙{In=k}]subscript𝑄2subscript𝑘superscript𝑘𝔼delimited-[]subscript𝑍𝑛superscript𝑘1superscriptsubscript𝐼𝑛𝑘\displaystyle|Q_{2}|\leq\sum_{k\neq k^{*}}{\mathbb{E}}\left[\left|Z_{n,k^{*}}% \right|\mathbbm{1}\{I_{n}^{*}=k\}\right]| italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | ≤ ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E [ | italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k } ]
\displaystyle\leq (K1)𝔼p11p1(𝟙{Ink})𝔼1p1|Zn,kp1|𝐾1superscript𝔼subscript𝑝11subscript𝑝11superscriptsubscript𝐼𝑛superscript𝑘superscript𝔼1subscript𝑝1superscriptsubscript𝑍𝑛superscript𝑘subscript𝑝1\displaystyle(K-1){\mathbb{E}}^{p_{1}-1\over p_{1}}\left(\mathbbm{1}\{I_{n}^{*% }\neq k^{*}\}\right){\mathbb{E}}^{1\over p_{1}}\left|Z_{n,k^{*}}^{p_{1}}\right|( italic_K - 1 ) blackboard_E start_POSTSUPERSCRIPT divide start_ARG italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 end_ARG start_ARG italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_POSTSUPERSCRIPT ( blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } ) blackboard_E start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_POSTSUPERSCRIPT | italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT |
=\displaystyle== O((n(1δ)p2(p11)/p1np2(p11)/p1+1n3(p11)/p1)n12)=o(νnn3/2)𝑂superscript𝑛1𝛿subscript𝑝2subscript𝑝11subscript𝑝1superscript𝑛subscript𝑝2subscript𝑝11subscript𝑝11superscript𝑛3subscript𝑝11subscript𝑝1superscript𝑛12𝑜subscript𝜈𝑛superscript𝑛32\displaystyle O\left(\left({n^{(1-\delta)p_{2}(p_{1}-1)/p_{1}}\over n^{p_{2}(p% _{1}-1)/p_{1}}}+{1\over n^{3(p_{1}-1)/p_{1}}}\right)\cdot n^{1\over 2}\right)=% o\left({\nu_{n}\over n^{3/2}}\right)italic_O ( ( divide start_ARG italic_n start_POSTSUPERSCRIPT ( 1 - italic_δ ) italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 ) / italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 ) / italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 ) / italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ) ⋅ italic_n start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) = italic_o ( divide start_ARG italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT end_ARG )

with p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and p2subscript𝑝2p_{2}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT satisfying p1=3/(1ϵ)subscript𝑝131italic-ϵp_{1}=3/(1-\epsilon)italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 3 / ( 1 - italic_ϵ ) and p2>3/(2δ)subscript𝑝232𝛿p_{2}>3/(2\delta)italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 3 / ( 2 italic_δ ) for some ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ).

In a similar manner, it can be shown that |Q3|=o(νnn3/2)subscript𝑄3𝑜subscript𝜈𝑛superscript𝑛32|Q_{3}|=o\left({\nu_{n}\over n^{3/2}}\right)| italic_Q start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT | = italic_o ( divide start_ARG italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT end_ARG ), and the details are omitted. Therefore, by (A.9),

|Bias(MIn)|K2σk2c2νn+c3n(nKc1νn)+o(νnn3/2)=O(νnn3/2).Biassubscript𝑀superscriptsubscript𝐼𝑛superscript𝐾2superscriptsubscript𝜎superscript𝑘2subscript𝑐2subscript𝜈𝑛subscript𝑐3𝑛𝑛𝐾subscript𝑐1subscript𝜈𝑛𝑜subscript𝜈𝑛superscript𝑛32𝑂subscript𝜈𝑛superscript𝑛32\left|{\rm Bias}\left(M_{I_{n}^{*}}\right)\right|\leq{K^{2}\sigma_{k^{*}}^{2}c% _{2}\nu_{n}+c_{3}\over\sqrt{n}\left(n-Kc_{1}\nu_{n}\right)}+o\left({\nu_{n}% \over n^{3/2}}\right)=O\left({\nu_{n}\over n^{3/2}}\right).| roman_Bias ( italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) | ≤ divide start_ARG italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_n end_ARG ( italic_n - italic_K italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG + italic_o ( divide start_ARG italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT end_ARG ) = italic_O ( divide start_ARG italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT end_ARG ) .

Note further that the order of |Q1|subscript𝑄1|Q_{1}|| italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | dominates those of |Q2|subscript𝑄2|Q_{2}|| italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | and |Q3|subscript𝑄3|Q_{3}|| italic_Q start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT |. Because Q1<0subscript𝑄10Q_{1}<0italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < 0, we know that Bias(MIn)<0Biassubscript𝑀superscriptsubscript𝐼𝑛0{\rm Bias}\left(M_{I_{n}^{*}}\right)<0roman_Bias ( italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) < 0 for sufficiently large n𝑛nitalic_n.

We now consider the variance. Note that

Var[MIn]Vardelimited-[]subscript𝑀superscriptsubscript𝐼𝑛absent\displaystyle{\rm Var}\left[M_{I_{n}^{*}}\right]\leqroman_Var [ italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] ≤ 𝔼[|MInμk|2]𝔼delimited-[]superscriptsubscript𝑀superscriptsubscript𝐼𝑛subscript𝜇superscript𝑘2\displaystyle{\mathbb{E}}\left[\left|M_{I_{n}^{*}}-\mu_{k^{*}}\right|^{2}\right]blackboard_E [ | italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
=\displaystyle== 𝔼[|MInμk|2𝟙{In=k}]+kk𝔼[|MInμk|2𝟙{In=k}]𝔼delimited-[]superscriptsubscript𝑀superscriptsubscript𝐼𝑛subscript𝜇superscript𝑘21superscriptsubscript𝐼𝑛superscript𝑘subscript𝑘superscript𝑘𝔼delimited-[]superscriptsubscript𝑀superscriptsubscript𝐼𝑛subscript𝜇superscript𝑘21superscriptsubscript𝐼𝑛𝑘\displaystyle{\mathbb{E}}\left[\left|M_{I_{n}^{*}}-\mu_{k^{*}}\right|^{2}% \mathbbm{1}\{I_{n}^{*}=k^{*}\}\right]+\sum_{k\neq k^{*}}{\mathbb{E}}\left[% \left|M_{I_{n}^{*}}-\mu_{k^{*}}\right|^{2}\mathbbm{1}\{I_{n}^{*}=k\}\right]blackboard_E [ | italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } ] + ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E [ | italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k } ]
=\displaystyle== 𝔼[Zn,k2Tk2(n)𝟙{In=k}]+kk𝔼[(Zn,kΔk)2Tk2(n)𝟙{In=k}]𝔼delimited-[]superscriptsubscript𝑍𝑛superscript𝑘2superscriptsubscript𝑇superscript𝑘2𝑛1superscriptsubscript𝐼𝑛superscript𝑘subscript𝑘superscript𝑘𝔼delimited-[]superscriptsubscript𝑍𝑛𝑘subscriptΔ𝑘2superscriptsubscript𝑇𝑘2𝑛1superscriptsubscript𝐼𝑛𝑘\displaystyle{\mathbb{E}}\left[\frac{Z_{n,k^{*}}^{2}}{T_{k^{*}}^{2}(n)}% \mathbbm{1}\{I_{n}^{*}=k^{*}\}\right]+\sum_{k\neq k^{*}}{\mathbb{E}}\left[{(Z_% {n,k}-\Delta_{k})^{2}\over T_{k}^{2}(n)}\mathbbm{1}\{I_{n}^{*}=k\}\right]blackboard_E [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) end_ARG blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } ] + ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E [ divide start_ARG ( italic_Z start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT - roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) end_ARG blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k } ]
\displaystyle\leq 𝔼[Zn,k2Tk2(n)𝟙{In=k}]+2kk𝔼[Zn,k2Tk2(n)𝟙{In=k}]+2kkΔk2𝔼[1Tk2(n)𝟙{In=k}].𝔼delimited-[]superscriptsubscript𝑍𝑛superscript𝑘2superscriptsubscript𝑇superscript𝑘2𝑛1superscriptsubscript𝐼𝑛superscript𝑘2subscript𝑘superscript𝑘𝔼delimited-[]superscriptsubscript𝑍𝑛𝑘2superscriptsubscript𝑇𝑘2𝑛1superscriptsubscript𝐼𝑛𝑘2subscript𝑘superscript𝑘superscriptsubscriptΔ𝑘2𝔼delimited-[]1superscriptsubscript𝑇𝑘2𝑛1superscriptsubscript𝐼𝑛𝑘\displaystyle{\mathbb{E}}\left[\frac{Z_{n,k^{*}}^{2}}{T_{k^{*}}^{2}(n)}% \mathbbm{1}\{I_{n}^{*}=k^{*}\}\right]+2\sum_{k\neq k^{*}}{\mathbb{E}}\left[{Z_% {n,k}^{2}\over T_{k}^{2}(n)}\mathbbm{1}\{I_{n}^{*}=k\}\right]+2\sum_{k\neq k^{% *}}\Delta_{k}^{2}{\mathbb{E}}\left[{1\over T_{k}^{2}(n)}\mathbbm{1}\{I_{n}^{*}% =k\}\right].blackboard_E [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) end_ARG blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } ] + 2 ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) end_ARG blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k } ] + 2 ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) end_ARG blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k } ] . (57)

We note that by (55),

𝔼[Zn,k2Tk2(n)𝟙{In=k}]σk2K2n.𝔼delimited-[]superscriptsubscript𝑍𝑛superscript𝑘2superscriptsubscript𝑇superscript𝑘2𝑛1superscriptsubscript𝐼𝑛superscript𝑘superscriptsubscript𝜎superscript𝑘2superscript𝐾2𝑛{\mathbb{E}}\left[\frac{Z_{n,k^{*}}^{2}}{T_{k^{*}}^{2}(n)}\mathbbm{1}\{I_{n}^{% *}=k^{*}\}\right]\leq{\sigma_{k^{*}}^{2}K^{2}\over n}.blackboard_E [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) end_ARG blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } ] ≤ divide start_ARG italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG .

Similar to the proof of (56), we have

2kk𝔼[Zn,k2Tk2(n)𝟙{In=k}]=o(1n).2subscript𝑘superscript𝑘𝔼delimited-[]superscriptsubscript𝑍𝑛𝑘2superscriptsubscript𝑇𝑘2𝑛1superscriptsubscript𝐼𝑛𝑘𝑜1𝑛2\sum_{k\neq k^{*}}{\mathbb{E}}\left[{Z_{n,k}^{2}\over T_{k}^{2}(n)}\mathbbm{1% }\{I_{n}^{*}=k\}\right]=o\left({1\over n}\right).2 ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_E [ divide start_ARG italic_Z start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) end_ARG blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k } ] = italic_o ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) .

Moreover, using the fact in (48),

2kkΔk2𝔼[1Tk2(n)𝟙{In=k}]2kkΔk2𝔼[𝟙{In=k}]=O(νnpnp+1n3)=O(n(1δ)pnp+1n3)=o(1n),2subscript𝑘superscript𝑘superscriptsubscriptΔ𝑘2𝔼delimited-[]1superscriptsubscript𝑇𝑘2𝑛1superscriptsubscript𝐼𝑛𝑘2subscript𝑘superscript𝑘superscriptsubscriptΔ𝑘2𝔼delimited-[]1superscriptsubscript𝐼𝑛𝑘𝑂superscriptsubscript𝜈𝑛𝑝superscript𝑛𝑝1superscript𝑛3𝑂superscript𝑛1𝛿𝑝superscript𝑛𝑝1superscript𝑛3𝑜1𝑛2\sum_{k\neq k^{*}}\Delta_{k}^{2}{\mathbb{E}}\left[{1\over T_{k}^{2}(n)}% \mathbbm{1}\{I_{n}^{*}=k\}\right]\leq 2\sum_{k\neq k^{*}}\Delta_{k}^{2}{% \mathbb{E}}\left[\mathbbm{1}\{I_{n}^{*}=k\}\right]=O\left({\nu_{n}^{p}\over n^% {p}}+{1\over n^{3}}\right)=O\left({n^{(1-\delta)p}\over n^{p}}+{1\over n^{3}}% \right)=o\left({1\over n}\right),2 ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_n ) end_ARG blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k } ] ≤ 2 ∑ start_POSTSUBSCRIPT italic_k ≠ italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ blackboard_1 { italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_k } ] = italic_O ( divide start_ARG italic_ν start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ) = italic_O ( divide start_ARG italic_n start_POSTSUPERSCRIPT ( 1 - italic_δ ) italic_p end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ) = italic_o ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) ,

with p>1/δ𝑝1𝛿p>1/\deltaitalic_p > 1 / italic_δ.

Therefore,

Var[MIn]σk2K2n+o(1n),Vardelimited-[]subscript𝑀superscriptsubscript𝐼𝑛superscriptsubscript𝜎superscript𝑘2superscript𝐾2𝑛𝑜1𝑛{\rm Var}\left[M_{I_{n}^{*}}\right]\leq{\sigma_{k^{*}}^{2}K^{2}\over n}+o\left% ({1\over n}\right),roman_Var [ italic_M start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ] ≤ divide start_ARG italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + italic_o ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ) ,

which completes the proof.

A.10 Proof of Theorem 8

Let Mk=1Tk(n)j=1Tk(n)Xkjsubscript𝑀𝑘1subscript𝑇𝑘𝑛superscriptsubscript𝑗1subscript𝑇𝑘𝑛subscript𝑋𝑘𝑗M_{k}=\frac{1}{T_{k}(n)}\sum_{j=1}^{T_{k}(n)}X_{kj}italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_n ) end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT. By (47),

(ωΩ:0<ε<1,N=N(ω,ε),s.t.n>N,nMIn=nMk)=1,\displaystyle{\mathbb{P}}\left(\omega\in\Omega:\forall 0<\varepsilon<1,\exists N% =N(\omega,\varepsilon)\in{\mathbb{N}},s.t.\ \forall n>N,\sqrt{n}M_{I^{*}_{n}}=% \sqrt{n}M_{k^{*}}\right)=1,blackboard_P ( italic_ω ∈ roman_Ω : ∀ 0 < italic_ε < 1 , ∃ italic_N = italic_N ( italic_ω , italic_ε ) ∈ blackboard_N , italic_s . italic_t . ∀ italic_n > italic_N , square-root start_ARG italic_n end_ARG italic_M start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = square-root start_ARG italic_n end_ARG italic_M start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) = 1 ,

i.e.,

nMInnMka.s.0,asn.\displaystyle\sqrt{n}M_{I^{*}_{n}}-\sqrt{n}M_{k^{*}}\overset{a.s.}{% \longrightarrow}0,\ \ {\rm as}\ \ n\to\infty.square-root start_ARG italic_n end_ARG italic_M start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT - square-root start_ARG italic_n end_ARG italic_M start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_OVERACCENT italic_a . italic_s . end_OVERACCENT start_ARG ⟶ end_ARG 0 , roman_as italic_n → ∞ .

Note that

n(MInμ)=nMInnMk+n(Mkμ).𝑛subscript𝑀subscriptsuperscript𝐼𝑛superscript𝜇𝑛subscript𝑀subscriptsuperscript𝐼𝑛𝑛subscript𝑀superscript𝑘𝑛subscript𝑀superscript𝑘superscript𝜇\displaystyle\sqrt{n}\cdot\left(M_{I^{*}_{n}}-\mu^{*}\right)=\sqrt{n}M_{I^{*}_% {n}}-\sqrt{n}M_{k^{*}}+\sqrt{n}\cdot\left(M_{k^{*}}-\mu^{*}\right).square-root start_ARG italic_n end_ARG ⋅ ( italic_M start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = square-root start_ARG italic_n end_ARG italic_M start_POSTSUBSCRIPT italic_I start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT - square-root start_ARG italic_n end_ARG italic_M start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + square-root start_ARG italic_n end_ARG ⋅ ( italic_M start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) .

By Slutsky Theorem, to prove the theorem, it suffices to show

n(Mkμ)N(0,σk2),asn,formulae-sequence𝑛subscript𝑀superscript𝑘superscript𝜇𝑁0superscriptsubscript𝜎superscript𝑘2as𝑛\displaystyle\sqrt{n}\cdot\left(M_{k^{*}}-\mu^{*}\right)\Rightarrow N(0,\sigma% _{k^{*}}^{2}),\ \ {\rm as}\ \ n\to\infty,square-root start_ARG italic_n end_ARG ⋅ ( italic_M start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ⇒ italic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , roman_as italic_n → ∞ , (58)

which follows immediately from (45) and Theorem 1 of Rényi (1957) or Theorem 3.2 in Chapter 7 of Gut (2013), and the proof is completed.

A.11 Proof of Proposition 3

Note that σk2=𝔼Xk2𝔼2Xksuperscriptsubscript𝜎superscript𝑘2𝔼superscriptsubscript𝑋superscript𝑘2superscript𝔼2subscript𝑋superscript𝑘\sigma_{k^{*}}^{2}={\mathbb{E}}X_{k^{*}}^{2}-{\mathbb{E}}^{2}X_{k^{*}}italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = blackboard_E italic_X start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - blackboard_E start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. It suffices to prove the two terms on the RHS of (12) converge to two terms of σk2superscriptsubscript𝜎superscript𝑘2\sigma_{k^{*}}^{2}italic_σ start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, respectively.

By Theorem 6 and the continuous mapping theorem, the second term on the RHS of (12) converges almost surely to 𝔼2Xksuperscript𝔼2subscript𝑋superscript𝑘{\mathbb{E}}^{2}X_{k^{*}}blackboard_E start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Similar to the proof of Theorem 6, we can show that the first term on the RHS of (12) converge to 𝔼Xk2𝔼superscriptsubscript𝑋superscript𝑘2{\mathbb{E}}X_{k^{*}}^{2}blackboard_E italic_X start_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT almost surely, thus completing the proof.

References

  • Agrawal, R. 1995. Sample mean based index policies by o(logn)𝑜𝑛o(\log n)italic_o ( roman_log italic_n ) regret for the multi-armed bandit problem. Advances in Applied Probability, 27(4), 1054-1078.

  • Athreya, K. B., S. N. Lahiri. 2006. Measure Theory and Probability Theory. Springer Science & Business Media.

  • Auer, P., N. Cesa-Bianchi, P. Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3), 235-256.

  • Chang, H. S., M. C. Fu, J. Hu, S. I. Marcus. 2005. An adaptive sampling algorithm for solving Markov decision processes. Operations Research, 53(1), 126-139.

  • Chen, H. J., E. J. Dudewicz. 1976. Procedures for fixed-width interval estimation of the largest normal mean. Journal of the American Statistical Association, 71(355), 752-756.

  • D’Eramo, C., A. Nuara, M. Restelli. 2016. Estimating the maximum expected value through Gaussian approximation. International Conference on Machine Learning, 1032-1040.

  • Dmitrienko, A., J. C. Hsu. 2006. Multiple testing in clinical trials. S. Kotz, C. B. Read, N. Balakrishnan, B. Z. Vidakovic (Eds.), Encyclopedia of Statistical Sciences. Wiley, New Jersey.

  • Fan, W., L. J. Hong, B. L. Nelson. 2016. Indifference-zone-free selection of the best. Operations Research, 64(6), 1499-1514.

  • FDA (Food and Drug Administration). 2019. Adaptive designs for clinical trials of drugs and biologics: Guidance for industry. Available at https://www.fda.gov/media/78495/download.

  • Fu, M. C. 2017. Markov decision processes, AlphaGo, and Monte Carlo tree search: Back to the future. INFORMS TutORials in Operations Research, 68-88.

  • Gut, A. 2013. Probability: A Graduate Course. Springer Science & Business Media.

  • Hao, B., Y. Abbasi-Yadkori, Z. Wen, G. Cheng. 2019. Bootstrapping upper confidence bound. Advances in Neural Information Processing Systems, 12123-12133.

  • Kim, S.-H., B. L. Nelson. 2006. Selecting the best system. In Handbook in OR & MS: Simulation, S. G. Henderson and B. L. Nelson (Eds.), 501-534, Elsevier Science.

  • Kocsis, L., C. Szepesvári. 2006. Bandit based Monte-Carlo planning. Machine Learning: ECML, 282-293.

  • Lai, T. L., H. Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1), 4-22.

  • Lesnevski, V., B. L. Nelson, J. Staum. 2007. Simulation of coherent risk measures based on generalized scenarios. Management Science, 53(11), 1756-1769.

  • Mozgunov, P., T. Jaki. 2020. An information theoretic approach for selecting arms in clinical trials. Journal of the Royal Statistical Society Series B, 82(5), 1223-1247.

  • Qin, C., D. Klabjan, D. Russo. 2017. Improving the expected improvement algorithm. Proceedings of the 31st International Conference on Neural Information Processing Systems, 5387-5397.

  • Rényi, A. 1957. On the asymptotic distribution of the sum of a random number of independent random variables. Acta Mathematica Academiae Scientiarum Hungarica, 8(1-2), 193-199.

  • Richter, W. 1965. Limit theorems for sequences of random variables with sequences of random indeces. Theory of Probability & Its Applications, 10(1), 74-84.

  • Robertson, D. S., J. M. S. Wason. 2019. Familywise error control in multi-armed response-adaptive trials. Biometrics, 75, 885-894.

  • Russo, D. 2020. Simple bayesian algorithms for best arm identification. Operations Research, 68(6), 1625-1647.

  • Shamir, O. 2011. A variant of Azuma’s inequality for martingales with subgaussian tails. arXiv preprint arXiv:1110.2392.

  • Smith, J. E., R. L. Windler. 2006. The optimizer’s curse: Skepticism and postdecision surprise in decision analysis. Management Science, 52(3), 311-322.

  • van Hasselt, H. 2010. Double q𝑞qitalic_q-learning. Advances in Neural Information Processing Systems (NIPS), 2613-2621.

  • van Hasselt, H. 2013. Estimating the maximum expected value: an analysis of (nested) cross validation and the maximum sample average. Available at arXiv preprint arXiv:1302.7175.

  • van Hasselt, H.,, A. Guez, D. Silver. 2016. Deep reinforcement learning with double Q-learning. Proceedings of the AAAI Conference on Artificial Intelligence, 30, 2094-2100.

  • Villar, S. S., J. Bowden, J. Wason. 2015. Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges. Statistical Science, 30(2), 199.

  • Wainwright, M. J. 2019. High-dimensional Statistics: A Non-asymptotic Viewpoint (Vol. 48). Cambridge University Press.