100% found this document useful (14 votes)

267 views16 pages

Big and Complex Data Analysis Methodologies and Applications Unlimited Ebook Download

veterinary

Uploaded by

kan.hquoan.gnguangngay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (14 votes)

267 views16 pages

Big and Complex Data Analysis Methodologies and Applications Unlimited Ebook Download

veterinary

Uploaded by

kan.hquoan.gnguangngay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Big and Complex Data Analysis Methodologies and

Applications

Visit the link below to download the full version of this book:

https://medipdf.com/product/big-and-complex-data-analysis-methodologies-and-appl
ications/

Click Download Now

Preface

This book comprises a collection of research contributions toward high-dimensional

data analysis. In this data-centric world, we are often challenged with data sets
containing many predictors in the model at hand. In a host of situations, the number
of predictors may very well exceed the sample size. Truly, many modern scientific
investigations require the analysis of such data. There are a host of buzzwords
in today’s data-centric world, especially in digital and print media. We encounter
data in every walk of life, and for analytically and objectively minded people,
data is everything. However, making sense of the data and extracting meaningful
information from it may not be an easy task. Sometimes, we come across buzzwords
such as big data, high-dimensional data, data visualization, data science, and open
data without a proper definition of such words. The rapid growth in the size and
scope of data sets in a host of disciplines has created a need for innovative statistical
and computational strategies for analyzing such data. A variety of statistical and
computational tools are needed to deal with such type of data and to reveal the data
story.
This book focuses on variable selection, parameters estimation, and prediction
based on high-dimensional data (HDD). In classical regression context, we define
HDD where a number of predictors (d/ are larger than the sample size (n/. There
are situations when the number of predictors is in millions and sample size maybe
in hundreds. The modeling of HDD, where the sample size is much smaller than
the size of the data element associated with each observation, is an important
feature in a host of research fields such as social media, bioinformatics, medical,
environmental, engineering, and financial studies, among others. A number of the
classical techniques are available when d < n to tell the data story. However, the
existing classical strategies are not capable of yielding solutions for HDD. On the
other hand, the term “big data” is not very well defined, but its problems are real
and statisticians need to play a vital role in this data world. Generally speaking,
big data relates when data is very large and may not even be stored at one place.
However, the relationship between n and d may not be as crucial when comparing
with HDD. Further, in some cases, users are not able to make the distinction between
population and sampled data when dealing with big data. In any event, the big data

v
vi Preface

or data science is an emerging field stemming equally from research enterprise

and public and private sectors. Undoubtedly, big data is the future of research in
a host of research fields, and transdisciplinary programs are required to develop the
skills for data scientists. For example, many private and public agencies are using
sophisticated number-crunching, data mining, or big data analytics to reveal patterns
based on collected information. Clearly, there is an increasing demand for efficient
prediction strategies for analyzing such data. Some examples of big data that have
prompted demand are gene expression arrays; social network modeling; clinical,
genetics, and phenotypic spatiotemporal data; and many others.
In the context of regression models, due to the trade-off between model pre-
diction and model complexity, the model selection is an extremely important and
challenging problem in the big data arena. Over the past two decades, many penal-
ized regularization approaches have been developed to perform variable selection
and estimation simultaneously. This book makes a seminal contribution in the arena
of big data analysis including HDD. For a smooth reading and understanding of the
contributions made in this book, it is divided in three parts as follows:
General High-dimensional theory and methods (chapters “Regularization
After Marginal Learning for Ultra-High Dimensional Regression Models”–
“Bias-Reduced Moment Estimators of Population Spectral Distribution and Their
Applications”)
Network analysis and big data (chapters “Statistical Process Control Charts
as a Tool for Analyzing Big Data”–“Nonparametric Testing for Heterogeneous
Correlation”)
Statistics learning and applications (chapters “Optimal Shrinkage Estimation
in Heteroscedastic Hierarchical Linear Models”–“A Mixture of Variance-Gamma
Factor Analyzers”)
We anticipate that the chapters published in this book will represent a meaningful
contribution to the development of new ideas in big data analysis and will
showcase interesting applications. In a sense, each chapter is self-contained. A brief
description of the contents of each of the eighteen chapters in this book is provided.
Chapter “Regularization After Marginal Learning for Ultra-High Dimensional
Regression Models” (Feng) introduces a general framework for variable selection
in ultrahigh-dimensional regression models. By combining the idea of marginal
screening and retention, the framework can achieve sign consistency and is
extremely fast to implement.
In chapter “Empirical Likelihood Test for High Dimensional Generalized Lin-
ear Models” (Zang et al.), the estimation and model selection aspects of high-
dimensional data analysis are considered. It focuses on the inference aspect, which
can provide complementary insights to the estimation studies, and has at least two
notable contributions. The first is the investigation of both full and partial tests,
and the second is the utilization of the empirical likelihood technique under high-
dimensional settings.
Abstract random projections are frequently used for dimension reduction in
many areas of machine learning as they enable us to do computations on a
more succinct representation of the data. Random projections can be applied row-
Preface vii

and column-wise to the data, compressing samples and compressing features,

respectively. Chapter “Random Projections For Large-Scale Regression” (Thanei
et al.) discusses the properties of the latter column-wise compression, which turn
out to be very similar to the properties of ridge regression. It is pointed out that
further improvements in accuracy can be achieved by averaging over least squares
estimates generated by independent random projections.
Testing a hypothesis subsequent to model selection leads to test problems
in which nuisance parameters are present. Chapter “Testing in the Presence of
Nuisance Parameters: Some Comments on Tests Post-Model-Selection and Random
Critical Values” (Leeb and Pötscher) reviews and critically evaluates proposals that
have been suggested in the literature to deal with such problems. In particular,
the chapter reviews a procedure based on the worst-case critical value, a more
sophisticated proposal based on earlier work, and recent proposals from the econo-
metrics literature. It is furthermore discussed why intuitively appealing proposals,
for example, a parametric bootstrap procedure, as well as another recently suggested
procedure, do not lead to valid tests, not even asymptotically.
As opposed to extensive research of covariate measurement error, error in
response has received much less attention. In particular, systematic studies on
general clustered/longitudinal data with response error do not seem to be available.
Chapter “Analysis of Correlated Data with Error-Prone Response Under Gener-
alized Linear Mixed Models” (Yi et al.) considers this important problem and
investigates the asymptotic bias induced by the error in response. Valid inference
procedures are developed to account for response error effects under different
situations, and asymptotic results are appropriately established.
Statistical inference on large covariance matrices has become a fast growing
research area due to the wide availability of high-dimensional data, and spec-
tral distributions of large covariance matrices play an important role. Chapter
“Bias-Reduced Moment Estimators of Population Spectral Distribution and Their
Applications” (Qin and Li) derives bias-reduced moment estimators for the popula-
tion spectral distribution of large covariance matrices and presents consistency and
asymptotic normality of these estimators.
Big data often take the form of data streams with observations of a related
process being collected sequentially over time. Statistical process control (SPC)
charts provide a major statistical tool for monitoring the longitudinal performance of
the process by online detecting any distributional changes in the sequential process
observations. So, SPC charts could be a major statistical tool for analyzing big data.
Chapter “Statistical Process Control Charts as a Tool for Analyzing Big Data” (Qiu)
introduces some basic SPC concepts and methods and demonstrates the use of SPC
charts for analyzing certain real big data sets. This chapter also describes some
recent SPC methodologies that have a great potential for handling different big data
applications. These methods include disease dynamic screening system and some
recent profile monitoring methods for online monitoring of profile/image data that
is commonly used in modern manufacturing industries.
Chapter “Fast Community Detection in Complex Networks with a K-Depths
Classifier” (Tian and Gel) introduces a notion of data depth for recovery of
viii Preface

community structures in large complex networks. The authors propose a new

data-driven algorithm, K-depths, for community detection using the L1 depth
in an unsupervised setting. Further, they evaluate finite sample properties of
the K-depths method using synthetic networks and illustrate its performance for
tracking communities in online social media platform Flickr. The new method
significantly outperforms the classical K-means and yields comparable results to
the regularized K-means. Being robust to low-degree vertices, the new K-depths
method is computationally efficient, requiring up to 400 times less CPU time than
the currently adopted regularization procedures based on optimizing the Davis-
Kahan bound.
Chapter “How Different are Estimated Genetic Networks of Cancer Subtypes?”
(Shojaie and Sedaghat) presents a comprehensive comparison of estimated networks
of cancer subtypes. Specifically, the networks estimated using six estimation
methods were compared based on various network descriptors characterizing both
local network structures, that is, edges, and global properties, such as energy
and symmetry. This investigation revealed two particularly interesting properties
of estimated gene networks across different cancer subtypes. First, the estimates
from the six network reconstruction methods can be grouped into two seemingly
unrelated clusters, with clusters that include methods based on linear and nonlinear
associations, as well as methods based on marginal and conditional associations.
Further, while the local structures of estimated networks are significantly different
across cancer subtypes, global properties of estimated networks are less distinct.
These findings can guide future research in computational and statistical methods
for differential network analysis.
Statistical analysis of big clustered time-to-event data presents daunting sta-
tistical challenges as well as exciting opportunities. One of the challenges in
working with big biomedical data is detecting the associations between disease
outcomes and risk factors that involve complex functional forms. Many existing
statistical methods fail in large-scale settings because of lack of computational
power, as, for example, the computation and inversion of the Hessian matrix of
the log-partial likelihood is very expensive and may exceed computation memory.
Chapter “A Computationally Efficient Approach for Modeling Complex and Big
Survival Data” (He et al.) handles problems with a large number of parameters
and propose a novel algorithm, which combines the strength of quasi-Newton,
MM algorithm, and coordinate descent. The proposed algorithm improves upon
the traditional semiparametric frailty models in several aspects. For instance, the
proposed algorithms avoid calculation of high-dimensional second derivatives of the
log-partial likelihood and, hence, are competitive in term of computation speed and
memory usage. Simplicity is obtained by separating the variables of the optimization
problem. The proposed methods also provide a useful tool for modeling complex
data structures such as time-varying effects.
Asymptotic inference for the concentration of directional data has attracted
much attention in the past decades. Most of the asymptotic results related to
concentration parameters have been obtained in the traditional large sample size
and fixed dimension case. Chapter “Tests of Concentration for Low-Dimensional
Preface ix

and High-Dimensional Directional Data” (Cutting et al.) considers the extension of

existing testing procedures for concentration to the large n and large d case. In this
high-dimensional setup, the authors provide tests that remain valid in the sense that
they reach the correct asymptotic level within the class of rotationally symmetric
distributions.
“Nonparametric testing for heterogeneous correlation” covers the big data
problem of determining whether a weak overall monotone association between two
variables persists throughout the population or is driven by a strong association
that is limited to a subpopulation. The idea of homogeneous association rests
on the underlying copula of the distribution. In chapter “Nonparametric Testing
for Heterogeneous Correlation” (Bamattre et al.), two copulas are considered,
the Gaussian and the Frank, under which components of two respective ranking
measures, Spearman’s footrule and Kendall’s tau, are shown to have tractable
distributions that lead to practical tests.
Shrinkage estimators have profound impacts in statistics and in scientific and
engineering applications. Chapter “Optimal Shrinkage Estimation in Heteroscedas-
tic Hierarchical Linear Models” (Kou and Yang) considers shrinkage estimation
in the presence of linear predictors. Two heteroscedastic hierarchical regression
models are formulated, and the study of optimal shrinkage estimators in each
model is thoroughly presented. A class of shrinkage estimators, both parametric
and semiparametric, based on unbiased risk estimate is proposed and is shown
to be (asymptotically) optimal under mean squared error loss in each model. A
simulation study is conducted to compare the performance of the proposed methods
with existing shrinkage estimators. The authors also apply the method to real data
and obtain encouraging and interesting results.
Chapter “High Dimensional Data Analysis: Integrating Submodels” (Ahmed
and Yuzbasi) considers efficient prediction strategies in sparse high-dimensional
model. In high-dimensional data settings, many penalized regularization strategies
are suggested for simultaneous variable selection and estimation. However, different
strategies yield a different submodel with different predictors and number of
predictors. Some procedures may select a submodel with a relatively larger number
of predictors than others. Due to the trade-off between model complexity and
model prediction accuracy, the statistical inference of model selection is extremely
important and a challenging problem in high-dimensional data analysis. For this
reason, we suggest shrinkage and pretest post estimation strategies to improve the
prediction performance of two selected submodels. Such a pretest and shrinkage
strategy is constructed by shrinking an overfitted model estimator in the direction
of an underfitted model estimator. The numerical studies indicate that post selection
pretest and shrinkage strategies improved the prediction performance of selected
submodels. This chapter reveals many interesting results and opens doors for further
research in a host of research investigations.
Chapter “High-Dimensional Classification for Brain Decoding” (Croteau et al.)
discusses high-dimensional classification within the context of brain decoding
where spatiotemporal neuroimaging data are used to decode latent cognitive states.
The authors discuss several approaches for feature selection including persistent
x Preface

homology, robust functional principal components analysis, and mutual information

networks. These features are incorporated into a multinomial logistic classifier, and
model estimation is based on penalized likelihood using the elastic net penalty.
The approaches are illustrated in an application where the task is to infer, from
brain activity measured with magnetoencephalography (MEG), the type of video
stimulus shown to a subject.
Principal components analysis is a widely used technique for dimension reduc-
tion and characterization of variability in multivariate populations. In chapter
“Unsupervised Bump Hunting Using Principal Components” (A. D’{az-Pach’on et
al.), the authors interest lies in studying when and why the rotation to principal
components can be used effectively within a response-predictor set relationship in
the context of mode hunting. Specifically focusing on the Patient Rule Induction
Method (PRIM), the authors first develop a fast version of this algorithm (fastPRIM)
under normality which facilitates the theoretical studies to follow. Using basic geo-
metrical arguments, they then demonstrate how the principal components rotation of
the predictor space alone can in fact generate improved mode estimators. Simulation
results are used to illustrate findings.
The analysis of high-dimensional data is challenging in multiple aspects. One
aspect is interaction analysis, which is critical in biomedical and other studies.
Chapter “Identifying Gene-Environment Interactions Associated with Prognosis
Using Penalized Quantile Regression” (Wang et al.) studies high-dimensional
interactions using a robust approach. The effectiveness demonstrated in this study
opens doors for other robust methods under high-dimensional settings. This study
will also be practically useful by introducing a new way of analyzing genetic data.
In chapter “A Mixture of Variance-Gamma Factor Analyzers” (McNicholas et
al.), a mixture modeling approach for clustering high-dimensional data is developed.
This approach is based on a mixture of variance-gamma distributions, which
is interesting because the variance-gamma distribution has been underutilized in
multivariate statistics—certainly, it has received far less attention than the skew-t
distribution, which also parameterizes location, scale, concentration, and skewness.
Clustering is carried out using a mixture of variance-gamma factor analyzers
(MVGFA) model, which is an extension of the well-known mixture of factor
analyzers model that can accommodate clusters that are asymmetric and/or heavy
tailed. The formulation of the variance-gamma distribution used can be represented
as a normal mean variance mixture, a fact that is exploited in the development of
the associated factor analyzers.
In summary, several directions for innovative research in big data analysis were
highlighted in this book. I remain confident that this book conveys some of the
surprises, puzzles, and success stories in the arena of big data analysis. The research
in this arena is ongoing for a foreseeable future.
As an ending thought, I would like to thank all the authors who submitted their
papers for possible publication in this book as well as all the reviewers for their
valuable input and constructive comments on all submitted manuscripts. I would like
to express my special thanks to Veronika Rosteck at Springer for the encouragement
and generous support on this project and helping me to arrive at the finishing line.
Preface xi

My special thanks go to Ulrike Stricker-Komba at Springer for outstanding technical

support for the production of this book. Last but not least, I am thankful to my family
for their support for the completion of this book.

Niagara-On-The-Lake, Ontario, Canada S. Ejaz Ahmed

August 2016
Contents

Part I General High-Dimensional Theory and Methods

Regularization After Marginal Learning for Ultra-High
Dimensional Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3
Yang Feng and Mengjia Yu
Empirical Likelihood Test for High Dimensional Generalized
Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 29
Yangguang Zang, Qingzhao Zhang, Sanguo Zhang, Qizhai Li,
and Shuangge Ma
Random Projections for Large-Scale Regression . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 51
Gian-Andrea Thanei, Christina Heinze, and Nicolai Meinshausen
Testing in the Presence of Nuisance Parameters: Some
Comments on Tests Post-Model-Selection and Random Critical Values. . . 69
Hannes Leeb and Benedikt M. Pötscher
Analysis of Correlated Data with Error-Prone Response Under
Generalized Linear Mixed Models . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 83
Grace Y. Yi, Zhijian Chen, and Changbao Wu
Bias-Reduced Moment Estimators of Population Spectral
Distribution and Their Applications . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 103
Yingli Qin and Weiming Li

Part II Network Analysis and Big Data

Statistical Process Control Charts as a Tool for Analyzing Big Data.. . . . . . 123
Peihua Qiu
Fast Community Detection in Complex Networks
with a K-Depths Classifier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 139
Yahui Tian and Yulia R. Gel

xiii
xiv Contents

How Different Are Estimated Genetic Networks of Cancer Subtypes? .. . . 159

Ali Shojaie and Nafiseh Sedaghat
A Computationally Efficient Approach for Modeling Complex
and Big Survival Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 193
Kevin He, Yanming Li, Qingyi Wei, and Yi Li
Tests of Concentration for Low-Dimensional
and High-Dimensional Directional Data.. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 209
Christine Cutting, Davy Paindaveine, and Thomas Verdebout
Nonparametric Testing for Heterogeneous Correlation .. . . . . . . . . . . . . . . . . . . . 229
Stephen Bamattre, Rex Hu, and Joseph S. Verducci

Part III Statistics Learning and Applications

Optimal Shrinkage Estimation in Heteroscedastic Hierarchical
Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 249
S.C. Kou and Justin J. Yang
High Dimensional Data Analysis: Integrating Submodels . . . . . . . . . . . . . . . . . . 285
Syed Ejaz Ahmed and Bahadır Yüzbaşı
High-Dimensional Classification for Brain Decoding . . . .. . . . . . . . . . . . . . . . . . . . 305
Nicole Croteau, Farouk S. Nathoo, Jiguo Cao, and Ryan Budney
Unsupervised Bump Hunting Using Principal Components . . . . . . . . . . . . . . . . 325
Daniel A. Díaz-Pachón, Jean-Eudes Dazard, and J. Sunil Rao
Identifying Gene–Environment Interactions Associated with
Prognosis Using Penalized Quantile Regression .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . 347
Guohua Wang, Yinjun Zhao, Qingzhao Zhang, Yangguang Zang,
Sanguo Zang, and Shuangge Ma
A Mixture of Variance-Gamma Factor Analyzers. . . . . . . .. . . . . . . . . . . . . . . . . . . . 369
Sharon M. McNicholas, Paul D. McNicholas, and Ryan P. Browne
Part I
General High-Dimensional Theory
and Methods
Regularization After Marginal Learning
for Ultra-High Dimensional Regression Models

Yang Feng and Mengjia Yu

Abstract Regularization is a popular variable selection technique for high dimen-

sional regression models. However, under the ultra-high dimensional setting, a
direct application of the regularization methods tends to fail in terms of model
selection consistency due to the possible spurious correlations among predictors.
Motivated by the ideas of screening (Fan and Lv, J R Stat Soc Ser B Stat Methodol
70:849–911, 2008) and retention (Weng et al, Manuscript, 2013), we propose a
new two-step framework for variable selection, where in the first step, marginal
learning techniques are utilized to partition variables into different categories, and
the regularization methods can be applied afterwards. The technical conditions of
model selection consistency for this broad framework relax those for the one-step
regularization methods. Extensive simulations show the competitive performance of
the new method.

Keywords Independence screening • Lasso • Marginal learning • Retention •

Selection • Sign consistency

1 Introduction

With the booming of information and vast improvement for computation speed,
we are able to collect large amount of data in terms of a large collections of n
observations and p predictors, where p n. Recently, model selection gains
increasing attention especially for ultra-high dimensional regression problems.
Theoretically, the accuracy and interpretability of selected model are crucial in
variable selection. Practically, algorithm feasibility and efficiency are vital in
applications.
A great variety of penalized methods have been proposed in recent years. The
regularization techniques for simultaneous variable selection and estimation are
particularly useful to obtain sparse models compared to simply apply traditional
criteria such as Akaike’s information criterion [1] and Bayesian information

Y. Feng () • M. Yu
Department of Statistics, Columbia University, New York, NY 10027, USA
e-mail: [email protected]

© Springer International Publishing AG 2017 3

S.E. Ahmed (ed.), Big and Complex Data Analysis, Contributions to Statistics,
DOI 10.1007/978-3-319-41573-4_1
4 Y. Feng and M. Yu

criterion [18]. The least absolute shrinkage and selection operator (Lasso) [19] have
been widely used as the l1 penalty shrinks most coefficients to 0 and fulfills the
task of variable selection. Many other regularization methods have been developed;
including bridge regression [13], the smoothly clipped absolute deviation method
[5], the elastic net [26], adaptive Lasso [25], LAMP [11], among others. Asymptotic
analysis for the sign consistency in model selection [20, 24] has been introduced
to provide theoretical support for various methods. Some other results such as
parameter estimation [17], prediction [15], and oracle properties [5] have been
introduced under different model contexts.
However, in ultra-high dimensional space where the dimension p D exp.na /
(where a > 0), the conditions for sign consistency are easily violated as a con-
sequence of large correlations among variables. To deal with such challenges, Fan
and Lv [6] proposed the sure independence screening (SIS) method which is based
on correlation learning to screen out irrelevant variables efficiently. Further analysis
and generalization can be found in Fan and Song [7] and Fan et al. [8]. From the
idea of retaining important variables rather than screening out irrelevant variables,
Weng et al. [21] proposed the regularization after retention (RAR) method. The
major differences between SIS and RAR can be summarized as follows. SIS makes
use of marginal correlations between variables and response to screen noises out,
while RAR tries to retain signals after acquiring these coefficients. Both of them
relax the irrepresentable-type conditions [20] and achieve sign consistency.
In this paper, we would like to introduce a general multi-step estimation
framework that integrates the idea of screening and retention in the first step to learn
the importance of the features using the marginal information during the first step,
and then impose regularization using corresponding weights. The main contribution
of the paper is two-fold. First, the new framework is able to utilize the marginal
information adaptively in two different directions, which will relax the conditions
for sign consistency. Second, the idea of the framework is very general and covers
the one-step regularization methods, the regularization after screening method, and
the regularization after retention method as special cases.
The rest of this paper is organized as follows. In Sect. 2, we introduce the model
setup and the relevant techniques. The new variable selection framework is elabo-
rated in Sect. 3 with connections to existing methods explained. Section 4 develops
the sign consistency result for the proposed estimators. Extensive simulations are
conducted in Sect. 5 to compare the performance of the new method with the
existing approaches. We conclude with a short discussion in Sect. 6. All the technical
proofs are relegated to the appendix.
Regularization After Marginal Learning for Ultra-High Dimensional Regression Models 5

2 Model Setup and Several Methods in Variable Selection

2.1 Model Setup and Notations

Let .Xi ; Yi / be i.i.d. random pairs following the linear regression model:

Yi D Xi ˇ C "i ; i D 1; : : : ; n;

where Xi D .Xi1 ; : : : ; Xi /T is pn -dimensional vector distributed as N.0; †/, ˇ D

i:i:d:
.ˇ1 ; : : : ; ˇp /T is the true coefficient vector, "1 ; : : : ; "n N.0; 2 /; and fXi gniD1 are
independent of f"i gniD1 . Note here, we sometimes use pn to emphasize the dimension
p is diverging with the sample size n. Denote the support index set of ˇ by S D f j W
ˇj ¤ 0g and the cardinality of S by sn , and †Sc jS D †Sc Sc †Sc S .†SS /1 †SSc : Both
pn and sn are allowed to increase as n increases. For conciseness, we sometimes use
signals and noises to represent relevant predictors S and irrelevant predictors Sc (or
their corresponding coefficients) respectively.
For any set A, let Ac be its complement set. For any k dimensional vector w
and any subset P K f1; : : : ; kg, wPK denotes the subvector of w indexed by K, and
let kwk1 D kiD1 jwi j; kwk2 D . kiD1 w2i /1=2 ; kwk1 D maxiD1;:::;k jwi j: For any
k1 k2 matrix M, any subsets K1 f1; : : : ; k1 g, K2 f1; : : : ; k2 g, MK1 K2 represents
the submatrix of M consisting of entries indexed by the Cartesian product K1 K2 .
Let MK2 be the columns of M indexed by K2 and M j be the j-th Pk column of M.
Denote kMk2 D fƒmax .M T M/g1=2 and kMk1 D maxiD1;:::;k jD1 jMij j: When
k1 D k2 D k, let .M/ D maxiD1;:::;k Mii , ƒmin .M/ and ƒmax .M/ be the minimum
and maximum eigenvalues of M, respectively.

2.2 Regularization Techniques

The Lasso [19] defined as

( )
X
n X
pn
1
ˇO D arg min .2n/ .Yi XiT ˇ/2 C n jˇj j ; n 0 (1)
ˇ
iD1 jD1

is a popular variable selection method. Thanks to the invention of efficient algo-

rithms including LARS [4] and the coordinate descent algorithm [14], Lasso and its
variants are applied to a wide range of different scenarios in this big data era. There
is a large amount of research related to the theoretical properties of Lasso. Zhao and
Yu [24] proposed almost necessary and sufficient conditions for the sign consistency
for Lasso to select true model in the large pn setting as n increases. Considering the
sensitivity of tuning parameter n and consistency for model selection, Wainwright
6 Y. Feng and M. Yu

[20] has identified precise conditions of achieving sparsity recovery with a family
of regularization parameters n under deterministic design.
Another effective approach to the penalization problem is adaptive Lasso
(AdaLasso) [25], which uses an adaptively weighted l1 -penalty term, defined as
( )
X
n X
pn
1
ˇO D arg min .2n/ .Yi XiT ˇ/2 C n !j jˇj j ; n 0: (2)
ˇ
iD1 jD1

where !j D 1=jˇOinit j for some 0, in which ˇOinit is some initial estimator.

When signals are weakly correlated to noises, Huang et al. [16] proved AdaLasso
is sign consistent with !j D 1=jˇOjM j 1=j.XQ j /T Yj, where XQ is the centered
and scaled data matrix. One potential issue of this weighting choice is that when
the correlations between some signals and response are too small, those signals
would be severely penalized and may be estimated as noises. We will use numeric
examples to demonstrate this point in the simulation section.

2.3 Sure Independence Screening

To reduce dimension from ultra-high to a moderate level, Fan and Lv [6] proposed
a sure independence screening (SIS) method, which makes use of marginal correla-
tions as a measure of importance in first step and then utilizes other operators such
as Lasso to fulfill the target of variable selection. In particular, first we calculate
the component-wise regression coefficients for each variable, i.e., ˇOjM D .XQ j /T Y,
Q
j D 1; : : : ; pn , where XQ j is the standardized j-th column of data X and YQ is the
standardized response. Second, we define a sub-model with respect to the largest
coefficients

M D f1 j pn W jˇOjM j is among the first b nc of allg:

Predictors that are not in M are regarded as noise and therefore discarded for
further analysis. SIS reduces the number of candidate covariates to a moderate level
for the subsequent analysis. Combining SIS and Lasso, Fan and Lv [6] introduced
SIS-Lasso estimator,
( )
Xn X
1 2
ˇO D arg min .2n/ .Yi X ˇ/ C n
T
jˇj j
i
ˇ2M
iD1 j2M
( )
X
n X X
1 2
D arg min .2n/ .Yi Xi ˇ/ C n
T
jˇj j C 1 jˇj j : (3)
ˇ
iD1 j2M j2Mc
Regularization After Marginal Learning for Ultra-High Dimensional Regression Models 7

Clearly, should be chosen carefully to avoid screening out signals. To deal with
the issue that signals may be marginally uncorrelated with the response in some
cases, iterative-SIS was introduced [6] as a practical procedure but without rigorous
theoretical support for the sign consistency. As a result, solely relying on marginal
information is sometimes a bit too risky, or greedy, for model selection purpose.

3 Regularization After Marginal Learning

3.1 Algorithm

From Sect. 2, one potential drawback shared between AdaLasso and SIS-Lasso is
that they may miss important covariates that are marginally weakly correlated with
the response.
Now, we introduce a new algorithm, regularization after marginal (RAM)
learning, to solve the issue. It utilizes marginal correlation to divide all variables
into three candidate sets: a retention set, a noise set, and an undetermined set. Then
regularization is imposed to find signals in the uncertainty set as well as to identify
falsely retention signals and falsely screened noises.
A detailed description of the algorithm is as follows:
Step 0 (Marginal Learning) Calculate the marginal regression coefficients after
standardizing each predictor, i.e.,

X
n j
.Xi XN j /
ˇOjM D Yi ; 1 j pn ; (4)
iD1
O j
q Pn
1 Pn j Nj 2
iD1 .Xi X /
O j2
j
where XN j D n iD1 Xi and n1
. D
Define a retention set by R D f1 j p W jˇOjM j n g, for a positive constant
O
n ; a noise set by NO D f1 j p W jˇOjM j Qn g, for a positive constant Qn < n ;
and an undetermined set by UO D .R O [ NO /c .

Step 1 (Regularization After Screening Noises Out) Search for signals in UO by

solving
( n
)
X X X 2 X
1
ˇOR;
O UO1 D arg min .2n/ Yi Xij ˇj Xik ˇk C n jˇj j ; (5)
ˇNO D0 iD1 j2UO O
k2R j2UO

where the index UO1 is denoted as the set of variables that are estimated as signals
in U, O ˇO O O /j ¤ 0g. After Step 1, the selected variable set is
O namely UO1 D f j 2 Uj.
R;U1
RO [ UO1 .

Healing The Heart of Trauma and Dissociation With EMDR and Ego State Therapy - 1st Edition Best Quality Download
100% (20)
Healing The Heart of Trauma and Dissociation With EMDR and Ego State Therapy - 1st Edition Best Quality Download
16 pages
Forgiveness The Greatest Healer of All Best Quality Download
100% (19)
Forgiveness The Greatest Healer of All Best Quality Download
15 pages
The Complete Family Guide To Schizophrenia Helping Your Loved One Get The Most Out of Life Updated Edition Download
100% (19)
The Complete Family Guide To Schizophrenia Helping Your Loved One Get The Most Out of Life Updated Edition Download
17 pages
Forensic Dentistry, 2nd Edition Updated Edition Download
100% (10)
Forensic Dentistry, 2nd Edition Updated Edition Download
15 pages
Cochrane Handbook For Systematic Reviews of Interventions, 2nd Edition Premium Ebook Download
100% (16)
Cochrane Handbook For Systematic Reviews of Interventions, 2nd Edition Premium Ebook Download
17 pages
Limited Get Geometric Data Analysis From Correspondence Analysis To Structured Data Analysis Accessible DOCX Download
100% (14)
Limited Get Geometric Data Analysis From Correspondence Analysis To Structured Data Analysis Accessible DOCX Download
19 pages
Health Impact Assessment Past Achievement, Current Understanding, and Future Progress All Chapter
100% (12)
Health Impact Assessment Past Achievement, Current Understanding, and Future Progress All Chapter
15 pages
The Pediatric and Perinatal Autopsy Manual Google Drive Download
100% (11)
The Pediatric and Perinatal Autopsy Manual Google Drive Download
17 pages
British Social Welfare - 1st Edition Reference Book Download
100% (8)
British Social Welfare - 1st Edition Reference Book Download
17 pages
Complications in Endovascular Therapy 1st Edition PDF
100% (9)
Complications in Endovascular Therapy 1st Edition PDF
16 pages
Handbook of Measurement Issues in Family Research 1st Edition All Sections Download
100% (8)
Handbook of Measurement Issues in Family Research 1st Edition All Sections Download
14 pages
Limited Get The Political Psyche, 1st Edition Full Download
100% (15)
Limited Get The Political Psyche, 1st Edition Full Download
24 pages
New Strategies To Advance Pre/Diabetes Care Integrative Approach by PPPM All-in-One Download
100% (10)
New Strategies To Advance Pre/Diabetes Care Integrative Approach by PPPM All-in-One Download
15 pages
Breakthrough Read Annie's Ghosts A Journey Into A Family Secret (FULL VERSION DOWNLOAD)
100% (11)
Breakthrough Read Annie's Ghosts A Journey Into A Family Secret (FULL VERSION DOWNLOAD)
18 pages
Protocols For High Risk Pregnancies An Evidence Based Approach, 7th Edition An Evidence Based Approach - 7th Edition Google Drive Download
100% (20)
Protocols For High Risk Pregnancies An Evidence Based Approach, 7th Edition An Evidence Based Approach - 7th Edition Google Drive Download
15 pages
Appetite and Its Discontents Science, Medicine, and The Urge To Eat, 1750 1950 Verified Download
100% (10)
Appetite and Its Discontents Science, Medicine, and The Urge To Eat, 1750 1950 Verified Download
16 pages
Must Read Principles of Medical Imaging EPUB DOCX PDF Download
100% (9)
Must Read Principles of Medical Imaging EPUB DOCX PDF Download
14 pages
Bergen Belsen 1945 A Medical Student's Journal Complete EPUB Download
100% (9)
Bergen Belsen 1945 A Medical Student's Journal Complete EPUB Download
17 pages
Fluoroquinolone Associated Disability (FQAD) Pathogenesis, Diagnostics, Therapy and Diagnostic Criteria Side Effects of Fluoroquinolones
100% (13)
Fluoroquinolone Associated Disability (FQAD) Pathogenesis, Diagnostics, Therapy and Diagnostic Criteria Side Effects of Fluoroquinolones
14 pages
Moderating Severe Personality Disorders A Personalized Psychotherapy Approach - 1st Edition Full Text Download
100% (9)
Moderating Severe Personality Disorders A Personalized Psychotherapy Approach - 1st Edition Full Text Download
17 pages
Mathematical Approach To Multilevel, Multiscale Health Interventions, A Pharmaceutical Industry Decline and Policy Response Pharmaceutical Industry Decline and Policy Response Complete Volume Download
100% (11)
Mathematical Approach To Multilevel, Multiscale Health Interventions, A Pharmaceutical Industry Decline and Policy Response Pharmaceutical Industry Decline and Policy Response Complete Volume Download
16 pages
The Power of Identity Claims How We Value and Defend The Self, 1st Edition Digital EPUB Download
100% (8)
The Power of Identity Claims How We Value and Defend The Self, 1st Edition Digital EPUB Download
15 pages
Top Choice Lighter Create Lasting and Healthy Habits To Lose Weight & Keep It Off For Life Without The Struggle Study Guide Download
100% (10)
Top Choice Lighter Create Lasting and Healthy Habits To Lose Weight & Keep It Off For Life Without The Struggle Study Guide Download
18 pages
(Ebook PDF) Nanobiotechnology in Neurodegenerative Diseases Full Text Download
100% (16)
(Ebook PDF) Nanobiotechnology in Neurodegenerative Diseases Full Text Download
24 pages
Single Case Research Design and Analysis (Psychology Revivals) New Directions For Psychology and Education 1st Edition Full Version Download
100% (9)
Single Case Research Design and Analysis (Psychology Revivals) New Directions For Psychology and Education 1st Edition Full Version Download
14 pages
Superabsorbent Polymers Chemical Design, Processing and Applications - 1st Edition Authorized Download
100% (11)
Superabsorbent Polymers Chemical Design, Processing and Applications - 1st Edition Authorized Download
15 pages
Expert Pick The Night School Lessons in Moonlight, Magic, and The Mysteries of Being Human Complete EPUB Ebook
100% (14)
Expert Pick The Night School Lessons in Moonlight, Magic, and The Mysteries of Being Human Complete EPUB Ebook
17 pages
Conscious Will and Responsibility A Tribute To Benjamin Libet, 1st Edition High-Quality Download
100% (12)
Conscious Will and Responsibility A Tribute To Benjamin Libet, 1st Edition High-Quality Download
15 pages
Read Keep Out of Reach of Children Reye S Syndrome, Aspirin, and The Politics of Public Health Complete Volume Download
100% (17)
Read Keep Out of Reach of Children Reye S Syndrome, Aspirin, and The Politics of Public Health Complete Volume Download
18 pages
Hematopoietic Cell Transplantation in Children With Cancer Complete Digital Book
100% (9)
Hematopoietic Cell Transplantation in Children With Cancer Complete Digital Book
14 pages
Ultrasound Guided Regional Anesthesia, 2nd Edition One-Click Ebook Download
100% (14)
Ultrasound Guided Regional Anesthesia, 2nd Edition One-Click Ebook Download
17 pages
Top Choice Sternal Puncture A Method of Clinical and Cytological Investigation, 2nd Edition Premium Download
100% (11)
Top Choice Sternal Puncture A Method of Clinical and Cytological Investigation, 2nd Edition Premium Download
16 pages
A320 Reset
100% (21)
A320 Reset
82 pages
Pediatric Gender Assignment A Critical Reappraisal 1st Edition Verified Download
100% (12)
Pediatric Gender Assignment A Critical Reappraisal 1st Edition Verified Download
15 pages
Skin Necrosis Secure Download
100% (12)
Skin Necrosis Secure Download
15 pages
Anxiety A Short History PDF
100% (14)
Anxiety A Short History PDF
16 pages
Clinical Trials of Genetic Therapy With Antisense DNA and DNA Vectors 1st Edition Exclusive Download
100% (9)
Clinical Trials of Genetic Therapy With Antisense DNA and DNA Vectors 1st Edition Exclusive Download
14 pages
Advanced Studies in Experimental and Clinical Medicine Modern Trends and Latest Approaches - 1st Edition High-Resolution PDF Download
100% (18)
Advanced Studies in Experimental and Clinical Medicine Modern Trends and Latest Approaches - 1st Edition High-Resolution PDF Download
17 pages
Exclusive Own Depression and Its Treatment Full Download
100% (15)
Exclusive Own Depression and Its Treatment Full Download
14 pages
The Phoenix Generation A New Era of Connection, Compassion, and Consciousness Reference Book Download
100% (8)
The Phoenix Generation A New Era of Connection, Compassion, and Consciousness Reference Book Download
14 pages
Heparin Induced Thrombocytopenia, Fifth Edition 5th Edition Unrestricted Download
100% (11)
Heparin Induced Thrombocytopenia, Fifth Edition 5th Edition Unrestricted Download
17 pages
Free Download Illusions The Adventures of A Reluctant Messiah FULL PDF DOCX DOWNLOAD
100% (19)
Free Download Illusions The Adventures of A Reluctant Messiah FULL PDF DOCX DOWNLOAD
21 pages
Targeting Protein Protein Interactions by Small Molecules Fast Download
100% (12)
Targeting Protein Protein Interactions by Small Molecules Fast Download
16 pages
Vascular Tumors and Developmental Malformations Pathogenic Mechanisms and Molecular Diagnosis Fast Ebook Download
100% (9)
Vascular Tumors and Developmental Malformations Pathogenic Mechanisms and Molecular Diagnosis Fast Ebook Download
15 pages
Curing Cancer With Carrots PDF Ebook - by Ann Cameron, Ralph Cole
94% (18)
Curing Cancer With Carrots PDF Ebook - by Ann Cameron, Ralph Cole
66 pages
The Amazing Way To Reverse Heart Disease Naturally Beyond The Hypertension Hype Why Drugs Are Not The Answer 2nd Edition
100% (17)
The Amazing Way To Reverse Heart Disease Naturally Beyond The Hypertension Hype Why Drugs Are Not The Answer 2nd Edition
14 pages
Dreaming The Myth Onwards New Directions in Jungian Therapy and Thought 1st Edition Final Version Download
100% (10)
Dreaming The Myth Onwards New Directions in Jungian Therapy and Thought 1st Edition Final Version Download
16 pages
Health Planning For Effective Management - 1st Edition Digital EPUB Download
100% (16)
Health Planning For Effective Management - 1st Edition Digital EPUB Download
14 pages
Loving, Supporting, and Caring For The Cancer Patient A Guide To Communication, Compassion, and Courage New Edition PDF
100% (9)
Loving, Supporting, and Caring For The Cancer Patient A Guide To Communication, Compassion, and Courage New Edition PDF
17 pages
Textbook of Pediatric Gastroenterology, Hepatology and Nutrition A Comprehensive Guide To Practice 2nd Edition Full Book Access
100% (11)
Textbook of Pediatric Gastroenterology, Hepatology and Nutrition A Comprehensive Guide To Practice 2nd Edition Full Book Access
14 pages
Big and Complex Data Analysis Methodologies and Applications Full Access Download
No ratings yet
Big and Complex Data Analysis Methodologies and Applications Full Access Download
17 pages
Comprehensive Toxicology 3rd Edition Complete DOCX Download
100% (16)
Comprehensive Toxicology 3rd Edition Complete DOCX Download
15 pages
Information Technology and Data in Healthcare Using and Understanding Data (HIMSS Book) Exclusive Download
100% (16)
Information Technology and Data in Healthcare Using and Understanding Data (HIMSS Book) Exclusive Download
14 pages
After Diagnosis Family Caregiving With Hospice Patients PDF DOCX Download
100% (14)
After Diagnosis Family Caregiving With Hospice Patients PDF DOCX Download
17 pages
Positive Images Gay Men and HIV/AIDS in The Culture of 'Post Crisis', 1st Edition Dropbox Download
100% (18)
Positive Images Gay Men and HIV/AIDS in The Culture of 'Post Crisis', 1st Edition Dropbox Download
15 pages
Fundamentals of Applied Multidimensional Scaling For Educational and Psychological Research Best Quality Download
100% (16)
Fundamentals of Applied Multidimensional Scaling For Educational and Psychological Research Best Quality Download
14 pages
Mental Health Care Issues in America (2 Volumes) An Encyclopedia (2 Volumes) 1st Edition Complete PDF Download
100% (8)
Mental Health Care Issues in America (2 Volumes) An Encyclopedia (2 Volumes) 1st Edition Complete PDF Download
14 pages
Chromatin and Chromatin Remodeling Enzymes Part C Multiformat Download
100% (19)
Chromatin and Chromatin Remodeling Enzymes Part C Multiformat Download
17 pages
ASNT Reference Manual Eddy Current
No ratings yet
ASNT Reference Manual Eddy Current
80 pages
Appreciating Asperger Syndrome Looking at The Upside With 300 Positive Points Digital PDF Download
100% (11)
Appreciating Asperger Syndrome Looking at The Upside With 300 Positive Points Digital PDF Download
14 pages
Nanoformulations in Human Health Challenges and Approaches, 1st Edition Digital PDF Download
100% (19)
Nanoformulations in Human Health Challenges and Approaches, 1st Edition Digital PDF Download
15 pages
Handbook of Self Knowledge Unrestricted Download
100% (14)
Handbook of Self Knowledge Unrestricted Download
16 pages
Mobilizing Mutations Human Genetics in The Age of Patient Advocacy Research PDF Download
100% (16)
Mobilizing Mutations Human Genetics in The Age of Patient Advocacy Research PDF Download
16 pages
(Ebook PDF) Surrounded by Liars How To Stop Half Truths, Deception, and Gaslighting From Ruining Your Life Full-Feature Download
100% (13)
(Ebook PDF) Surrounded by Liars How To Stop Half Truths, Deception, and Gaslighting From Ruining Your Life Full-Feature Download
14 pages
Teachings Temple 3
No ratings yet
Teachings Temple 3
479 pages
ASTM D5162-01 - Discontinuity (Holiday) Testing of Nonconductive Protective Coating On Metallic Substrates
No ratings yet
ASTM D5162-01 - Discontinuity (Holiday) Testing of Nonconductive Protective Coating On Metallic Substrates
4 pages
Interventional Cardiology, Second Edition, 2nd Edition Full Access Download
100% (8)
Interventional Cardiology, Second Edition, 2nd Edition Full Access Download
14 pages
Explore Breakup Bootcamp The Science of Rewiring Your Heart Full Digital Edition
100% (14)
Explore Breakup Bootcamp The Science of Rewiring Your Heart Full Digital Edition
16 pages
Psychosis in The Elderly, 1st Edition Complete PDF Download
100% (14)
Psychosis in The Elderly, 1st Edition Complete PDF Download
15 pages
Bioprocess Engineering An Introductory Engineering and Life Science Approach Total Access Ebook
100% (19)
Bioprocess Engineering An Introductory Engineering and Life Science Approach Total Access Ebook
14 pages
Secure Copy Overthrowing The Old Gods Aleister Crowley and The Book of The Law Multiformat Download
100% (8)
Secure Copy Overthrowing The Old Gods Aleister Crowley and The Book of The Law Multiformat Download
18 pages
Amphibious Excavator
No ratings yet
Amphibious Excavator
6 pages
MSC Physics Syllabus 2022 2023
No ratings yet
MSC Physics Syllabus 2022 2023
50 pages
BS EN 10277 5 2008 Bright Steel Products Steel For Quenching and Tempering Part 5 General
No ratings yet
BS EN 10277 5 2008 Bright Steel Products Steel For Quenching and Tempering Part 5 General
11 pages
Unit-1 - Machine Learning
No ratings yet
Unit-1 - Machine Learning
85 pages
Hot Pick The Eight Crystal Alliances The Influence of Stones On The Personality Instant Access
100% (11)
Hot Pick The Eight Crystal Alliances The Influence of Stones On The Personality Instant Access
20 pages
Nanomaterials For Electrochemical Sensing and Biosensing - 1st Edition Entire Volume Download
100% (10)
Nanomaterials For Electrochemical Sensing and Biosensing - 1st Edition Entire Volume Download
16 pages
Most Loved Neuroendoscopic Surgery 1st Edition Scribd Full Download
100% (18)
Most Loved Neuroendoscopic Surgery 1st Edition Scribd Full Download
22 pages
Answers To Finishing Materials Revision Questions
No ratings yet
Answers To Finishing Materials Revision Questions
4 pages
Full Download Synopsis of Spine Surgery, 3rd Edition Secure Ebook Download
100% (20)
Full Download Synopsis of Spine Surgery, 3rd Edition Secure Ebook Download
20 pages
Survival Analysis With Correlated Endpoints Joint Frailty Copula Models Complete DOCX Download
100% (13)
Survival Analysis With Correlated Endpoints Joint Frailty Copula Models Complete DOCX Download
15 pages
Hemomath The Mathematics of Blood Google Drive Download
100% (14)
Hemomath The Mathematics of Blood Google Drive Download
15 pages
Contemporary Socio Cultural and Political Perspectives in Thailand Dropbox Download
100% (18)
Contemporary Socio Cultural and Political Perspectives in Thailand Dropbox Download
16 pages
Full Download Everyday MAGIC The Joy of Not Being Everything and Still Being More Than Enough Full Book Download
100% (14)
Full Download Everyday MAGIC The Joy of Not Being Everything and Still Being More Than Enough Full Book Download
15 pages
Read Twelve Steps and Twelve Traditions The Twelve and Twelve Essential Alcoholics Anonymous Reading Official Ebook Release
100% (18)
Read Twelve Steps and Twelve Traditions The Twelve and Twelve Essential Alcoholics Anonymous Reading Official Ebook Release
14 pages
Illustrated Synopsis of Dermatology & Sexually Transmitted Diseases - 5th Edition High-Quality Ebook
100% (15)
Illustrated Synopsis of Dermatology & Sexually Transmitted Diseases - 5th Edition High-Quality Ebook
16 pages
Illustrated Synopsis of Dermatology & Sexually Transmitted Diseases, 4th Edition All Sections Download
100% (13)
Illustrated Synopsis of Dermatology & Sexually Transmitted Diseases, 4th Edition All Sections Download
14 pages
Cognizance On Fish 1
No ratings yet
Cognizance On Fish 1
12 pages
Popular Pick Custodians of Truth The Continuance of Rex Deus All Format Download
100% (18)
Popular Pick Custodians of Truth The Continuance of Rex Deus All Format Download
18 pages
Section 3 Well Performance Retesting
No ratings yet
Section 3 Well Performance Retesting
59 pages
Module 3 - Spring 2019 (Compatibility Mode) PDF
No ratings yet
Module 3 - Spring 2019 (Compatibility Mode) PDF
60 pages
FUJITSU SoftwareServerView Suite Remote Management
No ratings yet
FUJITSU SoftwareServerView Suite Remote Management
426 pages
Geometric Design of A Highway Using Autocad Civil 3D: Presenter Name
No ratings yet
Geometric Design of A Highway Using Autocad Civil 3D: Presenter Name
11 pages
Nomadic Matt's Guide To Road Tripping The United States
No ratings yet
Nomadic Matt's Guide To Road Tripping The United States
70 pages
Vestibular Rehab by Susan B OSullivan
No ratings yet
Vestibular Rehab by Susan B OSullivan
34 pages
Mock Test Key
No ratings yet
Mock Test Key
61 pages
Kinematics and Dynamics of Machines MECE 3270-Course Outline
No ratings yet
Kinematics and Dynamics of Machines MECE 3270-Course Outline
10 pages
Numerical Methods
No ratings yet
Numerical Methods
25 pages
RT9724GB GP
No ratings yet
RT9724GB GP
13 pages
Unit 4
No ratings yet
Unit 4
7 pages
Bhu 18
No ratings yet
Bhu 18
10 pages
Nuseed TechSheet Trifecta 2024-East
No ratings yet
Nuseed TechSheet Trifecta 2024-East
2 pages
Names of Member: Rev.C.O Fatoye Pastor Matthew O.Bamise
No ratings yet
Names of Member: Rev.C.O Fatoye Pastor Matthew O.Bamise
6 pages
Инструкция Panasonic KX-TCD150FXC (77 страницы)
No ratings yet
Инструкция Panasonic KX-TCD150FXC (77 страницы)
3 pages
Quiz3 Sol
No ratings yet
Quiz3 Sol
2 pages
Matter in Our Surrounding 1 Questions
No ratings yet
Matter in Our Surrounding 1 Questions
3 pages
For Fill Slope For Cut Slope
No ratings yet
For Fill Slope For Cut Slope
2 pages
CS 191x Courseware4
No ratings yet
CS 191x Courseware4
3 pages

Big and Complex Data Analysis Methodologies and Applications Unlimited Ebook Download

Uploaded by

Big and Complex Data Analysis Methodologies and Applications Unlimited Ebook Download

Uploaded by

Big and Complex Data Analysis Methodologies and

Click Download Now

This book comprises a collection of research contributions toward high-dimensional

or data science is an emerging field stemming equally from research enterprise

and column-wise to the data, compressing samples and compressing features,

community structures in large complex networks. The authors propose a new

and High-Dimensional Directional Data” (Cutting et al.) considers the extension of

homology, robust functional principal components analysis, and mutual information

My special thanks go to Ulrike Stricker-Komba at Springer for outstanding technical

Niagara-On-The-Lake, Ontario, Canada S. Ejaz Ahmed

Part I General High-Dimensional Theory and Methods

Part II Network Analysis and Big Data

How Different Are Estimated Genetic Networks of Cancer Subtypes? .. . . 159

Part III Statistics Learning and Applications

Yang Feng and Mengjia Yu

Abstract Regularization is a popular variable selection technique for high dimen-

Keywords Independence screening • Lasso • Marginal learning • Retention •

© Springer International Publishing AG 2017 3

2 Model Setup and Several Methods in Variable Selection

2.1 Model Setup and Notations

where Xi D .Xi1 ; : : : ; Xi /T is pn -dimensional vector distributed as N.0; †/, ˇ D

2.2 Regularization Techniques

The Lasso [19] defined as

is a popular variable selection method. Thanks to the invention of efficient algo-

where !j D 1=jˇOinit j for some   0, in which ˇOinit is some initial estimator.

2.3 Sure Independence Screening

M D f1 j pn W jˇOjM j is among the first b nc of allg:

3 Regularization After Marginal Learning

Step 1 (Regularization After Screening Noises Out) Search for signals in UO by

You might also like

where !j D 1=jˇOinit j for some 0, in which ˇOinit is some initial estimator.

M D f1 j pn W jˇOjM j is among the first b nc of allg: