0% found this document useful (0 votes)
50 views7 pages

Analysis of Clinical Flow Cytometric

The document discusses analyzing clinical flow cytometry data by treating it as high-dimensional objects rather than sequential two-dimensional analyses. It applied methods involving statistical manifolds and the Kullback-Leibler divergence to cluster cases of acute lympoblastic leukemia and expansion of physiologic B-cell precursors using flow cytometry data from 54 patient samples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views7 pages

Analysis of Clinical Flow Cytometric

The document discusses analyzing clinical flow cytometry data by treating it as high-dimensional objects rather than sequential two-dimensional analyses. It applied methods involving statistical manifolds and the Kullback-Leibler divergence to cluster cases of acute lympoblastic leukemia and expansion of physiologic B-cell precursors using flow cytometry data from 54 patient samples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Cytometry Part B (Clinical Cytometry) 76B:1–7 (2009)

Original Articles
Analysis of Clinical Flow Cytometric
Immunophenotyping Data by Clustering on
Statistical Manifolds: Treating Flow Cytometry
Data as High-Dimensional Objects
William G. Finn,1* Kevin M. Carter,2 Raviv Raich,3 Lloyd M. Stoolman,1 and Alfred O. Hero2
1
Department of Pathology, University of Michigan, Ann Arbor, Michigan 48109
2
Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan 48109
3
School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331

Background: Clinical flow cytometry typically involves the sequential interpretation of two-dimensional
histograms, usually culled from six or more cellular characteristics, following initial selection (gating) of
cell populations based on a different subset of these characteristics. We examined the feasibility of
instead treating gated n-parameter clinical flow cytometry data as objects embedded in n-dimensional
space using principles of information geometry via a recently described method known as Fisher Informa-
tion Non-parametric Embedding (FINE).
Methods: After initial selection of relevant cell populations through an iterative gating strategy, we
converted four color (six-parameter) clinical flow cytometry datasets into six-dimensional probability den-
sity functions, and calculated differences among these distributions using the Kullback-Leibler diver-
gence (a measurement of relative distributional entropy shown to be an appropriate approximation of
Fisher information distance in certain types of statistical manifolds). Neighborhood maps based on Kull-
back-Leibler divergences were projected onto two dimensional displays for comparison.
Results: These methods resulted in the effective unsupervised clustering of cases of acute lympho-
blastic leukemia from cases of expansion of physiologic B-cell precursors (hematogones) within a set of
54 patient samples.
Conclusions: The treatment of flow cytometry datasets as objects embedded in high-dimensional space
(as opposed to sequential two-dimensional analyses) harbors the potential for use as a decision-support
tool in clinical practice or as a means for context-based archiving and searching of clinical flow
cytometry data based on high-dimensional distribution patterns contained within stored list mode
data. Additional studies will be needed to further test the effectiveness of this approach in clinical
practice. q 2008 Clinical Cytometry Society

Key terms: flow cytometry; statistical manifold; information geometry; immunophenotyping; immunopheno-
type clustering

How to cite this article: Finn WG, Carter KM, Raich R, Stoolman LM, Hero AO. Analysis of clinical flow cytomet-
ric immunophenotyping data by clustering on statistical manifolds: Treating flow cytometry data as high-dimen-
sional objects. Cytometry Part B 2009; 76B: 1–7.

Grant sponsor: National Science Foundation; Grant number: CCR-


Clinical flow cytometric analysis usually involves the 0325571.
interpretation of individual two-dimensional scatter plots *Correspondence to: William G. Finn, MD, Department of Pathology,
culled from sets of simultaneous analysis of up to eight University of Michigan, 1301 Catherine Road, Room M5242, Ann
measurements (two light scatter measurements and up Arbor, MI 48109-0602. E-mail: wgfi[email protected]
Received 13 February 2008; Accepted 27 May 2008
to six fluorescence channels or ‘‘colors’’) for routine clin- Published online 18 July 2008 in Wiley InterScience (www.
ical grade analyzers. However, the multidimensional interscience.wiley.com).
power of flow cytometry may be instead more effec- DOI: 10.1002/cyto.b.20435

q 2008 Clinical Cytometry Society


2 FINN ET AL.

tively realized by systems that treat single muticolor anal- Fisher information metric (12). However, calculating
yses as individual high-dimensional datasets (1–8). the Fisher information metric requires knowledge of the
The analysis of high-dimensional datasets has become underlying parameterization of the assumed manifold,
more common in the age of applied genomics and pro- knowledge that is generally not available or feasible in
teomics. However, the fact that all measured characteris- the analysis of flow cytometry datasets.
tics of a given analysis can be traced to each individual Recently, Carter et al. described a nonparametric
cell gives the dimensionality of flow cytometry a approach to clustering and classification on statistical
uniquely spatial characteristic not shared by other pro- manifolds using a similarity measurement known as the
teomic platforms (7,9). Each individual tube analyzed in Kullback-Leibler divergence (commonly referred to as
a routine n-parameter flow cytometry study can be rep- the relative entropy of a probability distribution) as an
resented conceptually as a single object embedded in n- estimate of the Fisher information distance for statistical
dimensional space and formed in aggregate by thousands manifolds for which parameterization is unknown, and
of analyzed cells, each of which displays a unique n- for which individual data points lie in reasonably close
dimensional signature. Just as an ordinary object is bet- proximity (as would generally apply to immunopheno-
ter described by its shape and overall appearance than typic analysis of distinct cell populations by multipara-
by the measuring of its individual dimensions, one could meter flow cytometry) (12,14). As a given manifold is
consider the possibility that flow cytometry data could more densely sampled, the Kullback-Leibler divergence
be better represented by the general shape of a cell pop- converges to the Fisher information distance. This
ulation over all of the dimensions analyzed (5). Since we approach has been termed Fisher Information Non-
live in three-dimensional space, direct visualization of a parametric Embedding (FINE) (12).
four color (six dimensional) flow cytometry dataset as a In this study, we attempted to apply these principles to
six-dimensional object is not feasible. However, rather the interpretation of flow cytometry datasets as high-
than utilizing the interpretation of sequential two-dimen- dimensional objects generated by probability density func-
sional projections of this six-dimensional object (as is tions embedded on a statistical manifold (as opposed to
the current norm), analytical methods can be devised for sequential groups of individual light scatter characteristics
the comparison of separate datasets embedded as unique or surface antigens). As an initial test of this approach, we
objects in six-dimensional space. chose to compare the immunophenotypic patterns of leu-
The analysis of high-dimensional datasets often kemic B-precursor lymphoblasts against the immunophe-
involves characterizing the manifold within which the notypic patterns of physiologic B-cell precursors (hemato-
data are assumed to be embedded. In layman’s terms, gones), since distinction between these often similar cell
the mathematical concept of a manifold could be defined types is an important and sometimes challenging task that
as a smooth space or surface (of any dimensionality) that often confronts practicing hematopathologists on the day-
is nearly ‘‘flat’’ on small scales, and within which geomet- to-day diagnostic service (15).
rical objects may be embedded. Examples could include
a sphere, a torus, Euclidean space in general, and indeed MATERIALS AND METHODS
our three-dimensional universe. The field of manifold
Case Selection
learning involves the discovery of lower dimensional
manifolds for objects embedded in higher dimensional The use of previously analyzed clinical flow cytometry
space and is often applied to dimensionality reduction of data for cluster analysis was approved by our Institu-
high-dimensional datasets (10). tional Review Board. The files of the clinical flow cyto-
It is often assumed that high-dimensional datasets can metry laboratory at the University of Michigan were
be appropriately represented on Euclidean manifolds searched for cases coded as B-precursor acute lympho-
(manifolds comprised of points or coordinates embedded blastic leukemia (ALL) based on complete diagnostic
within Euclidean space). However, there are many prob- assessment including morphologic assessment of mar-
lems in which the data cannot be appropriately repre- row, flow cytometric immunophenotyping, and cytoge-
sented by a Euclidean manifold, and the model parame- netic analysis where indicated per World Health Organi-
ters are unspecified and must be learned through the zation diagnostic criteria (16). From this list, cases were
data. In such cases, it may be helpful to assume that the selected that had sufficient available list mode data and
data lie in a manifold composed not of individual spatial sufficient cells for analysis, searching back from the most
coordinates, but of probability density functions. The recent cases available. Thirty-one cases of ALL were
term statistical manifold has been used to describe retrieved for analysis, spanning an approximately 18-
such manifolds composed of probability density func- month period. For comparison, the flow cytometry data-
tions rather than spatial coordinates (11,12). base was manually screened for the presence of cases
The emerging field of information geometry involves with hematogone hyperplasia, and from this screen 23
the analysis of probability distributions as geometric cases were retrieved showing prominent hematogone
structures within non-Euclidean space and can be populations, again based on a combination of morpho-
applied to the study of statistical manifolds (13). The dis- logic assessment, clinical correlation, and flow cytomet-
tance between points or objects on a statistical manifold ric immunophenotyping based on previously published
can be measured by a distance function known as the descriptions of hematogone immunophenotypes (15).

Cytometry Part B: Clinical Cytometry


TREATING FLOW DATA AS HIGH-DIMENSIONAL OBJECTS 3
Data Retrieval
Raw flow cytometry data for this study were gener-
ated by analysis on a Beckman-Coulter FC-500 flow cy-
tometer using Beckman-Coulter CXP acquisition software
(Beckman-Coulter, Hialeah, FL) and stored as list mode
data in standard fcs format. Within our routine acute leu-
kemia flow cytometry panel, we include a single four-
color (six-parameter) tube including CD45 (ECD conju-
gate), CD10 (phycoerythrin-cyanin 5 conjugate), CD19
(phycoerythrin conjugate), and CD38 (fluorescein iso-
thiocyanate conjugate) (all antibody reagents obtained
from Beckman-Coulter/Immunotech, Hialeah, FL), de-
signed for the isolation of hematogones and aberrant
lymphoblast populations, based on known differential
patterns of these markers in these cell types. Although it
may take additional markers to render fine distinctions
in practice, this tube was selected for analysis since the FIG. 1. Illustration of flow cytometry list mode data after conversion
methods being tested in this study require single high- by a kernel density estimate. This smoothed the data by converting
individual data points into Gaussian distributions, which were then
dimensional datasets acquired in a single analysis, and summed and normalized to form an overall distribution of the same
since this marker combination is highly useful in distin- shape and density variation of the initial dataset. The conversion was
guishing these cells subsets in most cases. performed over all six dimensions for each dataset, but the figure
depicts a two-dimensional projection of the kernel density estimate (in
List mode data were prepared for analysis as follows. this case CD19 vs. CD10).
First, the cell population of interest (either hematogones
or lymphoblasts, depending on the case) was selected Briefly, in the first step the gated tab-delimited list
by manually examining the datasets using an iterative mode files were smoothed by converting from sets of
gating strategy to evaluate for the presence of distinct individual points into probability density functions using
cell clusters based on the most effective discriminator kernel density estimation. Kernel methods are nonpara-
for that particular case. In most cases, the initial evalua- metric techniques used for estimating probability den-
tion was of a CD10 versus CD19 histogram or of a CD10 sities of data sets and involve the conversion of discrete
versus CD38 histogram (depending on the separation of data points into the normalized sum of identical den-
cell clusters), due to the tendency for lymphoblasts and sities centered about each data point. In essence, each
hematogones to coexpress these markers. From here, data point is converted into a probability distribution
data were projected onto CD45 versus side angle light and the aggregate of these distributions is summed and
scatter histograms to exclude higher side scatter events normalized to form a single smooth distribution. Kernel
that could potentially represent nonspecific binding of methods have been used in previous work on the analy-
antibodies to nonlymphoid cells. Data were then repro- sis of flow cytometry data (3). For our data, we chose a
jected onto CD10 versus CD19 and CD19 versus CD38 Gaussian kernel, essentially converting each discrete data
histograms to assure the appearance of well-distributed point into a Gaussian probability function, the total of
clustered data without evidence of artificial ‘‘shelves’’ or which were summed and normalized to form a non-
cut-off thresholds for any given marker, and without evi- Gaussian distribution corresponding to the overall
dence of inclusion of extraneous cell clusters that would ‘‘shape’’ of the cloud of individual cellular events meas-
represent nonlymphoblast (or nonhematogone) cell pop- ured in each six-dimensional analysis (Fig. 1). The
ulations. Care was taken during this approach to target derived distribution for each six-dimensional flow cytom-
the selection of the cell subpopulation of interest (either etry analysis would be represented as follows:
hematogones or leukemic lymphoblasts) based on differ-
ential cell clusters on histograms, without being artifi- X  
1 Ni
ðx  xi Þ
cially restrictive as to the exclusion of cells beyond a fi ðxÞ ¼ 3 K
ðNi 3 hÞ i¼1
h
prescribed level of light scatter or marker expression.
Once the gated data for the cells of interest were iso- x 2
where KðxÞ ¼ ðp1ffiffiffiffi
2pÞ
e 2 is the zero mean unit variance
lated via this iterative approach, the data were converted
Gaussian kernel, h is the bandwidth or smoothing pa-
from standard flow cytometry list mode format to tab-
rameter around each data point, and fi ðxÞ is the resulting
delimited text using WinMDI software, version 2.8
probability density function for the ith patient sample
(Scripps Research Institute, La Jolla, CA).
based on the normalized sum of distributions centered
on the Ni cells in the sample. The bandwidth parameter
Data Analysis
is very important to the overall density estimate. Choos-
The gated data files were analyzed using the three-step ing a bandwidth parameter too small will yield a peak
FINE process, described in more detail by Carter et al. filled density, whereas a bandwidth that is too large will
(12,14). generate a density estimate that is too smooth and loses

Cytometry Part B: Clinical Cytometry


4 FINN ET AL.

dependent (and therefore may have different values in


the kernel width vector). The result of the kernel den-
sity estimation step was conversion of discrete dot-plots
into six-dimensional probability density functions.
In the second step, we calculated the relative differen-
ces among individual six-dimensional datasets for each
case using the Kullback-Leibler divergence
Z  
fi ðxÞ
DKL ðfi jjfj Þ ¼ log fi ðxÞdx
fj ðxÞ
to form the following similarity matrix between any
given patient samples i and j:
Dij ¼ DKL ðfi jjfj Þ þ DKL ðfj jjfi Þ:
The similarity matrix was constructed to assure symme-
try, since the Kullback-Leibler divergence is not symmet-
ric. The result was a high-dimensional neighborhood
FIG. 2. Two-dimensional embedding of neighborhood map data gen- map depicting the relative difference in information (i.e.,
erated by the comparison of six-dimensional flow cytometry datasets by similarities) among the 54 total samples analyzed based
Fisher information nonparametric embedding (FINE) using the Kull- on distributions defined in six dimensions.
back-Leibler divergence as a distance measurement. Cases of B-pre-
cursor acute lymphoblastic leukemia (ALL) were effectively separated Since the similarity matrix represents a high-dimen-
from benign hematogone hyperplasia (HP) by this method. The circled sional neighborhood map, an additional step of dimen-
points correspond to the density plots illustrated in Figure 3, num- sionality reduction is included as the third step in the
bered respectively.
procedure so that the similarities between cases in the
high-dimensional neighborhood map may be visualized
most of the features of the distribution. For this analysis, on a two-dimensional plot. This dimensionality reduction
the parameter h was chosen separately for each analyzed step was carried out using classical multidimensional
patient sample using the maximal smoothing principle scaling (12). Multidimensional scaling is the term used
(17) under the assumption that each dimension was in- for a group of methods by which high-dimensional dis-

FIG. 3. Contour plots of CD38 versus CD10 expression for several data sets. The top row corresponds to hematogone hyperplasia (HP) cases, and
the bottom row represents acute lymphoblastic leukemia (ALL) cases. The selected patients are those most similar between disease classes, the cent-
roids of each disease class, and those with little similarity between disease classes, as highlighted in Figure 2.

Cytometry Part B: Clinical Cytometry


TREATING FLOW DATA AS HIGH-DIMENSIONAL OBJECTS 5
tance matrices may be embedded in lower dimensional cytometry are usually in the form of sequential two-
space. Classical multidimensional scaling (cMDS) is a par- dimensional analyses linked to additional dimensions of
ticular type of multidimensional scaling in which each data via previous analytical iterations. Although a useful
point on a matrix of dissimilarities is embedded in Eu- practical method for the study of flow cytometry data-
clidean space, by first centering the dissimilarities about sets, this approach has limited value in unsupervised dis-
the origin, then calculating the eigenvalue decomposi- covery or in proteomic style analysis.
tion of the centered matrix. This method allows for the Analogy may be made to the common endeavor of
low-dimensional graphic representation of data points face recognition. Individuals recognize other individuals
while revealing any natural separation or clustering of by interpreting the overall appearance or shape of one’s
the data (12). face, not by evaluating individual facial measurements in
a step-by-step selection and analysis process. Similarly, in
RESULTS this study we set out to devise a method whereby the
Of patients from whom all samples were obtained, 18 single high-dimensional object formed by the flow cyto-
were male and 13 were female, with an average age of metric analysis of a given cell population could be eval-
25 years at the time of bone marrow biopsy (range 2–74 uated as a whole rather than as sequential parts.
years). Of patients from whom hematogone samples There are numerous potential applications for the
were obtained, 14 were male and 9 were female, with type of analysis outlined here. The ability of this statisti-
an average age of 41 years at the time of bone marrow cal manifold learning method to adapt as additional cases
biopsy (range 9 months to 66 years). are added to the database augments its potential useful-
Two-dimensional maps (generated via multidimen- ness as a clinical decision support tool. Analysis of any
sional scaling) depicting projections of the relative Kull- given case could be queried against the known neighbor-
back-Leibler divergences of six-dimensional data among hood maps constructed by previous analyses, and a list
cases studied are shown in Figure 2. To illustrate the dif- of most probable diagnoses could then be generated to
ferences depicted in Figure 2, traditional plots of CD10 assist the hematopathologist or flow cytometrist in ren-
versus CD38 (two of the 6 measured dimensions in each dering a final diagnostic impression. One could envision
analysis) are shown in Figure 3 for paired cases from a role for this type of approach in borderline classifica-
each cluster (ALL or hematogones) that are relatively tion issues (such as lymphoma subtyping based on
similar, relatively dissimilar, and near the center of each immunophenotype), issues of minimal residual disease
cluster. detection, etc.
In general, the algorithm used in this study was effec- Aside from its potential diagnostic utility, a system of
tive in the discrimination and clustering of cases of ALL clustering flow cytometry data within statistical mani-
from cases showing hematogone expansions. The hema- folds could potentially be used as a context-based search-
togone cases were more tightly clustered, likely reflect- ing and databasing method for case retrieval or research.
ing the greater immunophenotypic variability of leuke- For example, in our laboratory, we currently list cases in
mic lymphoblasts relative to the more consistent and our database according to the final diagnosis assigned to
uniform immunophenotype typical of hematogones (15). that case following our interpretation of the flow cytom-
etry data. If we wish to search for cases of, for example,
DISCUSSION ALL, we enter the appropriate text code into the search
The method outlined in this study represents a novel engine, and it finds cases that we diagnosed as ALL, irre-
approach to the analysis of clinical flow cytometry data spective of the actual immunophenotypic pattern con-
in which multicolor flow cytometry datasets are treated tained within the list mode files. The approach outlined
as virtual objects embedded in high-dimensional space in this study would potentially allow us to store raw list
and compared with one another by approximating infor- mode files of selected cell populations and, subjecting
mation distances on statistical manifolds. In the current them to the manifold learning process, search the data-
demonstration of concept, this system was generally base not for cases by diagnostic label, but by the actual
effective in the unsupervised distinction of patient sam- similarity of the flow cytometry dataset over the entire
ples containing leukemic lymphoblasts from patient sam- group of markers contained within an individual analysis
ples containing normal B cell precursors (hematogones). tube. The system would adapt, with information distan-
Formal data on sensitivity and specificity cannot be ces across the overall neighborhood map adjusting with
derived in this proof-of-principle study since we did not each added case. Such context-based searches have been
randomly select cases from our normal workflow, and proposed for histologic images (18) and would also be
therefore the pretest prevalence of each diagnostic con- of use in retrieval of flow cytometry data from archives.
dition, which is required for derivation of such statistics, Our approach could also have potential value in clus-
would not be represented. tering and classifying disease processes through unsuper-
In contrast to other proteomic or immunophenotyp- vised discovery, analogous to approaches used in func-
ing methods, flow cytometry allows simultaneous analy- tional genomic and proteomic applications. Indeed, flow
sis of numerous surface markers traceable in any combi- cytometric immunophenotyping is at its essence a pro-
nation to a specific individual cell. In day-to-day practice, teomic method, albeit on a relatively small scale (19–21).
attempts to harness this dimensional power of flow Our approach allows a proteomic-style analysis of the

Cytometry Part B: Clinical Cytometry


6 FINN ET AL.

entire distribution formed by multicolor flow cytometric the manifold learning algorithm. One could argue, how-
analysis of cell suspensions. Our study was performed ever, that the analysis of entire datasets (including both
using archived clinical four-color datasets. The power of normal and abnormal cell types) would be of potential
this approach could be magnified considerably if applied value, since the nature of the host response may be dis-
to higher dimensional datasets (10 color and beyond) tinct in a given disease process and may be represented
currently deployed in research settings (22). by the immunophenotypic pattern of non-neoplastic
To our knowledge, our study is the first to employ the cells in the sample. Furthermore, the nature of flow
principles of information geometry and statistical mani- cytometry data allows for the virtual selection of numer-
fold embedding in the comparison of flow cytometry ous different cell types without preanalytical sorting or
results between different patient samples. However, pre- isolation, and subsequent analysis of these subsets via
vious studies have described methods that treat flow manifold learning. A caveat, of course, is that any given
cytometry output as single high-dimensional datasets process of selection for cell populations of interest could
rather than as collections of two-dimensional projec- influence the subsequent clustering algorithm, and
tions. Roederer et al. described systems based on proba- minor differences in cell selection strategies could har-
bility binning of n-dimensional data, including the use of bor the potential to inordinately affect the clustering
an algorithm that identified geographic regions in n- due to potential inconsistencies in initial data selection.
dimensional space that contain significantly more or The influence of various preanalytical factors (number of
fewer events than other areas (7,23). They termed this colors in the analysis, presence of normal cell popula-
statistical comparison of event numbers in high dimen- tions, cell selection strategies, etc.) on the performance
sional space ‘‘frequency difference gating.’’ Zeng et al. of this statistical manifold clustering approach will have
and Zamir et al. described approaches with some con- to be evaluated in expanded prospective studies.
ceptual similarity to ours but with different methods In summary, this study was an attempted demonstra-
(2,6). Zamir et al. evaluated single four-color (six-dimen- tion of principle for the analysis of clinical flow cytome-
sional) flow cytometry assays by converting each of try data as individual high-dimensional datasets using the
them into a single matrix with the number of rows equal principles of information geometry and statistical mani-
to the number of cells analyzed, and the number of col- folds. Such an approach may harbor potential for the de-
umns equal to the number of measured flow cytometry velopment of decision support tools and context-based
characteristics (in this Case 6), each normalized to a search capability in clinical flow cytometry laboratories,
mean of zero and standard deviation of 1. The matrices and for the analysis of flow cytometry data as a proteo-
were then subjected to statistical clustering methods for mic discovery tool. Additional studies will be required to
the classification of different cell populations within the formally assess the potential utility of this approach for
sample. Although this method was based on the analysis such specific applications.
of a six-dimensional dataset as a single entity, it main-
tained the identity of each cell as a discrete point in the LITERATURE CITED
matrix, without conversion to probability density func- 1. Valet GK, Hoffkes HG. Automated classification of patients with
tions as in our study. Zeng et al. used a kernel density chronic lymphocytic leukemia and immunocytoma from flow cyto-
metric three-color immunophenotypes. Cytometry 1997;30:275–
estimation method similar to ours to convert high-dimen- 288.
sional flow cytometry datasets into probability density 2. Zamir E, Geiger B, Cohen N, Kam Z, Katz BZ. Resolving and classi-
functions, but then used histogram features extracted fying haematopoietic bone-marrow cell populations by multi-dimen-
sional analysis of flow-cytometry data. Br J Haematol 2005;129:420–
from each dimension of the probability density function 431.
to guide k-means clustering as a means to identify dis- 3. Collins GS, Krzanowski WJ. Nonparametric discriminant analysis of
phytoplankton species using data from analytical flow cytometry.
crete cell populations within a given dataset. Pedreira Cytometry 2002;48:26–33.
et al. described a multidimensional classification 4. Boddy L, Wilkins MF, Morris CW. Pattern recognition in flow cytom-
approach for automated flow cytometry analysis that, etry. Cytometry 2001;44:195–209.
5. Toedling J, Rhein P, Ratei R, Karawajew L, Spang R. Automated in-
like our method, treated flow cytometry datasets as silico detection of cell populations in flow cytometry readouts and
objects embedded in n-dimensional space and did not its application to leukemia disease monitoring. BMC Bioinformatics
require the application of an assumed distribution onto 2006;7:282.
6. Zeng QT, Pratt JP, Pak J, Ravnic D, Huss H, Mentzer SJ. Feature-
the flow cytometry dataset, but did not use the specific guided clustering of multi-dimensional flow cytometry datasets.
principles of information geometry outlined in the cur- J Biomed Inform 2007;40:325–331.
7. Roederer M, Hardy RR. Frequency difference gating: A multivariate
rent study (8). method for identifying subsets that differ between samples. Cytome-
There are limitations to the treatment of entire flow try 2001;45:56–64.
cytometry datasets as single high-dimensional distribu- 8. Pedreira CE, Costa ES, Arroyo ME, Almeida J, Orfao A. A multidi-
mensional classification approach for the automated analysis of flow
tions. For example, patients with immunophenotypically cytometry data. IEEE Trans Biomed Eng 2008;55:1155–1162.
identical abnormal cell populations would likely be clus- 9. Perez OD, Nolan GP. Phospho-proteomic immune analysis by flow
cytometry: From mechanism to translational medicine at the single-
tered separately depending on the nature of the non-neo- cell level. Immunol Rev 2006;210:208–228.
plastic background cells or on the sheer percentage of 10. Law M. Manifold Learning (Web Page). 2008. Available at: http://
abnormal cells in the sample. For this reason, we chose www.cse.msu.edu/lawhiu/manifold/. Accessed January 25, 2008.
11. Lee S, Abbott AL, Clark N, Araman P. Active contours on statistical
in this study to purify the cells of interest through an manifolds and texture segmentation. In the IEEE International Con-
iterative list-mode selection process before application of ference on Image Processing, IEEE; Genoa, Italy: 2005. pp 828–831.

Cytometry Part B: Clinical Cytometry


TREATING FLOW DATA AS HIGH-DIMENSIONAL OBJECTS 7
12. Carter KM, Raich R, Hero AO. FINE: Information embedding for 18. Balis UJ. Implementation of a region of interest-based query using
document classification. In the Proceedings of the 2008 IEEE Inter- vector quantization, generalized affine class-based vocabularies, and
national Conference on Acoustics, Speech, and Signal Processing, multimodal Chebyshev polynomial normalization to retrieve con-
Las Vegas, NV, IEEE; 2008. pp 1861–1864. text-matched imagery from existing digital image repositories
13. Amari S, Nagoaka H. Differential-Geometrical Methods in Statistics. (abstract). Arch Pathol Lab Med 2005;129:811.
New York: Springer; 1990. 19. Habib LK, Finn WG. Unsupervised immunophenotypic profil-
14. Carter KM, Raich R, Hero AO. Learning on statistical manifolds for ing of chronic lymphocytic leukemia. Cytometry B Clin Cytom
clustering and visualization. In 45th Allerton Conference on Com- 2006;70B:124–135.
munication, Control, and Computing, Monticello, Illinois; 2007. 20. De Zen L, Bicciato S, te Kronnie G, Basso G. Computational analysis
15. McKenna RW, Washington LT, Aquino DB, Picker LJ, Kroft SH. of flow-cytometry antigen expression profiles in childhood acute
Immunophenotypic analysis of hematogones (B-lymphocyte precur- lymphoblastic leukemia: An MLL/AF4 identification. Leukemia 2003;17:
sors) in 662 consecutive bone marrow specimens by 4-color flow 1557–1565.
cytometry. Blood 2001;98:2498–2507. 21. Maynadie M, Picard F, Husson B, Chatelain B, Cornet Y, Le Roux G, Cam-
16. Brunning RD, Borowitz M, Matutes E, Head D, Flandrin G, Swer- pos L, Dromelet A, Lepelley P, Jouault H, Imbert M, Rosenwadj M, Verge
dlow SH, Bennett JM. Precursor B lymphoblastic leukaemia/lympho- V, Bissieres P, Raphael M, Bene MC, Feuillard J. Immunophenotypic clus-
blastic lymphoma. In: Jaffe ES, Harris NL, Stein H, Vardiman JW, tering of myelodysplastic syndromes. Blood 2002;100:2349–2356.
editors. World Health Organization Classification of Tumours: 22. De Rosa SC, Brenchley JM, Roederer M. Beyond six colors: A new
Pathology & Genetics: Tumours of the Haematopoietic and Lymph- era in flow cytometry. Nat Med 2003;9:112–117.
oid Tissues. Lyon: IARC Press; 2001. pp 111–114. 23. Roederer M, Moore W, Treister A, Hardy RR, Herzenberg LA. Proba-
17. Terrell GR. The maximal smoothing principle in density estimation. bility binning comparison: A metric for quantitating multivariate dis-
J Am Stat Assoc 1990;85:470–477. tribution differences. Cytometry 2001;45:47–55.

Cytometry Part B: Clinical Cytometry

You might also like