Paper 9
Paper 9
To cite this article: Jungang Xu, Hui Li & Shilong Zhou (2015) An Overview of Deep Generative
Models, IETE Technical Review, 32:2, 131-139, DOI: 10.1080/02564602.2014.987328
To link to this article: https://doi.org/10.1080/02564602.2014.987328
ABSTRACT
As an important category of deep models, deep generative model has attracted more and more attention with
the proposal of Deep Belief Networks (DBNs) and the fast greedy training algorithm based on restricted Boltz-
mann machines (RBMs). In the past few years, many different deep generative models are proposed and
used in the area of Artificial Intelligence. In this paper, three important deep generative models including
DBNs, deep autoencoder, and deep Boltzmann machine are reviewed. In addition, some successful applica-
tions of deep generative models in image processing, speech recognition and information retrieval are also
introduced and analysed.
Keywords:
Deep autoencoder, Deep belief networks, Deep boltzmann machine, Deep generative model, Restricted
boltzmann machine
corresponding learning algorithm created a new situa- 2.1 Deep Generative Model
tion in the field of AI and steps closer to the final objec-
tive of AI. Deep generative model is usually represented as a
graphical model [15]. Sigmoid belief network is a kind
In this paper, we reviewed the deep generative mod- of deep generative model which is proposed and stud-
els, including its history, architecture, and applica- ied before 2006 and trained using variational approxi-
tions. The rest of the paper is organized as follows. mations [1619]. However, to calculate the multi-layer
Section 2 introduced the historical context of deep gen- joint distribution using this model is barely acceptable
erative models. Section 3 described the architecture of [5]. At the beginning of the twentieth century, Hinton
three typical deep generative models, including DBN, et al. proposed a kind of deep generative model called
deep autoencoder, and deep Boltzmann machine deep belief networks based on sigmoid belief networks
(DBM). Section 4 introduced and analysed some typi- [12]. Different from sigmoid belief networks, the top
cal applications of Artificial Intelligence. Section 5 pre- two layers in DBN form a restricted Boltzmann
sented the discussions and perspectives. machine (RBM) [2023], which inspires a fast training
method. The fast unsupervised learning algorithm of
DBN trains one layer at a time greedily and obtains a
2. THE HISTORICAL CONTEXT OF DEEP multi-layer probabilistic model finally. More deep gen-
erative models similar to DBN are proposed and
MODELS
trained by stacking shallow structures greedily, such
According to the theory of probability and statistics, as deep neural networks (DNNs), deep autoencoder,
there are two types of probabilistic model called gener- DBM, recurrent neural network (RNN), and so on
ative model and discriminative model. The generative [2426]. In the recent years, deep generative models
model models the joint distribution and the discrimi- draw more and more attention, and are used to solve
native model focuses on the conditional distribution. If AI problems since the corresponding training algo-
we need to predict the class y of the given observation rithms are fairly fast and need few labeled data.
x, for example, we can use the generative model to cal-
culate pðx; yÞ or use the discriminative model to calcu-
late pðy j xÞ [13,14]. Deep models are based on the 2.2 Deep Discriminative Model
probabilistic model, so the deep models can be catego-
As described in Section 1, the training algorithms like
rized into three classes: deep generative models, deep
BP have some limitations including being stuck in the
discriminative models, and hybrid deep architectures.
local optimum easily and time-consuming. Inspired by
The history of deep models is shown in Figure 1. Gen-
the structure of the visual system, LeCun et al.
erally, the theory of deep discriminative models is eas-
designed a deep discriminative model called convolu-
ier than that of deep generative models which are
tional neural networks (CNNs) and a training method
usually described as graphs. However, according to
of the model [27,28]. Although training deep models
the basic theory described above, deep discriminative
directly in a supervised way is very difficult, CNN is
models are usually trained in a supervised way which
an exception and it is capable of finding the optimum
is very difficult, and deep generative models can be
from the nonlinear space. However, both the tradi-
trained in unsupervised ways which have more poten-
tional training methods like BP and the training algo-
tial. Hybrid deep architectures combine the generative
rithms of CNN need large amounts of labeled data,
models and discriminative models and have some
which restricts the applications in the fields which are
potential real applications.
lack of labeled data like information retrieval.
4.1 Selected Applications in Image Processing 4.2 Selected Applications in Speech Recognition
Traditional image recognition technologies include The state-of-the-art hidden Markov model (HMM) sys-
wavelet transformation, Gabor filter, Bayes Network tems with observation probabilities approximated
decision, etc., for example, a novel approach in pursuit with Gaussian mixture models (GMMs) have been
of recognizing facial expression was proposed in refer- used in speech recognition for a long time and the tra-
ence [51], where facial feature is represented by a ditional neural network is barely used because of the
hybrid of Gabor wavelet transform of an image and low performance.
local transitional pattern code. However, the effect and
efficiency of traditional image recognition technologies Until a few years ago, a five-layer DBN was used to
are still not very satisfied. DBN was proposed and replace the Gaussian mixture component of the
tested on simple image recognition task on MNIST GMMHMM, and the monophone state was used as
data-set of handwritten digits, which is a common the modeling unit to model phone data [60]. Although
data-set for machine learning and pattern recognition monophones are generally accepted as a weaker pho-
experiments [5,5254]. DBN showed promising results netic representation than triphones, the DBNHMM
and outperformed most of the existing models. At the approach with monophones was shown to achieve
same time, deep autoencoder was developed and dem- higher phone recognition accuracy than the state-of-
onstrated with success on dimensionality reduction the-art triphone GMMHMM systems [61]. In more
task [27]. The parameters of deep autoencoder are ini- recent work, one popular type of sequence classifica-
tialized by stacking multiple RBMs and training each tion criterion, maximum mutual information, was
RBM greedily, which allows deep autoencoder net- successfully applied to learn DBN weights for the
works to learn low-dimensional codes that work much Texas Instruments and Massachusetts Institute of
better than principal components analysis as a tool to Technology (TIMIT) phone recognition task [6264].
reduce the dimensionality of data.
The DBNHMM was extended from the monophone
A modified DBN is developed where the top-layer phonetic representation to the triphone or context-
model uses a third-order Boltzmann machine [55]. dependent counterpart and from phone recognition to
This type of DBN is applied to the NORB database large vocabulary speech recognition [6571]. The
which is a three-dimensional object recognition task. experiments on the Bing mobile voice search data-set
Then, two strategies to improve the robustness of the collected under the real usage scenario demonstrate
DBN are developed [56]. First, sparse connections in that the triphone DBNHMM significantly outper-
the first layer of the DBN are used as a way to regular- forms the state-of-the-art HMM system [60]. Three fac-
ize the model. Second, a probabilistic de-noising algo- tors additional to the DBN contribute to the success:
rithm is developed. Both techniques are shown to be the use of triphones as the DBN modeling units, the
effective in improving robustness against occlusion use of the best available triphone GMMHMM to gen-
and random noise in a noisy image recognition task. erate the alignment with each state in the triphones,
DBN has also been successfully applied to create com- and the tuning of the transition probabilities. The
pact but meaningful representations of images for experiments also indicated that the decoding time of a
five-layer DBNHMM is almost the same as that of the information retrieval are also described respectively.
state-of-the-art triphone GMMHMM [68,69]. Although various models of deep learning and their
applications are proposed, there are a lot of works to
4.3 Selected Applications in Information Retrieval do in the future. First, more promoted deep generative
models are needed with architecture more close to
Semantic hashing is the first method which is used to human brains and simpler training theories. Second,
model documents to high-level features using deep after DistBelief is proposed as a distributed large scale
generative models [72,73]. Based on the word-count fea- deep network by Google company, distributed and
tures, the hidden variables in the final layer of a DBN parallel training algorithms of deep generative models
give a much better representation of each document become a hot research area, and in these algorithms
than the widely used latent semantic analysis and the map/reduce programming model will be used [75].
traditional term frequency-inverse document frequency These large scale deep networks are promising to pro-
(TF-IDF) approach for information retrieval. Documents cess big data. Third, the application of deep generative
are mapped to the space of memory addresses where models in information retrieval are worth to develop
semantically similar text documents are located at further, existing deep models are suitable to process
nearby address to facilitate rapid document retrieval. sensitive data with multiple layers like image data and
speech data, but they are too complex to deal with
While pre-training, a constrained conditional Poisson plain data like text data.
model is used to model word-count vectors, and then
normal RBMs are stacked layer by layer until the top
layer. Then the deep model is unrolled to a deep Funding
autoencoder and fine-tuned by BP algorithm. After the
deep model is trained, the retrieval process starts with This work was supported by the National Natural Science
mapping each query document into a binary code by Foundation of China [grant number 61372171]; the National
performing a forward pass through the model with Key Technology R&D Program of China [grant number
thresholding. Then the Hamming distance between 2012BAH23B03].
the query binary code and all other documents’ binary
codes are computed efficiently.
REFERENCES
Recently, a type of DBM is proposed to extract distrib-
1. T. S. Lee, and D. Mumford, “Hierarchical Bayesian inference in
uted semantic representations from a large unstruc- the visual cortex,” The Journal of the Optical Society of America
tured collection of documents, which overcomes the A, Vol. 20, no. 7, pp. 143448, Jul. 2003.
apparent difficulty of training a DBM with judicious 2. T. Serre, L. Wolf, and S. Bileschi, “Robust object recognition
parameter tying [74]. This enables an efficient pre- with cortex-like mechanisms,” IEEE Trans. on Pattern Analysis
and Machine Intelligence, Vol. 29, no. 3, pp. 41126, Mar.
training algorithm and a state initialization scheme for 2007.
fast inference. The model can be trained just as effi- 3. T. S. Lee, D. Mumford, and R. Romero, “The role of the primary
ciently as a standard RBM. The experiments showed visual cortex in higher level vision,” Vision Research, Vol. 38,
that the model assigns better log probability to unseen no. 15, pp. 242954, Aug. 1998.
data than the Replicated Softmax model. Features 4. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning
representations by back-propagating errors,” Nature, Vol. 323,
extracted from the model outperform latent Dirichlet no. 7, pp. 5336, Oct. 1986.
allocation (LDA), Replicated Softmax, and document 5. Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy
neural autoregressive distribution estimator (Doc- layer-wise training of deep networks,” in Advances in Neural
NADE) models on document retrieval and document Information Processing Systems, Vol. 19, B. Scho € lkopf, J. C.
Platt and T. Hoffman, Eds. Cambridge, MA: MIT Press, 2006,
classification tasks. pp. 15360.
6. H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, “Explor-
ing strategies for training deep neural networks,” Journal of
5. DISCUSSIONS AND PERSPECTIVES Machine Learning Research, Vol. 1, pp. 140, Jan. 2009.
Deep learning is recently proposed as a promising 7. P. J. Werbos, Beyond Regression: New Tools for Prediction and
Analysis in the Behavioral Sciences. Boston: Harvard
research field and widely used as an effective tool in University, 1974.
many applications. Deep generative model is a cate- 8. R. Hecht-Nielsen, “Replicator neural networks for universal
gory of deep models and characterized by training fast optimal source coding,” Science, Vol. 269, pp. 18603, Sept.
and labeled data free. We introduced the architectures 1995.
and training methods of three popular deep generative 9. G. Tesauro, “Practical issues in temporal difference learning,”
Machine Learning, Vol. 8, no. 34, pp. 25777, May 1992.
models including DBN, deep autoencoder, and DBM.
10. Y. Bengio, “Learning deep architectures for AI,” Foundations
And some typical applications of deep generative and trendsÒ in Machine Learning, Vol. 2, no. 1, pp. 1127,
models in image processing, speech recognition, and Jan. 2009.
11. Y. Bengio, and Y. LeCun, “Scaling learning algorithms towards 30. K. H. Cho, T. Raiko, and A. Ilin, “Parallel tempering is efficient
AI,” Large-Scale Kernel Machines, Vol. 34, pp. 141, Sept. for learning restricted Boltzmann machines,” in Proceedings of
2007. the 2010 International Joint Conference on Neural Networks,
12. G. E. Hinton, S. Osindero, and Y. W. The, “A learning algorithm Thessaloniki, 2010, pp. 18.
for deep belief nets,” Neural Computation, Vol. 18, no. 7, 31. N. Le Roux, and Y. Bengio, “Representational power of
pp. 152754, Jul. 2006. restricted Boltzmann machines and deep belief networks,”
13. B. Taskar, P. Abbeel, and D. Koller, “Discriminative probabilistic Neural Computation, Vol. 20, no. 6, pp. 163149, Jun. 2008.
models for relational data,” in Proceedings of Conference 32. A. Fischer, and C. Igel, “Empirical analysis of the divergence of
on Uncertainty in Artificial Intelligence, Alberta, 2002, Gibbs sampling based learning algorithms for restricted Boltz-
pp. 48592. mann machines,” in Proceedings of the 20th International
14. J. A. Lasserre, C. M. Bishop, and T. P. Minka, “Principled Conference on Artificial Neural Networks, Thessaloniki, 2010,
hybrids of generative and discriminative models,” in Proceed- pp. 20817.
ings of IEEE Computer Society Conference on Computer Vision 33. G. E. Hinton, “Products of experts,” in Proceedings of the 9th
and Pattern Recognition, New York, 2006, pp. 8794. International Conference on Artificial Neural Networks, London,
15. M. I. Jordan, Learning in Graphical Models. Dordrecht: Kluwer, 1999, pp. 16.
1998. 34. G. E. Hinton, “Learning multiple layers of representation,” Trends
16. P. Dayan, G. E. Hinton, R. Neal, and R. Zemel. “The Helmholtz in Cognitive Sciences, Vol. 11, no. 10, pp. 42834, Oct. 2007.
machine,” Neural Computation, Vol. 7, no. 5, pp. 889904, 35. T. Tieleman, “Training restricted Boltzmann machines using
Sept. 1995. approximations to the likelihood gradient,” in Proceedings of
17. G. E. Hinton, P. Dayan, B. J. Frey, and R. M. Neal, “The “wake- the 25th International Conference on Machine Learning, New
sleep” algorithm for unsupervised neural network,” Science, York, 2008, pp. 106471.
Vol. 268, no. 5214, pp. 1558161, May 1995. 36. T. Tieleman, and G. Hinton, “Using fast weights to improve per-
18. L. K. Saul, T. Jaakkola, and M. I. Jordan, “Mean field theory for sistent contrastive divergence,” in Proceedings of the 26th
sigmoid belief networks,” Journal of Artificial Intelligence Annual International Conference on Machine Learning, New
Research, Vol. 4, no. 1, pp. 6176, Jan. 1996. York, 2009, pp. 103340.
19. I. Titov, and J. Henderson, “Constituent parsing with incremen- 37. Y. Bengio, and O. Delalleau, “Justifying and generalizing
tal sigmoid belief networks,” in Proceedings of Meeting of contrastive divergence,” Neural Computation, Vol. 21, no. 6,
Association for Computational Linguistics, Prague, 2007, pp. 160121, Jun. 2009.
pp. 6329. 38. A. Fischer, and C. Igel, “Empirical analysis of the divergence of
20. P. Smolensky, “Information processing in dynamical systems: Gibbs sampling based learning algorithms for restricted
foundations of harmony theory,” Parallel Distributed Process- Boltzmann machines,” in Proceedings of the 20th International
ing: Explorations in the Microstructure of Cognition, Vol. 1, Conference on Artificial Neural Networks, Thessaloniki, 2010,
pp. 194281, Feb. 1986. pp. 20817.
21. Y. Freund, and D. Haussler, “Unsupervised learning of distribu- 39. D. J. Earl, and M. W. Deem, “Parallel tempering: theory, appli-
tions on binary vectors using two layer networks,” in Advances cations, and new perspectives,” Physical Chemistry Chemical
in Neural Information Processing Systems, Vol. 4, J. E. Moody, Physics, Vol. 7, pp. 39106, Aug. 2005.
S. J. Hanson, and R.P. Lippmann, Eds. Denver, CO: Morgan 40. G. Desjardins, A. Courville, and Y. Bengio, “Parallel tempering
Kaufmann, 1991, pp. 9129. for training of restricted Boltzmann machines,” in Proceedings
22. G. E. Hinton, “Training products of experts by minimizing con- of the 13th International Conference on Artificial Intelligence
trastive divergence,” Neural Computation, Vol. 14, no. 8, and Statistics, New York, 2010, pp. 14552.
pp. 1771800, Aug. 2002. 41. R. M. Neal, “Sampling from multimodal distributions using tem-
23. M. Welling, M. Rosen-Zvi, and G. E. Hinton, “Exponential family pered transitions,” Statistics and Computing, Vol. 6, no. 4,
harmoniums with an application to information retrieval,” in pp. 35366, Dec. 1996.
Advances in Neural Information Processing Systems, Vol. 17, L. 42. Y. Iba, “Extended ensemble monte carlo,” International Journal
K. Saul, Y. Weiss and L. Bottou, Eds. Cambridge, MA: MIT of Modern Physics, Vol. 12, no. 5, pp. 62356, Jun. 2001.
Press, 2004, pp. 14818. 43. J. Xu, H. Li, and S. Zhou, “Improving Mixing Rate with Tem-
24. R. Salakhutdinov, and G. E. Hinton, “Deep Boltzmann pered Transition for Learning Restricted Boltzmann Machines,”
machines,” in Proceedings of International Conference on Arti- Neurocomputing, Vol. 139, pp. 32835, Sept. 2014.
ficial Intelligence and Statistics, Florida, 2009, pp. 44855. 44. D. C. Plaut, and G. E. Hinton, “Learning sets of filters using
25. G. E. Hinton, and R. Salakhutdinov, “Reducing the dimension- back-propagation,” Computer, Speech and Language, Vol. 2,
ality of data with neural networks,” Science, Vol. 313, no. 5786, no. 1, pp. 3561, Mar. 1987.
pp. 5047, May 2006. 45. D. DeMers, and G. Cottrell, “Non-linear dimension reduction,”
26. R. Collobert, and J. Weston, “A unified architecture for natural in Advances in Neural Information Processing Systems, Vol. 5,
language processing: Deep neural networks with multitask S. J. Hanson, J. D. Cowan and C. L. Giles, Eds. San Mateo,
learning,” in Proceedings of International Conference on CA: Morgan Kaufmann, 1992, pp. 5807.
Machine learning, Helsinki, 2008, pp. 1607. 46. R. Hecht-Nielsen, “Replicator neural networks for universal
27. M. Ranzato, C. Poultney, S. Chopra, and Y. LeCun, “Efficient optimal source coding,” Science, Vol. 269, no. 5232,
learning of sparse representations with an energy-based pp. 18603, Sept. 1995.
model,” in Advances in Neural Information Processing 47. N. Kambhatla, and T. K. Leen, “Dimension reduction by local
€ lkopf, J. C. Platt and T. Hoffman, Eds.
Systems, Vol. 19, B. Scho principal component analysis,” Neural Computation, Vol. 9, no.
Cambridge: MIT Press, 2006, pp. 113744. 7, pp. 1493516, Oct. 1997.
28. P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for 48. R. Salakhutdinov, and G. E. Hinton, “Deep Boltzmann
convolutional neural networks applied to visual document machines,” in Proceedings of the 12th International Confer-
analysis,” in Proceedings of the 7th International Conference ence on Artificial Intelligence and Statistics, Clearwater Beach,
on Document Analysis and Recognition, Washington DC, 2009, pp. 44855.
2003, pp. 95863.
49. R. Salakhutdinov, and G. Hinton, “A better way to pretrain deep
29. L. Deng, and D. Yu, “Deep learning for signal and information Boltzmann machines,” Advances in Neural Information Proc-
processing,” Microsoft Research Report, Redmond, 2013. essing Systems, Vol. 25, F. Pereira, C. J. C. Burges, L. Bottou
and K. Q. Weinberger, Eds. Cambridge, MA: MIT Press, 2012, Proceedings of the 37th International Conference on Acoustics,
pp. 19. Speech, and Signal Processing, Kyoto, 2012, pp. 427376.
50. R. Salakhutdinov, “Learning deep generative models,” Ph.D. 64. A. Mohamed, D. Yu, and L. Deng, “Investigation of full-
Dissertation, Graduate Department of Computer Science, sequence training of deep belief networks for speech recogni-
Univ. Toronto, Toronto, 2009. tion,” in Proceedings of the 11th Annual Conference of the
51. A. Tanveer, J. Taskeed, and C. Ui-Pil, “Facial expression recog- International Speech Communication Association, Makuhari,
nition using local transitional pattern on gabor filtered facial 2010, pp. 28469.
images”, IETE Technical Review, Vol. 30, no. 1, pp. 4752, 65. D. Yu, F. Seide, G. Li, and L. Deng, “Exploiting sparseness in
Jan. 2013. deep neural networks for large vocabulary speech recogni-
52. G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning algo- tion,” in Proceedings of the 37th International Conference on
rithm for deep belief nets,” Neural Computation, Vol. 18, no. 7, Acoustics, Speech, and Signal Processing, Kyoto, 2012,
pp. 152754, Jul. 2006. pp. 440912.
53. J. Luo, and A. Brodsky, “An EM-based multi-step piecewise 66. D. Yu, S. Wang, Z. Karam, and L. Deng, “Language recognition
surface regression learning algorithm,” in Proceedings of the using deep-structured conditional random fields,” in Proceed-
7th International Conference on Data Mining, Las Vegas, 2011, ings of the 35th International Conference on Acoustics, Speech
pp. 28692. and Signal Processing, 2010, pp. 50303.
54. J. Luo, A. Brodsky, and Y. Li, “An EM-based ensemble learning 67. F. Seide, G. Li, X. Chen, and D. Yu, “Feature engineering in
algorithm on piecewise surface regression problem,” Interna- context-dependent deep neural networks for conversational
tional Journal of Applied Mathematics and Statistics, Vol. 28, speech transcription,” in Proceedings of the 2011 IEEE Work-
no. 4, pp. 5974, Aug. 2012. shop on Automatic Speech Recognition and Understanding,
Hawaii, 2011, pp. 249.
55. V. Nair, and G.Hinton, “3-d object recognition with deep belief
nets,” in Advances in Neural Information Processing Systems, 68. G. Dahl, D. Yu, L. Deng, and A. Acero, “Context-dependent
Vol. 22, Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Wil- DBN-HMMs in large vocabulary continuous speech recogni-
liams and A. Culotta, Eds. Cambridge, MA: MIT Press, 2009, tion,” in Proceedings of the 36th International Conference on
pp. 133947. Acoustics, Speech, and Signal Processing, Prague, 2011,
pp. 468891.
56. Y. Tang, and C. Eliasmith, “Deep networks for robust visual rec-
ognition,” in Proceedings of the 27th International Conference 69. G. Dahl, D. Yu, L. Deng, and A. Acero, “Context-dependent,
on Machine Learning, Haifa, 2010, pp. 105562. pre-trained deep neural networks for large vocabulary speech
recognition,” IEEE Trans. Audio, Speech, & Language Proc.,
57. A. Taralba, R. Fergus, and Y. Weiss, “Small codes and large Vol. 20, no. 1, pp. 3042, Jan. 2012.
image databases for recognition,” in Proceedings of Computer
Vision and Pattern Recognition, Anchorage, 2008, pp. 18. 70. Y. Kubo, T. Hori, and A. Nakamura, “Integrating deep neural
networks into structural classification approach based on
58. J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Ng, “Multi- weighted finite-state transducers,” in Proceedings of the 13th
modal deep learning,” in Proceedings of the 28th International Annual Conference of the International Speech Communication
Conference on Machine Learning, Bellevue, 2011, pp. 68996. Association, Portland, 2012.
59. N. Srivastava, and R. Salakhutdinov, “Multimodal learning with 71. L. Deng, J. Li, K. Huang, D. Yao, F. Yu, M. Seide, G. Seltzer, X.
deep Boltzmann machines,”in Advances in Neural Information Zweig, J. He, Y. Williams, and A. Acero. “Recent advances in
Processing Systems, Vol. 25, F. Pereira, C. J. C. Burges, L. deep learning for speech research at Microsoft,” in Proceed-
Bottou and K. Q. Weinberger, Eds. Montreal, Canada: NIPS, ings of International Conference on Acoustics, Speech and Sig-
2012, pp. 222230. nal Processing, Vancouver, 2013, pp. 86048.
60. A. Mohamed, G. Dahl, and G. Hinton, “Deep belief networks 72. G. Hinton, and R. Salakhutdinov, “Discovering binary codes for
for phone recognition,” in Proceedings of Neural Information documents by learning deep generative models,” Topics in
Processing Systems 2009 Workshop on Deep Learning for Cognitive Science, Vol. 3, no. 1, pp. 7491, Jan. 2011.
Speech Recognition and Related Applications, Vancouver,
2009. 73. R. Salakhutdinov, and G. Hinton, “Semantic hashing,” Interna-
tional Journal of Approximate Reasoning, Vol. 50, no. 7,
61. G. Sivaram, and H. Hermansky, “Sparse multilayer perceptron pp. 96978, Jul. 2009.
for phoneme recognition,” IEEE Trans. Audio, Speech, &
Language Processing, Vol. 20, no. 1, pp. 239, Jan. 2012. 74. N. Srivastava, R. Salakhutdinov, and G. E. Hinton, “Modeling
documents with deep Boltzmann machines,” in Proceedings
62. A. Mohamed, G. Dahl, and G. Hinton, “Acoustic Modeling of the 29th Conference on Uncertainty in Artificial Intelligence,
Using Deep Belief Networks,” IEEE Trans. Audio, Speech, & Bellevue, 2013, pp. 61624.
Language Processing, Vol. 20, no. 1, pp. 1422, Jan. 2012.
75. W. Fang, W. Pan, and Z. Cui, “View of MapReduce: Program-
63. A. Mohamed, G. Hinton, and G. Penn, “Understanding how ming model, methods, and its applications”, IETE Technical
deep belief networks perform acoustic modelling,” in Review, Vol. 29, no. 5, pp. 3807, Sept. 2012.
Authors
Jungang Xu is an associate professor of the Shilong Zhou is an MS student of School of
School of Computer and Control Engineering, Computer and Control Engineering, University
University of Chinese Academy of Sciences. of Chinese Academy of Sciences. He received
He received the PhD degree in computer the BS degree in software engineering from
applied technology from Graduate University Northeast University in 2012. His current
of Chinese Academy of Sciences in 2003. Dur- research interests include deep learning and
ing 20032005, he was a post-doctor of Tsing- information retrieval.
hua University. His current research interests
include deep learning, parallel computing, big Email: [email protected].
data management, etc.
Email: [email protected].