DeepLearning ACL2012 Tutorial
DeepLearning ACL2012 Tutorial
References
Richard Socher,* Yoshua Bengio,† and Christopher Manning*
*Department of Computer Science, Stanford University
† Department of computer science and operations research, U. Montréal
July 8, 2012
ACL 2012 Tutorial
References
Ando, Rie Kubota and Tong Zhang. 2005. A framework for learning predictive
structures from multiple tasks and unlabeled data. J. Machine Learning Re-
search 6:1817–1853.
Bengio, Y. 2009. Learning deep architectures for AI. Foundations & Trends in
Mach. Learn. 2(1):1–127.
Bengio, Yoshua, Réjean Ducharme, and Pascal Vincent. 2001. A neural proba-
bilistic language model. In T. K. Leen, T. G. Dietterich, and V. Tresp, eds.,
Advances in NIPS 13, pages 932–938. MIT Press.
Bengio, Yoshua, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003.
A neural probabilistic language model. J. Machine Learning Research 3:1137–
1155.
1
Bengio, Y., P. Simard, and P. Frasconi. 1994. Learning long-term dependencies
with gradient descent is difficult. IEEE Tr. Neural Networks 5(2):157–166.
Blitzer, John, Kilian Weinberger, Lawrence Saul, and Fernando Pereira. 2005.
Hierarchical distributed representations for statistical language modeling. In
NIPS’2004. Cambridge, MA: MIT Press.
Bordes, Antoire, Xavier Glorot, Jason Weston, and Yoshua Bengio. 2012. Joint
learning of words and meaning representations for open-text semantic parsing.
AISTATS’2012 .
Bordes, Antoire, Jason Weston, Ronan Collobert, and Yoshua Bengio. 2011.
Learning structured embeddings of knowledge bases. In AAAI 2011.
Bottou, L. 2011. From machine learning to machine reasoning. CoRR
abs/1102.1808.
Brown, Peter F., Vincent J. Della Pietra, Peter V. deSouza, Jenifer C. Lai, and
Robert L. Mercer. 1992. Class-based n-gram models of natural language. Com-
putational Linguistics 18(4):467–479.
Clark, Alexander. 2003. Combining distributional and morphological information
for part of speech induction˙ In EACL 2003, pages 59–66.
Collobert, R. and J. Weston. 2008. A unified architecture for natural language
processing: Deep neural networks with multitask learning. In ICML’2008.
Collobert, Ronan, Jason Weston, Léon Bottou, Michael Karlen, Koray
Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost)
from scratch. Journal of Machine Learning Research 12:2493–2537.
Costa, F., P. Frasconi, V. Lombardo, and G. Soda. 2003. Towards incremental
parsing of natural language using recursive neural networks. Applied Intelli-
gence 19.
Dahl, George E., Dong Yu, Li Deng, and Alex Acero. 2012. Context-dependent
pre-trained deep neural networks for large vocabulary speech recognition. IEEE
Transactions on Audio, Speech, and Language Processing 20(1):33–42.
Dauphin, Y., X. Glorot, and Y. Bengio. 2011. Large-scale learning of embed-
dings with reconstruction sampling. In Proceedings of the 28th International
Conference on Machine learning, ICML ’11.
2
Erhan, Dumitru, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pas-
cal Vincent, and Samy Bengio. 2010. Why does unsupervised pre-training help
deep learning? J. Machine Learning Res. 11:625–660.
Glorot, Xavier and Yoshua Bengio. 2010. Understanding the difficulty of training
deep feedforward neural networks. In AISTATS’2010, pages 249–256.
Glorot, Xavier, Antoire Bordes, and Yoshua Bengio. 2011. Deep sparse rectifier
neural networks. In AISTATS’2011.
Goodfellow, Ian, Quoc Le, Andrew Saxe, and Andrew Ng. 2009. Measuring
invariances in deep networks. In NIPS 22, pages 646–654.
Gould, S., R. Fulton, and D. Koller. 2009. Decomposing a Scene into Geometric
and Semantically Consistent Regions. In ICCV.
3
Huang, Eric H., Richard Socher, Christopher D. Manning, and Andrew Y. Ng.
2012. Improving word representations via global context and multiple word
prototypes. In ACL 2012.
Koo, Terry, Xavier Carreras, and Michael Collins. 2008. Simple semi-supervised
dependency parsing. In Proceedings of ACL, pages 595–603.
Le, Quoc, Jiquan Ngiam, Adam Coates, Abhik Lahiri, Bobby Prochnow, and
Andrew Ng. 2011. On optimization methods for deep learning. In Proc.
ICML’2011. ACM.
Le, Quoc, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Greg Corrado,
Kai Chen, Jeff Dean, and Andrew Ng. 2012. Building high-level features using
large scale unsupervised learning. In ICML’2012.
Lee, Honglak, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. 2009a. Con-
volutional deep belief networks for scalable unsupervised learning of hierarchi-
cal representations. In ICML’2009.
Lee, Honglak, Peter Pham, Yan Largman, and Andrew Ng. 2009b. Unsuper-
vised feature learning for audio classification using convolutional deep belief
networks. In NIPS’2009.
Martin, Sven, Jörg Liermann, and Hermann Ney. 1998. Algorithms for bigram
and trigram word clustering. Speech Communication 24:19–37.
Menchetti, S., F. Costa, P. Frasconi, and M. Pontil. 2005. Wide coverage natural
language processing using kernel methods and neural networks for structured
data. Pattern Recognition Letters 26(12).
Mikolov, Tomas, Anoop Deoras, Stefan Kombrink, Lukas Burget, and Jan Cer-
nocky. 2011. Empirical evaluation and combination of advanced language mod-
eling techniques. In Proc. 12th annual conference of the international speech
communication association (INTERSPEECH 2011).
Mnih, Andriy and Geoffrey E. Hinton. 2007. Three new graphical models for
statistical language modelling. In ICML’2007, pages 641–648.
4
Morin, Frédéric and Yoshua Bengio. 2005. Hierarchical probabilistic neural net-
work language model. In AISTATS’2005, pages 246–252.
Quattoni, Ariadna, Michael Collins, and Trevor Darrell. 2005. Conditional ran-
dom fields for object recognition. In NIPS’2004, pages 1097–1104. MIT Press.
rahman Mohamed, Abdel, George Dahl, and Geoffrey Hinton. 2012. Acoustic
modeling using deep belief networks. IEEE Trans. on Audio, Speech and Lan-
guage Processing 20(1):14–22.
Rifai, Salah, Yann Dauphin, Pascal Vincent, and Yoshua Bengio. 2012. A gener-
ative process for contractive auto-encoders. In ICML’2012.
Rifai, Salah, Pascal Vincent, Xavier Muller, Xavier Glorot, and Yoshua Bengio.
2011. Contracting auto-encoders: Explicit invariance during feature extraction.
In ICML’2011.
Schwenk, H. and J-L. Gauvain. 2002. Connectionist language modeling for large
vocabulary continuous speech recognition. In ICASSP, pages 765–768. Or-
lando, Florida.
Schwenk, Holger, Anthony Rousseau, and Mohammed Attik. 2012. Large, pruned
or continuous space language models on a gpu for statistical machine transla-
tion. In Workshop on the future of language modeling for HLT.
5
Seide, Frank, Gang Li, and Dong Yu. 2011. Conversational speech transcrip-
tion using context-dependent deep neural networks. In Interspeech 2011, pages
437–440.
Sha, Fei and Fernando C. N. Pereira. 2003. Shallow parsing with conditional
random fields. In HLT-NAACL.
Smith, Noah A. and Jason Eisner. 2005. Contrastive estimation: Training log-
linear models on unlabeled data. In Proceedings of the 43rd Annual Meeting of
the Association for Computational Linguistics (ACL’05), pages 354–362. Ann
Arbor, Michigan: Association for Computational Linguistics.
Socher, Richard, Eric H. Huang, Jeffrey Pennington, Andrew Y. Ng, and Christo-
pher D. Manning. 2011a. Dynamic pooling and unfolding recursive autoen-
coders for paraphrase detection. In Advances in Neural Information Processing
Systems 24.
Socher, Richard, Cliff C. Lin, Andrew Y. Ng, and Christopher D. Manning. 2011b.
Parsing natural scenes and natural language with recursive neural networks.
In Proceedings of the 26th International Conference on Machine Learning
(ICML).
Socher, R., C. D. Manning, and A. Y. Ng. 2010. Learning continuous phrase
representations and syntactic parsing with recursive neural networks. In Pro-
ceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning
Workshop.
Socher, Richard, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christo-
pher D. Manning. 2011c. Semi-supervised recursive autoencoders for predict-
ing sentiment distributions. In Proceedings of the 2011 Conference on Empiri-
cal Methods in Natural Language Processing (EMNLP).
Toutanova, Kristina, Dan Klein, Christopher D. Manning, and Yoram Singer.
2003. Feature-rich part-of-speech tagging with a cyclic dependency network.
In Human Language Technology Conference of the North American Chapter
of the Association for Computational Linguistics (HLT-NAACL 2003), pages
252–259.
Turian, Joseph, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: A
simple and general method for semi-supervised learning. In Proc. ACL’2010,
pages 384–394. Association for Computational Linguistics.
6
Vincent, Pascal, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol.
2008. Extracting and composing robust features with denoising autoencoders.
In ICML 2008, pages 1096–1103.