Publications of Deep Learning in NLP

This page is to collect the publications and related resources of Deep Learning in NLP. Last updated on 16/07/2015.

It is inspired by the great resource on CSE 599 - Advanced in NLP.

Feel free to pull request on GitHub.

###Survey

Bengio et al’s survey on representation learning

Yoshua Bengio, Aaron Courville and Pascal Vincent. “Representation Learning: A Review and New Perspectives.” pdf TPAMI 35:8(1798-1828)
Bengio, LeCun Yann, Yoshua Bengio and Geoffrey Hinton’s survey on Nature
Yann LeCun, Yoshua Bengio and Geoffrey Hinton. “Deep Learning” pdf Nature 521, 436–444
[survey, CNN, RNN, ReNN] Yoav Goldberg. “A Primer on Neural Network Models for Natural Language Processing”. pdf 2015

Embeddings & Language Models

Skip-gram embeddings

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. “Efficient Estimation of Word Representations in Vector Space.” pdf ICLR, 2013.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. “Distributed Representations of Words and Phrases and their Compositionality.” pdf NIPS, 2013.
[king-man+woman=queen] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. “Linguistic Regularities in Continuous Space Word Representations.” pdf NAACL, 2013.
[technical note] Yoav Goldberg and Omer Levy “word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method” pdf Tech-report 2013
[buzz-busting] Omer Levy and Yoav Goldberg “Linguistic Regularities in Sparse and Explicit Word Representations” pdf CoNLL-2014 Best Paper Award
[lessons learned] Omer Levy, Yoav Goldberg, Ido Dagan “Improving Distributional Similarity with Lessons Learned from Word Embeddings” pdf, TACL 2015
[syntax-word order] Wang Liang, Chris Dyer, Alan Black, Isabel Trancoso. “Two/Too Simple Adaptations of Word2Vec for Syntax Problems” pdf NAACL 2015 (Short)

Embedding enhancement: Syntax, Retrofitting, etc

[dependency embeddings] Omer Levy and Yoav Goldberg “Dependency Based Word Embeddings” pdf ACL 2014 (Short)
[dependency embeddings] Mohit Bansal, Kevin Gimpel and Karen Livescu. “Tailoring Continuous Word Representations for Dependency Parsing” pdf ACL 2014 (Short)
[retrofitting with lexical knowledge] Manaal Faruqui, Jesse Dodge, Sujay Kumar Jauhar, Chris Dyer, Eduard Hovy and Noah A. Smith. “Retrofitting Word Vectors to Semantic Lexicons” pdf, NAACL 2015
[contrastive estimation] Mnih and Kavukcuoglu, “Learning Word Embeddings Efficiently with Noise-Contrastive Estimation.” pdf NIPS 2013
[embedding documents] Quoc V Le, Tomas Mikolov. “Distributed representations of sentences and documents” pdf ICML 2014
[synonymy relations] Mo Yu, Mark Dredze. “Improving Lexical Embeddings with Semantic Knowledge” pdf ACL 2014 (Short)
[embedding relations] Asli Celikyilmaz, Dilek Hakkani-Tur, Panupong Pasupat, Ruhi Sarikaya. “Enriching Word Embeddings Using Knowledge Graph for Semantic Tagging in Conversational Dialog Systems” pdf AAAI 2015 (Short)
[multimodal] Angeliki Lazaridou, Nghia The Pham and Marco Baroni. “Combining Language and Vision with a Multimodal Skip-gram Model” pdf NAACL 2015
[syntax-word order] Wang Liang, Chris Dyer, Alan Black, Isabel Trancoso. “Two/Too Simple Adaptations of Word2Vec for Syntax Problems” pdf NAACL 2015 (Short)
[autoencoder, lexeme, lexical resource, synset] Sascha Rothe and Hinrich Schutze, “AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes” pdf ACL 2015 Best Paper
[lexical resource, babelnet] Ignacio Iacobacci, Mohammad Taher Pilehvar and Roberto Navigli, “SensEmbed: Learning Sense Embeddings for Word and Relational Similarity” pdf ACL 2015
[specific linguistic relation] Zhigang Chen, Wei Lin, Qian Chen, Xiaoping Chen, Si Wei, Hui Jiang and Xiaodan Zhu, “Revisiting Word Embedding for Contrasting Meaning” pdf ACL 2015
[syntax] Jianpeng Cheng and Dimitri Kartsaklis. “Syntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models of Meaning”. pdf EMNLP 2015, Lisbon, Portugal, September 2015.

Embedding enhancement: Word order, Morphological, etc

[syntax-word order] Wang Liang, Chris Dyer, Alan Black, Isabel Trancoso. “Two/Too Simple Adaptations of Word2Vec for Syntax Problems” pdf NAACL 2015 (Short)
[word order] Rie Johnson and Tong Zhang. Effective use of word order for text categorization with convolutional neural networks. pdf NAACL 2015
[word order] Radu Soricut and Franz Och. “Unsupervised Morphology Induction Using Word Embeddings” pdf NAACL 2015 Best Paper Awards
[morphology] Minh-Thang Luong Richard Socher Christopher D. Manning. “Better Word Representations with Recursive Neural Networks for Morphology” pdf CoNLL 2013
[morpheme] Siyu Qiu, Qing Cui, Jiang Bian, Bin Gao, Tie-Yan Liu. “Co-learning of Word Representations and Morpheme Representations” pdf COLING 2014
[morphological] Ryan Cotterell and Hinrich Schütze. “Morphological Word-Embeddings” pdf NAACL 2015 (Short)
[regularization] Dani Yogatama, Manaal Faruqui, Chris Dyer, Noah Smith. “Learning Word Representations with Hierarchical Sparse Coding” pdf ICML 2015
[character, word order, based on word2vec] Andrew Trask David Gilmore Matthew Russell, “Modeling Order in Neural Word Embeddings at Scale” pdf ICML 2015

Embeddings as matrix factorization

[approximate interpretation] Levy and Goldberg, “Neural Word Embedding as Implicit Matrix Factorization.” pdf NIPS 2014
Omer Levy, Steffen Remus, Chris Biemann, and Ido Dagan. “Do Supervised Distributional Methods Really Learn Lexical Inference Relations?” pdf NAACL 2015 (Short)
Tim Rocktaschel, Sameer Singh and Sebastian Riedel. “Injecting Logical Background Knowledge into Embeddings for Relation Extraction” pdf NAACL 2015
[exact interpretation] Yitan Li, Linli Xu, Fei Tian, Liang Jiang, Xiaowei Zhong and Enhong Chen. “Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective” pdf IJCAI 2015
[SVD, framework, scaling] Karl Stratos, Michael Collins, and Daniel Hsu. “ Model-based Word Embeddings from Decompositions of Count Matrices”. pdf ACL 2015.
[MF, SVD] Omer Levy, Yoav Goldberg, and Ido Dagan. “Improving Distributional Similarity with Lessons Learned from Word Embeddings”. pdf TACL 2015.

Embedding obtained from other methods

[noise-contrasive estimation] Andriy Mnih and Koray Kavukcuoglu, “Learning word embeddings efficiently with noise-contrastive estimation” pdf NIPS 2013
[logarithm of word-word co-occurrences] Jeffrey Pennington, Richard Socher, and Christopher D. Manning, “GloVe: Global Vectors for Word Representation” pdf EMNLP 2014
[explicitly encode co-occurrences] Omer Levy, Goldberg Yoav, and Ramat-Gan Israel, “Linguistic regularities in sparse and explicit word representations.” pdf CoNLL 2014.

Why and when embeddings are better

[comparison between pretrained embeddings] Yanqing Chen, Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. “The expressive power of word embeddings” pdf ICML 2013
[prediction fashioned matters] Felix Hill, KyungHyun Cho, Sebastien Jean, et al., “Not all neural embeddings are born equal” pdf NIPS Workshop 2014
[multichannel as multi-embeddings input] Wenpeng Yin, Hinrich Schütze. “MultiGranCNN: An Architecture for General Matching of Text Chunks on Multiple Levels of Granularity” ACL 2015
[dimension, corpus, compare] Siwei Lai, Kang Liu, Liheng Xu, Jun Zhao, “How to Generate a Good Word Embedding?” pdf arXiv pre-print

Word Representations via Distribution Embedding

Katrin Erk, “Representing Words As Regions in Vector Space”. pdf In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, Boulder, Colorado, 2009.
[SVD, framework, scaling] Karl Stratos, Michael Collins, and Daniel Hsu. “ Model-based Word Embeddings from Decompositions of Count Matrices”. pdf ACL 2015.
[MF, SVD] Omer Levy, Yoav Goldberg, and Ido Dagan. “Improving Distributional Similarity with Lessons Learned from Word Embeddings”. pdf TACL 2015.
[random walks, generative model] Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, Andrej Risteski. “Random walks on discourse spaces: a new generative language model with applications to semantic word embeddings” pdf. In CoRR, 2015.
[breadth, asymmetric] Luke Vilnis, Andrew McCallum. “Word Representations via Gaussian Embedding”. pdf. In ICLR, 2015.
[markov, generative, MF] Tatsunori B. Hashimoto, David Alvarez-Melis, Tommi S. Jaakkola. “Word, graph, and manifold embedding from Markov processes”. pdf. arXiv preprint 2015.

Classic(!)

Brown et al., “Class-Based n-Gram Models of Natural Language.” [pdf] Computational Linguistics 1992

Phrase, Sentence and Document Modeling

Phrase Modeling

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeff Dean, “Distributed Representations of Words and Phrases and their Compositionality,” pdf NIPS 2013
[socher’s]
[cutting RNN trees] Christian Scheible, Hinrich Schutze. “Cutting Recursive Autoencoder Trees” pdf CoRR abs/1301.2811 (2013)
[composition operators, mental-related similarity] Gershman, S. J., & Tenenbaum, J. B. “Phrase similarity in humans and machines”. pdf 2015 Proceedings of the 37th Annual Conference of the Cognitive Science Society, Proceedings of the 37th Annual Conference of the Cognitive Science Society.
[compositional, feature] Mo Yu, Mark Dredze. “Learning Composition Models for Phrase Embeddings”. pdf TACL 3: 227-242 (2015).

Sentence Modeling

CNNs: convolution neural networks for sentence modeling

[convnet for sentences, dynamic, k-max pooling, stacked] Nal Kalchbrenner, Edward Grefenstette and Phil Blunsom. “A Convolutional Neural Network for Modelling Sentences” pdf ACL 2014.
[2D convolutional] Misha Denil, Alban Demiraj, Nal Kalchbrenner, Phil Blunsom, Nando de Freitas. “Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network” pdf in CoRR 2014.
[unsupervised pretraining for CNN] Wenpeng Yin and Hinrich Schutze. “Convolutional Neural Network for Paraphrase Identification.” pdf NAACL 2015
[convolute better with word order, parallel-CNN, different region] Rie Johnson and Tong Zhang. “Effective Use of Word Order for Text Categorization with Convolutional Neural Networks” pdf
Hermann, Karl Moritz, and Phil Blunsom. “Multilingual Models for Compositional Distributed Semantics.” pdf ACL 2014
Hermann, Karl Moritz, and Phil Blunsom. “Multilingual Distributed Representations without Word Alignment.” ACL 2014
Kim, Yoon. “Convolutional Neural Networks for Sentence Classification. “ arxiv : 2014
Le, Quoc V., and Tomas Mikolov. “Distributed Representations of Sentences and Documents.” ICML (2014).
[ARC-I, ARC-II, 2D convolutional, order perserving] Baotian Hu, Zhengdong Lu, Hang Li, etc. “Convolutional Neural Network Architectures for Matching Natural Language Sentences.” pdf NIPS 2014
[tree CNN + recursive, structure] Phong Le and Willem Zuidema. “The Forest Convolutional Network: Compositional Distributional Semantics with a Neural Chart and without Binarization”. pdf. EMNLP 2015.

RNNs and their variants

[RNN with GRUs] Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler, “Skip-Thought Vectors” pdf NIPS 2015
[tree CNN + Recursive, structure] Phong Le and Willem Zuidema. “The Forest Convolutional Network: Compositional Distributional Semantics with a Neural Chart and without Binarization”. pdf. EMNLP 2015.

other NN architectures

[DAN, average, simple but effective] Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daumé III, “Deep Unordered Composition Rivals Syntactic Methods for Text Classification” pdf ACL 2015

Document Modeling

[2D convolutional] Misha Denil, Alban Demiraj, Nal Kalchbrenner, Phil Blunsom, Nando de Freitas. “Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network” pdf in CoRR 2014.
Hermann, Karl Moritz, and Phil Blunsom. “Multilingual Models for Compositional Distributed Semantics.” pdf ACL 2014
[deep RBM] Nitish Srivastava, Ruslan R Salakhutdinov, Geoffrey E. Hinton. “Modeling documents with a deep boltzmann machine.” pdf in Uncertainty in Artificial Intelligence, 2013
Chaochao Huang, Xipeng Qiu, Xuanjing Huang, “Text Classification with Document Embeddings” pdf Springer 2014
Le, Quoc V., and Tomas Mikolov. “Distributed Representations of Sentences and Documents.” ICML (2014).
[document-level language model] Rui Lin, Shujie Liu, Muyun Yang, et al. “Hierarchical Recurrent Neural Network for Document Modeling”. 2015. In Proceedings of EMNLP. [pdf]
[convolutional] Duyu Tang, Bing Qin, and Ting Liu. “Document Modeling with Gated Recurrent Neural Network for Sentiment Classification”. 2015. In Proceedings of EMNLP. [pdf]
[information flow, attention] Yangfeng Ji, Trevor Cohn, Lingpeng Kong, et al. “Document Context Language Models”. 2015. arXiv preprint. In submission to ICLR 2016. [pdf]

Neural Language Models

Neural langauge models

[neural LM] Bengio et al., “A Neural Probabilistic Language Model.” pdf Journal of Machine Learning Research 2003
[bi-loglinear LM]
[discriminative LM] Brian Roark, Murat Saraclar, and Michael Collins. “Discriminative n-gram language modeling.” pdf Computer Speech and Language, 21(2):373-392. 2007
[survey, CNN, RNN, ReNN] Yoav Goldberg. “A Primer on Neural Network Models for Natural Language Processing”. pdf 2015

Long short term memory (LSTMs)

[parsing] Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton, “Grammar as Foreign Language” pdf arXiv 2014
[program] Wojciech Zaremba, Ilya Sutskever, “Learning to Execute” pdf arXiv 2014
[translation] Ilya Sutskever, Oriol Vinyals, Quoc Le, “Sequence to Sequence Learning with Neural Networks” pdf NIPS 2014
[attention-based LSTM, summarization] Alexander M. Rush, Sumit Chopra and Jason Weston, “A Neural Attention Model for Abstractive Sentence Summarization” pdf EMNLP 2015
[bi-LSTM, character] Wang Ling, Tiago Luis, Luis Marujo, Ramon Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W Black, Isabel Trancoso, “Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation” pdf EMNLP 2015
[reading gate, dialogue cell] Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David Vandyke, Steve Young, “Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems” pdf EMNLP 2015 Best Paper
[attention, stochastic, layer] Lei Jimmy Ba, Roger Grosse, Ruslan Salakhutdinov, Brendan Frey. “Learning Wake-Sleep Recurrent Attention Models”. pdf To appear in NIPS 2015.
[sentence vector] Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, et al. “Skip-Thought Vectors”. pdf To appear in NIPS 2015.
[state embedding, character] Miguel Ballesteros, Chris Dyer and Noah A. Smith, “Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs” pdf EMNLP 2015
[no stacked, highway networks, character, CNN with LSTM] Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush “Character-Aware Neural Language Models” pdf arXiv pre-print 2015
[document-level language model] Rui Lin, Shujie Liu, Muyun Yang, et al. “Hierarchical Recurrent Neural Network for Document Modeling”. 2015. In Proceedings of EMNLP. [pdf]
[information flow, attention] Yangfeng Ji, Trevor Cohn, Lingpeng Kong, et al. “Document Context Language Models”. 2015. arXiv preprint. In submission to ICLR 2016. [pdf]

CNNs: convolution neural networks for language

[convoluting from character-level to doc-level] Xiang Zhang, Yann LeCun. “Text Understanding from Scratch” pdf
[character LM for doc-level] Peng, F., Schuurmans, D., Keselj, V. and Wang, S. “Language independent authorship attribution using character level language models.” pdf EACL 2004.
[convnet for sentences, dynamic, k-max pooling, stacked] Nal Kalchbrenner, Edward Grefenstette and Phil Blunsom. “A Convolutional Neural Network for Modelling Sentences” pdf ACL 2014.
[unsupervised pretraining for CNN] Wenpeng Yin and Hinrich Schutze. “Convolutional Neural Network for Paraphrase Identification.” pdf NAACL 2015
[convolute better with word order, parallel-CNN, different region] Rie Johnson and Tong Zhang. “Effective Use of Word Order for Text Categorization with Convolutional Neural Networks” pdf
[character, ConvNet, data augumentation] Xiang Zhang, Junbo Zhao, Yann LeCun, “Character-level Convolutional Networks” pdf NIPS 2015
[no stacked, highway networks, character, CNN with LSTM] Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush “Character-Aware Neural Language Models” pdf arXiv pre-print
[tree CNN + recursive, structure] Phong Le and Willem Zuidema. “The Forest Convolutional Network: Compositional Distributional Semantics with a Neural Chart and without Binarization”. pdf. EMNLP 2015.

QA with commonsense reasoning

[nlp for AI] Jason Weston, Antoine Bordes, Sumit Chopra, Tomas Mikolov. “Towards AI-Complete Question Answering:A Set of Prerequisite Toy Tasks” pdf 2015
[memory networks] Jason Weston, Sumit Chopra, Antoine Bordes “Memory Networks” pdf ICLR 2015
[winograd schema] Hector J. Levesque. “The Winograd Schema Challenge” pdf AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning 2011
[textual entailment] Ion Androutsopoulos, Prodromos Malakasiotis “A Survey of Paraphrasing and Textual Entailment Methods” pdf Journal of Artificial Intelligence Research 38 (2010) 135-187
[hypothesis entailment] Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, et al. “Reasoning about Entailment with Neural Attention” pdf arXiv preprint 2015

Compositional

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeff Dean, “Distributed Representations of Words and Phrases and their Compositionality,” pdf NIPS 2013
[socher’s]
[cutting RNN trees] Christian Scheible, Hinrich Schutze. “Cutting Recursive Autoencoder Trees” pdf CoRR abs/1301.2811 (2013)
[dimension, interpretable] Alona Fyshe, Leila Wehbe, Partha Talukdar, et al. “A Compositional and Interpretable Semantic Space”. pdf NAACL 2015.
[tree CNN + recursive, structure] Phong Le and Willem Zuidema. “The Forest Convolutional Network: Compositional Distributional Semantics with a Neural Chart and without Binarization”. pdf. EMNLP 2015.
[syntax] Jianpeng Cheng and Dimitri Kartsaklis. “Syntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models of Meaning”. pdf EMNLP 2015, Lisbon, Portugal, September 2015.
[noncompositional, detection] Majid Yazdani, Meghdad Farahmand and James Henderson. “Learning Semantic Composition to Detect Non-compositionality of Multiword Expressions”. pdf. EMNLP 2015.

Image Captioning

[granularity] Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. “From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions”. TACL 2014. [pdf]
[visual-semantic hierarchy] Ivan Vendrov, Ryan Kiros, Sanja Fidler, and Raquel Urtasun. “Order-Embeddings Of Images And Language”. In submission to ICLR 2016. [pdf]
[video clip, description] Yukun Zhu, Ryan Kiros, Richard Zemel, et al. “Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books”. ICCV 2015. [pdf] [project]
[intermediate multimodal layer, generalization] Lisa Anne Hendricks, Subhashini Venugopalan, Marcus Rohrbach, et al. “Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data”. arXiv preprint 2015. [pdf]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, et al. “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”. ICML 2015. [pdf] [project]

Image Generation

[extended VAE] Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, and Ruslan Salakhutdinov. “Generating Images From Captions With Attention”. In submission to ICLR 2016. [pdf]

Image Questioning

[color, object, one-word answer] Mengye Ren, Ryan Kiros, and Richard Zemel. “Exploring Models and Data for Image Question Answering”. NIPS 2015. [pdf]

Video, Frame, Action

[video clip, description] “Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books”. ICCV 2015. [pdf] [project]