Publications of Deep Learning in NLP

This page is to collect the publications and related resources of Deep Learning in NLP. Last updated on 16/07/2015.

It is inspired by the great resource on CSE 599 - Advanced in NLP.

Feel free to pull request on GitHub.

###Survey

Bengio et al’s survey on representation learning

  • Yoshua Bengio, Aaron Courville and Pascal Vincent. “Representation Learning: A Review and New Perspectives.” pdf TPAMI 35:8(1798-1828)

    Bengio, LeCun Yann, Yoshua Bengio and Geoffrey Hinton’s survey on Nature

  • Yann LeCun, Yoshua Bengio and Geoffrey Hinton. “Deep Learning” pdf Nature 521, 436–444
  • [survey, CNN, RNN, ReNN] Yoav Goldberg. “A Primer on Neural Network Models for Natural Language Processing”. pdf 2015

Embeddings & Language Models

Skip-gram embeddings

  • Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. “Efficient Estimation of Word Representations in Vector Space.” pdf ICLR, 2013.
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. “Distributed Representations of Words and Phrases and their Compositionality.” pdf NIPS, 2013.
  • [king-man+woman=queen] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. “Linguistic Regularities in Continuous Space Word Representations.” pdf NAACL, 2013.
  • [technical note] Yoav Goldberg and Omer Levy “word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method” pdf Tech-report 2013
  • [buzz-busting] Omer Levy and Yoav Goldberg “Linguistic Regularities in Sparse and Explicit Word Representations” pdf CoNLL-2014 Best Paper Award
  • [lessons learned] Omer Levy, Yoav Goldberg, Ido Dagan “Improving Distributional Similarity with Lessons Learned from Word Embeddings” pdf, TACL 2015
  • [syntax-word order] Wang Liang, Chris Dyer, Alan Black, Isabel Trancoso. “Two/Too Simple Adaptations of Word2Vec for Syntax Problems” pdf NAACL 2015 (Short)

Embedding enhancement: Syntax, Retrofitting, etc

  • [dependency embeddings] Omer Levy and Yoav Goldberg “Dependency Based Word Embeddings” pdf ACL 2014 (Short)
  • [dependency embeddings] Mohit Bansal, Kevin Gimpel and Karen Livescu. “Tailoring Continuous Word Representations for Dependency Parsing” pdf ACL 2014 (Short)
  • [retrofitting with lexical knowledge] Manaal Faruqui, Jesse Dodge, Sujay Kumar Jauhar, Chris Dyer, Eduard Hovy and Noah A. Smith. “Retrofitting Word Vectors to Semantic Lexicons” pdf, NAACL 2015
  • [contrastive estimation] Mnih and Kavukcuoglu, “Learning Word Embeddings Efficiently with Noise-Contrastive Estimation.” pdf NIPS 2013
  • [embedding documents] Quoc V Le, Tomas Mikolov. “Distributed representations of sentences and documents” pdf ICML 2014
  • [synonymy relations] Mo Yu, Mark Dredze. “Improving Lexical Embeddings with Semantic Knowledge” pdf ACL 2014 (Short)
  • [embedding relations] Asli Celikyilmaz, Dilek Hakkani-Tur, Panupong Pasupat, Ruhi Sarikaya. “Enriching Word Embeddings Using Knowledge Graph for Semantic Tagging in Conversational Dialog Systems” pdf AAAI 2015 (Short)
  • [multimodal] Angeliki Lazaridou, Nghia The Pham and Marco Baroni. “Combining Language and Vision with a Multimodal Skip-gram Model” pdf NAACL 2015
  • [syntax-word order] Wang Liang, Chris Dyer, Alan Black, Isabel Trancoso. “Two/Too Simple Adaptations of Word2Vec for Syntax Problems” pdf NAACL 2015 (Short)
  • [autoencoder, lexeme, lexical resource, synset] Sascha Rothe and Hinrich Schutze, “AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes” pdf ACL 2015 Best Paper
  • [lexical resource, babelnet] Ignacio Iacobacci, Mohammad Taher Pilehvar and Roberto Navigli, “SensEmbed: Learning Sense Embeddings for Word and Relational Similarity” pdf ACL 2015
  • [specific linguistic relation] Zhigang Chen, Wei Lin, Qian Chen, Xiaoping Chen, Si Wei, Hui Jiang and Xiaodan Zhu, “Revisiting Word Embedding for Contrasting Meaning” pdf ACL 2015
  • [syntax] Jianpeng Cheng and Dimitri Kartsaklis. “Syntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models of Meaning”. pdf EMNLP 2015, Lisbon, Portugal, September 2015.

Embedding enhancement: Word order, Morphological, etc

  • [syntax-word order] Wang Liang, Chris Dyer, Alan Black, Isabel Trancoso. “Two/Too Simple Adaptations of Word2Vec for Syntax Problems” pdf NAACL 2015 (Short)
  • [word order] Rie Johnson and Tong Zhang. Effective use of word order for text categorization with convolutional neural networks. pdf NAACL 2015
  • [word order] Radu Soricut and Franz Och. “Unsupervised Morphology Induction Using Word Embeddings” pdf NAACL 2015 Best Paper Awards
  • [morphology] Minh-Thang Luong Richard Socher Christopher D. Manning. “Better Word Representations with Recursive Neural Networks for Morphology” pdf CoNLL 2013
  • [morpheme] Siyu Qiu, Qing Cui, Jiang Bian, Bin Gao, Tie-Yan Liu. “Co-learning of Word Representations and Morpheme Representations” pdf COLING 2014
  • [morphological] Ryan Cotterell and Hinrich Schütze. “Morphological Word-Embeddings” pdf NAACL 2015 (Short)
  • [regularization] Dani Yogatama, Manaal Faruqui, Chris Dyer, Noah Smith. “Learning Word Representations with Hierarchical Sparse Coding” pdf ICML 2015
  • [character, word order, based on word2vec] Andrew Trask David Gilmore Matthew Russell, “Modeling Order in Neural Word Embeddings at Scale” pdf ICML 2015

Embeddings as matrix factorization

  • [approximate interpretation] Levy and Goldberg, “Neural Word Embedding as Implicit Matrix Factorization.” pdf NIPS 2014
  • Omer Levy, Steffen Remus, Chris Biemann, and Ido Dagan. “Do Supervised Distributional Methods Really Learn Lexical Inference Relations?” pdf NAACL 2015 (Short)
  • Tim Rocktaschel, Sameer Singh and Sebastian Riedel. “Injecting Logical Background Knowledge into Embeddings for Relation Extraction” pdf NAACL 2015
  • [exact interpretation] Yitan Li, Linli Xu, Fei Tian, Liang Jiang, Xiaowei Zhong and Enhong Chen. “Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective” pdf IJCAI 2015
  • [SVD, framework, scaling] Karl Stratos, Michael Collins, and Daniel Hsu. “ Model-based Word Embeddings from Decompositions of Count Matrices”. pdf ACL 2015.
  • [MF, SVD] Omer Levy, Yoav Goldberg, and Ido Dagan. “Improving Distributional Similarity with Lessons Learned from Word Embeddings”. pdf TACL 2015.

Embedding obtained from other methods

  • [noise-contrasive estimation] Andriy Mnih and Koray Kavukcuoglu, “Learning word embeddings efficiently with noise-contrastive estimation” pdf NIPS 2013
  • [logarithm of word-word co-occurrences] Jeffrey Pennington, Richard Socher, and Christopher D. Manning, “GloVe: Global Vectors for Word Representation” pdf EMNLP 2014
  • [explicitly encode co-occurrences] Omer Levy, Goldberg Yoav, and Ramat-Gan Israel, “Linguistic regularities in sparse and explicit word representations.” pdf CoNLL 2014.

Why and when embeddings are better

  • [comparison between pretrained embeddings] Yanqing Chen, Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. “The expressive power of word embeddings” pdf ICML 2013
  • [prediction fashioned matters] Felix Hill, KyungHyun Cho, Sebastien Jean, et al., “Not all neural embeddings are born equal” pdf NIPS Workshop 2014
  • [multichannel as multi-embeddings input] Wenpeng Yin, Hinrich Schütze. “MultiGranCNN: An Architecture for General Matching of Text Chunks on Multiple Levels of Granularity” ACL 2015
  • [dimension, corpus, compare] Siwei Lai, Kang Liu, Liheng Xu, Jun Zhao, “How to Generate a Good Word Embedding?” pdf arXiv pre-print

Word Representations via Distribution Embedding

  • Katrin Erk, “Representing Words As Regions in Vector Space”. pdf In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, Boulder, Colorado, 2009.
  • [SVD, framework, scaling] Karl Stratos, Michael Collins, and Daniel Hsu. “ Model-based Word Embeddings from Decompositions of Count Matrices”. pdf ACL 2015.
  • [MF, SVD] Omer Levy, Yoav Goldberg, and Ido Dagan. “Improving Distributional Similarity with Lessons Learned from Word Embeddings”. pdf TACL 2015.
  • [random walks, generative model] Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, Andrej Risteski. “Random walks on discourse spaces: a new generative language model with applications to semantic word embeddings” pdf. In CoRR, 2015.
  • [breadth, asymmetric] Luke Vilnis, Andrew McCallum. “Word Representations via Gaussian Embedding”. pdf. In ICLR, 2015.
  • [markov, generative, MF] Tatsunori B. Hashimoto, David Alvarez-Melis, Tommi S. Jaakkola. “Word, graph, and manifold embedding from Markov processes”. pdf. arXiv preprint 2015.

Classic(!)

  • Brown et al., “Class-Based n-Gram Models of Natural Language.” [pdf] Computational Linguistics 1992

Phrase, Sentence and Document Modeling

Phrase Modeling

  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeff Dean, “Distributed Representations of Words and Phrases and their Compositionality,” pdf NIPS 2013
  • [socher’s]
  • [cutting RNN trees] Christian Scheible, Hinrich Schutze. “Cutting Recursive Autoencoder Trees” pdf CoRR abs/1301.2811 (2013)
  • [composition operators, mental-related similarity] Gershman, S. J., & Tenenbaum, J. B. “Phrase similarity in humans and machines”. pdf 2015 Proceedings of the 37th Annual Conference of the Cognitive Science Society, Proceedings of the 37th Annual Conference of the Cognitive Science Society.
  • [compositional, feature] Mo Yu, Mark Dredze. “Learning Composition Models for Phrase Embeddings”. pdf TACL 3: 227-242 (2015).

Sentence Modeling

CNNs: convolution neural networks for sentence modeling

  • [convnet for sentences, dynamic, k-max pooling, stacked] Nal Kalchbrenner, Edward Grefenstette and Phil Blunsom. “A Convolutional Neural Network for Modelling Sentences” pdf ACL 2014.
  • [2D convolutional] Misha Denil, Alban Demiraj, Nal Kalchbrenner, Phil Blunsom, Nando de Freitas. “Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network” pdf in CoRR 2014.
  • [unsupervised pretraining for CNN] Wenpeng Yin and Hinrich Schutze. “Convolutional Neural Network for Paraphrase Identification.” pdf NAACL 2015
  • [convolute better with word order, parallel-CNN, different region] Rie Johnson and Tong Zhang. “Effective Use of Word Order for Text Categorization with Convolutional Neural Networks” pdf
  • Hermann, Karl Moritz, and Phil Blunsom. “Multilingual Models for Compositional Distributed Semantics.” pdf ACL 2014
  • Hermann, Karl Moritz, and Phil Blunsom. “Multilingual Distributed Representations without Word Alignment.” ACL 2014
  • Kim, Yoon. “Convolutional Neural Networks for Sentence Classification. “ arxiv : 2014
  • Le, Quoc V., and Tomas Mikolov. “Distributed Representations of Sentences and Documents.” ICML (2014).
  • [ARC-I, ARC-II, 2D convolutional, order perserving] Baotian Hu, Zhengdong Lu, Hang Li, etc. “Convolutional Neural Network Architectures for Matching Natural Language Sentences.” pdf NIPS 2014
  • [tree CNN + recursive, structure] Phong Le and Willem Zuidema. “The Forest Convolutional Network: Compositional Distributional Semantics with a Neural Chart and without Binarization”. pdf. EMNLP 2015.

RNNs and their variants

  • [RNN with GRUs] Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler, “Skip-Thought Vectors” pdf NIPS 2015
  • [tree CNN + Recursive, structure] Phong Le and Willem Zuidema. “The Forest Convolutional Network: Compositional Distributional Semantics with a Neural Chart and without Binarization”. pdf. EMNLP 2015.

other NN architectures

  • [DAN, average, simple but effective] Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daumé III, “Deep Unordered Composition Rivals Syntactic Methods for Text Classification” pdf ACL 2015

Document Modeling

  • [2D convolutional] Misha Denil, Alban Demiraj, Nal Kalchbrenner, Phil Blunsom, Nando de Freitas. “Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network” pdf in CoRR 2014.
  • Hermann, Karl Moritz, and Phil Blunsom. “Multilingual Models for Compositional Distributed Semantics.” pdf ACL 2014
  • [deep RBM] Nitish Srivastava, Ruslan R Salakhutdinov, Geoffrey E. Hinton. “Modeling documents with a deep boltzmann machine.” pdf in Uncertainty in Artificial Intelligence, 2013
  • Chaochao Huang, Xipeng Qiu, Xuanjing Huang, “Text Classification with Document Embeddings” pdf Springer 2014
  • Le, Quoc V., and Tomas Mikolov. “Distributed Representations of Sentences and Documents.” ICML (2014).
  • [document-level language model] Rui Lin, Shujie Liu, Muyun Yang, et al. “Hierarchical Recurrent Neural Network for Document Modeling”. 2015. In Proceedings of EMNLP. [pdf]
  • [convolutional] Duyu Tang, Bing Qin, and Ting Liu. “Document Modeling with Gated Recurrent Neural Network for Sentiment Classification”. 2015. In Proceedings of EMNLP. [pdf]
  • [information flow, attention] Yangfeng Ji, Trevor Cohn, Lingpeng Kong, et al. “Document Context Language Models”. 2015. arXiv preprint. In submission to ICLR 2016. [pdf]

Neural Language Models

Neural langauge models

  • [neural LM] Bengio et al., “A Neural Probabilistic Language Model.” pdf Journal of Machine Learning Research 2003
  • [bi-loglinear LM]
  • [discriminative LM] Brian Roark, Murat Saraclar, and Michael Collins. “Discriminative n-gram language modeling.” pdf Computer Speech and Language, 21(2):373-392. 2007
  • [survey, CNN, RNN, ReNN] Yoav Goldberg. “A Primer on Neural Network Models for Natural Language Processing”. pdf 2015

Long short term memory (LSTMs)

  • [parsing] Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton, “Grammar as Foreign Language” pdf arXiv 2014
  • [program] Wojciech Zaremba, Ilya Sutskever, “Learning to Execute” pdf arXiv 2014
  • [translation] Ilya Sutskever, Oriol Vinyals, Quoc Le, “Sequence to Sequence Learning with Neural Networks” pdf NIPS 2014
  • [attention-based LSTM, summarization] Alexander M. Rush, Sumit Chopra and Jason Weston, “A Neural Attention Model for Abstractive Sentence Summarization” pdf EMNLP 2015
  • [bi-LSTM, character] Wang Ling, Tiago Luis, Luis Marujo, Ramon Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W Black, Isabel Trancoso, “Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation” pdf EMNLP 2015
  • [reading gate, dialogue cell] Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David Vandyke, Steve Young, “Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems” pdf EMNLP 2015 Best Paper
  • [attention, stochastic, layer] Lei Jimmy Ba, Roger Grosse, Ruslan Salakhutdinov, Brendan Frey. “Learning Wake-Sleep Recurrent Attention Models”. pdf To appear in NIPS 2015.
  • [sentence vector] Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, et al. “Skip-Thought Vectors”. pdf To appear in NIPS 2015.
  • [state embedding, character] Miguel Ballesteros, Chris Dyer and Noah A. Smith, “Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs” pdf EMNLP 2015
  • [no stacked, highway networks, character, CNN with LSTM] Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush “Character-Aware Neural Language Models” pdf arXiv pre-print 2015
  • [document-level language model] Rui Lin, Shujie Liu, Muyun Yang, et al. “Hierarchical Recurrent Neural Network for Document Modeling”. 2015. In Proceedings of EMNLP. [pdf]
  • [information flow, attention] Yangfeng Ji, Trevor Cohn, Lingpeng Kong, et al. “Document Context Language Models”. 2015. arXiv preprint. In submission to ICLR 2016. [pdf]

CNNs: convolution neural networks for language

  • [convoluting from character-level to doc-level] Xiang Zhang, Yann LeCun. “Text Understanding from Scratch” pdf
  • [character LM for doc-level] Peng, F., Schuurmans, D., Keselj, V. and Wang, S. “Language independent authorship attribution using character level language models.” pdf EACL 2004.
  • [convnet for sentences, dynamic, k-max pooling, stacked] Nal Kalchbrenner, Edward Grefenstette and Phil Blunsom. “A Convolutional Neural Network for Modelling Sentences” pdf ACL 2014.
  • [unsupervised pretraining for CNN] Wenpeng Yin and Hinrich Schutze. “Convolutional Neural Network for Paraphrase Identification.” pdf NAACL 2015
  • [convolute better with word order, parallel-CNN, different region] Rie Johnson and Tong Zhang. “Effective Use of Word Order for Text Categorization with Convolutional Neural Networks” pdf
  • [character, ConvNet, data augumentation] Xiang Zhang, Junbo Zhao, Yann LeCun, “Character-level Convolutional Networks” pdf NIPS 2015
  • [no stacked, highway networks, character, CNN with LSTM] Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush “Character-Aware Neural Language Models” pdf arXiv pre-print
  • [tree CNN + recursive, structure] Phong Le and Willem Zuidema. “The Forest Convolutional Network: Compositional Distributional Semantics with a Neural Chart and without Binarization”. pdf. EMNLP 2015.

QA with commonsense reasoning

  • [nlp for AI] Jason Weston, Antoine Bordes, Sumit Chopra, Tomas Mikolov. “Towards AI-Complete Question Answering:A Set of Prerequisite Toy Tasks” pdf 2015
  • [memory networks] Jason Weston, Sumit Chopra, Antoine Bordes “Memory Networks” pdf ICLR 2015
  • [winograd schema] Hector J. Levesque. “The Winograd Schema Challenge” pdf AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning 2011
  • [textual entailment] Ion Androutsopoulos, Prodromos Malakasiotis “A Survey of Paraphrasing and Textual Entailment Methods” pdf Journal of Artificial Intelligence Research 38 (2010) 135-187
  • [hypothesis entailment] Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, et al. “Reasoning about Entailment with Neural Attention” pdf arXiv preprint 2015

Compositional

  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeff Dean, “Distributed Representations of Words and Phrases and their Compositionality,” pdf NIPS 2013
  • [socher’s]
  • [cutting RNN trees] Christian Scheible, Hinrich Schutze. “Cutting Recursive Autoencoder Trees” pdf CoRR abs/1301.2811 (2013)
  • [dimension, interpretable] Alona Fyshe, Leila Wehbe, Partha Talukdar, et al. “A Compositional and Interpretable Semantic Space”. pdf NAACL 2015.
  • [tree CNN + recursive, structure] Phong Le and Willem Zuidema. “The Forest Convolutional Network: Compositional Distributional Semantics with a Neural Chart and without Binarization”. pdf. EMNLP 2015.
  • [syntax] Jianpeng Cheng and Dimitri Kartsaklis. “Syntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models of Meaning”. pdf EMNLP 2015, Lisbon, Portugal, September 2015.
  • [noncompositional, detection] Majid Yazdani, Meghdad Farahmand and James Henderson. “Learning Semantic Composition to Detect Non-compositionality of Multiword Expressions”. pdf. EMNLP 2015.

Multi-modal Deep Learning

Image Captioning

  • [granularity] Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. “From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions”. TACL 2014. [pdf]
  • [visual-semantic hierarchy] Ivan Vendrov, Ryan Kiros, Sanja Fidler, and Raquel Urtasun. “Order-Embeddings Of Images And Language”. In submission to ICLR 2016. [pdf]
  • [video clip, description] Yukun Zhu, Ryan Kiros, Richard Zemel, et al. “Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books”. ICCV 2015. [pdf] [project]
  • [intermediate multimodal layer, generalization] Lisa Anne Hendricks, Subhashini Venugopalan, Marcus Rohrbach, et al. “Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data”. arXiv preprint 2015. [pdf]
  • Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, et al. “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”. ICML 2015. [pdf] [project]

Image Generation

  • [extended VAE] Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, and Ruslan Salakhutdinov. “Generating Images From Captions With Attention”. In submission to ICLR 2016. [pdf]

Image Questioning

  • [color, object, one-word answer] Mengye Ren, Ryan Kiros, and Richard Zemel. “Exploring Models and Data for Image Question Answering”. NIPS 2015. [pdf]

Video, Frame, Action

  • [video clip, description] “Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books”. ICCV 2015. [pdf] [project]