Recently, a breakthrough news spread over social networks.

The part II of my notes and thoughts on NIPS 2015 Deep Learning Symposium. Followings are the papers included in this post:

This post introduces my notes and thoughts on NIPS 2015 Deep Learning Symposium. Due to page limit, it will be separated into two posts. Click here to check the second part. Followings are the papers included in this post:

This post introduces my highlights on ICLR 2016 submssions. They are as listed:

The LSTM architecture with the gate mechanism, is designed firstly to tackle with the “gradient vanishing” problem, a major problem in standard RNNs. It is that, error gradients vanish exponentially quickly with the size of the time lag between important events, as first realized in 1991 12. With LSTM forget gates, however, when error values are back-propagated from the output, the gradient does not vanish if the forget gate is on, which means that activation close to 1.0. Since the forget gate activation is never larger than 1.0, the gradient can’t explode either. Thus, LSTM can prevent “gradient vanishing” problem by preventing any changes to the contents of the cell over many cycles. Nevertheless, there still remains similar problems about information flow in LSTM. In this post, I will introduce some work that addresses these problems.

  1. S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut f. Informatik, Technische Univ. Munich, 1991. 

  2. S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In S. C. Kremer and J. F. Kolen, editors, A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, 2001. 

On the recent held ACML 2015, Professor Ruslan Salakhutdinov was invited to give a talk about «Multi-modal Deep Learning». In this post, I will follow some work from Professor Ruslan Salakhutdinov and his students to have a first look at Multi-modal Deep Learning.

In the previous post, I briefly introduced a list of paper applying attention-based models in natural language processing. Though slight different, they are all soft alignment models. However, there actually exits two classes of alignment models, the soft one, and also the hard one. In fact, the soft and hard alignment models are concurred in computer vision in late 2014 1. In this work, the authors explore and compare two variants of this model: a deterministic version trainable using standard backpropagation techniques (soft alignment model) and a stochastic variant trainable by maximizing a variational lower bound (hard one). Due to the differences between CV and NLP (more precisely, image vs. language), hard alignment models are more difficult to transfer into NLP. In this post, I aim at introducing some advanced attention-based models especially hard ones, which have not been yet but will be popular.

  1. Kelvin Xu, Jimmy Ba, Ryan Kiros, et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. 2015. In Proceedings of ICML. 

Attention-based models are firstly proposed in the field of computer vision around mid 20141 (thanks for the remindar from @archychu). And then they spread into Natural Language Processing. In this post, I will mainly focus on a list of attention-based models applied in natural language processing.

  1. Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu. Recurrent Models of Visual Attention. 2014. In Advances in Neural Information Processing Systems. 

随着 ACL 2015 大会的落幕,SIGIR/KDD 2015 大会的召开,以及 EMNLP 2015 accepted papers 的公布,“后 word embedding”时期初现端倪。个人认为未来可能有四个小方向,分别是:interpretable relations, lexical resources, beyond words, beyond English。在分别展开这四个小方向之前,另一个问题是,那 word embedding models 的改进已经画上句号了么?

word2vec 作为一个已经被广为流传的工具,其优点已不必多说。那么它有什么缺陷和不足呢?其实其作者 Mikolov 是一个非常典型的工程型选手,实用主义,什么简单方便有效就用什么;导致 word2vec 作为一个简单的模型,其忽略了很多文本中的其他信息。那么这些其他信息都有什么呢?