# “后 Word Embedding ”的热点会在哪里？

## Interpretable Relations

However, word embeddings inherit two important limitations from their antecedent corpus-based distributional models: (1) they are unable to model distinct meanings of a word as they conflate the contextual evidence of different meanings of a word into a single vector; and (2) they base their representations solely on the distributional statistics obtained from corpora, ignoring the wealth of information provided by existing semantic resources.

Unlike what was suggested in previous work, where relatedness statistics learned from corpora is often claimed to yield extra gains over lexicon-based models, we obtained this new state-of-the-art result relying solely on lexical resources (Roget’s and WordNet), and corpus statistics does not seem to bring further improvement. To provide a comprehensive understanding, we constructed our study in a framework that exam- ines a number of basic concerns in modeling contrasting meaning. We hope our efforts would help shed some light on future directions for this basic semantic modeling problem.

The general aim of the models is to enforce that in the embedding space, the word pairs with higher degrees of contrast will be put farther from each other than those of less contrast.

##Lexical Resources

Motivation: 1) 公式（1），一个 word 由多个 lexeme 组成；2) 公式（2），一个 synset 也由多个 lexeme 组成；3) 一个 word embedding 也可以认为是它不同 sense 的 embedding 的加和。 Constraints: 1) 公式（1），一个 word 由多个 lexeme 组成；2) 公式（2），一个 synset 也由多个 lexeme 组成；3) 公式（25），相关联的 synset 应该有相近的 embedding，这个 constraints 用于解决 word 没有 synset 或者只有一个 synset word 时的问题。 Model: 基于这几种 constraints，就可以设计一个 encoding - decoding autoencoder 的 NN，从 input 到 output，把 synsets 当 words 的 encoding。然后公式（10）到公式（17）就是整个 autoencoder 的 NN 的每一步骤的公式。可以看出，encoding - decoding 的两种表达之间的差值就是我们的 objective 函数了，即公式 （17）。 Implementation: 这个论文的另一个卖点除了这种很优美的 relation constraints 外，是他充分利用了 sparseness 去加速求解。他们把 word synset 之间，用 autoencoder 的 encoding 和 decoding part 中的 lexeme 为中间体，分别组成了一个 rank 4 的 tensor，即 E 和 D。E 和 D 也是 autoencoder 中要 learned 的 parameter。但是他们假设了 E 和 D 中的每一个维度之间是 no interaction 的，且又由于很多 lexeme 并不存在，所以更增加了 E 和 D 的 sparseness——最后的结果就是 E 和 D 的实际有效的维度大大降低——计算大大提速。但是我现在没想清楚这个 assumption 是否合理。另外还有一个实现细节是 Section 2.6。 Problem: 1) 就是上面说到的 no interaction 假设；2) 另一个不太优雅的地方是他们把 三种 constraints 用线性加权组合起来，即 Section 2.5 的地方，这个东西在他们这个框架下是比其他人的线性加权要合理的，因为他们假设 word, synset, lexeme 三者都是在同一个 embedding space。而且他们在实验中也讨论了这三个的权重，还算 OK 了。

##Beyond Words

In our experiments we consider 8 tasks: semantic-relatedness, paraphrase detection, image-sentence ranking and 5 standard classification benchmarks. In these experiments, we extract skip-thought vectors and train linear models to evaluate the representations directly, without any additional fine-tuning. As it turns out, skip-thoughts yield generic representations that perform robustly across all tasks considered.

##Beyond English

### References

1. Samaneh Moghaddam and Martin Ester. 2013. On the design of LDA models for aspect-based opinion mining. CIKM.

2. Samaneh Moghaddam Martin Ester. 2013. The FLDA model for aspect-based opinion mining: addressing the cold start problem. WWW.

3. Siwei Lai, Kang Liu, Liheng Xu, Jun Zhao. 2015. How to Generate a Good Word Embedding? arXiv pre-print.

4. T. K. Landauer. 2002. On the computational basis of learning and cognition: Arguments from lsa. Psychology of learning and motivation, 41:43–84.

5. Radu Soricut and Franz Och. 2015. Unsupervised Morphology Induction Using Word Embeddings. NAACL.

6. Ignacio Iacobacci, Mohammad Taher Pilehvar and Roberto Navigli. 2015. SensEmbed: Learning Sense Embeddings for Word and Relational Similarity. ACL.

7. Zhigang Chen, Wei Lin, Qian Chen, et al. 2015. Revisiting Word Embedding for Contrasting Meaning. ACL.  2

8. Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, et al. 2015. Deep Unordered Composition Rivals Syntactic Methods for Text Classification. ACL.

9. Angeliki Lazaridou, Georgiana Dinu and Marco Baroni. 2015. Hubness and Pollution: Delving into Cross-Space Mapping for Zero-Shot Learning. ACL.

10. Faruqui Manaal, Dodge Jesse, Jauhar Sujay K, et al. 2015. Retrofitting Word Vectors to Semantic Lexicons. NAACL.

11. Sascha Rothe and Hinrich Schütze. 2015. AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes. ACL.

12. Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, et al. 2015. Skip-Thought Vectors. arXiv pre-print.

13. Antoine Bride, Tim Van de Cruys and Nicholas Asher. 2015. A Generalisation of Lexical Functions for Composition in Distributional Semantics. ACL.

14. Yanran Li, Wenjie Li, Fei Sun, et al. 2015. Component- Enhanced Chinese Character Embeddings. EMNLP.

### Highway Networks and Deep Residual Networks

Recently, a breakthrough news spread over social networks. In this post, I will explain this ResNet as a special case of Highway Networks, which has been proposed before. Both of the work is amazing and thought-provoking. Continue reading

#### NIPS 2015 Deep Learning Symposium Part II

Published on January 09, 2016

#### NIPS 2015 Deep Learning Symposium Part I

Published on December 11, 2015