Ultradense Word Embeddings by Orthogonal Transformation

Introduction

We introduce a method that learns an orthogonal transformation of the embedding space that focuses the information relevant for a task in an ultradense subspace of a dimensionality that is smaller by a factor of 100 than the original space.

To learn more about Ultradense Embeddings, read the following paper:
Ultradense Word Embeddings by Orthogonal Transformation

Pre-trained embeddings

You can download pre-trained embeddings for various languages and domains. All embeddings have the sentiment information in dimension 1, the concreteness information in dimension 11 (English only) and the frequency information in dimension 21.
File                      Language       Domain       based on
File (3.0 GB) EN News GoogleNews-vectors-negative300
File (4.3 GB) EN Twitter
File (1.8 GB) DE Web
File (3.2 GB) CZ Web
File (518 MB) ES Web
File (210 MB) FR Web frWac_no_postag_no_phrase_500_skip_cut100

Lexica

If you are only interested in the lexica created in the paper please download this file (186 MB). Lexica are not normalized, i.e. zero does not mean neutral. Please scale the lexicon as you need it.

Cite

If you use Ultradense Embeddings, please cite the following paper:

@article{rothe2016ultradense,
  title   = {Ultradense Word Embeddings by Orthogonal Transformation},
  author  = {Rothe, Sascha and Ebert, Sebastian and Sch\"utze, Hinrich},
  journal = {arXiv preprint arXiv:1602.07572},
  year    = {2016}
}

Contact: Sascha Rothe (cis page)