David Kaumanns, 5.11.2013
“the meaning of a word is its use in the language”
\(\Rightarrow\) word embeddings
Children learn meaning by semantic bootstrapping.
Abstract concepts are built upon more concrete concepts.
… beginning with (and supported by) concrete physical properties.
Linguistic word representations can be improved by experiential data.
This has been shown for
But what about
Raw image representations
From a set of images
Extract feature descriptor vectors for iconic regions.
Quantize the feature vectors into clusters (k-means).
For each image
Women standing in line to vote in Bangladesh.
1 Bangladesh 1 line 1 standing 1 vote 1 Women
1 robot 1 pinball 1 people 1 men 1 man 1 machine 1 light 1 game 1 color 1 car
Are the image-text corpora adequate?
Are visual embeddings for image tags useful by themselves?
How to fine-tune the parameters?
How to finally integrate visual semantics into linguistic semantics?