word2vec & GloVe Review Assignment
1.What differences would you expect for word embeddings that use a short context window vs. a large context window?
2.What are some problems with count-based methods for representing words?
3.Explain the intuition behind word2vec.
4.Consider the task of learning skip-gram embeddings. Provide 4 positive and 8 negative examples for the target word 'shovel' in the following excerpt: "... I purchased a shovel to rake the leaves in my lawn ..."
5.Consider the weighted unigram frequency formula for negative sampling in word2vec: \(P_\alpha(w) = {{count(w)^\alpha}\over{\sum_{w'} count(w')^\alpha}}\). Why is \(\alpha={3\over4}\) preferred over \(\alpha=1\)?
6.word2vec uses the logistic or sigmoid function to predict if a context word \(c\) is a real context word for a target word \(t\). How can we compute \(P(+| t,c)\)?
7.Compare and contrast CBOW and skip-gram. What are the advantages of each?
8.What's the intuition behind GloVe?
9.How does GloVe handle words that never co-occur together in a training corpus?
10.What are the advantages and disadvantages of GloVe compared to word2vec?
Did you like this question?
Was this helpful?
You may exit out of this review and return later without penalty.