Instructor solution
Vector Semantics Slide Deck (2) (page 29)
You may exit out of this review and return later without penalty.
word2vec & GloVe Review Assignment
1.What differences would you expect for word embeddings that use a short context window vs. a large context window?
2.What are some problems with count-based methods for representing words?
3.Explain the intuition behind word2vec.
4.Consider the task of learning skip-gram embeddings. Provide 4 positive and 8 negative examples for the target word 'shovel' in the following excerpt: "... I purchased a shovel to rake the leaves in my lawn ..."
5.Consider the weighted unigram frequency formula for negative sampling in word2vec: \(P_\alpha(w) = {{count(w)^\alpha}\over{\sum_{w'} count(w')^\alpha}}\). Why is \(\alpha={3\over4}\) preferred over \(\alpha=1\)?
6.word2vec uses the logistic or sigmoid function to predict if a context word \(c\) is a real context word for a target word \(t\). How can we compute \(P(+| t,c)\)?
7.Compare and contrast CBOW and skip-gram. What are the advantages of each?
8.What's the intuition behind GloVe?
9.How does GloVe handle words that never co-occur together in a training corpus?
10.What are the advantages and disadvantages of GloVe compared to word2vec?
You may exit out of this review and return later without penalty.