
Instructor solution
You may exit out of this review and return later without penalty.
Unit 4.3: Word embeddings
1.What's the intuition behind GloVe?
2.How does GloVe handle words that never co-occur together in a training corpus?
3.What are the advantages and disadvantages of GloVe compared to word2vec?
4.Explain the intuition behind word2vec.
5.Consider the weighted unigram frequency formula for negative sampling in word2vec: \(P_\alpha(w) = {{count(w)^\alpha}\over{\sum_{w'} count(w')^\alpha}}\). Why is \(\alpha={3\over4}\) preferred over \(\alpha=1\)?
6.word2vec uses the logistic or sigmoid function to predict if a context word \(c\) is a real context word for a target word \(t\). How can we compute \(P(+| t,c)\)?
7.Compare and contrast CBOW and skip-gram. What are the advantages of each?
8.What are some problems with count-based methods for representing words?
9.Consider the task of learning skip-gram embeddings. Provide 4 positive (word, context) and 8 negative (word, \(\neg\) context) examples for the target word 'shovel' in the following excerpt: "... I purchased a shovel to rake the leaves in my lawn ..."
10.The naive approach for learning embeddings using the skip-gram and CBOW algorithms uses softmax for predictions. Describe some problems with this approach.
11.Why are negative examples necessary when training a binary classifier?
12.Explain the purpose of the following formula:
$$\text{score}(w_{i}, w_{j}) = \frac{\text{count}(w_{i}, w_{j}) - \delta }{\text{count}(w_{i}) \times \text{count}(w_{j}) }$$
- How is the score used?
- What is \(\delta\)?
13.Consider the following formulation of loss for the skip-gram classifier:
$$ L({w}{target}, {w}{context}) =-( \log\sigma({\vec{w}{target}}^{\top} {\vec{w}{context}}) + \sum\limits_{i=1}^{k} \log(1- \sigma({\vec{w}{target}}^{\top} {\vec{w}{\neg context_{i}}}))) $$
Ignoring the log, what value do we expect \( \sum\limits_{i=1}^{k} \sigma({\vec{w}{target}}^{\top} {\vec{w}{\neg context_{i}}}) \) to approach at the end of training? Why?
You may exit out of this review and return later without penalty.