Speech and Language Processing: Vector Semantics and Embeddings (pages 12-14)
Think you've got it?
Word Embeddings Review Assignment
1.What's the distributional hypothesis?
2.What shortcoming of bag-of-words do word embeddings solve?
3.What are some elements an ideal word representation should recognize?
4.Create word vectors for 'tall', 'pine', and 'chopped' based on co-occurrences with 'tree' and 'person'.
5.Create a term-term matrix for the sentences in Figure A.
6.Provide k-windows of size k=1, k=2, k=3 and k=4 for 'Shakespeare' in the following sentence: "the greatest Shakespeare play is Macbeth"
7.What's the dot product of [3, 9, 1] • [0, 3, -4]?
8.What's the cosine similarity between [-3, 3, 4] and [2, -5, -1]?
9.Explain the intuition behind tf-idf.
10.Calculate the tf-idf for the documents in Figure A.
11.What are some use cases of word vectors?
12.What's the difference between PMI and PPMI, and why is PPMI preferred for NLP problems?