
Instructor solution
You may exit out of this review and return later without penalty.
Unit 4.2: Count-based vectors
1.What is meant if the cosine of two words is 1?
2.What's the cosine similarity between [-3, 3, 4] and [2, -5, -1]?
3.Cosine similarity
Give the formula for cosine similarity. Describe what it measures.
4.What's the cosine similarity between [-68, -92, 21, -19, 54] and [28, 32, 51, -78, -10]?
5.What's the cosine similarity between [86, -36, -34, -51] and [51, 67, 27, 52]?
6.What's the cosine similarity between [-53, 81, 96, 3] and [3, 49, 93, -1]?
7.What's the cosine similarity between [-76.81, 5.51] and [44.62, 36.53]?
8.When does \(\text{cos}(\mathbf{a}, \mathbf{a}) = \mathbf{a} \cdot \mathbf{b}\)?
9.When does \(\text{cos}(\mathbf{a}, \mathbf{a}) = \mathbf{a} \cdot \mathbf{b}\)?
10.What's the difference between PMI and PPMI, and why is PPMI preferred for NLP problems?
11.Explain the intuition behind TF-IDF.
12.Calculate TF-IDF for the documents in Figure A.
13.Complete the code to calculate the Euclidean distance. Do not use NumPy.
14.Complete the code to calculate cosine similarity. Do not use NumPy.
15.Euclidean distance
Implement Euclidean distance using numpy:
$$
d_{2}(\mathbf{a}, \mathbf{b}) = \sqrt{\sum_{i=0}^{\vert \mathbf{a} \vert} (a_{i} - b_{i})^{2}}
$$
where \(d_{2}\) represents the Euclidean distance of vectors \(\mathbf{a}\) and \(\mathbf{b}\).
16.Dot product
Implement the function dot_product using numpy. Your function should take two np.ndarrays representing \(\mathbf{x}\) and \(\mathbf{y}\) and return a single number.
17.What is \([3, 9, 1] \cdot [0, 3, -4]\)?
18.Dot product
Implement the function dot_product using numpy. Your function should take two np.ndarrays representing \(\mathbf{x}\) and \(\mathbf{y}\) and return a single number.
19.Given \(\mathbf{a} = [7, 2]\), what vector has the same direction, but 3 times the magnitude of \(\mathbf{a}\)?
20.Find the centroid
Implement the function find_centroid using numpy. Your function should take one np.ndarray representing a multirow matrix X and return a vector representing the centroid.
21.Create a term-term matrix for the sentences in Figure A.
22.Create a term-term matrix for the sentences in Figure A.
23.Complete the code to calculate the 2-norm (aka Euclidean norm) of a vector. Do not use NumPy.
24.Complete the code to normalize a vector to unit length using the 2-norm (aka Euclidean norm). Do not use NumPy for this.
25.What does a vector's 2-norm represent?
26.What does a vector's 2-norm represent?
27.If \(\sum_{i=0}^{ \vert \mathbf{x} \vert } x_{i}^{2} = 1\), what can you say about \(\mathbf{x}\)?
28.If \(\sum_{i=0}^{ \vert \mathbf{x} \vert } x_{i}^{2} = 1\), what can you say about \(\mathbf{x}\)?
29.Complete the code to calculate the frequency of words that co-occur within a symmetric window (\(k =1 \)) for the given word.
30.When \(k = 2 \), what will be target word and context words in the \(k\)-window for the sentence "The Spanish galleon disappeared into the night"?
31.Provide symmetric \(k\)-windows of size \(k=1\), \(k=2\), \(k=3\), and \(k=4\) for Shakespeare in the following sentence:
>the greatest Shakespeare play is Macbeth
You may exit out of this review and return later without penalty.