###### Instructor solution

You may exit out of this review and return later without penalty.

You can use this assignment in your class!

Unit 4.2: Count-based vectors

*1.*What's the difference between PMI and PPMI, and why is PPMI preferred for NLP problems?*2.*Complete the code to calculate the frequency of words that co-occur within a symmetric window (\(k =1 \)) for the given word.*3.*When \(k = 2 \), what will be target word and context words in the \(k\)-window for the sentence "The Spanish galleon disappeared into the night"?*4.*Explain the intuition behind TF-IDF.*5.*Calculate TF-IDF for the documents in Figure A.*6.*Complete the code to calculate the Euclidean distance. Do not use NumPy.*7.*Complete the code to calculate cosine similarity. Do not use NumPy.*8.*Complete the code to calculate the 2-norm (aka Euclidean norm) of a vector. Do not use NumPy.*9.*Complete the code to normalize a vector to unit length using the 2-norm (aka Euclidean norm). Do not use NumPy for this.*10.*What is meant if the cosine of two words is 1?*11.*What's the cosine similarity between [-3, 3, 4] and [2, -5, -1]?*12.*Cosine similarity

Give the formula for cosine similarity. Describe what it measures.*13.*What's the cosine similarity between [-68, -92, 21, -19, 54] and [28, 32, 51, -78, -10]?*14.*What's the cosine similarity between [86, -36, -34, -51] and [51, 67, 27, 52]?*15.*What's the cosine similarity between [-53, 81, 96, 3] and [3, 49, 93, -1]?*16.*What's the cosine similarity between [-76.81, 5.51] and [44.62, 36.53]?*17.*Euclidean distance

Implement Euclidean distance using numpy:

$$

d_{2}(\mathbf{a}, \mathbf{b}) = \sqrt{\sum_{i=0}^{\vert \mathbf{a} \vert} (a_{i} - b_{i})^{2}}

$$

where \(d_{2}\) represents the Euclidean distance of vectors \(\mathbf{a}\) and \(\mathbf{b}\).*18.*Dot product

Implement the function dot_product using numpy. Your function should take two np.ndarrays representing \(\mathbf{x}\) and \(\mathbf{y}\) and return a single number.*19.*What is \([3, 9, 1] \cdot [0, 3, -4]\)?*20.*Provide symmetric \(k\)-windows of size \(k=1\), \(k=2\), \(k=3\), and \(k=4\) for Shakespeare in the following sentence:

>the greatest Shakespeare play is Macbeth*21.*What does a vector's 2-norm represent?*22.*If \(\sum_{i=0}^{ \vert \mathbf{x} \vert } x_{i}^{2} = 1\), what can you say about \(\mathbf{x}\)?*23.*When does \(\text{cos}(\mathbf{a}, \mathbf{a}) = \mathbf{a} \cdot \mathbf{b}\)?*24.*Given \(\mathbf{a} = [7, 2]\), what vector has the same direction, but 3 times the magnitude of \(\mathbf{a}\)?*25.*Find the centroid

Implement the function find_centroid using numpy. Your function should take one np.ndarray representing a multirow matrix X and return a vector representing the centroid.*26.*Create a term-term matrix for the sentences in Figure A.*27.*Create a term-term matrix for the sentences in Figure A.

You may exit out of this review and return later without penalty.