###### Instructor solution

###### Speech and Language Processing: Vector Semantics and Embeddings (pages 12-14)

You may exit out of this review and return later without penalty.

You can use this assignment in your class!

Unit 4.2: Count-based vectors

*1.*What is meant if the cosine of two words is 1?*2.*What's the cosine similarity between [-3, 3, 4] and [2, -5, -1]?*3.*Cosine similarity

Give the formula for cosine similarity. Describe what it measures.*4.*What's the cosine similarity between [-68, -92, 21, -19, 54] and [28, 32, 51, -78, -10]?*5.*What's the cosine similarity between [86, -36, -34, -51] and [51, 67, 27, 52]?*6.*What's the cosine similarity between [-53, 81, 96, 3] and [3, 49, 93, -1]?*7.*What's the cosine similarity between [-76.81, 5.51] and [44.62, 36.53]?*8.*When does \(\text{cos}(\mathbf{a}, \mathbf{a}) = \mathbf{a} \cdot \mathbf{b}\)?*9.*When does \(\text{cos}(\mathbf{a}, \mathbf{a}) = \mathbf{a} \cdot \mathbf{b}\)?*10.*What's the difference between PMI and PPMI, and why is PPMI preferred for NLP problems?*11.*Explain the intuition behind TF-IDF.*12.*Calculate TF-IDF for the documents in Figure A.*13.*Complete the code to calculate the Euclidean distance. Do not use NumPy.*14.*Complete the code to calculate cosine similarity. Do not use NumPy.*15.*Euclidean distance

Implement Euclidean distance using numpy:

$$

d_{2}(\mathbf{a}, \mathbf{b}) = \sqrt{\sum_{i=0}^{\vert \mathbf{a} \vert} (a_{i} - b_{i})^{2}}

$$

where \(d_{2}\) represents the Euclidean distance of vectors \(\mathbf{a}\) and \(\mathbf{b}\).*16.*Dot product

Implement the function dot_product using numpy. Your function should take two np.ndarrays representing \(\mathbf{x}\) and \(\mathbf{y}\) and return a single number.*17.*What is \([3, 9, 1] \cdot [0, 3, -4]\)?*18.*Dot product

Implement the function dot_product using numpy. Your function should take two np.ndarrays representing \(\mathbf{x}\) and \(\mathbf{y}\) and return a single number.*19.*Given \(\mathbf{a} = [7, 2]\), what vector has the same direction, but 3 times the magnitude of \(\mathbf{a}\)?*20.*Find the centroid

Implement the function find_centroid using numpy. Your function should take one np.ndarray representing a multirow matrix X and return a vector representing the centroid.*21.*Create a term-term matrix for the sentences in Figure A.*22.*Create a term-term matrix for the sentences in Figure A.*23.*Complete the code to calculate the 2-norm (aka Euclidean norm) of a vector. Do not use NumPy.*24.*Complete the code to normalize a vector to unit length using the 2-norm (aka Euclidean norm). Do not use NumPy for this.*25.*What does a vector's 2-norm represent?*26.*What does a vector's 2-norm represent?*27.*If \(\sum_{i=0}^{ \vert \mathbf{x} \vert } x_{i}^{2} = 1\), what can you say about \(\mathbf{x}\)?*28.*If \(\sum_{i=0}^{ \vert \mathbf{x} \vert } x_{i}^{2} = 1\), what can you say about \(\mathbf{x}\)?*29.*Complete the code to calculate the frequency of words that co-occur within a symmetric window (\(k =1 \)) for the given word.*30.*When \(k = 2 \), what will be target word and context words in the \(k\)-window for the sentence "The Spanish galleon disappeared into the night"?*31.*Provide symmetric \(k\)-windows of size \(k=1\), \(k=2\), \(k=3\), and \(k=4\) for Shakespeare in the following sentence:

>the greatest Shakespeare play is Macbeth

Jurafsky, D., Martin, J.H.: Speech and Language Processing (3rd ed.). Retrieved from https://web.stanford.edu/~jurafsky/slp3/6.pdf (Apr 22, 2023)

Prev

Page 12

Next

You may exit out of this review and return later without penalty.