Unit 4.2: Count-based vectors

1.What's the difference between PMI and PPMI, and why is PPMI preferred for NLP problems?

2.Complete the code to calculate the frequency of words that co-occur within a symmetric window (\(k =1 \)) for the given word.

3.When \(k = 2 \), what will be target word and context words in the \(k\)-window for the sentence "The Spanish galleon disappeared into the night"?

4.Explain the intuition behind TF-IDF.

5.Calculate TF-IDF for the documents in Figure A.

6.Complete the code to calculate the Euclidean distance. Do not use NumPy.

7.Complete the code to calculate cosine similarity. Do not use NumPy.

8.Complete the code to calculate the 2-norm (aka Euclidean norm) of a vector. Do not use NumPy.

9.Complete the code to normalize a vector to unit length using the 2-norm (aka Euclidean norm). Do not use NumPy for this.

10.What is meant if the cosine of two words is 1?

11.What's the cosine similarity between [-3, 3, 4] and [2, -5, -1]?

12.Cosine similarityGive the formula for cosine similarity. Describe what it measures.

13.What's the cosine similarity between [-68, -92, 21, -19, 54] and [28, 32, 51, -78, -10]?

14.What's the cosine similarity between [86, -36, -34, -51] and [51, 67, 27, 52]?

15.What's the cosine similarity between [-53, 81, 96, 3] and [3, 49, 93, -1]?

16.What's the cosine similarity between [-76.81, 5.51] and [44.62, 36.53]?

17.Euclidean distanceImplement Euclidean distance using numpy:$$d_{2}(\mathbf{a}, \mathbf{b}) = \sqrt{\sum_{i=0}^{\vert \mathbf{a} \vert} (a_{i} - b_{i})^{2}}$$ where \(d_{2}\) represents the Euclidean distance of vectors \(\mathbf{a}\) and \(\mathbf{b}\).

18.Dot productImplement the function dot_product using numpy. Your function should take two np.ndarrays representing \(\mathbf{x}\) and \(\mathbf{y}\) and return a single number.

19.What is \([3, 9, 1] \cdot [0, 3, -4]\)?

20.Provide symmetric \(k\)-windows of size \(k=1\), \(k=2\), \(k=3\), and \(k=4\) for Shakespeare in the following sentence: >the greatest Shakespeare play is Macbeth

21.What does a vector's 2-norm represent?

22.If \(\sum_{i=0}^{ \vert \mathbf{x} \vert } x_{i}^{2} = 1\), what can you say about \(\mathbf{x}\)?

23.When does \(\text{cos}(\mathbf{a}, \mathbf{a}) = \mathbf{a} \cdot \mathbf{b}\)?

24.Given \(\mathbf{a} = [7, 2]\), what vector has the same direction, but 3 times the magnitude of \(\mathbf{a}\)?

25.Find the centroidImplement the function find_centroid using numpy. Your function should take one np.ndarray representing a multirow matrix X and return a vector representing the centroid.

26.Create a term-term matrix for the sentences in Figure A.

27.Create a term-term matrix for the sentences in Figure A.

Did you like this question?

Was this helpful?

You may exit out of this review and return later without penalty.