Tokenization webpage from Stanford NLP (pages 1, 2)
Think you've got it?
Natural Language Processing overview assignment
1.What is an n-gram language model and what are they used for?
2.How to import linguistic data using NLTK?
3.Describe the general steps involved in creating a classifier using supervised learning, in NLTK.
4.Write a program using Python and NLTK that will find all adverbs in “document.txt” that do not end in “-ly” and print out the 100 most frequent ones in order of decreasing frequency, with their frequencies.
5.What is tokenization in NLP?
6.What is the difference between statistical and symbolic NLP?
7.What are stopwords and how are they used in NLP?
8.What system of code does the following Python expression exploit and what are its results? Describe what the expression finds in detail, and name the NLP task that it carries out.
9.Why would 'Linguistics Ambiguity' pose a problem for NLP?
10.What is a 'chatbot' and what two important NLP processes are needed for them to function as they do?
11.What is the Turing Test?
12.What is a probabilistic context free grammar (PCFG)?
13.What is Natural Language Processing?
14.What is John Searle’s “Chinese Room” and what is it supposed to show?
15.Assign Penn Treebank POS tags to the words in the following sentence:
16.What is part-of-speech (POS) tagging and how can it be performed?
17.Who was Eliza and what was 'her' primary purpose?
18.Discuss challenges with POS-tagging the following sentence properly (by hand) using Penn Treebank tags: “One cannot look to reading for answers not well-known.”
19.Annotate the following sentence with Penn Treebank POS tags:
“All the king’s horses and all the king’s men could not put Humpty-Dumpty together again.”
20.What is WordNet?
21.What are semantic selection restrictions and how (roughly) could they be assigned using WordNet?
22.What are word vectors?
23.What is the dot product between the following vectors, \(a · b\): \(a = [1, 7, -5], b = [-3, 3, -1]\)
24.What is Propbank?
25.Describe the similarities and differences between the Propbank entities called “frame files” and Framenet “frames” in regards to their notion of “frame” and the way they label the complements of verbs.
26.What are conditional frequency distributions and how are they related to n-grams and Markov models?
27.What is machine learning?
28.What's the difference between supervised and unsupervised machine learning techniques?
29.What's a Decision Tree?
30.What strategies exist to improve the performance of weak classifiers, such as decision trees?
31.What is overfitting and what is underfitting?
32.What is sentiment analysis?
33.In sentiment analysis, what are some strategies to identify the following sentence as a negative sentiment: "I did not like the movie."
34.What are some example tasks of sentiment analysis?
35.Describe the difference between building a sentiment analysis classifier that classifies a text as positive or negative vs. one that classifies a text on a scale between 1-5 in terms of likability.
36.What is a 'Parse Tree'?
37.In Today's times, how are chatbots trained?
38.What is Speech Recognition?
39.What is semantic role labeling (SRL)?
40.What are five features of sentence constituents one would extract in order to train a Semantic Role classifier?
41.How, in general, are the meanings of ‘words’ (lexical units) most often represented in computational semantics today?
42.What is canonical form?
43.Explain in both computational and linguistic terms why Chinese cannot be word-tokenized as easily or definitively as English and other European languages.
44.How would an inference system use backwards or forwards chaining to answer the question, “How can I make my mother happy?”
45.What does the meaning of a word consist of in a vector model, in both (a) concrete and philosophical (b) terms?
46.What are some examples of text classification?
47.How can stop words be determined in text classification tasks?
48.What's the difference between multilabel and multinomial text classification?
49.What is word normalization and why might it be helpful for text classification tasks?
50.What is meant by "bag of words" in the context of text classification?
51.What's the difference between multilabel and multinomial text classification?
52.Explain the steps involved in calculating the probability of the phrase “tricks are for kids” occurring in a particular corpus using a bigram count-based model, relative frequencies, and log probabilities.
53.What is the sigmoid function and why is it useful?
54.What is one algorithm to handle stemming?