Bag of Words
Bag of Words
Bag of Words
Week 9-12
T stands for Theory, P stand for Pythonic and D stand for Discussion
Please make notes while discussions.
Note : Source of all images with open License from the Internet
Text Data is Superficial
But Language is Complex...
What is NLP research?
Some Early NLP History
Bag-of-Words Model
Introductions
The bag-of-words model is a way of representing text data
when modeling text with machine learning algorithms.
In the worked example, we have already seen one very simple approach
to scoring: a binary scoring of the presence or absence of words.
The scores are a weighting where not all words are equally as important
or interesting.
The scores have the effect of highlighting words that are distinct
(contain useful information) in a given document.
TF-IDF Calculation
Limitations of Bag-of-Words
The bag-of-words model is very simple to understand and implement
and offers a lot of flexibility for customization on your specific
text data.
● http://www.cs.cmu.edu/~arielpro/15381f16/slides/NLP_guest
_lecture.pdf
Thank you