Topic Modeling refers to the task of identifying topics in a series of documents. The model counts all of the words and then categorizes them into topics by identifying word patterns. For example, instead of reading through hundreds of app reviews, you can utilize Topic Modeling to cluster all reviews into two topics: positive or negative. Both positive and negative reviews typically contain their own distinct set of words, which dictates the grouping process.
This type of technology can be commonly found in the real world. Web libraries use LDA to recommend books based on a person’s past reading trends, while news providers use it to group articles based on their similarity.
Comparable to other Machine Learning techniques, NLP models can be both supervised and unsupervised. In order to train these two types of models, we need ‘labeled’ or ‘unlabeled’ data. For labeled data, a human must manually ‘label’ each message in a conversation with a topic. For unsupervised models, it will analyze the messages and let the algorithm suggest topics on its own.