This blog post is based on this report and on Cornelius’ post on topic models in R.

Everyone is talking about text analysis. Is it puzzling that this data source is so popular right now? Actually no. Most of our datasets rely on (hand-coded) textual information. Extracting, processing, and analyzing this oasis of information becomes increasingly relevant for a large variety of research fields. This Methods Bites Tutorial by Cosima Meyer summarizes Cornelius Puschmann’s workshop in the MZES Social Science Data Lab in January 2019 on advancing text mining with R and the package **quanteda**. The workshop offered guidance through the use of quanteda and covered various classification methods, including classification with known categories (dictionaries and supervised machine learning) and with unknown categories (unsupervised machine learning).

This post was updated in December 2020 to be consistent with quanteda’s version 2.1.2. For more information on differences between quanteda versions, have a look at this excellent overview.