7 Text data visualization by tidytext

Xiaolin Sima and Yanni Chen

As our community contribution, we created a cheatsheet about text data visualization by tidytext. cheatsheet can be found here

7.1 Motivation for the project and need it addrressed

Text mining and natural language processing is an essential but complicated field with a lot of tools and ways to analyze. However, we found tidytext as a useful door helping us to open the world of text mining. When we meet text data, using tidytext package can make many text analysis tasks easier and more effective. Much of the infrastructure needed for text mining with tidy data frames already exists in other widely used packages like dplyr, tidyr and ggplot2. In that way we created this cheatsheet, providing functions with examples to allow the use of tidytext package combined with existing infrastructure to do some basic text analysis works.

7.2 Own evaluation of the project

By creating the cheatsheet, we have not only learned how to use the tidytext package to do the text frequency and sentiment analysis, but also got familiar with how to combine the tidytext package with other data visualization tools, for example ggplot and wordcloud. The cheatsheet is generally useful if someone wants to visualize the text data analysis, specifically word frequency and sentiment analysis using R.

However, There are also places we need to improve. Adding more example not limited to the wine tasting review might be one place to make the cheatsheet more generalized. Also, the deeper sentiment analysis regarding each category can be listed for further analysis if more room allowed.