Recently I wanted to create a wordcloud of my tweets and do further analysis. In this post I am going to show you how to connect to twitter in R and how to make a wordcloud from your tweets. To follow this tutorial, you need a Twitter account.
Contents
First steps in R
Install required libraries twitteR and wordcloud and load them.
install.packages(c("wordcloud", "twitteR"))
library(twitteR)
library(wordcloud)
Create a twitter app
To be able to authenticate your API requests with the R package twitteR you need to authenticate yourself. To have an endpoint for that, you need to create a Twitter App at https://apps.twitter.com/. Click “Create New App” and fill the required fields with your values.
- Name: choose a name for your app, unfortunately it has to be unique. Most combinations of R and Twitter I could think of were already taken, so I just took veRenaTweeteR 😉
- Description: Some description.
- Website: They want you to provide a website URL e.g. where your app can be downloaded. Since I don’t plan to “publish” my app in anyway I just put my blog address.
- Callback URL: You have to put http://127.0.0.1:1410 to be redirected after authentication.
When you successfully created your app, go to Keys and Access Tokens. There you find consumer key and consumer secret that you need to authenticate in R.
Authenticating and first steps with twitteR
Save the keys from your Twitter App.
twitter_key<-"your_twitter_key"
twitter_secret<-"your_twitter_secret"
oauth<-setup_twitter_oauth(twitter_key, twitter_secret)
After this, a browser will pop open which will ask you to login with your Twitter account (unless you are already logged in) and ask you to give permissions to yourAppName. When you correctly set the callback URL, the following text will appear:
With the following command we get the 100 newest tweets of user "ExpectAPatronum" (which is me), but you can do it for other users as well. The second line will display the structure of the newest tweet.
myTweets<-userTimeline("ExpectAPatronum", n=100)
str(myTweets[[1]])
A tweet contains lots of information (from statusSource we can even tell I sent it using the iPhone app!).
Reference class 'status' [package "twitteR"] with 17 fields
$ text : chr "Don't agree with everything but still funny! https://t.co/2bMYBDkfGY"
$ favorited : logi FALSE
$ favoriteCount: num 0
$ replyToSN : chr(0)
$ created : POSIXct[1:1], format: "2016-01-18 07:21:31"
$ truncated : logi FALSE
$ replyToSID : chr(0)
$ id : chr "688984546289790976"
$ replyToUID : chr(0)
$ statusSource : chr "Twitter for iPhone"
$ screenName : chr "ExpectAPatronum"
$ retweetCount : num 0
$ isRetweet : logi FALSE
$ retweeted : logi FALSE
$ longitude : chr(0)
$ latitude : chr(0)
$ urls :'data.frame': 1 obs. of 5 variables:
..$ url : chr "https://t.co/2bMYBDkfGY"
..$ expanded_url: chr "https://twitter.com/jennybryan/status/688866722980364289"
..$ display_url : chr "twitter.com/jennybryan/sta…""| __truncated__
..$ start_index : num 45
..$ stop_index : num 68
and 53 methods, of which 39 are possibly relevant:
getCreated, getFavoriteCount, getFavorited, getId, getIsRetweet, getLatitude,
getLongitude, getReplyToSID, getReplyToSN, getReplyToUID, getRetweetCount,
getRetweeted, getRetweeters, getRetweets, getScreenName, getStatusSource, getText,
getTruncated, getUrls, initialize, setCreated, setFavoriteCount, setFavorited, setId,
setIsRetweet, setLatitude, setLongitude, setReplyToSID, setReplyToSN, setReplyToUID,
setRetweetCount, setRetweeted, setScreenName, setStatusSource, setText, setTruncated,
setUrls, toDataFrame, toDataFrame#twitterObj
Creating the wordcloud
With the following wordcloud I created the first wordcloud:
set.seed(1234) # to always get the same wordcloud and for better reproducibility
tweetTexts<-unlist(lapply(myTweets, function(t) { t$text})) # to extract only the text of each status object
words<-unlist(strsplit(tweetTexts, " "))
words<-tolower(words)
clean_words<-words[-grep("http|@|#|ü|ä|ö", words)] # remove urls, usernames, hashtags and umlauts (the latter can not be displayed by all fonts)
wordcloud(clean_words, min.freq=2)
Making it look nicer
Since I didn't like the default font and also not the ones suggested in the example section of the package, I started to look for other possible fonts. From the help I found out that everything can be passed as parameter vfont which is also accepted by the method text {graphics} because this parameter will be passed on to this method. This method accepts Hershey fonts (which contains 8 font families with different faces like bold, italic, ...).
Playing around with that a little I generated a few more wordclouds.
wordcloud(clean_words, min.freq=2, vfont=c("serif", "plain"))
wordcloud(clean_words, min.freq=2, vfont=c("script", "plain"))
wordcloud(clean_words, min.freq=2, vfont=c("gothic italian", "plain"))
One other important issue for a nice wordcloud is definitely also font color. wordcloud uses the package RColorBrewer for that (which is automatically installed with wordcloud).
The package RColorBrewer provides several palettes of colors that look nice together. I chose the palette "Pastel1" with 7 colors (minimum is 3, maximum depends on the palette). Of course you can use par to change other settings of the plot.
pal<-brewer.pal(7, "Pastel1")
par(bg="darkgray")
wordcloud(clean_words, min.freq=2, vfont=c("script", "plain"), colors=pal)
Other settings
As already seen, you can change the font (vfont) and the color (colors) of the wordcloud. There are a lot more settings in wordcloud:
- words
- freq
- scale (=4,.5): range of the size of the words
- min.freq (=3): the minimum frequency of a word to be included. I always set it to at least 2.
- max.words (=Inf): maximum number of words in the wordcloud
- random.order (=TRUE): otherwise words are plotted in decreasing frequency
- random.color (=FALSE)
- rot.per (=.1): how many words are 90 degree rotated
- colors (= "black")
- ordered.colors (= FALSE)
- use.r.layout (=FALSE)
- fixed.asp (=TRUE)
- ...: any parameter that can be passed to text (e.g. vfont)
Source code
library(wordcloud)
library(twitteR)
install.packages("extrafont")
library(extrafont)
font_import()
twitter_key<-"your_key"
twitter_secret<-"your_secret"
oauth<-setup_twitter_oauth(twitter_key, twitter_secret)
myTweets<-userTimeline("ExpectAPatronum", n=100)
str(myTweets[[1]])
tweetTexts<-unlist(lapply(myTweets, function(t) { t$text}))
#### wordcloud
set.seed(1234)
words<-unlist(strsplit(tweetTexts, " "))
words<-tolower(words)
length(grep("http", words))
length(grep("@", words))
length(grep("#", words))
clean_words<-words[-grep("http|@|#|ü|ä|ö", words)]
wordcloud(clean_words, min.freq=2)
#### playing with the settings
wordcloud(clean_words, min.freq=2, vfont=c("serif", "plain"))
wordcloud(clean_words, min.freq=2, vfont=c("script", "plain"))
wordcloud(clean_words, min.freq=2, vfont=c("gothic italian", "plain"))
pal<-brewer.pal(7, "Pastel1")
par(bg="darkgray")
wordcloud(clean_words, min.freq=2, vfont=c("script", "plain"), colors=pal)
#### feature image
pal<-brewer.pal(7, "Dark2")
par(bg="lightgray")
wordcloud(clean_words, min.freq=2, vfont=c("script", "plain"), colors=pal)
Hey Verena,
Danke erstmals für das tolle Tutorial. Es ist genau das was ich gesucht habe.
Leider bekomme ich aber immer eine Fehlermeldung bei:
tweetTexts<-unlist(lapply(myTweets, function(t) { t$text}))
Das ist der Error text:
Error in t$text : $ operator is invalid for atomic vectors
Weißt du, woran das liegen könnte?
Liebe Grüße
Clemens
Hallo Clemens!
Ich habe den Code jetzt noch mal ausprobiert und bei mir klappt es, anfangs hatte ich befürchtet, dass sich das Package oder die Twitter API eventuell geändert hat. Bekommst du Tweets zurück oder ist die Liste myTweets leer? Was kommt denn bei str(myTweets[[1]]) als Output?
LG,
Verena
Thank you so much for this enlightening post! You have no idea how much time I spent looking for a possibility to chance the font 😀 and how much happiness you gave me when I stumbled across your post. I’m just getting started in R and your posts are really great to get a glimpse of what goes on behind the code.
So thanks!
In terms of colour, I can very much recommend the wesanderson package that provides colour palettes that are dominant in his movies 😉