{"id":248,"date":"2012-09-11T22:48:07","date_gmt":"2012-09-11T22:48:07","guid":{"rendered":"http:\/\/blog.fellstat.com\/?p=248"},"modified":"2012-09-11T22:48:07","modified_gmt":"2012-09-11T22:48:07","slug":"wordcloud-makes-words-less-cloudy","status":"publish","type":"post","link":"https:\/\/blog.fellstat.com\/?p=248","title":{"rendered":"wordcloud makes words less cloudy"},"content":{"rendered":"<p>&nbsp;<\/p>\n<p>An update to the wordcloud package (2.2) has been released to CRAN. It includes a number of improvements to the basic wordcloud. Notably that you may now pass it text and Corpus objects directly. as in:<\/p>\n<pre>#install.packages(c(\"wordcloud\",\"tm\"),repos=\"http:\/\/cran.r-project.org\")\nlibrary(wordcloud)\nlibrary(tm)<\/pre>\n<pre>\nwordcloud(\"May our children and our children's children to a \nthousand generations, continue to enjoy the benefits conferred \nupon us by a united country, and have cause yet to rejoice under \nthose glorious institutions bequeathed us by Washington and his \ncompeers.\",colors=brewer.pal(6,\"Dark2\"),random.order=FALSE)<\/pre>\n<p><a href=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/09\/blog_linc.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-250\" title=\"blog_linc\" src=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/09\/blog_linc-300x264.png\" alt=\"\" width=\"300\" height=\"264\" \/><\/a><\/p>\n<pre>data(SOTU)\nSOTU &lt;- tm_map(SOTU,function(x)removeWords(tolower(x),stopwords()))\nwordcloud(SOTU, colors=brewer.pal(6,\"Dark2\"),random.order=FALSE)<\/pre>\n<p><a href=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/09\/blog_corp.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-251\" title=\"blog_corp\" src=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/09\/blog_corp-300x264.png\" alt=\"\" width=\"300\" height=\"264\" \/><\/a><\/p>\n<p>This bigest\u00a0improvement\u00a0in this version though is a way to make your text plots more readable. A very common type of plot is a scatterplot, where instead of plotting points, case labels are plotted. This is accomplished with the text function in base R. Here is a simple artificial example:<\/p>\n<pre>\nstates &lt;- c('Alabama', 'Alaska', 'Arizona', 'Arkansas', \n\t'California', 'Colorado', 'Connecticut', 'Delaware', \n\t'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois',\n\t'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana',\n\t'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota',\n\t'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', \n\t'New Hampshire', 'New Jersey', 'New Mexico', 'New York', \n\t'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon', \n\t'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', \n\t'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', \n\t'West Virginia', 'Wisconsin', 'Wyoming')\n\nloc &lt;- rmvnorm(50,c(0,0),matrix(c(1,.7,.7,1),ncol=2))\n\nplot(loc[,1],loc[,2],type=\"n\")\ntext(loc[,1],loc[,2],states)<\/pre>\n<pre><a href=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/09\/blog_text1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-252\" title=\"blog_text1\" src=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/09\/blog_text1-1024x901.png\" alt=\"\" width=\"550\" height=\"483\" \/><\/a><\/pre>\n<p>Notice how many of the state names are unreadable due to overplotting, giving the scatter plot a cloudy appearance. The textplot function in wordcloud lets us plot the text without any of the words overlapping.<\/p>\n<pre>\ntextplot(loc[,1],loc[,2],states)\n<a href=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/09\/blog_text2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-253\" title=\"blog_text2\" src=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/09\/blog_text2-1024x901.png\" alt=\"\" width=\"550\" height=\"483\" \/><\/a><\/pre>\n<pre><\/pre>\n<p>A big improvement! The only thing still hurting the plot is the fact that some of the states are only partially visible in the plot. This can be fixed by setting x and y limits, whch will cause the layout algorithm to stay in bounds.<\/p>\n<pre>mx &lt;- apply(loc,2,max)\nmn &lt;- apply(loc,2,min)\ntextplot(loc[,1],loc[,2],states,xlim=c(mn[1],mx[1]),ylim=c(mn[2],mx[2]))<\/pre>\n<p><a href=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/09\/blog_text3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-255\" title=\"blog_text3\" src=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/09\/blog_text3-1024x901.png\" alt=\"\" width=\"550\" height=\"483\" \/><\/a>Another great thing with this release is that the layout algorithm has been exposed so you can create your own beautiful custom plots. Just pass your desired coordinates (and word sizes) to wordlayout, and it will return bounding boxes close to the\u00a0originals, but with no overlapping.<\/p>\n<pre>plot(loc[,1],loc[,2],type=\"n\")\nnc &lt;- wordlayout(loc[,1],loc[,2],states,cex=50:1\/20)\ntext(nc[,1] + .5*nc[,3],nc[,2]+.5*nc[,4],states,cex=50:1\/20)<\/pre>\n<p><a href=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/09\/blog_text4.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-256\" title=\"blog_text4\" src=\"http:\/\/blog.fellstat.com\/wp-content\/uploads\/2012\/09\/blog_text4-1024x901.png\" alt=\"\" width=\"550\" height=\"483\" \/><\/a><\/p>\n<p>okay, so this one wasn&#8217;t very creative, but it begs for some further thought. Now we have word clouds where not only the size can mean something, but also the x\/y position (roughly) and color. Done right, this could add whole new layer of statistical richness to the visually\u00a0pleasing\u00a0but statistically shallow standard wordcloud.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; An update to the wordcloud package (2.2) has been released to CRAN. It includes a number of improvements to the basic wordcloud. Notably that you may now pass it text and Corpus objects directly. as in: #install.packages(c(&#8220;wordcloud&#8221;,&#8221;tm&#8221;),repos=&#8221;http:\/\/cran.r-project.org&#8221;) library(wordcloud) library(tm) wordcloud(&#8220;May our children and our children&#8217;s children to a thousand generations, continue to enjoy the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,11],"tags":[],"class_list":["post-248","post","type-post","status-publish","format-standard","hentry","category-r","category-wordcloud"],"_links":{"self":[{"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=\/wp\/v2\/posts\/248","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=248"}],"version-history":[{"count":0,"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=\/wp\/v2\/posts\/248\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=248"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=248"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.fellstat.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=248"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}