Fells Stats Know Your Data


The OpenStreetMap Package Opens Up

A new version of the OpenStreetMap package is now up on CRAN, and should propagate to all the mirrors in the next few days. The primary purpose of the package is to provide high resolution map/satellite imagery for use in your R plots. The package supports base graphics and ggplot2, as well as transformations between spatial coordinate systems.

The bigest change in the new version is the addition of dozens of tile servers, giving the user the option of many different map looks, including those from Bing, MapQuest and Apple.

nm = c("osm", "maptoolkit-topo",
"waze", "mapquest", "mapquest-aerial",
"bing", "stamen-toner", "stamen-terrain",
"stamen-watercolor", "osm-german", "osm-wanderreitkarte",
"mapbox", "esri", "esri-topo",
"nps", "apple-iphoto", "skobbler",
 "opencyclemap", "osm-transport",
"osm-public-transport", "osm-bbike", "osm-bbike-german")

for(i in 1:length(nm)){
	map = openmap(c(lat= 38.05025395161289,   lon= -123.03314208984375),
			c(lat= 36.36822190085111,   lon= -120.69580078125),

You can also render maps from cloudmade.com which hosts tons of map tilings. Simply set the "type" parameter to "cloudmade-<id>" where <id> is the cloudmade identifier for the map you want to use. Here is a sample:

nm = c("cloudmade-2","cloudmade-999","cloudmade-998",
#setCloudMadeKey("<your key>")
for(i in 1:length(nm)){
	map = openmap(c(lat= 38.05025395161289,   lon= -123.03314208984375),
			c(lat= 36.36822190085111,   lon= -120.69580078125),

Maps are initially put in a sperical mercator projection which is the standard for most (all?) map tiling systems, but can easily be translated to long-lat (or any other projection) using the openproj function. Maps can be plotted in ggplot2 using the autoplot function.

mapLatLon = openproj(map)

The package also has a Java GUI to help with map type selection, and specification of coordinates to bound your map. clicking on the map will give you the latitude and longitude of the point clicked.


Probably the main alternative to OpenStreetMap is the ggmap package. ggmap is an excellent package, and it is somewhat unfortunate that there is a significant duplication of effort between it and OpenStreetMap. That said, there are some differences that may help you decide which to use:

Reasons to favor OpenStreetMap:

  • More maps: OpenStreetMap supports more map types.
  • Better image resolution: ggmap only fetches one png from the server, and thus is limited to the resolution of that png, whereas OpenStreetMap can download many map tiles and stich them together to get an arbitrarily high image resolution.
  • Transformations: OpenStreetMap can be used with any map coordinate system, whereas ggmap is limited to long-lat.
  • Base graphics: Both packages support ggplot2, but OpenStreetMap also supports base graphics.
Reasons to favor ggmap:
  • No Java dependency: ggmap does not require Java to be installed.
  • Geocoding: ggmap has functions to do reverse geo coding.
  • Google maps: While OpenStreetMap has more map types, it currently does not support google maps.





Interview by DecisionStats

Ajay Ohri interviewed
on his popular DecisionStats blog. Topics discussed
ranged widely from Fellows
, to Deducer, to statnet, to Poker A.I.,
to Big Data.    

Filed under: Deducer, R No Comments

Quickly Profiling Compiled Code within R on the Mac

This is a quick note on profiling your compiled code on the mac. It is important not to guess when figuring out where the bottlenecks in your code are, and for this reason, the R manual has several suggestions on how to profile compiled code running within R. All of the methods are platform dependent, with linux requiring command line tools (sprof, and oprofile). On the mac there are some convenient GUI tools built in to help you out.

Our test code is drawn from the ergm package, and fits a network model to a small graph. This model is fit using markov chain monte carlo, so we expect most of the time to be spent drawing MCMC samples.

flomarriage = network(flo,directed=FALSE)
flomarriage %v% "wealth" = c(10,36,27,146,55,44,20,8,42,103,48,49,10,48,32,3)
for(i in 1:10) gest = ergm(flomarriage ~ kstar(1:2) + absdiff("wealth") + triangle)

The quickest way to get an idea of where your program is spending time is to open Activity Monitor.app and select the R process while the above code is running. Then click sample, which should result in something like:

In order to get down to the interesting code, you'll have to wade though a bunch of recursive Rf_eval and do_begin calls used by R. Finally though we see the MCMCSample function in ergm.so which contains almost all of the samples. So it is as we expected, the MCMC sampling takes up all the time. Underneeth that we see calls to:

  • ChangeStats: calculates the network statistics
  • MH_TNT: calculates the metropolis proposal
  • ToggleEdge: alters the network to add (or remove) an edge

But how much these contribute is a bit difficult to see here. Enter the Time Profiler in Instruments.app. You can launch this program from within Xcode.app under Xcode - Open Developer Tool - Instruments. Once Instruments is open, click Library and drag the Time Profiler into the program. Then, while the R code is running, select 'Attach to Process' - R and click record. You should get something like:
Like in the Activity Monitor, we have to drill down past all of the recursive Rf_eval calls, but once we do, we see that ChangeStats is taking 44.6% of the time, MH_TNT takes 30.9% and ToggleEdges takes 11.7%. We could further expand these out to identify possible targets for optimization within each of these functions.

So there you have it. Apple makes it really easy to profile your compiled code. One issue that I ran into is to make sure that your Xcode is up-to-date and that Instruments is version 4.0 or above. I experienced some crashes profiling R with older versions.

Filed under: R No Comments