All the Things You Can Do With Coordinates

You’re looking at a new dataset, and notice that observations have latitude and longitude coordinates. “I can plot these, but what else can I do?”, you think to yourself.

Well fear not! The sheer amount of free information you can get with nothing but a latitude and longitude coordinate is astonishing. This is possible due to reverse geocoding, which is the process of converting latitude and longitude coordinates to another geographic unit. For example, the lng/lat coordinates \((-73.9857, 40.7484)\) reverse geocode to:

  • The Empire State Building
  • 20 W 34th St, New York, New York County, NY 10001
  • The individual parts of the above address. “Address line”, “City”, “County”, “State/Territory”, “Zip Code”,
  • Census Tract 76, New York County, New York

Census Data

I mentioned that the Empire State Building lies in Census Tract 76, New York County, New York. What does this mean?

The United States Census Bureau conducts a census in every year ending in 0. It has a constitutional mandate to count every person in the United States, regardless of age, citizenship, and housing status. In non-Census years, the Census Bureau conducts the American Community Survey, which contains an expanded group of variables on a sample of people in each tract. The text of the 2018 American Community Survey (PDF warning) is contained here.

A census tract is a small geographic boundary of about 4000 people aimed at being homogeneous as possible. A given tract exists within a single county in a single state, and ignores zip code and city boundaries.

You can use the tidycensus package to obtain geometric polygons for census tracts, counties, and states, along with any demographic variables that the Census Bureau provides. You can determine which tracts your coordinates are in via the sf package. It allows you to perform spatial joins. A spatial join between a set of points and a set of polygons returns the answer to “which polygon(s) contain the given points?” If your polygons do not overlap (which is always the case with census boundaries), then exactly one or zero polygons will be returned (zero means that it isn’t in a census tract, which can happen if you accidentally have coordinates in Canada.)

Interesting demographic questions you can ask are:

  • What percent of households in the tract with this coordinate are married?
  • What percent of people are renters?
  • How many households in the corresponding tract own at least one car?
  • How many people in the corresponding tract work in the manufacturing sector?

Caveats

I’m a statistician by trade, so of course there are going to be caveats!

  • Census data is amazing for characterizing areas, but awful for characterizing individual people due to the ecological fallacy.

  • If you don’t have a lot of points in diverse geographic areas, then you will have to deal with lack of power when drawing any inferences at the tract level. If this is the case then you should limit your analysis to larger geographic boundaries like counties and individual states.

Historical Analysis

Do you have data in Turkey and want to know where in the Byzantine Empire it took place? Or data from the 80s in Czechoslovakia, which is no longer a country? Then you can reverse geocode to historical shapefiles.

Other Geography

There are countless kinds of territories and buildings, and you can reverse geocode to practically any of them via spatial joining as long as you have a shapefile for the given geography.

Image recognition might be hard, but reverse geocoding in 2018 is a breeze. ([Source](https://xkcd.com/1425/)\)

Figure 1: Image recognition might be hard, but reverse geocoding in 2018 is a breeze. (Source)

Some interesting public shapefiles are:

Fictional Worlds

If you’re interested in worldbuilding and fictional places, you can even find shapefiles for those!

Caveats

One potential issue is that if you aren’t looking at public boundaries or natural geographic features, you will probably have to shell out a lot of money to buy the shapefiles you need. For example, Starbucks is unlikely to have a public repository of shapefiles for every Starbucks in the world. But if a “good enough” solution is fine for you, and you can get center coordinates, then you can draw a bounding box of a particular radius around the center point. That allows you to ask questions like “Did this activity take place within half a mile of a Starbucks?”.