Friday, November 7, 2014

Geocoding Frac Sand Mines in Wisconsin

Introduction:

Geocoding is the process of taking a description of a location, such as an address or a place name and giving it a location on the earth's surface.  It can be performed by manually entering one address at a time or by using a table.  In this portion of the project, information regarding all frac sand mines in Wisconsin is gathered from the Wisconsin Department of Natural Resources (DNR).  From there, the various members of the class were assigned around fifteen to twenty mines to geocode.  This would allow for the mines to be repeated.  The objective of having multiple people geocode the same mines was to see if there was any error in the geocoding process.  This error can possibly come from the mines not having enough location information and some guessing involved in locating them.

Methods:

The first step in the process was viewing the table of the mine data that was provided from the Wisconsin DNR (Figure 1).  From here it was clear that the table was not ready to be run through the geocoder as geocoders require normalized table data to run correctly and the information in the table provided wasn't normalized in almost any way.  At this point table normalization was begun.  The important fields that had to be normalized to accurately run the geocoder were: address, city, and zip code.  The key was making sure that all of the mines had accurate addresses listed.  This was difficult as most of the address fields included both an address, a city, and the Public Land Survey System (PLSS) identification.  This information had to be split up into an address field, a city field, and a PLSS field.  Unfortunately, there were several records that didn't come with an address field (perhaps the most crucial part of geocoding in this case) and only had PLSS information.  In order to locate these mines and record their addresses, a combination of PLSS data and aerial imagery had to be used along with the geocoder.  To get the address, it was simply a matter of clicking on the map in ArcMap where the mine looked to be based on aerial imagery.  This, however, isn't an exact process and could possibly be a cause of error between the different mines geocoded by the class.

This is the data as given by the DNR, it has been formatted slightly differently with the PLSS and street addresses being split up.  It can be seen that some of the mines didn't come with address or even city data.  (Figure 1)

At this point the table was normalized with all fields filled in (Figure 2) and geocoding was run.  Fortunately, the results came out well with all mines successfully matched.

This is the completed, normalized table that was run through the geocoder successfully.  All of the address fields were found and filled in while the PLSS data was unnecessary for the geocoding in this case.  (Figure 2) 

When the whole class completed their geocoding, it was instructed to merge all of the mines into one feature class and query out the mines that matched mine unique IDs of the mines that had been done individually (Figure 3).  This process seemed like it would be a rather simple one involving the use of a few tools to get a finished product that would be used to compare to the individually geocoded mines in order to find the error.  However, the lack of normalization among the different geocoded mine shapefiles from the various class members made this rather difficult.  Eventually several shapefiles had to simply be excluded from the merge due to their extreme differences compared to the others.

Here is the SQL query that was used along with the select tool to query out the mines from the classes merged shapefile that matched the mine unique IDs of those that had already been completed.  (Figure3)
Once the class's mines were queried out, they could be compared with the mines that had already been completed.


Results and Discussion:


(Figure 4)
The mines were compared and the point distance tool war run to compare the mines I geocoded with the class mines that had the same unique ID as mine.  It turns out that the majority of the mines were all located in relatively similar positions, if not in the same positions as the ones I geocoded.  This may show that the geocoding was done well by the majority of the class to perform it.  However, there are several mines that appear much further away than they likely should.  The example that was clearly the worst geocoded mine by either myself or one of my classmates was the mine with the unique ID 163.  One of the other mines actually matches mine exactly while the other is over 165 miles away.  This vastly changed the average distance of the mines away from themselves.  However, it can be seen from the low median distance that overall the mines were well placed in general (Figure 4).

This relative consistency in mine placement can also be seen in Figure 5.  The one large outlier can be seen in there as well.  The errors that are present can likely be attributed to errors in locating mines that just had PLSS data and lack of proper normalization among the class.





In this map the various mines that were geocoded by me and by the class can be seen.  Most of the mines appear to be relatively close to each other.  There is however, one large outlier on the far eastern part of the state.  (Figure 5)


Conclusion:

Geocoding is an extremely useful process to place objects on a mp when run correctly.  However, when run poorly, geocoding can lead to results that don't mirror real life in any way.  This is troublesome as everybody was given the same data, yet there were many different outputs.  In the future, the group should normalize he mines together to a certain standard.  Also the process of looking up mine addresses using their PLSS description and aerial imagery is a very unscientific method.  However, at times it seems like it is necessary to accomplish the set task.

No comments:

Post a Comment