Thursday, April 7, 2016

Data Normalization, Geocoding, and Error Assessment

Goals and objectives:
The purpose of this lab was to work with a poorly organized excel table and normalize it to be put into ArcMap. This lab also introduced the geocoding tool and allowed us to gain experience using it. The final objective is to compare our own geocoded results to those of our classmates and the actual locations of the mines. The objectives of the lab are laid out in Figure 1.

Figure 1: List of Objectives for this Lab



Methods:
Data Normalization
    After being given an excel table of frac sand mine locations in Wisconsin we had to prepare it to be put into excel. This is called normalization of data. It is the process of putting data into a general format that can be read by many different programs. In this case, our goal was to eventually put this table into ArcMap to be geocoded (Figure 2 and 3 show the table before and after normalization was completed). Each student was assigned a handful of mines to work with. The following steps were done on only these assigned mines.



Geocode
      This portion of the lab was completed in ArcMap. Geocoding is the process of assigning locations to addresses. This allows features to be portrayed accurately. To geocode, you open the geocoding tool bar and through the World Geocode Service you upload the normalized table. The software then assigns an address to the points. Looking at the interactive rematch inspector window, you are able to see the match score. The higher the score the more accurate the position. I only had two location that was a partial match.


 Locate PLSS Description
      After assigning the correct address for these locations, the next task was to select the mines that only had a Public Land Survey System (PLSS) location. PLSS is a method developed and used in the United States to divide land. Those mines with only a PLSS location are not accurate and therefore need to be corrected. From here, I then found the exact address and location with the help of Google Earth and assigned it to the mine. (Results shown in Figure 4)


  Compare the Results
       This portion of the lab was dedicated to assessing the error of our mine locations. To do so,  I first had to find the mines that I had location for by querying the Mine_ID field. I then created a new layer of these particular mines for my classmates' locations and then a layer of the actual locations provided to us. I then compared my mines to my classmates using the "Point Distance Tool". The error table can be seen in figure 4. Using the same tool, I then compared my mines to the actual locations of the mines (Figure 5).


Results:
Data Normalization
      Before normalization, the table provided to us included many unnecessary fields and multiple rows with no data. This would create an issue when importing the table to ArcMap. To fix this issue I determined what fields were needed in the project. These fields included: Mine_ID, PLSS, Street, City, State, Zip, and County. With this information ArcMap was able to geocode the mines accurately.

Figure 2: Table of mine information provided by the Wisconsin DNR before normalization.
Figure 3: Table of  mine location information created for geocoding purposed in ArcMap

Geocode/Locate PLSS Description
     The results of the automatic geocoding method along with the manual assignment of addresses for my portion of mines are shown below. At a glance the mine locations look correct, however, almost all of them were slightly inaccurate which is discussed further on. 

Figure 4:



Compare the Results
     The following tables reflect the amount of error between my geocoded mines and those of my peers (Figure 5) and the actual locations (Figure 6)



Figure 5: Error table between my mine locations and my peer's,
Figure 6: Error between my locations and the actual mine locations.

As one can see, there was some variation between my mine locations and those of my peers and the actual locations. However, the amount of error was smaller when comparing my portion of mine locations and actual locations of the mines. This leads me to believe that most of the error was from my peers geocoding rather than my own. For some mines there were large variances such as mine 257. The error almost reached 1 on both tables. This particular mine was difficult to find when determining the PLSS location and actual address which is why I believe it to be so high. Overall, my geocoded mines were fairly accurate which can be seen in figure 7. 
Figure 7: Comparison maps of my geocoded mine locations to my peers and the actual location. 



Discussion:
Error is something that will almost always occur when working with geographic data. Sources of this error include the original source maps, data automation and compilation, and data processing and analysis. There are two types of error that can occur from these sources when working with geographic data, inherent and operational error. Inherent error, as assumed, is embedded in the data itself. Often, this error is due to the generalization of features in a complex world. Operational error, on the other hand, is error that happen during operation and procedures. These are also known as user error or processing error.

Overall, my data error was larger when compared to my peers than when compared to the DNR. This could be explained with many reasons. To begin, there was definitely a large amount of human geocoding (operational) error. For the mines that only had a PLSS or did not completely match the assigned address we manually went in to find the correct one by looking at aerial imagery. This process of image interpretation can be difficult for users without experience and may have lead to error.

Another possible source of error is the data we were given. Because we were not given the field survey methods it is hard to completely trust that the data is accurate. There may be inherent errors with the data on the DNR's behalf.



Conclusion:
Geocoding is a very useful tool when it comes to importing and analyzing spatial data. However, when geocoding it is important to know the proper procedures. In this case, it was important to know about normalizing the given data table, how to match addresses to location of mines, and basic PLSS information. Geocoding is not a perfect form of locating addresses but when conducted with proper precaution you can minimize error and get very accurate results.