Richard's GIS Masters: 2016

Friday, December 2, 2016

Lab 15: Dasymetric Mapping

In this lab we explored dasymetric mapping. Dasymetric mapping is a process used to break down one set of data using ancillary data in order to use it in another set of data. A common example for a use of dasymetric mapping is taking census block data and properly applying and dividing it amongst other data that has mismatched boundaries. We did that in this lab with census blocks and high school districts in Seminole county, Florida. We were required to divide the prospective students up into the proper high school districts using areal weighting and dasymetric mapping. Dasymetric mapping has a track record of being more accurate. In this lab unfortunately I couldnt seem to get that to happen. My dasymetric map was twice as inaccurate as the a real weighted one.

Sunday, November 27, 2016

Lab 14: Gerrymandering

In this week's lab we had an assignment on Gerrymandering. Gerrymandering is when voting districts are drawn in a way that favours one party over another. In this lab we had to look at compactness and community as factors for determining if a district is gerrymandered or not.

Most heavily gerrymandered districts have huge perimeter to area ratio. They will end up long and snakey. This can be detected by looking at the district's compactness. I used the Polsby-Popper method to evaluate compactness. This compares the area to perimeter as if the district was a circle the higher the score the more compact the district. I have included a screenshot of what I found to be the least compact district below. It is Congressional District 12 of North Carolina.

A lot of heavily gerrymandered districts also chop up counties that should be contiguous and will end up being made up of a lot of counties. Congressional District 1 also of North Carolina (shown below) is a district made up of bits of counties that don't even neighbor each other. It is necessary for some districts to contain multiple counties due to population sizes or even portion of counties. LA county for example is very densely populated. It needs to be split up because a single district couldn't contain it. District 1 of NC on the other hand contains little bits of many counties and few counties in their entirety.

Sunday, November 20, 2016

Lab 13: A Tale of Two DEMs

I had to compared to two DEMs of a coastal Californian water basin for lab 13. I had one that was created using LIDAR and I had a SRTM one. I created two maps to show some of the differences between them. I had one in which I subtracted them from each other to show the differences in elevation. I also made a similar map for slope.

Sunday, November 13, 2016

Lab 12: OLS vs GWR

The biggest difference between Geographically Weighted Regression (GWR) and Ordinary Least Sqaures (OLS) is that GWR takes into account geographic distribution and explores local relationships while OLS models the data globally and is a bit more rigid. GWR improved the model in this lab greatly because all the data was organized spatially and affected things closer more than things at a distance. OLS is better for things that don't have a big spatial component.

Friday, November 4, 2016

Lab 11: Regression using a GIS

In this lab I used the lessons in regression we learned last week and used them in ArcMap. This allowed me to familiarize myself with how ArcMap processes and reports regressions. The performance of a model can be evaluated using the six Ordinary Least Squares (OLS) checks. They are as follows:

Are the explanatory variables helping your model? This can be checked by examining the p-value of the variables and confirming that they are statistically significant.
Are the relationship what you expected? This is checked by looking at the signs of the coefficients and making sure they make sense. If a variable has a negative coefficient when you expect it to be positive something is wrong with your model.
Are any of the explanatory variables redundant? This is easily checked by examining the Variance Inflation Factors (VIF). If the VIF shows that any of the variables are redundant you may have to remove one.
Is the model biased? The Jarque-Bera test looks for bias or skewed residuals. If it is significant you may have an issue with your model.
Do you have all key explanatory variables? If you run the Spatial Correlation (Global Moran's I) tool it will provide you with information about whether or not your residuals are clustered, dispersed, or random. If your distribution isn't random you may need more key variables in your model
How well are you explaining your dependent variable? This is easily evaluated by looking at the adjusted r-squared value. The higher the number the bigger the percent of variation is explained by your model. If you are trying to compare this model to others you can also look at the Akaike's Information Criterion (AIC). The AIC means nothing by itself but it is useful to compare it to other models. The model with the lowest AIC is usually the better one.

Friday, October 28, 2016

Lab 10: Using Regression to Predict Rainfall.

In this lab we had an assignment that involved predicting values for rainfall using regression analysis. We were given an Excel file containing rainfall data for station A(Y) and station B(X) ranging from the years 1931-2004. Unfortunately station A was missing measurements from 1931-1949. The goal was to predict those measurements. Using the regression analyst tool in Excel I got the slope(b) and intercept coefficient(a) of the regression. This allowed me to use the regression formula, Y = bX + a to calculate the missing values assuming that the stations would have similar values for that year and have a normal distribution of data.

Sunday, October 23, 2016

Lab 9: Assessment of the accuracy of DEMs

One can determine the quality of a Digital Elevation Model (DEM) using statistics. In this lab I used Microsoft Excel to calculate the percentiles, Root Mean Squared Error (RMSE), and Mean Error (ME).
The percentiles tell you how many points fit within a given range. I used the 68th and 95th percentiles for this lab. This means, for example, that the difference between the DEM and the field report data for the 95th percentile of urban land cover is within 0.384 m 95% of the time.
RMSE tells you how similar one set of values is to another. The lower the number the more accurate the data is. RMSE does not tell you about the distribution of error. I found that bare earth and low grass land covers were the most accurate while fully forested areas were the least accurate.
ME tells you about possible bias in the data. A negative number indicates underestimation while a positive number indicates overestimation. The urban area was the most biased with a ME of 0.164 while bare earth was the least biased with -0.005. I have attached below a table summarizing the values I arrived at during this lab.

Accuracy Metric	Accuracy (m)
Accuracy Metric	Bare earth and low grass	High grass, weeds, and crops	Brush land and low trees	Fully forested	Urban	Combined
Sample size	48	55	45	98	41	287
Accuracy 68^th (m)	0.098	0.151	0.22	0.222	0.189	0.276
Accuracy 95^th (m)	0.163	0.44	0.481	0.463	0.384	0.171
RMSE (m)	0.105	0.181	0.246	0.394	0.2	0.429
ME	-0.005	-0.069	-0.103	0.003	0.164	-0.006

Sunday, October 16, 2016

Lab 8: Interpolation Exploration

In Lab 8 I explored various interpolation methods. The lab covered thiessen polygons, inverse distance weighted (IDW), regularized spline, and tension spline. The picture above is an example of tension spline used to interpolate water quality in Tampa Bay, FL. Thiessen polygons take each point and matches it to the closest data point. This creates uniform areas that don't easily display anything subtle. IDW weighs the distance from data points and uses that to interpolate. The farther away a data point is the less effect it has on how an area is interpreted. The two types of spline both use similar formulas to create a "sheet" that best fits the slope the data points create. Regularized splines create smoother more gradually changing surfaces but the values can lay outside of the data's original range. Tension splines are a little bit stiffer and values are more constrained by the data's original values.

Friday, October 7, 2016

Lab 7: An exploration of TIN models

This week I explored Triangulated Irregular Networks (TIN) and compared them to Digital Elevation Models (DEM). They are both used to model elevation but they are structured differently. DEMs are raster which basically store information in a grid while TINs are basically triangles created from nodes that have elevation data. One big difference is that TINs can easily show slope, aspect, elevation, and more without further processing. You have to make new rasters to display that data when using DEMs. Below is a picture of a TIN displaying edges, nodes, slope, and contour lines.

I also explored how to edit TINs. The triangles in the TIN sometimes don't create flat surfaces where they should be so you have to edit them in. In the example the lab provided I had a TIN that wasn't properly displaying the flatness of a lake. I have attached before and after pictures.

Before lake feature was added

After lake feature was added

Editing the TIN created a hard breakline (in blue) that told the model to make the area inside it a certain elevation and created nodes along the breakline so it could be modeled.

Sunday, October 2, 2016

Lab 6: Allocation-Location Modeling

In Lab 6 we had to use network analyst to solve an allocation-location problem. This example features a company with 22 distribution centers and the desire to optimize how they serve customers. The allocation-location analysis used a network data set to determine the shortest distance between a customer and a distribution center. Then I looked at which distribution center served the most customers in each market area and assigned them as shown above. Most of the market areas did not change but the 28 that did I highlighted with a red outline. The new market area assignments allow the company to serve their customers but with less distance traveled than before.

Friday, September 23, 2016

Lab 5: Vehicle Routing Problem

In this lab I had to solve a vehicle routing problem (VRP) for a trucking company that had the goal of providing continuity between drivers, customers, and service areas. Using GIS to solve a VRP is fast easy way to come to a solution.
I created a total of 16 routes that covered every order while keeping them in one service area to provide continuity. The screenshot above shows the solution that ArcMap's network analyst calculated after I tweaked the settings. It provides a nice balance between cost effectiveness and customer service.

Friday, September 16, 2016

Lab 4: Network Datasets

This week I learned about how to create and add functionality to a network. The first iteration of was just a basic network that could build a route between certain locations.
The second iteration had me connect the RestrictedTurns feature class to the network. This provided a more accurate picture of the route because not all intersections allow every type of turn. The route ended up taking a little longer than the first iteration. At least now the driver won't be violating traffic laws.
The third and final functionality added was historical traffic data. Essentially the earlier models let the driver see how long their route would take without any other vehicles on the road which is not that realistic. This functionality gives the driver a better ability to anticipate their route and the time it will take to complete it. This route took the longest of the three.

Saturday, September 10, 2016

Lab 3: Determining the Quality of Road Networks

The goal of this lab was to get more experience with accuracy assessment. The TIGER 2000 data had 258 grid squares that were more complete than Jackson County GIS. Jackson County GIS had 38.
The map above shows the percent difference in completeness between a set of road data from Jackson County GIS and one from TIGER 2000. The blue areas show grid sections that differ by less than 100%. White to brown are the more extreme values. One grid section had a difference of over 1700%! The negatives show percent difference in favor of TIGER 2000 having more data in that grid section. The positives favor Jackson County GIS.

Thursday, September 1, 2016

Lab 2: Quality of Road Network Data In Albuquerque, New Mexico

This lab was an exercise in testing data quality. I received two sets of street map data. One was from the City of Albuquerque and the other from StreetMaps USA. I first used ArcCatalog to create a rough network dataset of both street maps. This program predicts where streets form intersections and places points where the intersections are. I then used a sampling tool to randomly select one hundred points that I could use for the project. I had to find points that were present in both datasets, examples of good intersections, and met sampling rules. This caused me to drop from one hundred points to twenty-nine, displayed in the screenshot above. I matched up all the points on both data sets and created a reference set of points based on orthophotos of the study area. I then used the National Standard for Spatial Data Accuracy (NSSDA) to calculate how accurate both sets of data are when compared to the reference set. Using the standard reporting statements presented in the Positional Accuracy Handbook¹ I got two statements. For the City of Albuquerque data, I got:

Using the National Standard for Spatial Data Accuracy, the data set tested 26.7 feet

horizontal accuracy at 95% confidence level.

For the StreetMaps USA data, I got:

Using the National Standard for Spatial Data Accuracy, the data set tested 360.6 feet horizontal accuracy at 95% confidence level.

1 Positional Accuracy Handbook. 1999. Minnesota Planning, Land Management Information Center, St. Paul,

MN.

Saturday, August 27, 2016

Lab 1: Accuracy of a Handheld GPS Device

The average waypoint was 3.8 meters away from the reference waypoint. There was difference of 5.9 meters in elevation between the reference and the average. The average waypoint was at a higher elevation. It seems as though the device was more precise horizontally than it was vertically.
Accuracy is based on a device measuring something and getting a result that truly reflects the situation. Accurate results have low to no error. Precision is a low variability between measurements. Precise measurements would cluster around a point while imprecise ones would be found anywhere.