Wednesday, May 31, 2017

Module 2: Physical Spatial Database Design

We learned how to  create and import data into PostgreSQL (PGSQL). This is the Entity Relationship Diagram (ERD) that I created for this project. It shows how the different datasets are related and the type of information involved. This is important for creating the tables especially when importing the .csv file used to populate the sales table. The tool used to create tables from the parcels and parks shapefiles automatically creates the tables and classifies the data. For this project, I decided to type most of the data with broader data types to make sure it came out well. This was my first time using PGSQL and I enjoy learning things with a hands-on approach

Friday, December 2, 2016

Lab 15: Dasymetric Mapping

In this lab we explored dasymetric mapping. Dasymetric mapping is a process used to break down one set of data using ancillary data in order to use it in another set of data. A common example for a use of dasymetric mapping is taking census block data and properly applying and dividing it amongst other data that has mismatched boundaries. We did that in this lab with census blocks and high school districts in Seminole county, Florida. We were required to divide the prospective students up into the proper high school districts using areal weighting and dasymetric mapping. Dasymetric mapping has a track record of being more accurate. In this lab unfortunately I couldnt seem to get that to happen. My dasymetric map was twice as inaccurate as the a real weighted one.

Sunday, November 27, 2016

Lab 14: Gerrymandering

In this week's lab we had an assignment on Gerrymandering. Gerrymandering is when voting districts are drawn in a way that favours one party over another. In this lab we had to look at compactness and community as factors for determining if a district is gerrymandered or not.

Most heavily gerrymandered districts have huge perimeter to area ratio. They will end up long and snakey. This can be detected by looking at the district's compactness. I used the Polsby-Popper method to evaluate compactness. This compares the area to perimeter as if the district was a circle the higher the score the more compact the district. I have included a screenshot of what I found to be the least compact district below. It is Congressional District 12 of North Carolina.

A lot of heavily gerrymandered districts also chop up counties that should be contiguous and will end up being made up of a lot of counties. Congressional District 1 also of North Carolina (shown below) is a district made up of bits of counties that don't even neighbor each other. It is necessary for some districts to contain multiple counties due to population sizes or even portion of counties. LA county for example is very densely populated. It needs to be split up because a single district couldn't contain it. District 1 of NC on the other hand contains little bits of many counties and few counties in their entirety.

Sunday, November 20, 2016

Lab 13: A Tale of Two DEMs

I had to compared to two DEMs of a coastal Californian water basin for lab 13. I had one that was created using LIDAR and I had a SRTM one. I created two maps to show some of the differences between them. I had one in which I subtracted them from each other to show the differences in elevation. I also made a similar map for slope.

Sunday, November 13, 2016

Lab 12: OLS vs GWR

The biggest difference between Geographically Weighted Regression (GWR) and Ordinary Least Sqaures (OLS) is that GWR takes into account geographic distribution and explores local relationships while OLS models the data globally and is a bit more rigid. GWR improved the model in this lab greatly because all the data was organized spatially and affected things closer more than things at a distance. OLS is better for things that don't have a big spatial component.

Friday, November 4, 2016

Lab 11: Regression using a GIS

In this lab I used the lessons in regression we learned last week and used them in ArcMap. This allowed me to familiarize myself with how ArcMap processes and reports regressions. The performance of a model can be evaluated using the six Ordinary Least Squares (OLS) checks. They are as follows:

  1. Are the explanatory variables helping your model? This can be checked by examining the p-value of the variables and confirming that they are statistically significant.
  2. Are the relationship what you expected? This is checked by looking at the signs of the coefficients and making sure they make sense. If a variable has a negative coefficient when you expect it to be positive something is wrong with your model.
  3.  Are any of the explanatory variables redundant? This is easily checked by examining the Variance Inflation Factors (VIF). If the VIF shows that any of the variables are redundant you may have to remove one.
  4. Is the model biased? The Jarque-Bera test looks for bias or skewed residuals. If it is significant you may have an issue with your model.
  5. Do you have all key explanatory variables? If you run the Spatial Correlation (Global Moran's I) tool it will provide you with information about whether or not your residuals are clustered, dispersed, or random. If your distribution isn't random you may need more key variables in your model
  6.  How well are you explaining your dependent variable? This is easily evaluated by looking at the adjusted r-squared value. The higher the number the bigger the percent of variation is explained by your model. If you are trying to compare this model to others you can also look at the Akaike's Information Criterion (AIC). The AIC means nothing by itself but it is useful to compare it to other models. The model with the lowest AIC is usually the better one.

Friday, October 28, 2016

Lab 10: Using Regression to Predict Rainfall.

In this lab we had an assignment that involved predicting values for rainfall using regression analysis. We were given an Excel file containing rainfall data for station A(Y) and station B(X) ranging from the years 1931-2004. Unfortunately station A was missing measurements from 1931-1949. The goal was to predict those measurements. Using the regression analyst tool in Excel I got the slope(b) and intercept coefficient(a) of the regression. This allowed me to use the regression formula, Y = bX + a to calculate the missing values assuming that the stations would have similar values for that year and have a normal distribution of data.