Assignment 5: Correlation and Spatial Autocorrelation

Correlation and Spatial Autocorrelation

Goal 

The goal of this assignment is to utilize different techniques and software to calculate correlations. Excel, SPSS, and ArcMap are used together to produce different ways of analyzing different data sets. Correlation measure the strength of the association of two variables, and also the direction of association. the results of a correlation always fall between -1 and 1. The closer the correlation value falls to 1 or negative 1 the stronger the relation between the values are. If the correlation falls closer to 0 then there is no or little relationship between the variables. if both the x and y variable increase, then there is a positive correlation closer to 1. If x increases and y decreases then the correlation is closer to -1 creating a negative correlation.


Part 1: Correlation

Section:1

a) The first task a set of data was given showing Distance (ft) and Sound Level (dB) (Figure 1) and a scatter plot in excel was to be created (Figure 2).

Figure 1: Data Given to create scatter plot.
Figure 2: scatter plot created in Excel.

b) In SPSS find the Pearson Correlation.
c) Show a Pearson Correlation chart using SPSS (Figure 3).
Figure 3: Pearson's r Correlation for Sound Level related to distance
d) What is the hypothesis?
Null Hypothesis: There is no linear relationship between sound level and distance.
Alternative Hypothesis: There is a Linear relationship between sound level and distance.
e) Summary of findings.
In the excel scatter plot that was created it is seen that the trend line decreases showing that the farther away from the source of sound, the level of sound decreases. There is a strong negative relation seen in figure 2 also because the points surround the trend line quite tightly. The Pearson Correlation chart (figure 3) also shows that there is a strong negative relationship because the level of significance is .01, and the r value of -.896 is very close to -1 which means that there is strong relation. Therefore we reject the null hypothesis, and assume that when sound travels from its source the level of sound lowers as distance increases. 

Section: 2

Census Tracts and Population in Detroit, MI


Create a correlation matrix with all the data given in the Detroit spreadsheet (Figure 4).

Codes for the correlation matrix:
White = White Pop. for the 1000 Census Tracts in and around Detroit
Black = Black Pop.
Asian = Asian Pop.
His = Hispanic Pop.
BachDegree = Number with a Bachelor's Degree
MedHHInc = Median Household Income
MedHomeValue = Median Home Value
Manu = Number of Manufacturing Employees
Retail = Number of Retail Employees
Finance = Number of Finance Employees

Figure 4: This is the Pearson's r Correlation Matrix for Detroit data regarding ethnicity, schooling, income, and job types.

Results

The correlation matrix in figure 4 provides information on the relationship between different ethnicity and there schooling, income, and job type.  First off we look at the different ethnicity compared to having a bachelors degree by focusing on the r value: White 0.698, Black -0.305, Asian 0.559, and Hispanic -0.058. White and Asian ethnicity show moderately strong positive relation to having a bachelors degree, Black shows a moderately weak negative correlation, and Hispanics have almost no correlation to having a bachelors degree. 

Then we look at ethnicity compared to Median Household Income.  The r values are as follows: White 0.554, Black -0.408, Asian 0.388, and Hispanic -0.078. White have a moderate positive correlation, Asian has a moderate to low positive correlation, black has a moderate to low negative correlation, and Hispanic has a very low negative correlation to median House Hold Income.

Then we look at ethnicity and Median Home Value. The r values are as follows: White 0.486, Black -0.362, Asian 0.436, and Hispanic -0.092. White and Asian have positive moderate correlation, Black has moderate to low  negative correlation, and Hispanic have very low negative correlation to Median Home Value. White and Asian have moderate positive correlation, Black has moderate to low negative correlation, and Hispanic has very low negative correlation to Median Home Value.

Then we look at ethnicity and job type of Manufacturing. The r values are as follows: White 0.011, Black -0.085, Asian 0.077, and Hispanic -0.009.  White and Asian have very low positive relation, and Black and Hispanic both have very low negative relationship to manufacturing jobs. 

Then we look at ethnicity and the job type retail. The r values are as follows: White 0.184, Black -0.146, Asian 0.259, and Hispanic -0.004. White and Asian have weak to low positive relationship to retail jobs, and Black and Hispanics have very low negative relationship to retail jobs.

Finally ethnicity compared to Finance jobs is looked at. The r values are as follows: White -0.007, Black -0.042, Asian 0.097, and Hispanic -0.034. White Black and Hispanic all have very low negative correlation to Finance jobs, and Asian have a low positive correlation to finance jobs. 


Conclusion


White and Asian tend to have a more positive relationship with having a bachelors degree which coincides with them having higher correlation with Median household Income and Median Home value. This leads to believe that having a higher education means that you have a higher value house and higher income. Black and Hispanic have negative linear relations with these variables which can potentially be attributed to the negative correlation with having a bachelor's degree. White and Asian also have positive relationships with the three job variables and Hispanic and Black have negative relationships. this can probably be attributed to education once again. But without in depth evaluation of these specific variables, it is hard to state why different ethnicity relate in different ways to these variables.  

Part II: Spatial Autocorrelation

Introduction

I have obtained Data from the Texas Election Commision (TEC) for the 1980 and 2012 Presidential Elections. The data contains the percent Democratic votes as well as the voter turnout for each election. Hispanic population data from the 2010 U.S. Census will be gathered as well. The mission is to analyze the patterns of election data and determine if there are clusters of voting patterns within the state, as well as voter turnout. Hispanic population data will be used to determine how the hispanic population votes in given areas. Using GeoDa and SPSS a report will be created which analyzes the election patterns over the last 32 years. This information will be provided to the governor of Texas.

Methods 

Data was gathered on Hispanic data from the 2010 census, and was joined to a Texas county shape file in ArcMap, and was also joined with the TEC data which was provided. This joined shape file was exported from ArcMap and brought into GeoDa where Moran's I scatter plots, and Lisa Maps were created based on different voting data which aids in the analyzing of voter data. the excel spread sheet was used in SPSS where a correlation matrix was made to better see the linear relations between the data.

TEC Data codes:
  • VTP80 = Voter Turnout 1980
  • VTP12 = Voter Turnout 2012
  • PRES80D = Percent Democratic Vote 1980
  • PRES12D = Percent Democratic Vote 2012
  • HD02_S02 = Percent Hispanic Population 2010


GeoDa aids in performing spatial Autocorrelation. It produces Moran's I scatter plots (Figure 5) which uses 4 quadrants of an x,y graph to show strength:
  • High, High (+,+)-high values touching high values
  • High, Low (+,-)- high values touching low values (outlier)
  • Low, High (-,+)- low values touching high values (outlier)
  • Low, Low (-,-)- Low values touching low values 
Moran's I acts like a Pearson's r Correlation in that its values range from -1 to +1. -1 means however though values are less clustered, and +1 means they are more clustered.

Figure 5: Example of a Moran's I Scatter plot from Dr. Ryan Weichelt.

Results

In Figure 6 we see the correlation matrix of the TEC data along with Hispanic data. First we see a negative moderate to strong correlation between the percent of democratic voters compared to the voter turnout in 1980 (r=-0.612). Then we see a negative moderate to strong correlation between percent democrat 2012 to voter turnout 2012 (r=-0.623). This shows we see negative correlation towards voting democratic in Texas.

In 1980 the percent democrat to Hispanic population has a very weak correlation (r= 0.093). In 2012 the percent democrat to Hispanic population has a strong positive correlation (r= 0.718). This means in 1980 if the population of Hispanics were close to what they are in 2010 that there would have been no significant correlation. vice versa in 2012 we see a strong positive correlation leading to believe that more Hispanics in certain areas led to more democratic votes.
Figure 6: Correlation Matrix of TEC and Hispanic Data.

Voter Turnout 1980

First voter turnout in 1980 (figure 7 and 8) was looked at. A majority of votes came from northern Texas, with the lowest amount being seen in Southern Texas. Low values are seen in the Dark Blue portion of the map, Low values are visualized on the Moran's I graph (Figure 8) in the lower left quadrant of the graph, these are low low values, and the High Red values are in the Upper Right quadrant of the graph, with outliers being the lighter colors Magenta(Low, High-Outlier) and salmon color (High, Low- Outlier) in the Figure 7 map. We also see a moderate cluster value in the Moran's I value of 0.468058 which means that there is a moderate amount of clustering which is also visible in the LISA map.

 Figure 7: LISA map showing voter turnout in 1980

Figure 8: Moran's I chart showing voter turnout in 1980.

Voter Turnout 2012

Next voter turnout in 2012 was looked at (Figure 9 and 10). In figure 9, we see the LISA map which shows sparce high values of high turn out which means there were few areas with high values right next to each other. the southern portion of Texas we see a large area of low turnout which means the surrounding area as a whole had low values. Figure 10 the Moran's I chart shows this because there are not many very high high values in the top right quadrant of the graph. Yet we see more dark Blue because looking at the Moran's I chart we see more clustered Low Low values in the bottom left quadrant. The Moran's I value of 0.335851 is also significant in the map because we don't see much large areas of cluster, yet a small amount which is why the Moran,s I value is closer to 0.
Figure 9: LISA map of 2012 voter turnout.

Figure 10: Moran's I chart for 2012 voter turnout data.

Percent Democratic vote 1980

Next we look at 1980 percent democratic vote data (Figure 11 and 12). Here we see higher Democratic votes clustered in the southern and Eastern portion of Texas. With lower clusters of democratic votes in the North Western portion of Texas. This is consistent with the Moran's I chart (Figure 12) where we also see a moderate Moran's I value of 0.575173 which is why we see a few large clusters however the whole map is not clustered. 


 Figure 11: LISA Map for 1980 percent democratic vote data.
Figure 12: Moran's I chart for 1980 percent democrat vote data.

Percent Democratic Vote 2012

Next the 2012 percent democrat vote is looked at (Figure 13 and 14). Here it is seen that as oppose to 1980 the high cluster of democratic votes is still in the south yet it switches to the western side of the state rather than the eastern side. With lower density democratic votes in the northern and central portion of Texas. there are larger clusters in this map due to the more moderate to strong Moran's I value of 0.695853 ( Figure 14). It is seen the higher clusters of Democratic votes shift westward, and lower cluster shift eastward.  


 Figure 13: LISA Map for 2012 percent democrat vote data.
Figure 14: Moran's I chart for 2012 percent democrat vote data

Percent Hispanic Population 2010 

Finally a LISA map and Moran's I chart were created for the Percent Hispanic Population of 2010 (Figure 15 and 16). Large clusters of high populations of Hispanics are in the south western portion of the state, and low percentages are clustered in the eastern portion of the state. We see large clusters as indicated by the Moran's I value of 0.778655 (Figure 16). The southwestern portion is probably more clustered due to its location next to the Mexico boarder. 

 Figure 15: LISA Map for 2010 percent Hispanic Population Data.

Figure 16:Moran's I chart for 2010 percent Hispanic Population Data.


Conclusion

It can be inferred that locations of high population percentages of Hispanics can tend to see larger clusters of democratic votes. however it is also seen that Hispanics do not vote in large amounts, this is seen in the Pearson's Correlation Matrix (figure 6) where negative moderate to strong correlations show little voting turnout related to Hispanic population. To understand why Hispanics vote democratic when they do vote would require more analysis on candidates and potential causes for the turn out of votes.  

Comments

Popular posts from this blog

Assignment 4- Hypothesis Testing

Assignment 3

Assignment 2