Assignment 6: Regression Analysis

Regression Analysis

Part 1

Introduction

In the first part of the assignment a simple regression analysis is to be performed on data which attempts to relate the percent of kids in an area who receive free lunch, and crime rate in the same area of a town. This seems silly but performing a regression analysis could help make sense of why this data could relate. The questions that will be answered are; Is the news correct that areas where a larger percent of kids receive free lunch there is a higher crime rate, If a new area of the town was identified with 30% free lunch what is the corresponding crime rate, and how confident is this result.

Methodology


Definitions 

  • Ordinary Least Squares (OLS): Fitting a line through a set of points and squaring the sum of the vertical distances from observed points so the distance from the trend line is minimized.

  • Regression Analysis: Studies the relation between a dependent and Independent variable.

  • Linear Regression Analysis: Assumes a linear relation between dependent variable (y) and the Independent variable (x), and determines if x affects y. This is done by applying a straight line to fit the data, and understand the nature of the fit of the data to the line. Most important y values can potentially be determined from the x values. 

  • Coefficient of Determination (r2): Range from 0-1 which explains how well the dependent variable is explained by the independent variable.
  • Residual: The amount of deviation from the trend line of the data.

In this first portion of the assignment an excel file containing percent free lunch and crime data was imported into SPSS and a linear regression analysis was performed (Figure 1).
Figure 1: Linear Regression window in SPSS showing the independent and dependent variables.

Results

The results in Figure 2 show the summary of statistics that were ran. the Important numbers to focus on are the R square value in the Model Summary, and the B column and sig column in the Coefficients chart. Figure 3 shows how the two variables relate through the use of a scatter plot graph with a trend-line showing OLS.
Figure 2: Regression analysis output showing the percent of kids with free lunch (Independent variable) and crime rate (Dependent Variable).


Figure 3: This image shows a scatter plot with Crime Rate being the dependent variable, and the independent variable being Percent Free Lunch.

Conclusion

The news station is correct that there is a linear relation but based on the r square value which in this case is 0.173 this is not a significant relationship. Additionally the significance level is .005 which is not significant at all to make the claim that these two variables are dependent on each other. A question that was posed was when a new town was identified as having 30 percent free lunch what would there crime rate be? Now this is not a significant answer but according to the scatter plot the crime rate could potentially be 50.

Part 2

Introduction

Portland, OR is concerned with having adequate responses to 911 calls. They are determined to figure out what might explain where most 911 calls come from. They want to determine this to see if they can find an appropriate area to place a new hospital that will be built. The goal is to provide ideas as to what influences more calls or less calls, and where a potential location for the new hospital will be that responds to these 911 calls.

Three variables are to be chosen to analyze using Regression Analysis. The dependent variable will always be the number of calls per census tract with the Independent variables which will try to explain the number of calls in given area will be the number of people with no high school degree, alcohol sales, and unemployed. relationships will be determined strength of relationships, and weather to reject or fail to reject the null hypothesis which states there is no linear relation between the two variables.

Null Hypothesis: There is no linear relationship between 911 calls and low education, alcohol sales, and being unemployed.
Alternative Hypothesis: There is a linear relationship between 911 calls and low education, alcohol sales, and being unemployed.

Results


For all three Independent variables that were selected; low education, Alcohol Sales, and Unemployed, all of them rejected the null hypothesis. This is because they were all significant at the .000 level which is lower than 0.05 which means they are significant relations. Figure 4 shows the variable low education which has the highest r-square value of 0.567. Figure 5 shows he variable Alcohol sales with an r-square value of 0.152. Figure 6 shows the variable unemployment with an r-square value of 0.543. These R-square values explain the percent of variance of the relationship between the independent variable and the Dependent variable of 911 calls. The highest percent of variance is in low education which is 56.7 % this means that this is the strongest predictor out of the three variables that were chosen. This is known to be the strongest relation due to a high r-square value, and a significance level of .000.

Equations were created to represent the trend line between each set of data.
Low education: y=3.931=.166x
Alcohol Sales: y=9.590+.00003069x
Unemployed: y=1.106+.507x


Low education

 Figure 4: This shows the SPSS output for linear regression analysis between Low Education, and 911 calls.
Alcohol Sales


 Figure 5: This shows the SPSS output for linear regression analysis between Alcohol Sales, and 911 calls.
Unemployed

 Figure 6: This shows the SPSS output for linear regression analysis between Unemployment, and 911 calls.

A couple maps were created to represent the analysis that was performed, Figure 7 shows a map of the number of 911 calls by census tract in Portland. We see a high level of calls in the tracts in the dark brown color, specifically tract 60, 65, and 66. 

A standard deviation map showing the Ordinary Least Squares residual distance from the trend line is shown in Figure 8. In order to understand the map the Mean of Low Education data is 125, Median 98 Mode 38, and Standard Deviation 129. Census Tract 60 and 65 are on the high end of the Standard Deviation Histogram, and a direct relation to the chloropleth map in Figure 7 can be seen with a high number of 911 calls in the same tracts. Areas that fall in the middle of the standard deviation of can be seen with low numbers of calls specifically in tract 27. 

Figure 7: This map illustrates the number of 911 calls per census tract.
Figure 8: This map shows the standard deviation of  Residuals of people with low education, and te number of 911 calls. 

Conclusion

The variable that was noticed to have the most influence on the number of 911 calls in the Portland area was low education. and the specific areas where this occurs is in the census tracts 60, 65, and 66. therefore based on just this high relation statistic, It would potentially be most useful to place the new hospital in one of these tracts, additionally most high values fall around these tract. It would require more analyzing of different factors to pinpoint a specific location, but based on this data alone those tracts would be the best spot. 

Comments

Popular posts from this blog

Assignment 4- Hypothesis Testing

Assignment 3

Assignment 2