Assignment 4- Hypothesis Testing

Hypothesis Testing

In this assignment the goal is to distinguish between z and t tests tests. Also to learn how to calculate z and t tests through the use of hypothesis testing. Then to utilize this process when connecting real-world data to statistics and geography.

z-test 
A z-test is a statistical test to determine if two means are significantly different or not, a z-test is used when the sample size is larger then 30.

(z-test equation, Taken from Ryan Weichelt)
t-test
A t-test is a statistical test to determine if two means are significantly different or not, and this test is used when the sample size is smaller than 30.
                                              (t-test equation, Taken from Ryan Weichelt)
Steps of Hypothesis Testing:

  1. State the null hypothesis
  2. State the alternative hypothesis
  3. Choose a statistical test
  4. Choose the level of significance (α)
  5. Calculate the statistics
  6. Make a decision about the null and alternative hypothesis

Null Hypothesis
The null hypothesis always states that there is no significant difference between the sample mean and the population mean. The sample mean would be data that the researcher collected, and the population mean would be the mean of the larger group where the sample was taken from.
Alternative Hypothesis
The alternative hypothesis always states there is a difference between the sample mean and the population mean. The alternative hypothesis can not be used to determine how big or small the difference is,  it only states that there is a difference.

The null hypothesis is never accepted, the only two options are to reject or fail to reject the null hypothesis.
Part I: t and z tests
1)
In part 1 of this assignment we were given a chart to fill out where we determined the level of significance (α), weather a z or t test should be used, and what the z or t value should be. We were given the interval type, confidence level, and sample size (n). This information is needed to determine the rest of the chart. 
In order to determine the level of significance (α), first the interval type is determined. Then the confidence level is used, this number is a percentage and the confidence level is turned into a decimal and 1 is subtracted by the confidence level which leaves you with the level of significance. Now this is for a one tailed test, but if it is a two tailed test 1 is subtracted by the confidence level then divided by 2 to determine both levels of significance on each tail.

Then it was determined weather to use a z or t test this is done by looking at the sample size, if it is above 30 a z-test is used, below 30 a t-test is used. 

Then the z or t value was determined, this is done by using charts. The standard statistical table (figure 1) is used to determine z test scores. If it is a one tail test with a confidence interval of 95, you look for the closest decimal to .95 which falls at 1.64. This 1.64 is the degree of freedom which is the value of the tail that the sample value will either fall above or below. Now if it is a 2 tail z-test then you add the confidence level and the level of significance together to get a a higher decimal point which divides the significance level. for example if the confidence level is 90 you look at 95 percent on the standard statistical table, and this gives the value 1.64 now this creates .025 significance level, and that z value one side is negative and the other is positive, so you end up with -1.64 and 1.64.


Figure 1: Statistical chart used for determining z-values.
For determining a t value, a t-value chart (figure 2) is used. This chart provides the level of significance along the top, and the sample size along the side. when looking at a one tail t-test, you look for the significance level that is being used, and subtract the sample size by 1 and choose that value on the chart. For example with a one tailed t-test at a 99 percent confidence level with a sample size of 23, the t-value would be 2.508. If a two tail t-test is being used with a 99 percent confidence level and a sample size of 15, .005 is looked at and 14 is looked at and it gives a value of 2.977, but to get the second tail there has to be a negative value so the degrees of freedom are -2.977 and 2.977.
Figure 2: t-test chart used for determining t-test values.


2)
Next we were given some real world data from the Department of Agriculture and Live Stock Development organization in Kenya. They estimate yields in a certain district should approach the following amounts in metric tons (averages based on data from the whole country) per hectare: groundnuts. 0.55; cassava, 3.8; and beans, 0.28. A survey of 23 farmers had results: groundnuts, 0.51 (standard deviation 0.3); Cassava, 3.4 (standard deviation .74); and beans, 0.33 (standard deviation 0.13). 


         μ             σ            µh            n
                Ground Nuts        0.51         0.3          0.55        23
                Cassava                3.4          0.74         3.8          23
                Beans                   0.33         0.13        0.28         23
The hypothesis testing steps will be used to analyze these three mean sets. 

Ground Nuts Hypothesis Test
  1. There is no significant difference between the farmers sample mean (μ) of groundnuts, and the population mean (μh) of groundnuts of the entire county of Kenya.
  2. There is a significant difference between the farmers sample mean (μ) of groundnuts, and the population mean (μh) of groundnuts of the entire country of Kenya.
  3. t-test because the sample of farmers is less than 30.
  4. Confidence level of 95% with a 2 tail t-test.
  5. t= -0.6394
  6. Fail to reject the null hypothesis because there is no significant difference between the sample mean and the population mean. The t-test does not exceed a critical value, It falls between -2.074 and 2.074.
Cassava Hypothesis Test 

  1. there is no significant difference between the farmers sample mean (μ) of cassava, and the population mean (μh) of cassava of the entire country of Kenya.
  2. There is a significant difference between the farmers sample mean (μ) of cassava, and the population mean (μh) of cassava of the entire country of Kenya.
  3. t-test because the sample of farmers is less than 30.
  4. confidence level of 95% with a 2 tail t-test.
  5. t= -2.5923
  6. Reject the null hypothesis because there is a significant difference between the sample mean and the population mean.  The t-test value falls outside of the critical values, -2.593 falls outside of -2.074 and 2.074.
Beans Hypothesis Test
  1. there is no significant difference between the farmers sample mean (μ) of beans, and the population mean (μh) of beans of the entire country of Kenya.
  2. There is a significant difference between the farmers sample mean (μ) of beans, and the population mean (μh) of beans of the entire country of Kenya.
  3. t-test because sample of farmers is less than 30.
  4. Confidence level of 95% with a 2 tail t-test.
  5. t=1.844
  6. Fail to reject the null because there is no significant difference between the sample mean and the population mean. The t-test value falls within the critical values, 1.844 falls inside -2.074 and 2.074.
Similarities:
Both groundnuts and beans fail to reject the null, this means there is no significant difference between the sample mean and the population mean. Intern they follow the trend of the rest of the country in terms of producing the same amount of metric tons of beans and groundnuts as the whole country. Both ground nuts and cassava had a smaller sample mean than the population mean, yet on failed to reject and the other rejected.  
Differences:
Cassava was the only one that rejected the null hypothesis. This means that it is not consistent with the larger population mean in producing similar metric tons of product. Beans was the only one to have a larger sample mean than the population mean.

3)
The next question we were given a scenario about stream pollutants. The level of a stream's pollutants was higher than the allowable limit of 4.4mg/I. A sample (n) of 17 streams  gave a mean pollutant level of 6.8 mg/I, with a standard deviation (σ) of 4.2. we are to provide conclusions using a 1 tailed test with a 95% confidence interval, using the hypothesis testing steps. 

  1. There is no significant difference between the sample mean of streams tested, and the mean of the allowable limit of pollutant of mg/I.
  2. There is a significant difference between the sample mean of streams tested, and the mean of the allowable limit of pollutant of mg/I. 
  3. t-test is used because the sample of 17 is below 30.
  4. confidence level of 95% with a 1 tail t-test.
  5. t=2.356 [((6.8-4.4)/(4.2/sqrt(17)))=2.356]
  6. Reject the null hypothesis because there is a significant difference between the sample mean and allowable limit. This is known because the t-value of 2.356 falls above the 1.746 degree of freedom value.
Conclusions:
This means that the sample of streams have a greater amount of pollution than what is allowable to be considered safe. Now this could also be caused from outliers within the data from the sample, and this is seen with the large standard deviation compared to the mean number. this large standard deviation suggest a positive skew in the data, so not all 17 of the sampled streams may be high with pollutants, but there are some that definitely are at an unsafe level.

Part II

In this portion of the lab we used real world data of average home values in the county of Eau Claire to perform a hypothesis test to determine if the average value of homes in the city of Eau claire are significantly different than that of the entire county of Eau Claire. We were given shapefiles of the county of Eau Claire and are to create a map which shows the difference in the value of homes.

Steps of hypothesis testing for average value of homes in the city of Eau Claire compared to the entire county of Eau Claire.

  1. There is no significant difference between the average value of homes in the city of Eau Claire, and the average value of homes in the entire county of Eau Claire.
  2. There is a significant difference between the average value of homes in the city of Eau Claire, and the the average value of homes in the entire county of Eau Claire.
  3. z-test is used because the sample size of the value of homes in the city of Eau Claire is greater than 30. the sample size is 53.
  4. Confidence level of 95% with a 1 tail z-test on the negative side of the histogram.
  5. z= -2.572 [((151876.51-169438.13)/(49706.919/sqrt(53)))=-2.572].
  6. Reject the null hypothesis because there is a significant difference between the sample mean of home values in the city of Eau Claire compared to the mean of the value of all homes in the county of Eau Claire.  This is done because the z-value of -2.572 falls outside of the degree of freedom of -1.64.

Results:
The city of Eau Claire does have significantly lower average home values. This is shown in the map where the dark brown colors are located within the city of Eau Claire in the top left corner of the county. These dark brown colors fall between <-1.5 and -0.5, this means that they are in the negative spectrum of the standard deviation curve, which means that the home values are less than those in the rest of the county that tends to fall on the positive end of the curve. The mean value shows a difference of roughly $17,000 in the value of homes in the city compared to homes in the rest of the county meaning that homes in the city are worth less.

Comments

Popular posts from this blog

Assignment 3

Assignment 2