Assignment 2

Descriptive Statistics and Mean Centers

The purpose of this assignment is to become familiar with different statistical methods. In the first part the focus is on Range, Mean, Median, Mode, Kurtosis, Skewness, and Standard Deviation. The second part of this assignment will focus on analyzing mean centers of Wisconsin. Throughout this assignment we become more familiar with the use of  Excel when performing statistical analysis. 

Definitions

Range: 

Range is the difference between the highest and lowest value in a data set.
Example: If a set of numbers was 1,2,5,5,8,10,12,15  The range would be 14 (15-1=14).

Mean:

Mean is the average of all numbers within a data set. the Mean is calculated by adding all the numbers up and dividing that total by the number of numbers in the set of data.
Example: If a set of numbers was 1,2,5,5,8,10,12,15 the Mean is 7.25 ((1+2+5+5+8+10+12+15)/8=7.25)

Median:

Median is the exact middle number in a data set, but the numbers need to be in order from lowest to highest before determining the middle number. Finding the Median is easy when there is an odd number of numbers in the data set, but when there is an even number, one must take the two middle numbers and add them together and divide it by two this gives the average for those two numbers which intern is the median.
Example:  If a set of numbers was 1,2,5,5,8,10,12,15 the Median is 6.5 (5+8=13/2=6.5)

Mode:

Mode is which ever number occurs the most in a data set, or the number that is the most frequent in the data set.
Example:  If a set of numbers was 1,2,5,5,8,10,12,15 the Mode is 5. this is because it is the only number in the given set of numbers that occurs more than once.

Kurtosis:

Kurtosis describes the shape of a histogram. Kurtosis describes weather the histogram is flat or more peaked. A more peaked distribution is Leptokurtic, a flatter distribution is Platykurtic, and a normal shaped curve is Mesokurtic (Figure 1).

Figure 1: This graph shows the three types of Kurtosis (https://www.bogleheads.org/wiki/Excess_kurtosis).

Skewness:

Skewness describes how the histogram is balanced, it can be positively skewed, negatively skewed or have no skew. No skew would be a normal bell curve shape (figure 2). Positively skewed is when the curve is very high on the left side and it tails off on the right side (Figure 2). A Negative skew curve is high on the right side and tails of on the left side (Figure 2).

Figure 2: This series of graphs shows the different types of skewness (https://www.kullabs.com/classes/subjects/units/lessons/notes/note-detail/9958).

Standard Deviation:

Standard Deviation is a measurement that tells how the numbers in a data set are spread out from the Mean. The numbers Fall either one two or three standard deviations away from the mean (Figure 3). 1 Standard Deviation is equal to 68.3% of the values in a data set. 2 Standard Deviations is equal to 95.4% of the values in a data set. Finally 3 Standard Deviations is equal to 99.7% of the values in a data set. There are two types of Standard Deviation; population standard deviation and sample population standard deviation. The main difference in these two types are what number the sample size is. Population standard deviation uses the exact number of observations as the sample size, where the sample population standard deviation subtracts 1 from the sample size (Figure 4). This is done to reduce sampling bias.

Figure 3: This image shows the percent break down of standard deviation around the Mean.


Figure 4: This shows both standard deviation equations, and how the sampling bias is reduced in the second equation which is shown in N-1.

Part 1: Hand Calculations of Data

In this first part we were given the task of analyzing test scores, using the statistical methods listed above, from two local high schools in the Eau Claire School District, in Wisconsin. Two sets of standardized test scores taken by juniors have been given to us (Figure 5), one set is from Eau Claire North and the other from Memorial High School. Typically Memorial High school tends to have the student with the highest test score. This leads to people questioning the teaching methods at Eau Claire North. The public perceives that since North has low scores the teachers there should be fired.

Figure 5: These are the two sets of test scores from Eau Claire North (left) and Eau Claire Memorial (right).

Methods

The First step in analyzing these two sets of test scores was to determine the standard deviation of both sets of test scores. Although there are more convenient ways of calculating standard deviation through using a computer, this step was done by hand using only a calculator to understand what process the computer must go through in order to calculate these numbers. First I had to determine which type of standard deviation calculation to perform, in this case I used the sample population standard deviation this was used to reduce sampling bias. Next I performed the calculation on both sets of test scores (Figure 6 and 7).
Figure 6: This shows the calculations that were performed to end with the standard deviation of 23.6354 for Eau Claire North High School.


Figure 7: This shows the calculations that were performed to come up with the standard deviation of 27.1573 for Eau Claire Memorial High School.

After calculating the standard deviation for both sets of test scores, it was important to gather the rest of the statistical data in order to determine weather the public's statement in whether the teachers from north should be fired or not is correct. For this step I calculated the range, mean, median, mode, kurtosis, and skewness of both data sets through the use of Excel. The results from both schools including the standard deviation is shown in the table below. 

Figure 8: The results of all the statistical calculations are shown in this table.

Discussion

After calculating the statistics of both test score sets, Eau Claire North should NOT fire its teachers. This conclusion has been come to because in looking at the statistics, a majority of students from Eau Claire North actually did better than a majority of Eau Claire Memorial Students, it just so happens that the highest test score came from one kid at Memorial with a 198 out of 200. The statistics show by looking at the range of both test score sets that North had less of a range between scores, meaning that there is less of a difference between the highest and lower score. By having a lower range North's test scores are more closely related unlike Memorial where there is a bigger gap between the highest and lowest score. Additionally North had a higher mean (North: 160.923, Memorial: 158.538) this means on average North's students had higher scores on the standardized test. The mode shows the most frequent score, North once again is better than Memorial in this statistic because more students frequently got a higher score where memorial has a more frequent low test score (North: 170, Memorial: 120). The kurtosis of north is more leptokurtic than the kurtosis of Memorial, this means that North has a majority more high scores than Memorial (North: -0.557, Memorial:-1.174). The skewness of both North and Memorial tend to be negatively skewed, meaning that both have more test scores that fall below the Mean than above the mean. North has a lower Standard Deviation than Memorial, this statistic falls in the favor of North also because it means that more test scores are closely related than at Memorial and the curve is more peaked this means more scores fit into each deviation, showing that scores are more closely related. 

After providing this analysis of the statistics for each school, Eau Claire North should not fire its teachers because other than the one kid at memorial who did really well, a Majority of the kids at memorial did not do as well as the majority of the kids at North. 


Part 2: Calculating Mean Centers and Weighted Mean Centers

The purpose of this part of the assignment is to analyze the state of Wisconsin through determining the geographic mean center, and the weighted mean center based on population data from 2000 and 2015 which was found from the U.S. Census Bureau. In using this we are to create a map which has 3 different points showing the Geographic Mean Center, and both weighted mean centers from population from 2000 and 2015. 

Definitions

Geographic Mean Center
The geographic mean center is the physical center point of an area on a x,y plane or a Cartesian plane.

Weighted Mean Center
The weighted mean center is used to find the center point of an area but focusing on a specific part of the data. The data pulls and pushes the center point to different areas based on where more or less points fall on an x,y plane or Cartesian plane.

Methods

In order to determine the geographic mean center of Wisconsin, first it required downloading a shape file of Wisconsin into ArcGIS, then using the mean center tool (figure 9) which determined the exact center point of Wisconsin based on the area of the state. By only entering the shape file of Wisconsin into the tool and leaving the weight, case, and attribute fields blank he tool knows to take only the exact center point just based on area for the state (figure 10). 

Figure 9: This is the Mean Center tool which is used for determining the geographic mean center of Wisconsin.
Figure 10: This shows the geographic mean center of the state of Wisconsin.

After determining the geographic mean center, next i found the weighted mean center of Wisconsin based on population data from 2000 and 2015. To find the weighted center based on these different population data sets, once again the mean center tool was used to find those points. This time in the tool it is important to select the data that needs to be used in order to weight the center, in this case first population 2000 was entered into the "weight field" in the tool (figure 11). After running the tool a new point was placed south east of the previous point (figure 12). Then finally the same process was repeated except using population 2015 (figure 13 and 14). 


Figure 11: This shows the mean center tool but this time with the weight field containing population 2000 data.


Figure 12: This shows where the weighted mean center point is in the state of Wisconsin according to population 2000 data.


Figure 13: This shows the mean center tool but this time with the weight field containing population 2015 data.


Figure 14: This shows where the weighted mean center point is in the state of Wisconsin according to population 2015 data.

After gathering the three points, a map was created to show the movement of the points from the center to where the weighted centers are (figure 15). This map shows the locations of the Geographic Mean Center (yellow) and the two weighted mean centers of the 2000 population (red) and the 2015 population (green) (figure 15). 

Figure 15: This map shows all of the Mean Center points on the same map which is helpful when analyzing why these center points moved. 

 Discussion

The map in Figure 15 shows that in terms of the geographic area of Wisconsin the geographic mean center falls in Wood county. It falls here because this is the center of the state, there is an equal amount of space on all sides of that point. When weighting the mean center by the 2000 population data, it moves the center to the south east of the state into Green Lake county from where the geographic center is in Wood county. This is attributed to the fact that there are much more larger cities in the southeast portion of the state such as Kenosha Racine, and the big one being Milwaukee. By having larger cities in the south east, it pulls the mean center towards that direction. Since the only really large cities on the northwest side of the state are Eau Claire and Superior, even though these are larger cities there is just not enough people as compared to the many large cities in the south east. Then when finding the weighted mean center for 2015 (green), we see that it still falls in Green Lake county, but it is ever so slightly farther south than in 2000. This could be attributed to the fact that counties in the south had a slightly higher increase in population compared to the northern counties. This will shift the point lower as we see in the map.

Sources

"Excess kurtosis." Excess kurtosis - Bogleheads. December 26, 2016. Accessed October 11, 2017. https://www.bogleheads.org/wiki/Excess_kurtosis.

Simplified, Learning. "Skewness with Example." World's Fastest Growing Educational Portal. Accessed October 11, 2017. https://www.kullabs.com/classes/subjects/units/lessons/notes/note-detail/9958.




Comments

Popular posts from this blog

Assignment 4- Hypothesis Testing

Assignment 3