Note: Only for Reference
In the second week of our course, we completely covered two chapters (3, 4) and started chapter 5 from our course of Data Analysis and Business Intelligence where we discussed the various topics like measures of location (mean, median, mode), measures of dispersion (range, standard deviation, variance), various charts (dot plots, stem-leaf displays, box plot) for displaying and exploring the data, measures of position, Skewness and coefficient of variation, Interpretation of scatter diagram and contingency table.
Along with this we put insight on some of the basic concepts of probability like ways of assigning probability, types of probability, mutually exclusive events, rules of addition and multiplication, complement rule. This week 2 discussion question is mainly focused on the concepts of chapter 3 and 4.
“Mean is affected by extreme values but the median is not”.
It is a true statement that “Mean is affected by extreme values but the median is not”. Mean is the average of a range of values or quantities, computed by dividing the total of all values by the number of values. The median is that value of the variate which divides the total frequency into two halves.
Median is the value which lies exactly at the center of the distribution regardless of the data’s arrangement in ascending or descending order but on the other hand, mean considers all the data and the heavy fluctuation of the data at the either of the end will shift the mean towards them. Therefore, the mean is affected by extreme values but the median is not.
Numerical Illustration for Lincolnville School District bus data
Given Information:
- Number of observation (N)= 80
- Minimum Value = 505
- Maximum Value = 10575
Class Interval |
Frequency (F) |
Mid Value(M) |
Cum Frequency (C.F) |
F*M |
(M-X̅) |
(M-X̅)2 |
F(M-X̅)2 |
500-1950 |
8 |
1225 |
8 |
9800 |
-3280 |
10758400 |
86067200 |
1950-3400 |
19 |
2675 |
27 |
50825 |
-1830 |
3348900 |
63629100 |
3400-4850 |
27 |
4125 |
54 |
111375 |
-380 |
144400 |
3898800 |
4850- 6300 |
11 |
5575 |
65 |
61325 |
1070 |
1144900 |
12593900 |
6300-7750 |
5 |
7025 |
70 |
35125 |
2520 |
6350400 |
31752000 |
7750-9200 |
5 |
8475 |
75 |
42375 |
3970 |
15760900 |
78804500 |
9200-10650 |
5 |
9925 |
80 |
49625 |
5420 |
29376400 |
146882000 |
|
N=80 |
|
|
∑FM = 360450 |
|
|
∑ F(M-X̅)2 = 423627500
|
Working Notes:
For a number of classes
|
For the value of class interval
|
According to “2 to the K rule” 2k > N ,2k > 80 Trying k = 6 then: 26 > 80 64 > 80 (i.e., 64 is less than 80 which indicates that 6 classes will be not enough.) Trying the next highest number k = 7 then: 27 >80 128>80 (i.e., 128 is greater than 80 which indicates that suggested number of classes is 7.) |
i = (highest value-lowest value)/K i= (10575-505)/7 i=1438.57 (let’s assume i=1450 approximately as the value should be in the multiplication of 10 or 100)
|
Mean (X̅) |
Standard Deviation(σ) |
Median |
X̅ = ∑FM/N = 360450/80 = 4505 |
σ = √∑f (M-X̅)2 /N
= √423627500/80
=2301.16 |
Median Class = 3400 – 4850 Median = L + [{(∑F/2)-C.F}/ F]*(H-L) =3400+ [{80/2-27}/27]*(4850-3400) =4098.1418 =4098 |
For Range:
Range = Maximum value – Minimum value = 10575-505 =10070
- Answer
After looking at the Frequency table of Bus maintenance for Lincolnville School District, the data tend to cluster in between 3400 and 4850 as it contains the highest number of frequencies. The mean maintenance cost was determined to be 4505 with a median cost of 4098.
Yes, one measure could be more representative of the typical cost than the others. This is because mean would not be suitable in case of cost data’s with large fluctuations as it considers all the values causing to shift the mean towards extremities. So the median would be appropriate in such cases. But in case of data’s with lower fluctuations, mean would definitely be the better option. So depending upon the situation, one measure could represent the data’s more accurate than the others.
- Answer
The range (ie; 10070) of the maintenance costs are from 9200 to 10650. After evaluation, the standard deviation was determined to be 2301.16.
Empirical rule = X̅ ± 2 σ
= 4505 ± (2*2301.16)
= 4505 ± 4602.32
= 4505 – (2 * 2301.16) to 4505+ (2 *2301.16)
= -97.32 to 9107.32
Since two standard deviations plus and minus the mean hold 95% of the maintenance costs according to the Empirical Rule the high and low of the 95% interval are- 97.32 and 9107.32.
- Answer
|
|
Difference |
Minimum |
505 |
505 |
Q1 |
3081 |
2576 |
Q2(Median) |
4178.5 |
1097.5 |
Q3 |
5408 |
1229.5 |
Maximum |
10650 |
5167 |
Fig: Whisker’s Box Plot
From the above Whisker Box Plot, we can see that the distribution is positively skewed because the median is not centrally located and the distance of median from the first quartile is less than the distance of the median from the third quartile.
- Building a Culture of Compliance: Strategies for Long-Term Success - January 21, 2025
- Which best describes how an investor makes money from an equity investment? - January 15, 2025
- Informed consent is considered an application of which belmont principle? - January 15, 2025