Continuation of Part 3.
Central Tendency
- Given a Histogram, how would you choose a number/numbers that accurately represents the typical salary - The value at which the frequency is highest called the mode, and this certainly works in describing the distribution, the most common value is the mode. 
- Value in the middle is called the median and this will also work 
- Average is a statistic that rests at a specific spot in the middle of the distribution 
 
- Mode - Mode occurs with the highest frequency
- So what is the mode of [2, 5, 5, 9, 8, 3]
- Answer = 5 because it occurs twice in the dataset
- In case where there are thousands of data point in a histogram, then the mode is the range that occurred with the highest frequency, because we cannot see the individual values but we can see which bin has the highest frequency.
- In case where the entire histogram is at same level then it is called Uniform Distribution, such distributions have no mode.
- In case where there are two or more distinct clear trends will have more than one mode, making it a bimodal distribution.
 
- Mode Explained - Mode can be used to describe any type of data we have, whether it is numerical or categorical.
- Not all scores in the dataset affect mode only the repeating ones.
- If we take a lot of samples from the same population, the mode will be different for each sample.
- Mode changes with change in bin size.
- There is no equation for mode, there is a procedure to find the mode, but we cannot describe it with an equation, since it really depends on how we present the data.
 
- Mean- Sum of all the numbers divided by the total numbers
- If data is [1, 2, 3, 4, 5] then mean = (1 + 2 + 3 + 4 + 5)/5
- For sample we say x bar = Sigma of x divided by n (small n)
- For sample we say mu = Sigma of x divided by N (capital N)
 
- Properties of Mean- All scores in the distribution affect the mean.- Think of mean as a pivot trying to keep the scale balanced, if we add/remove a score the scale will become off-balance and will have to be recalculated to re-balance it.
 
- The mean can be described with a formula.
- Many samples from the same population will have similar or roughly similar mean.
- The mean of the sample can be used to make inferences about the population it came from.
- The mean will change if we add an extreme value to the dataset.- This is known as outlier, these are the values that are unexpectedly different from the other observed values.
- Outliers create skewed distributions by pulling the mean towards the outlier and this causes misleading average/mean.
 
 
- All scores in the distribution affect the mean.
- Median- Sort the data
- Find the middle value of the data
- Median of even numbers is calculated by- First sorting them
- Then we select the two middle values
- Take average of these two middle values
 
- When data has outlier, the median does not get affected much by departures from the norm, this tendency of median is called robust
 
- Median Formula -For even values where X is the value and n is the position of the value - (X(n/2) + X(n/2+1))/2 find the two middle values and then find average of those two values -For odd values - X(n+1)/2 
- Positively Skewed (High frequency towards Left)- Mode or highest frequency will be towards the left due to highest frequency being there
- Mean will be pulled towards the right because of lot of smaller non repeating values are in right
- Median will be in the middle of Mode and Mean
- So Mode is less than Median which is less than Mean (Mode < Median < Mean)
 
- Normally Distributed (frequency in centre)- Mean will be equal to Median which will be equal to Mode (Mean = Median = Mode)
- Mode will occur in the centre bin where the frequency is the highest.
- But also since the distribution is symmetrical therefore the Mean and the Median will both occur pretty much right in the centre.
 
- Measure Of Centre- Mean:- Mean has simple equation
- Mean will always change if any data value changes
- Mean is not affected by change in bin size, it will always be the same, not matter how we visualize the data with the histogram
- Mean is affected severely by outliers
- Mean is not easy to find just by looking at the histogram
 
- Median- Median does not has a simple equation
- Median will not always change if any data value changes
- Median is not affected by change in bin size
- Median is not affected severely by outliers
- Median is not easy to find just by looking at the histogram
 
- Mode- Mode does not has a equation
- Mode will not always change if any data value changes
- Mode is affected by change in bin size
- Mode is not affected severely by outliers
- Mode is easy to find just by looking at the histogram, because it is the highest frequency
- It can be used to describe categorical data, such as gender or country of origin
 
 
- Mean:
- In an introductory statistics course, the same number of students scored below 75% as above 75% on the final exam. What shape(s) could the distribution of final exam scores have? - This is another way of saying that 75% was the median score on the exam. All of these distributions can have a median of 75%. - Uniform - Normal - Bimodal - Positively Skewed - Negatively Skewed
