Codecademy Learn Statistics With Python
Codecademy Learn Statistics With Python
Histograms
A histogram is an accurate representation of the distribution of numerical data
Range = max(data) – min(data)
A bin is a sub-range of values that falls within the range of a dataset. All bins in a histogram must
be the same width.
Create an array in numpy with:
np.aray([values])
Find max and min values:
np.amax(npArray)
np.amin(npArray)
A count is the number of values that fall within the bin’s range.
Use np.histogram(inputArray, range, bins) to find the counts for a array.
np.title(“Title”) for a title
np.xlabel(“xLabel”) for the xlabel
np.ylabel(“yLabel”) for the ylabel
np.show() to show the histogram
Mean
The mean is found my adding all the numbers in the set and then dividing by how many numbers
there are.
np.average(array) allows us to find the mean from an array
Median
The median is the middle number in a set of stored numbers. If there are an even number of
values in a dataset, you either report both of the middle two values or their average.
np.median(array) allows us to find the median from an array
Mode
The mode is the most frequent occurring observation in a dataset. A dataset can have multiple
modes
stats.mode(array)
Variance
The Variance is the average of the squared differences from the Mean.
The variance is usually represented by the symbol sigma squared
Standard Deviation
The standard deviation is a measure of how spread out numbers are.
Standard deviation is computed by taking the square root of the variance. Sigma is the symbol
used for standard deviation.