0% found this document useful (0 votes)
17 views2 pages

Lab02 Summary Measures - Ipynb

The document discusses calculating summary statistics such as mean, median, mode, variance, and percentiles using NumPy functions. It creates sample data and demonstrates calculating various summary metrics in one line of code.

Uploaded by

joumana.r.daher
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views2 pages

Lab02 Summary Measures - Ipynb

The document discusses calculating summary statistics such as mean, median, mode, variance, and percentiles using NumPy functions. It creates sample data and demonstrates calculating various summary metrics in one line of code.

Uploaded by

joumana.r.daher
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":

[],"collapsed_sections":[],"toc_visible":true},"kernelspec":
{"name":"python3","display_name":"Python 3"},"language_info":
{"name":"python"}},"cells":[{"cell_type":"markdown","source":["***Objective:***\
n","\n","Summary statistics summarize and provide information about your sample
data. It tells you something about the values in your data set\n","\n","The main
objective in this notebook is to learn how to use already built in functions in
numpy to calculate easily these summary statistics measures such as the mean, mode,
median, maximum, minimum, quantiles, percentiles...etc. in one line of code and in
less than 1 minute instead of calculating them manually\n","\n"],"metadata":
{"id":"KFnwvRv68Zmj"}},{"cell_type":"markdown","source":["# Summary
Statistics"],"metadata":{"id":"kU3lKYnyA3El"}},{"cell_type":"markdown","source":
["**Importing Libraries**\n"],"metadata":{"id":"18qlkn6q8n2O"}},
{"cell_type":"code","source":["import numpy as np"],"metadata":
{"id":"OGuh4PFJBAmN"},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":["**Creating a Dataframe with 5 columns : septal
length, width, petal length, width and the species** (pandas
introduction)"],"metadata":{"id":"u4UCVDd982nB"}},{"cell_type":"code","source":
["#First lets create a list and convert it to numpy array \n","x =
np.array([5.1,4.9,5.8,7.9,5.9,6.2,6.3,5.7])\n","# y=
[4.9,3,1.4,0.2,5.4,3.9,1.7,0.4]\n"],"metadata":
{"id":"6e8Ao8O_Dg5k"},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":["## **Sample Mean**\n"," \n"],"metadata":
{"id":"kfEubN1gJLj-"}},{"cell_type":"markdown","source":["$$\\frac{1}{n} \\
sum_{i=i}^{n} x_{i}$$"],"metadata":{"id":"yj3vCfwPF9Ft"}},
{"cell_type":"markdown","source":["**Let's Calculate the mean for the septal length
column specifically**"],"metadata":{"id":"yJOyfx109lGC"}},
{"cell_type":"markdown","source":["**Use already built in a function
called .mean()**\n"],"metadata":{"id":"qMGAGwbnAfNH"}},
{"cell_type":"code","source":["x_mean = np.mean(x)\n","x_mean"],"metadata":
{"id":"MDIXJxkDJYFD"},"execution_count":null,"outputs":[]},
{"cell_type":"code","source":["# alternative way use the method?\n","x_mean2 =
x.mean()\n","x_mean2"],"metadata":
{"id":"_3gTpZlF4G0P"},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":["## **Population Variance**\n"],"metadata":
{"id":"bglLkh0qJdLZ"}},{"cell_type":"markdown","source":["$$\\sigma^2 = \\frac{\\
displaystyle\\sum_{i=1}^{n}(x_i - \\mu)^2} {N}$$"],"metadata":
{"id":"CYSJtRTTGb_8"}},{"cell_type":"markdown","source":["**Use already built in a
function called .var()**\n"],"metadata":{"id":"nWCxyGXAB7UQ"}},
{"cell_type":"code","source":["\n","var = np.var(x)\n","var"],"metadata":
{"id":"pyBrhH8MJjrf"},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":["##**Population Standard
Deviation**"],"metadata":{"id":"IlVmmfvvJvaU"}},{"cell_type":"markdown","source":
["$$\\sigma = \\sqrt\\frac{\\sum(x_i - \\mu)^2}{N}$$"],"metadata":
{"id":"9idGZJbpt3ZC"}},{"cell_type":"code","source":["# 1st Method\
n","np.sqrt(np.var(x))"],"metadata":
{"id":"bgVGjFi2JzXO"},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":["**Now use a function already built in for the
standard deviation**"],"metadata":{"id":"DBNq55NVCYcZ"}},
{"cell_type":"code","source":["# 2nd Method\n","std = np.std(x)\
n","std"],"metadata":{"id":"2DXTfHQdJ3gA"},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":["# Sample Variance"],"metadata":
{"id":"SqnkzrHs41yZ"}},{"cell_type":"markdown","source":["$$s^2 = \\frac{\\
displaystyle\\sum_{i=1}^{n}(x_i - \\bar x)^2} {n-1}$$"],"metadata":
{"id":"sNXd9a2X5G58"}},{"cell_type":"code","source":["\n","var = np.var(x, ddof =
1)\n","var"],"metadata":{"id":"lvGFCQuD4zVa"},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":["##**Minimum and Maximum**"],"metadata":
{"id":"RAy5G05aKFSu"}},{"cell_type":"code","source":["np.min(x)"],"metadata":
{"id":"KYslx3SZKAmv"},"execution_count":null,"outputs":[]},
{"cell_type":"code","source":["np.max(x)"],"metadata":{"id":"n-
NMIbKPKISo"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":
["##**Mode**"],"metadata":{"id":"8p-hH_F6KKk-"}},{"cell_type":"code","source":
["import statistics\n","z = [4, 1, 2, 2, 3, 5]\n","statistics.mode(z)"],"metadata":
{"id":"sNB07xBbKOLa"},"execution_count":null,"outputs":[]},
{"cell_type":"code","source":["z1=
[\"few\", \"few\", \"many\", \"some\", \"many\",\"few\"]\
n","statistics.mode(z1)"],"metadata":{"id":"P-
dq0EoFEXhM"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":
["\n","##**Percentiles**\n"],"metadata":{"id":"um8Xn-p-C7gR"}},
{"cell_type":"markdown","source":["###**25th Percentile**"],"metadata":{"id":"S-
yimGUIC0ZA"}},{"cell_type":"code","source":["# Store the following values in an
array named arr \n","arr = [20, 2, 7, 1, 34]\n","print(\"arr : \",
arr)"],"metadata":{"id":"2qOBHMIlWvSS"},"execution_count":null,"outputs":[]},
{"cell_type":"code","source":["print(\"25th percentile of arr : \",\n","
np.percentile(arr, 25))"],"metadata":
{"id":"3cwoutQeVhO6"},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":["###**75th percentile**"],"metadata":
{"id":"JSno9J_lC_8I"}},{"cell_type":"code","source":["print(\"75th percentile of
arr : \",\n"," np.percentile(arr, 75))"],"metadata":
{"id":"OWR8YN6_KaRE"},"execution_count":null,"outputs":[]},
{"cell_type":"code","source":["np.percentile(arr, q=[25, 75],
interpolation='lower')"],"metadata":
{"id":"RxWr7yAWKc6C"},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":["###**50th Percentile OR Median**"],"metadata":
{"id":"rTyskfjJKtAH"}},{"cell_type":"code","source":["#Easiest way to calculate the
median using a built in function called .median()\n","np.median(arr)"],"metadata":
{"id":"xISFQtZqK0XP"},"execution_count":null,"outputs":[]},
{"cell_type":"code","source":["# Or we can use the 50th percentile\n","print(\"The
median is : \",\n"," np.percentile(arr, 50))"],"metadata":
{"id":"lyNqedwBXHpO"},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":["TO DO EXERCISE:\n","\n","1- CREATE A NUMPY ARRAY
THAT CONTAINS THE FOLLOWING NUMBERS 4.9,3,1.4,0.2,5.4,3.9,1.7,0.4\n","\n","2- NOW
APPLY THE FOLLOWING SUMMARY MEASURES: MEAN, VARIANCE, STD, MIN, MAX, 40TH
PERCENTILE\n","\n","3- FINALLY CALCULATE THE MEDIAN USING 2 DIFFERENT
METHODS"],"metadata":{"id":"7j2glQvPXinq"}},{"cell_type":"markdown","source":
["https://github.com/rasbt/data-science-tutorial/blob/master/code/summary-
stats.ipynb\n"],"metadata":{"id":"bEeSg-zpFEAh"}}]}

You might also like