Handout 1 - Introduction To Setting Up Python
Handout 1 - Introduction To Setting Up Python
Handout 1 - Introduction To Setting Up Python
TO
PYTHON/JUPYTER NOTEBOOK
By
1|Page
PYTHON/JUPYTER NOTEBOOK: Course Outline
• Data summarization
• Frequencies
2|Page
• Crosstabs
• Normality Test (Checking that your data is normally distributed)
• Bar plots
• Stacked graphs
• Histograms
• Box plots
• Area graphs
• Line graphs
• Grouped plots
• Scatterplots
3|Page
This series of workshops is designed for students who plan to use the free statistical
software PYTHON for statistical analysis and graphical presentation. Python has
more than one API that can be used to access the python language. I will also
introduce JUPYTER NOTEBOOK, a code environment that helps to use of Python.
PYTHON has become the most popular software environment followed by SQL
and the R. In addition, it is a free and open source, so if you can use Python, then
you will never be constrained by your future employer's choice of statistical
software. This means that the skills you learn now can follow you for the rest of your
life. Python is becoming the primary language of data science and statistics and
is being adopted across academia, government, and businesses to help manage
and learn from the growing volume of data being obtained. Hopefully, you will
get a sense of some of the power of Python from these workshops.
Look at the table below and see for yourself how Python compares to other
commercial statistical software packages (SPSS, SAS and STATA) available on the
market.
A comparison of Tools for data analysis
Features SPSS SAS STATA R Python 3.0
Data manipulation Strong Very strong Strong Very strong Very Strong
Data analysis Very strong Very strong Very strong Very strong Strong
4|Page
What is Python?
Python is a high-level scripting language which can be used for a wide variety of
text processing, system administration and internet-related tasks. Unlike many
similar languages, its core language is very small and easy to master, while
allowing the addition of modules to perform a virtually limitless variety of tasks.
Python is a true object-oriented language, and is available on a wide variety of
platforms. There is even a python interpreter written entirely in Java, further
enhancing python’s position as an excellent solution for internet-based problems.
Python was developed in the early 1990’s by Guido van Rossum, then at CWI in
Amsterdam, and currently at CNRI in Virginia. In some ways, python grew out of
a project to design a computer language which would be easy for beginners to
learn yet would be powerful enough for even advanced users. This heritage is
reflected in python’s small, clean syntax and the thoroughness of the
implementation of ideas like object-oriented programming, without eliminating
the ability to program in a more traditional style. So, python is an excellent choice
as a first programming language without sacrificing the power and advanced
capabilities that users will eventually need.
Python 2.0 was released on 16 October 2000 and had many major new features,
including a cycle-detecting garbage collector and support for Unicode.
Python 3.0 (py3k) was released on 3 December 2008 after a long testing period
5|Page
Although pictures of snakes often appear on python books and websites, the
name is derived from Guido van Rossum’s favourite TV show, “Monty Python’s
Flying Circus”. For this reason, lots of online and print documentation for the
language has a light and humorous touch. Interestingly, many experienced
programmers report that python has brought back a lot of the fun they used to
have programming, so van Rossum’s inspiration may be well expressed in the
language itself.
6|Page
1. WHY PYTHON?
There are several reasons for data scientists to adopt Python as their preferred
programming language, including:
1. Open-source nature and active community
2. General purpose
- suitable for analysis of financial data
- other fields – DJANGO
- Web programming
3. High level language: Employs syntax closer to human language – makes
language easier to learn and implement
4. Shorter learning curve and Easy-to-learn with a syntax that is clear and
intuitive
5. It provides the larger ecosystem of a programming language (large
collection of powerful and standardized libraries – and also THIRD PARTY
SOFTWARE) and the depth of good scientific computation libraries
6. Very powerful: Powerful integration with fast, compiled languages (e.g.
C/C++) for numerical computation primitives (as used in NumPy and
pandas)
7. Ease of integrating the core modeling process with database access,
wrangling post-processing, such as visualization and web-serving
8. Availability and continued development of Pythonic interfaces to Big
Data frameworks such as Apache Spark or MongoDB
9. Support and development of Python libraries by large and influential
organizations such as Google or Facebook (e.g. TensorFlow and
PyTorch)
10. Python has certain advantages that can improve coding, especially in
the large corporations and professional environment.
2. WHY JUPYTER?
Jupyter Notebook
Language kernels
Jupyter + Python: {the file extension foe the Jupyter notebook files is the .ipynb:
iPython notebook document}; R and Julia
Text Can create your notes and save then together with
your code and formulae [MARKDOWN]
Code Can type your code and execute in the CODE MODE
8|Page
Python Datatypes
Python offers several powerful data structures, and it pays off to make yourself
familiar with them.
o Arrays to work with numerical data. (Python also offers the data type
matrix. However, it is recommended to use arrays, since many
numerical and scientific functions will not accept input data in matrix
format.)
9|Page
PYTHON LIBRARIES FOR DATA ANALYSIS
10 | P a g e
SETTING UP PYTHON
Installing Python
PLEASE copy and paste the following link (below) into your browser:
www.anaconda.com
Scroll down till you get to this part (below) of the home page
11 | P a g e
Then click on the Individual Edition and this will take you to this page (below)
Click DOWNLOAD and then select one (match [64-Bit or 32-Bit] your computer
PLEASE) of the 3 operating systems:
• Windows
• MacOS
• Linux
12 | P a g e
On the screen that will appear
On the Control Panel then select (to verify the system you are using)
System
Please check your whether your operating systems is a 32bit or 64bit and
then choose the matching version of Python
Select the latest Python version 3.8 or the latest version available on the
download page
After downloading
Follow instructions till the end as illustrated below for either the
Windows or MAC OS operating systems
---------------------------------------------------------------------------------------------------------------------
13 | P a g e
A Step-By-Step Guide on How to Install Python and Jupyter Notebook in
Anaconda for the owners of computers running on Windows and Mac OS
Install Anaconda
What is Anaconda?
Anaconda free open source is distributing both Python and R programming
language. Anaconda is widely used in the scientific community and data scientist
to carry out Machine Learning project or data analysis.
14 | P a g e
Step 2) Accept the License Agreement
15 | P a g e
Step 4) Select Destination Folder and Click Next
16 | P a g e
Step 6) Installation will begin
17 | P a g e
FOR: Mac User
By default, Chrome selects the downloading page of your system. In this section,
installation is done for Mac. If you run on Windows or Linux, download Anaconda
5.1 for Windows installer or Anaconda 5.1 for Linux installer.
Step 2) You are now ready to install Anaconda. Double-click on the downloaded
file to begin the installation. It is .dmg for mac and .exe for windows. You will be
asked to confirm the installation. Click Continue button.
18 | P a g e
You are redirected to the Anaconda3 Installer.
19 | P a g e
Step 3) Next window displays the ReadMe. After you are done reading the
document, click Continue
Step 4) This window shows the Anaconda End User License Agreement. Click
Continue to agree.
20 | P a g e
Step 5) You are prompted to agree, click Agree to go to the next step.
Step 6) Click Change Install Location to set the location of Anaconda. By default,
Anaconda is installed in the user environment: Users/YOURNAME/.
21 | P a g e
Select the destination by clicking on Install for me only. It means Anaconda will
be accessible only to this user.
Step 7) You can install Anaconda now. Click Install to proceed. Anaconda takes
around 2.5 GB on your hard drive.
22 | P a g e
A message box is prompt. You need to confirm by typing your password. Hit Install
Software
23 | P a g e
Step 8) Anaconda asks you if you want to install Microsoft VSCode. You can
ignore it and hit Continue
24 | P a g e
You are asked if you want to move "Anaconda3" installer to the Trash. Click Move
to Trash
25 | P a g e
THE JUPYTER NOTEBOOK INTERFACE
26 | P a g e
or click on Anaconda Navigator (Anaconda 3)
27 | P a g e
After clicking OK, then click LAUNCH on JUPYTER NOTEBOOK
or
28 | P a g e
and it will open in your browser
29 | P a g e
4. JUPYTER’S INTERFACE – THE DASHBOARD
2 ipynb
UPLOAD – can load a file – the python scripts, data sets or pdf documents into
NEW
Text file
Folders
Notebooks
30 | P a g e
FIRST CREATE A FOLDER FOR EACH PROJECT
31 | P a g e
Open the newly renamed FOLDER
Then click on NEW and then select PYTHON 3 then see a screen similar to the
one below
When you create a new notebook document, you will be presented with
the notebook name, a menu bar, a toolbar and an empty code cell.
Notebook name: The name displayed at the top of the page, next to the
Jupyter logo, reflects the name of the MyFileName .ipynb file. Clicking on the
notebook name brings up a dialog which allows you to rename it. Thus,
32 | P a g e
renaming a notebook from “Untitled” to “My first notebook” in the browser,
Menu bar: The menu bar presents different options that may be used to
manipulate the way the notebook functions.
Toolbar: The tool bar gives a quick way of performing the most-used operations
within the notebook, by clicking on an icon.
Code cell: the default type of cell; read on for an explanation of cells.
NAMING
You will notice that at the top of the page is the word Untitled. This is the title
for the page and the name of your Notebook. Since that is not a very
descriptive name, let us change it!
Just move your mouse over the word Untitled and click on the text. You should
now see an in-browser dialog titled Rename Notebook. Let us rename this one
to Hello Jupyter:
33 | P a g e
Structure of a notebook document
CODE CELL
The notebook consists of a sequence of cells. A cell is a multiline text input field,
and its contents can be executed by using Shift-Enter, or by clicking either the
“Play” button the toolbar, or Cell , Run in the menu bar. The execution
behaviour of a cell is determined by the cell’s type. There are three types of
cells: code cells, markdown cells, and raw cells. Every cell starts off being
a code cell, but its type can be changed by using a drop-down on the toolbar
(which will be “Code”, initially), or via keyboard shortcuts indicated below.
Keyboard shortcuts
All actions in the notebook can be performed with the mouse, but keyboard
shortcuts are also available for the most common ones. The essential shortcuts
to remember are the following:
34 | P a g e
For the full list of available shortcuts, click Help , Keyboard Shortcuts in the
notebook menus.
For more information on the different things you can do in a notebook, see
the collection of examples on this following link:
https://nbviewer.jupyter.org/github/jupyter/notebook/tree/master/docs/sour
ce/examples/Notebook/
Code cells
A code cell allows you to edit and write new code, with full syntax highlighting
and tab completion. The programming language you use depends on
the kernel, and the default kernel (IPython) runs Python code.
When a code cell is executed, code that it contains is sent to the kernel
associated with the notebook. The results that are returned from this
computation are then displayed in the notebook as the cell’s output. The
output is not limited to text, with many other possible forms of output are also
possible, including matplotlib figures and HTML tables (as used, for example, in
the pandas data analysis package). This is known as IPython’s rich
display capability.
35 | P a g e
The MENU BAR
The Jupyter Notebook has several menus that you can use to interact with your
Notebook. The menu runs along the top of the Notebook just like menus do in
other applications. Here is a list of the current menus:
• File
• Edit
• View
• Insert
• Cell
• Kernel
• Widgets
• Help
Let us go over the menus one by one. I will not go into detail for every single
option in every menu, but I will focus on the items that are unique to the
Notebook application.
The first menu is the File menu. In it, you can create a new Notebook or open
a pre-existing one. This is also where you would go to rename a Notebook. I
think the most interesting menu item is the Save and Checkpoint option. This
allows you to create checkpoints that you can roll back to if you need to.
36 | P a g e
Next is the Edit menu. Here you can cut, copy, and paste cells. This is also where
you would go if you wanted to delete, split, or merge a cell. You can reorder
cells here too.
37 | P a g e
Note that some of the items in this menu are greyed-out. The reason for this is
that they do not apply to the currently selected cell. For example, a code cell
cannot have an image inserted into it, but a Markdown cell can. If you see a
greyed-out menu item, try changing the cell’s type and see if the item
becomes available to use.
The View menu is useful for toggling the visibility of the header and toolbar. You
can also toggle Line Numbers within cells on or off. This is also where you would
go if you want to mess about with the cell’s toolbar.
The Insert menu is just for inserting cells above or below the currently selected
cell.
38 | P a g e
The Cell menu allows you to run one cell, a group of cells, or all the cells. You
can also go here to change a cell’s type, although I personally find the toolbar
to be more intuitive for that.
39 | P a g e
The other handy feature in this menu is the ability to clear a cell’s output. If you
are planning to share your Notebook with others, you will probably want to
clear the output first so that the next person can run the cells themselves.
40 | P a g e
The Kernel cell is for working with the kernel that is running in the background.
Here you can restart the kernel, reconnect to it, shut it down, or even change
which kernel your Notebook is using.
You probably will not be working with the Kernel all that often, but there are
times when you are debugging a Notebook that you will find you need to
restart the Kernel. When that happens, this is where you would go.
41 | P a g e
The Widgets menu is for saving and clearing widget state. Widgets are
basically JavaScript widgets that you can add to your cells to make dynamic
content using Python (or another Kernel).
Finally, you have the Help menu, which is where you go to learn about the
Notebook’s keyboard shortcuts, a user interface tour, and lots of reference
material.
42 | P a g e
JUPYTER’S INTERFACE – Prerequisites for coding
INPUT FIELD
• Green borders and pen show that you are in the Edit Mode
• To close EDIT MODE PRESS ESC so that you can go to the
COMMAND MODE
• To execute this code can either press CTRL + ENTER OR press the RUN
Icon in the TOOL BAR
SHIFT + ENTER
CUT, COPY & PASTE CELLS
43 | P a g e
X – allows you to copy a cell
C – Copy
allows you to move the IN & OUT FIELDS TOGETHER either up or down
DELETING A CELL
44 | P a g e
NOTEBOOK CELLS
CODE CELLS
- This is the cell where you write your python code that will be computed by
the ipython kernel and the output is displayed under the cell.
- When such cell is run, its result is displayed in an output cell. The output may
be text, image, matplotlib plots or HTML tables. Code cells have rich text
capability.
MARKDOWN CELL
- This is where you add the documentation by putting text formatted using
- All kinds of formatting features are available like making text bold and
italic, displaying ordered or unordered list, rendering tabular contents etc.
45 | P a g e
To convert a cell into a Markdown Cell, select markdown from dropdown
menu as shown below
Select the CELL and then PRESS M (the key shortcut) and then start typing
OR
You can use backslash to generate literal characters which would otherwise
have special meaning in the Markdown syntax.
\*literal asterisks\*
*literal asterisks*
46 | P a g e
RAW CELLS
Contents in raw cells are not evaluated by notebook kernel. When passed
through nbconvert, they will be rendered as desired. If you type LatEx in a raw
cell, rendering will happen after nbconvert is applied.
- The Raw NBConvert cell type is only intended for special use cases when
using the nbconvert command line tool.
- Basically it allows you to control the formatting in a very specific way when
converting from a your jupyter notebook into another file format like PDF,
HTML, etc
HEADING
47 | P a g e
MAKING TITLES AND SUBTITLES
You make titles using hashtags. A single hashtag gives you a title, two
hashtags gives you a subtitle and so on as shown below:
# Title
## Subtitle
** bold **
* italics *
* bullet 1
* bullet 2
48 | P a g e
PERFORMING BASIC MATHEMATICAL CALCULATIONS
+ add()
- sub(), subtract()
* mul(), multiply()
// floordiv()
% mod()
** pow()
&
HAPPY CODING!!
49 | P a g e