Module 4 - Introduction To Jupyter Notebook
Module 4 - Introduction To Jupyter Notebook
Introduction
To Jupyter
Learning Competencies
4.1. Understand the different uses of Jupyter in the field of Data Science.
4.2. Process relevant information in understanding many projects in the Jupyter
ecosystem
4.3. Familiarize on the user interface of Jupyter Notebook
4.4. Identify kernels that can be used in running a Jupyter
4.5. Learn the different architectural designs of core pieces in the Jupyter
ecosystem
G11: 39
MODULE 4:
Open-source tools are software tools that are freely available without a commercial license. It also
refers to an openly distributed program code which includes codes that are free of charge and can be
utilized and modified by the end user without restriction. Many kinds of open-source tools allow
developers and others to do certain things in programming, maintaining technologies or other types of
technology tasks.
Open-source software differs from commercially licensed software that is offered for a price to users.
Many software products such as the big-data tool Hadoop and other related tools from the Apache
Foundation, are well-known examples of open-source technologies. Instead of a firm profiting from
software, majority of these are freely available, with the licensing owned by a user community.
Some of the advantages of open-source includes codes that can be known and modified to solve
problems or can be adapted for specific needs and can be freely redistributed. These codes also remain
in a state where it can be improved through intervention, revision and new ideas of developers. Open-
source also promotes a higher standard of quality and allows continuity of projects, even after a
change of programmers or responsible parties.
PROJECT JUPYTER
Data science is a science. Anything that will make it easier for data scientist to adapt and explore,
whether its elastic infrastructure of Jupyter Notebooks, can help them make progress. Data cleaning
and transformation, numerical simulation, exploratory data analysis, data visualization, statistical
modeling, machine learning, and data visualization are just few of the areas where Jupyter is leading
the industry.
If you are going to involve yourself in data science, you must get yourself familiarize with Jupyter. It
is a huge popular open source project, known as Project Jupyter and best known for Jupyter Notebook.
Project Jupyter is a non-profit, an open-source project that exists to develop open-source software,
open-standards, and services for interactive data science and scientific computing across all
programming languages. It is born out of the IPython Project in 2014 which is a growing project with
increasingly language-agnostic components such as the notebook format, message protocol, qtconsole,
notebook web application, etc. IPython itself is focused on interactive Python, and part of it is to
provide a Python kernel for Jupyter.
G11: 40
MODULE 4:
Project Jupyter has a broad collaboration of a large community that develops open-source tools for
interactive and exploratory computing. The tools include over 100 computer languages that focuses
on Python, the Jupyter Notebook, JupyterHub, and an ecosystem of extensions.
Data science is used in many industries and the predominant technologies and algorithms used by
these industries are available in Jupyter. Some of the the industries that are larger users of data science
include the following:
Jupyter Notebook
The Jupyter Notebook’s popularity is fueled by its adaptation as a favorite environment in doing data
science. It is also a grown platform that used in classrooms to develop teaching materials, share
lessons and tutorials, and create computational stories. This Notebook can also contain documents of
text narratives with images and math, combined with executable codes supported by many languages
and the output of the code. These combinations of content and code makes it a powerful form of data-
based communication that is why more educators are choosing Jupyter in teaching.
Jupyter Notebook is a part of the Jupyter, a software that allows the user to create Jupyter notebook.
This notebook extends the console-based approach to interactive computing in a qualitatively new
direction. It provides web-based application that is suitable for capturing the whole computation
process, such as developing, documenting, executing the codes, and communicating the results. The
uses of Jupyter Notebook include data cleaning and transformation, numerical simulation, statistical
modeling, data visualization, machine learning, and many more. Jupyter Notebook has the following
features:
1. Language of Choice. Jupyter supports over 40 programming languages including Python, R,
Julia, and Scala.
2. Share Notebooks. Notebooks can be shared with others using email, Dropbox, GitHub and
the Jupyter Notebook Viewer.
3. Interactive Output. Your code can produce rich, interactive output such as HTML, images,
video, LaTeX, and custom MIME types.
4. Big Data Integration. It leverages big data tools, such as Apache Spark, from Python, R and
Scala. Explore that same data with pandas, scikit-learn, ggplot2 and TensorFlow.
G11: 41
MODULE 4:
It also has “Dashboard”, the Notebook Dashboard that serves as the control panel which shows your
local files; allowing to open notebook documents and shutting down their kernels.
This is the Notebook Dashboard, this is shown first when you launch a Jupyter Notebook App. This
component is mainly used when you want to open notebook documents and manage running kernels.
Its features are similar to a file manager, such as navigating folders, renaming and deleting files. You
can also see here three tabs: Files, Running and Clusters.
G11: 42
MODULE 4:
Files tab – lists all available objects to Jupyter. All files that are used by Jupyter are stored as
regular files on your disk. It provides context managers that know how to process the
different types of files and programs used. Jupyter files (with an .ipynb file extension) can
be seen when you use Windows Explorer in viewing your file contents.
Running tab – lists all the notebooks that have been started and allows you to control which
notebooks are running at any time. The Jupyter keeps track of which notebook are running.
Clusters tab – this tab is for environments where several machines are in use for running
Jupyter.
The prompt Select items to perform action on them tells that you can select multiple items and then
perform same action for all of them. Most of the actions in the menus can be performed over a single
item or selected set of items.
G11: 43
MODULE 4:
The Upload button present a prompt to select a file the file you will upload in Jupyter. This tab is
used to move a data file into the project to access in case that Jupyter is running as a website in a
remote location wherein you can’t just copy the file to the disk where Jupyter is running.
The New button pulls down a menu which presents list of choices of the what kind of Jupyter project
you can create base on the available kernels.
You can also see the available list of objects that can Jupyter can create such as Text file and Folders.
Once you created a new notebook, it will open in a new browser tab and will reflect as a new entry in
the notebook list of the dashboard.
Opening Notebooks
An open notebook has only one interactive session that is connected to a kernel which will execute the
code sent by the user and results will be transmitted back. This kernel still remains active even if you
close your web browser, and if you are going to open the same notebook from the dashboard the web
application will still reconnect on the same kernel.
Tool Bar
Code Cell
G11: 44
MODULE 4:
Notebook Name – It is the name that displays on top, next to the Jupyter logo and clicking on
it will allow you to rename your notebook.
Menu Bar – it presents different options that you can use to manipulate some functions in the
notebook.
Toolbar – it gives you a quick access to perform commonly used operations on the notebook
by clicking on each icon.
Code Cell – it is the default type of cell and it is where you can read on for explanations on
cells.
Code Cell – it allows you to write new code and edit, with tab completion and syntax
highlighting where the programming language you use depends on the kernel. The default
kernel (Ipython) runs Python code. When the code cell is executed, the code is sent to the
kernel that is associated with the notebook. The result from the execution are displayed as the
cell output in the notebook.
Markdown Cell – your computation process can be documented in a literate way by
alternating descriptive text with code using rich text. This is accomplished by marking up text
with the Markdown language in IPython. The corresponding cells are called markdown cells.
The markdown language provides a simple way to perform text markup to specify which part
of the texts should be emphasized. To provide structure for your document, markdown
headings can be used. These headings are consisting of 1 to 6 hash signs followed by space
G11: 45
MODULE 4:
and title of your section. It will convert the markdown heading to clickable link for a section
on the notebook. When a markdown is executed, markdown code is converted to its
corresponding formatted rich text.
With Markdown cells, you can also easily include mathematics using a LaTeX standard
notation: $...$ for inline math and $....$$ for mathematics displayed. When running the
markdown cell, the LaTeX portions are rendered as equations with high-quality typography in
the HTML output. MathJax supported a large subset of LaTeX features makes this possible.
This is possible.
Raw Cell – it gives you a place to write output directly, but on the other hand these are not
evaluated by the notebook. Once it passed through nbconvert, the raw cells remain unchanged
when it reached the target format. For instance, you can type the full LaTeX in the raw cell,
and it can only render by LaTeX only after the conversion of nbconvert.
Basic Workflow
The normal workflow in a notebook is quite like a standard IPython session. The difference is that
you can edit cells in-place multiple times until you obtain your desired result, instead of rerunning
scripts.
Computational problems are usually done in pieces, organizing related ideas into cells and moving
forward once the part is done correctly. It is easier for interactive exploration than breaking up
computation into scripts that needs to be executed at the same time as previously necessary, especially
on parts that took some time to run. To interrupt calculation that take some time to run, you can use
the Kernel button on the Menu Bar and click on the Interrupt command or press I, I on your
keyboard. Choose Restart command to restart the entire computational process or press 0, 0 on your
keyboard.
Keyboard Shortcuts
All notebook actions can be done with the help of your mouse, but there are available keyboard
shortcuts for you. The following are the some of the essential shortcuts for you to remember:
Shift+Enter (Run Cell) – it helps you to execute the current cell, show any output, and jump
on the next cell below. This is same with clicking the Cell, then Run on the menu bar, or the
play button on the tool bar
Esc (Command Mode) – in this mode, you can navigate the notebook using keyboard
shortcuts
Enter (Edit Mode) – in this mode, you can edit text in cells
G11: 46
MODULE 4:
Plotting
The ability to display plots that are outputs of the executed code cells is a key feature of the Jupyter
Notebook. To provide functionality, IPython kernel is designed to work effortlessly with mathplotlib
plotting library.
Trusting Notebooks
You can store signature on each trusted notebook to prevent untrusted code from executing on user’s
behalf when this notebook is open. When the notebook is opened, the notebook server verifies this
signature. If there is no signature that matches found, Javascript and HTML output will not be
displayed not until they are regenerated by re-executed cells.
Browsing Compatibility
The Jupyter Notebook objective is to support latest version of the following browsers: Chrome, Safari
and Firefox. Updated versions of Opera and Edge may also work, but in case they didn’t work, you
can use one of the supported browsers.
JupyterLab
JupyterLab is said to be the next generation web-based user interface of Project Jupyter. It is a web-
based interactive development for Jupyter Notebooks, code, and data. JupyterLab is flexible. It
configures and arranges the user interface to support a wide range of workflows in data science,
scientific computing, and machine learning. JupyterLab is extensible and modular. It writes plug-ins
that add new components and integrate with existing ones.
JupyterLab enables the user to work with documents and activities like the Jupyter Notebooks, text
editors, terminals, and custom components in a flexible, integrated, and extensible manner. Using tabs
and splitter, you can arrange set of multiple documents and activities side by side. Integrating these
two allows new workflows for interactive computing. For example:
Code Consoles provide transient scratchpads for running code interactively, with full support
for rich output. It can be linked to a notebook kernel as a computation log from the notebook.
Kernel-backed documents enable code in any text file to be run interactively in any Jupyter
kernel.
Notebook cell outputs can be mirrored into their own tab, side by side with the notebook,
enabling simple dashboards with interactive controls backed by a kernel.
Multiple views of documents with different editors or viewers enable live editing documents
reflected in other viewers.
JupyterLab offers integrated model for viewing and handling data formats. It reads file formats such
as images, CSV, JSON, Markdown, Vega, Vega-lite, etc. It also displays rich kernel outputs in these
formats.
In exploring the user interface of JupyterLab, it also offers keyboard shortcuts that you can customize.
Its extensions can customize or enhance any part of the JupyterLab including new themes, file editors,
and custom components. It is served from the same server and uses the same notebook document
format as the classic Jupyter Notebook.
G11: 47
MODULE 4:
JupyterHub
JupyterHub is a multi-version of notebooks that is designed for companies, classrooms, and research
laboratories. JupyterHub brings great power of notebooks to a large group of users. It allows them to
have access in computational environments and resources easily without the burden of installation and
maintenance tasks. The users – including students, researchers, and data scientists – can get their work
done on their own phase through shared resources which can be managed efficiently by system
administrators.
JupyterHub runs in the cloud or in your own hardware and makes it possible to serve a pre-configured
data science environment to any user wherever they are. It is customizable and scalable and is suitable
either for a small or large team, academic courses, and large-scale infrastructure. Here are the key
features of the the JupyterHub:
Jupyter Kernels
Kernels are programming language specific processes that can run independently and interact with
Jupyter Applications and their user interface. Providing a powerful environment for interactive
computing in Python, IPython is its reference Jupyter kernel. And since the Jupyter notebook server
depends on IPython kernel fuctionality, the team maintains this kernel. In addition to this, there are
still many other languages that can be used in this notebook.
G11: 48
MODULE 4:
IPython provides a rich architecture for interactive computing with a powerful interactive shell. It
supports interactive data visualization and use of GUI toolkits. This kernel has a flexible, embeddable
interpreters to load in projects you are working. It is easy to use with high performance tools design
for parallel computing.
Aside from IPython, another kernel that runs in Jupyter is IRkernel. To run Jupyter with R kernel, you
need atleast to have the following: Jupyter and a current R installation. And if you have installed
Jupyter, you can create a notebook and switch to IR kernel from the dropdown menu.
In addition to this kernel is IJulia. It is a Julia-language backend combined with Jupyter interactive
environment which is also used by Python. This combination allows the user to interact with Julia
language using Jupyter/Ipython’s powerful graphical notebook that combines code, formatted text,
math, and multimedia on a single document. IJulia is Jupyter language kernel and it works with a
variety of notebook user interfaces. In addition to the classic Jupyter Notebook, it also works on
JupyterLab. The ntereact notebook desktop supports IJulia with detailed instructions for its
installation with nteract.
Jupyter Architecture
The Jupyter ecosystem consist of different architectural designs of core pieces. Some of these are
individual projects, and others show relationships between projects.
IPython Kernel
When we talk about IPython, it has two fundamental roles: Terminal IPyton as the familiar REPL
(read-eval-print loop) and the IPyton kernel that provides computation and communication with the
frontend interfaces like the notebook.
Terminal IPython – when ipython is type-in, the original Python interface runs in the terminal
that looks something like this:
It is more complex to deal because of multi-line code, tab completion using readline,
G11: 49
MODULE 4:
magic commands, and so on. But this model is like code example: it prompts the user for
some code, and when entered, it executes the same process. This model is often called REPL.
IPython Kernel – all other interfaces – Notebook, Qt console, ipython console in the terminal,
and third-party interfaces uses IPython Kernel. It is a separate process that is responsible for
running user code, and things like computing possible completions. Notebook or the Qt
console, communicate the IPython Kernel using JSON messages sent over ZeroMQ sockets.
The protocol used between the Notebook or the Qt console and IPython Kernel is described in
Messaging in Jupyter.
The core execution machinery for the kernel is shared with terminal IPython:
Terminal IPython
IPython Kernel
Messages
Simultaneously, a kernel process can be connected to more than one frontend. Here, same variables
will be available on different frontends. This design aims to allow easier development of different
interfaces based on the same kernel. It also made possible to support new languages in same frontends
by developing kernels in those languages.
Presently, there are two ways to develop a kernel for other language.
Wrapper Kernel – reuse the communications machinery from IPython, and implement only
the core execution part; and
Native Kernel – implement execution and communications in the target language.
$ Language
Execution $ Language
Execution &
IPython JSON, 0MQ
JSON, 0MQ Machinery
Machinery
Native
Wrapper
Kernel
Kernel
G11: 50
MODULE 4:
Since the notebook server – and not the kernel is responsible for saving and loading notebooks, you
can still edit notebook even if you don’t have the kernel for that language, but you won’t be able to
run the code. The kernel does not know anything about the notebook document, it only receives cells
of code to execute when the user runs them.
Exported
Postprocessors
File
G11: 51
MODULE 4:
4. The following statements are TRUE about Jupyter Notebook Application except:
A. Jupyter Notebook can be installed and viewed only on with internet connection.
B. Notebook documents produced by the app contains computer codes like python and
rich text elements.
C. It is server client application that enables the user to edit and execute notebook
documents.
D. It is grown platform used in classrooms to develop teaching materials, share lessons,
and tutorials and create computational stories.
G11: 52
MODULE 4:
7. It serves as the control panel of the Jupyter Notebook which shows the local files and allows
opening notebook document and shutting down their kernels.
A. Settings
B. Files
C. Dashboard
D. Menu
8. These are programming languages specific processes that runs independently and interact
with Jupyter applications and their user interface.
A. Algorithm
B. Kernels
C. Architecture
D. Framework
9. It is one of the features of JupyterHub that allows it to run on a variety of infrastructure which
includes commercial cloud providers, virtual machines, or even your own laptop hardware.
A. Customizable
B. Flexible
C. Scalable
D. Portable
10. This kernel has flexible, embeddable interpreters to load in projects and easy to use with high
performance tools design for parallel computing.
A. IPython
B. IR Kernel
C. IJulia
D. Panda
G11: 53
MODULE 4:
16. This prompt tells you that you can select multiple items and then perform same action for all
of them.
17. It pulls down a menu which presents list of choices on what kind of Jupyter project you can
create base on available kernels.
18. It presents different options you can use to manipulate some functions in the notebook.
19. Keyboard shortcut to restart an entire computational process.
20. Keyboard shortcut to interrupt calculations that take some time.
Direction: For your Virtual Expo Entry #4, create a Mad Lib program that will prompt the user to
enter words (noun, verb, adjective, place, etc.) and then use these words to construct a story.
Mad Libs
Example:
Enter a Marvel hero: Captain America
Enter a female celebrity who played any Marvel hero: Elizabeth Olsen
Marvel heroes are all on vacation except for Captain America and Antman
when the Trolls attack. The unlikely friend must band together to blow up
the Troll's ship by hitting a button that only Antman can hit.
But oh no! He shrinks
Because after saving the world the last time, He can't eat and been
drinking water all day
We think Black Widow is in it but turns out the big cameo is Elizabeth
Olsen and saves the day.
In the mid-credits, they got a message from Ironman delivered to them by a
pizza delivery boy who is actually played by Stan Lee.
G11: 54
MODULE 4:
PROGRAMMING RUBRICS
Shows imagination Story is original and Story has some Story is simple and
and thought. shows original originality. shows little creative
thought. effort.
Asay, M., 2021. Jupyter has revolutionized data science, and it started with a chance meeting
between two students. [online] TechRepublic. Available at:
<https://www.techrepublic.com/article/jupyter-has-revolutionized-data-science-and-it-started-
with-a-chance-meeting-between-two-students/> [Accessed 24 August 2021].
Barba, et. Al., (2019). Teaching and Learning with Jupyter. https://jupyter4edu.github.io/jupyter-edu-
book/index.html
Fernando Pérez, Brian E. Granger, IPython: A System for Interactive Scientific Computing,
Computing in Science and Engineering, vol. 9, no. 3, pp. 21-29, May/June 2007,
doi:10.1109/MCSE.2007.53. URL: https://ipython.org
Glossary - Workana | El Glosario Workana explica terminología del mundo freelance, conceptos
fundamentales del marketing y los negocios. n.d. What Is Open Source?. [online] Available at:
<https://i.workana.com/glossary/what-is-open-source/> [Accessed 24 August 2021].
G11: 55
MODULE 4:
Perkel, J., 2018. Why Jupyter is data scientists’ computational notebook of choice. [online]
Nature.com. Available at: <https://doi.org/10.1038/d41586-018-07196-1> [Accessed 23
August 2021].
Techopedia.com. 2017. What are Open-Source Tools? - Definition from Techopedia. [online]
Available at: <https://www.techopedia.com/definition/3295/open-source-tools#what-does-
open-source-tools-mean> [Accessed 24 August 2021].
Toomy, D., 2017. Jupyter for Data Science | Packt. [online] Packt. Available at:
<https://www.packtpub.com/product/jupyter-for-data-science/9781785880070> [Accessed 1
September 2021].
G11: 56