Data Science
Data Science
It's a
free virtual lab environment that lets you practice your skills in Python, R, and more, while you learn
data science.
Skills Network Labs is an environment that contains tools such as RStudio, Jupyter Notebook, and
Zeppelin Notebook. These tools provides an interactive environment for you to perform data
analysis, data visualization, machine learning and image recognition. For example, on Jupyter
Notebook you can train a model that can recognize the denomination of an Euro banknote by using
just the image of the banknote.
It's ready to use in your web browser. So without having to install anything on your computer, you
can immediately start using popular data science tools like RStudio IDE, Jupyter Notebooks, Apache
Zeppelin and more. In these videos, you'll be learning about these popular tools and how to get
familiar working in the environments. Skills Network Labs is also the virtual lab environment
for cognitiveclass.ai, a popular website with lots of free online courses in data science, artificial
intelligence, and data engineering. But throughout these videos, you'll also hear us referring to Data
Scientist Workbench, which is simply the former name of Skills Network Labs. So in actuality, Skills
Network Labs and Data Scientist Workbench are one and the same. Let's first go through some of
the user interface on Skills Network Labs shown in the image below to get you familiarized with the
environment before jumping into the data science tools.
Welcome to Skills Network Labs which you can access with labs.cognitiveclass.ai. On the Skills
Network Labs homepage, you have all the data science tools at your fingertips. Once you're on the
main page of Skills Network Labs, you'll find several buttons. Let's take a closer look at each of
these building analytics tools.
JupyterLab button takes you to Jupyter Labs which is an interactive environment that allows to run or
create notebooks that run codes in Python with Jupyter Notebooks, Scala on Apache Toree, and R.
Let's take a look at a Python 3 notebook, notebook is an interactive document that allows you
to execute code in smaller chunks called cells. When you execute a cell, the notebook prints any
output immediately into the output cell. Doing so, allows you to do a number of things. For instance,
you can write your code to import data, print the data, clean the data, print the cleaned data, create a
model and print the model output and so on. And you can change the code in an input cell and rerun
the cell as often as you'd like, but that's not all, the notebook also supports rendering markup cells in
line, so that you can embed text, markdown HTML images, videos and even interactive widgets, all
within a notebook.
Let's look at another building analytics tool called Zeppelin Notebooks, Zeppelin notebooks allow
Interactive Data Analytics. Like Jupyter, you use notebooks to ingest, discover analyze, visualize,
and collaborate with your data.
Currently, Apache's Zeppelin supports many interpreters such as Apache Spark, Python, JDBC,
Markdown and Shell.
Let's look at Tutorial for Scala. Apache Zeppelin provides built-in Apache Spark integration so you
don't need to build a separate module plugin or library for it. For data visualization, some basic
charts are already included in Zeppelin allowing you to convert from data tables directly into
visualizations without any code. Zeppelin also aggregates values and displays them in pivot
charts, with simple drag and drop. You can easily create charts with multiple aggregated values
including sum, count, average, minimum and maximum.
Finally let's look at RStudio in Building Analytics tools, RStudio IDE allows you to analyze data, take
advantage of many statistical packages, create beautiful visualizations and Web applications. Like
other IDEs, RStudio allows you to code in a console or a script editor as well as keep track of your
variables and history. You can display your plots, manage your packages, and see help
documentation for R. Taking it a step further, with R Shiny library, you can make your visualizations
interactive. Using Shiny, you can create all sorts of Web based interactive apps just using R code.
Thank you for reading this article.
1. To launch Skills Network Labs, click on the "Open Tool" button below.
2. Once Skills Network Labs loads, you should see the following screen:
Fantastic! Now you're on your way to learning more about popular open data science tools.
Please note that practice exercises are not graded, and you do not need submit anything.
This course uses a third-party tool, Lab - Getting Started with Skills Network Labs, to
enhance your learning experience. The tool will reference basic information like your
name, email, and Coursera ID.
1. If you already have Skills Network Labs open in a tab, you can skip to the next step.
Otherwise, click on the "Open Tool" button below to launch Skills Network Labs. (Note: you
may see errors if you have multiple tabs with Skills Network Labs open.)
2. Take a look at the My Data page. This is where you can upload and store your data for your
account. It may take a few minutes to load.
2. Try uploading a file. If you don't know what to upload, you can use the following sample CSV file
(student_grades.csv):
student_grades.csv
3. Great! Now that you've uploaded the file, notice that you have a filepath at the top of the screen.
You can use filepaths for any of your uploaded files to use with Jupyter Notebooks, RStudio IDE,
and the other tools available on Skills Network Labs.
This course uses a third-party tool, Lab - Exploring My Data on Skills Network Labs, to
enhance your learning experience. The tool will reference basic information like your
name, email, and Coursera ID.
1. If you already have Skills Network Labs open in a tab, you can click on JupyterLab on the main
page.
1. In your new empty notebook (from Exercise 1), click within the gray code cell and write
some code, like "1 + 1" (without quotation marks).
2. Execute the code, by either clicking the Play button in the menu above the notebook, or by
pressing Shift+Enter on your notebook.
3. You should see in the output, "2".
4. Try executing other code (try simple math operations).
5. Great! Now you know how to write and run code in Jupyter Notebooks
1. In your Jupyter notebook, first click on any of the existing cells to select the cell.
2. From the menu, click on "Insert", then "Insert Cell Above" or "Insert Cell Below".
3. Great! Now you know how to insert new cells in Jupyter Notebooks. Note you can use the
keyboard shortcuts: [a] - Insert a Cell Above; [b] - Insert a Cell Below.
1. In your notebook, click on any code cell, and in the drop-down menu in the menu above,
change the cell type from "Code" to "Markdown". As you'll notice, you cannot create Markdown
cells without first creating cells and converting them from "Code" to "Markdown".
2. In the Markdown cell, write some text like "My Title".
3. To render the Markdown text, make sure the cell is selected (by clicking within it), and press
Play in the menu, or Shift+Enter.
4. Your Markdown cell should now be rendered!
5. To edit your Markdown cell, double-click anywhere within the cell. Note you can use the
keyboard shortcut: [m] - Convert Cell to Markdown
Please note that practice exercises are not graded, and you do not need submit anything.
This course uses a third-party tool, Lab - Jupyter Notebooks - The Basics, to enhance
your learning experience. The tool will reference basic information like your name,
email, and Coursera ID.
Note: If you have JupyterLab on Skills Network Labs open in a tab already, continue using that tab.
Otherwise, click on "Open Tool" button below to launch Jupyter Notebooks.
1. In the list of notebooks in the right-hand panel, click on the arrow (>) to the left of the
notebook name to expand the list of options.
2. Click on "Rename" to rename your notebook to something like "My_Notebook.ipynb".
Exercise 6 - Save and Download your Jupyter Notebook from Skills Network Labs
to your computer
Although your notebooks and data is preserved in your account after signing out, sometimes you
may want to download your notebook to your computer.
1. To upload a Jupyter Notebook from your computer to Skills Network Labs, drag and drop a
.ipynb file from your computer directly into the browser. You can use the notebook you
downloaded from Exercise 6.
2. Your notebook should open up automatically once it has finished uploading.
This course uses a third-party tool, Lab - Jupyter Notebooks - More Features, to
enhance your learning experience. The tool will reference basic information like your
name, email, and Coursera ID.
Note: If you have JupyterLab on Skills Network Labs open in a tab already, continue using that tab.
Otherwise, click on "Open Tool" button below to launch JupyterLab.
1. You can write HTML code in your Markdown cells. Try executing the following HTML code in
a Markdown cell: <a href=https://www.cognitiveclass.ai>Cognitive Class</a>. You should now
see a hyperlink to https://www.cognitiveclass.ai that appears as "Cognitive Class".
2. You can also use Markdown formatting. For example, to use an H1 Header, try running the
following in a Markdown cell: # Header 1. You can find more rules for Markdown formatting in
this Markdown Cheatsheet: https://github.com/adam-p/markdown-here/wiki/Markdown-
Cheatsheet
3. Using only Markdown, try creating a table, a list of shopping items, and try embedding
an image.
This course uses a third-party tool, Lab - Jupyter Notebooks - Advanced Features, to
enhance your learning experience. The tool will reference basic information like your
name, email, and Coursera ID.
1. Launch Zeppelin Notebooks on Skills Network Labs by clicking on the "Open Tool" button
below.
2. Click on "Create new note" to create a new Zeppelin notebook. You can give it a name like
"My_Zeppelin_Notebook". You can keep the default interpreter as "Spark".
1. In the same notebook as Exercise 2, in an empty cell, type in: "%pyspark" in the first line,
followed by "1 + 1" in the second line of the same code cell. Make sure not to include the
quotation marks. Then press Play or Shift+Enter to run the cell.
2. The %pyspark command indicates that you want to run the cell with Python. The resulting
output of the cell should now say "2".
1. To create a new cell, move your cursor just slightly below one of the existing cells in your
notebook. You should see a bar that shows a "+" sign. Click and a new cell will be created.
1. To add a title to your code cell, in the upper-right hand corner of your code cell, click on
the gear for "Settings", then click on "Show title". Now you should be able to see the title of
the code cell ("Untitled" by default).
1. Like with Python, for Markdown cells you must first specify that you want to use markdown in
your code cell, by typing in %md in the first line of code. Below the %md, you can now write
markdown code and run the cell with Shift+Enter to render the Markdown cell.
This course uses a third-party tool, Lab - Zeppelin Notebooks - The Basics, to enhance
your learning experience. The tool will reference basic information like your name,
email, and Coursera ID.
1. On the main Zeppelin page, click on one of the available tutorials. You can open either the
"Tutorial for Scala" or "Tutorial for Python". If you don't see the main Zeppelin page and you
have Zeppelin open already on Skills Network Labs, click on the Zeppelin logo in the top-left
corner to go to the main Zeppelin page.
This course uses a third-party tool, Lab - Zeppelin Notebooks - Tutorial, to enhance
your learning experience. The tool will reference basic information like your name,
email, and Coursera ID.
1. Go to "File", then "Save" to save your R script. Alternatively, you can go to "File" then "Save
As..." to choose a filename and where you want to save your file.
2. In the bottom right-hand window for "Files", you should now be able to navigate
the folders and find your current R Script.
3. If you wish, you can download the R script to your computer. To do so, check the
checkbox to the left of the filename, and click on "Export" to download your file.
This course uses a third-party tool, Lab - RStudio - The Basics, to enhance your
learning experience. The tool will reference basic information like your name, email, and
Coursera ID.
map_new_york_city.R
2. Upload the R script to RStudio IDE by navigating to the "Upload" button in the lower-righthand
window under the "Files" tab.
4. Try running each line of code, one by one from the top, using "Run" or by pressing "Ctrl+Enter
(Windows)" or "Cmd+Enter (Mac)" on your keyboard. Don't worry too much about what the code
means or how to write similar code.
5. Once the map is created, you should see it in the lower right-hand corner. You can try to zoom in
and out of the map view.
6. (Optional) Try changing the co-ordinates of the map (its latitude and longitude values) in the script
window, then re-run the code again to re-create the map. For example, you can do a Google search
for a popular landmark in the world to find its latitude and longitude. You may need to figure out
which lines of code you need to change, which lines of code you need to re-run, and which lines of
code you do not need to re-run.
This course uses a third-party tool, Lab - RStudio - Creating an interactive map in R, to
enhance your learning experience. The tool will reference basic information like your
name, email, and Coursera ID.
IBM Watson Studio, formerly known as Data Science Experience or DSX, in an enterprise-ready
environment for data scientists and developers, and includes some of the tools as you have learned
so far on Cognitive Class Labs. You may find that many of the features are similar as what you have
seen on Cognitive Class Labs. However, because IBM Watson Studio was designed for scalability
and enterprise usage, you will find some extra features that include (1) collaboration with team
members, (2) scalability with Spark clusters to analyze big data, and (3) connections to various data
sources.
IBM Watson Studio also comes with a free trial, which includes Jupyter Notebooks, RStudio, space
for object storage, 2 Spark executors, and a community that includes notebooks and tutorials that
you can use.
Note that the videos in this course may still include references to "Data Science Experience" or
"DSX", but in your mind, simply replace those terms with "Watson Studio". Thank you for your
patience with us as we update the videos!
What is CRISP-DM?
The CRISP-DM methodology is a process aimed at increasing the use of data mining over a wide
variety of business applications and industries. The intent is to take case specific scenarios and
general behaviors to make them domain neutral. CRISP-DM is comprised of six steps an entity has
to implement in order to have a reasonable chance of success. The six steps are shown in the
following diagram:
1. Business Understanding This stage is the most important because this is where the
intention of the project is outlined. Foundational Methodology and CRISP-DM are aligned here.
It requires communication and clarity. The difficulty here is that stakeholders have different
objectives, biases and modalities of relating information. They don’t all see the same things or
in the same manner. Without clear, concise and complete perspective of what the project goals
are resources will be needlessly expended.
2. Data Understanding Data understanding relies on business understanding. Data is
collected at this stage of the process. The understanding of what the business wants and needs
will determine what data is collected, from what sources and by what methods. CRISP-DM
combines the stages of Data Requirements, Data Collection and Data Understanding from the
Foundational Methodology outline.
3. Data Preparation Once the data has been collected, it must be transformed into a useable
subset unless it is determined that more data is needed. Once a dataset is chosen it must then
be checked for questionable, missing or ambiguous cases. Data Preparation is common to
CRISP-DM and Foundational Methodology.
4. Modeling Once prepared for use the data must be expressed through whatever appropriate
models give meaningful insights and hopefully new knowledge. This is the purpose of data
mining; to create knowledge information that has meaning and utility. The use of models reveals
patterns and structures within the data that provide insight into the features of interest. Models
are selected on a portion of the data and adjustments are made if necessary. Model selection is
an art and science. Both Foundational Methodology and CRISP-DM require for the subsequent
stage.
5. Evaluation The selected model must be tested. This is usually done by having a preselected
test set to run the trained model on. This will allow you to see the effectiveness of the model on
a set it sees as new. Results from this are used to determine efficacy of the model and
foreshadows its role in the next and final stage.
6. Deployment In the deployment step, the model is used on new data outside of the scope of
the dataset and by new stakeholders. The new interactions at this phase might reveal the new
variables and needs for the dataset and model. These new challenges could initiate revision of
either business needs and actions, or the model and data, or both.
CRISP-DM is a highly flexible and cyclical model. Flexibility is required at each step along with
communication to keep the project on track. At any of the six stages it may be necessary to revisit an
earlier stage and make changes. The key point of this process is that it’s cyclical and therefore even
at the finish you are having another business understanding encounter to discuss the viability after
deployment. The journey continues.
It is useful for users outside North America who may have encountered problems provisioning
services in IBM Cloud.
This promo code gives you 6 months of trial with enhanced access to some of the "Platform-as-a-
Service (PaaS)" services (e.g. Watson, Db2) on IBM Cloud. (It does not provide access to
Infrastructure/IaaS services).
Once you have obtained the promo code by clicking below, follow these instructions to apply the
promo code to your account.
This course uses a third-party tool, OPTIONAL: IBM Cloud Promo Code, to enhance your
learning experience. The tool will reference basic information like your name, email, and
Coursera ID.