0% found this document useful (0 votes)
229 views21 pages

Data Science

Skills Network Labs, previously known as Data Scientist Workbench or BDU Labs, is a free virtual lab environment that provides tools like Jupyter Notebook, RStudio, and Zeppelin Notebook. These tools allow users to perform data analysis, visualization, machine learning, and image recognition in an interactive browser-based environment without installing software. Skills Network Labs contains popular data science tools and is also the virtual lab for cognitiveclass.ai, a website with free online data science and AI courses.

Uploaded by

Daniel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
229 views21 pages

Data Science

Skills Network Labs, previously known as Data Scientist Workbench or BDU Labs, is a free virtual lab environment that provides tools like Jupyter Notebook, RStudio, and Zeppelin Notebook. These tools allow users to perform data analysis, visualization, machine learning, and image recognition in an interactive browser-based environment without installing software. Skills Network Labs contains popular data science tools and is also the virtual lab for cognitiveclass.ai, a website with free online data science and AI courses.

Uploaded by

Daniel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Welcome to Skills Network Labs, previously known as Data Scientist Workbench or BDU Labs.

 It's a
free virtual lab environment that lets you practice your skills in Python, R, and more, while you learn
data science.

Skills Network Labs is an environment that contains tools such as RStudio, Jupyter Notebook, and
Zeppelin Notebook. These tools provides an interactive environment for you to perform data
analysis, data visualization, machine learning and image recognition. For example, on Jupyter
Notebook you can train a model that can recognize the denomination of an Euro banknote by using
just the image of the banknote.
It's ready to use in your web browser. So without having to install anything on your computer, you
can immediately start using popular data science tools like RStudio IDE, Jupyter Notebooks, Apache
Zeppelin and more. In these videos, you'll be learning about these popular tools and how to get
familiar working in the environments. Skills Network Labs is also the virtual lab environment
for cognitiveclass.ai, a popular website with lots of free online courses in data science, artificial
intelligence, and data engineering. But throughout these videos, you'll also hear us referring to Data
Scientist Workbench, which is simply the former name of Skills Network Labs. So in actuality, Skills
Network Labs and Data Scientist Workbench are one and the same. Let's first go through some of
the user interface on Skills Network Labs shown in the image below to get you familiarized with the
environment before jumping into the data science tools.

Thank you for reading this article.

Welcome to Skills Network Labs which you can access with labs.cognitiveclass.ai. On the Skills
Network Labs homepage, you have all the data science tools at your fingertips. Once you're on the
main page of Skills Network Labs, you'll find several buttons. Let's take a closer look at each of
these building analytics tools.
JupyterLab button takes you to Jupyter Labs which is an interactive environment that allows to run or
create notebooks that run codes in Python with Jupyter Notebooks, Scala on Apache Toree, and R.
Let's take a look at a Python 3 notebook, notebook is an interactive document that allows you
to execute code in smaller chunks called cells. When you execute a cell, the notebook prints any
output immediately into the output cell. Doing so, allows you to do a number of things. For instance,
you can write your code to import data, print the data, clean the data, print the cleaned data, create a
model and print the model output and so on. And you can change the code in an input cell and rerun
the cell as often as you'd like, but that's not all, the notebook also supports rendering markup cells in
line, so that you can embed text, markdown HTML images, videos and even interactive widgets, all
within a notebook.
Let's look at another building analytics tool called Zeppelin Notebooks, Zeppelin notebooks allow
Interactive Data Analytics. Like Jupyter, you use notebooks to ingest, discover analyze, visualize,
and collaborate with your data.

Currently, Apache's Zeppelin supports many interpreters such as Apache Spark, Python, JDBC,
Markdown and Shell. 
Let's look at Tutorial for Scala. Apache Zeppelin provides built-in Apache Spark integration so you
don't need to build a separate module plugin or library for it. For data visualization, some basic
charts are already included in Zeppelin allowing you to convert from data tables directly into
visualizations without any code. Zeppelin also aggregates values and displays them in pivot
charts, with simple drag and drop. You can easily create charts with multiple aggregated values
including sum, count, average, minimum and maximum.
Finally let's look at RStudio in Building Analytics tools, RStudio IDE allows you to analyze data, take
advantage of many statistical packages, create beautiful visualizations and Web applications. Like
other IDEs, RStudio allows you to code in a console or a script editor as well as keep track of your
variables and history. You can display your plots, manage your packages, and see help
documentation for R. Taking it a step further, with R Shiny library, you can make your visualizations
interactive. Using Shiny, you can create all sorts of Web based interactive apps just using R code.
Thank you for reading this article.

Lab - Getting Started with Skills Network Labs


This course uses Skills Network Labs, an online virtual lab environment to help you get hands-on
experience with various data science tools without the hassle of installing and configuring the tools.
You will get access to popular open-source data science tools right inside your browser, like Jupyter
Notebooks and RStudio IDE.

Exercise 1 - Launching Skills Network Labs

1. To launch Skills Network Labs, click on the "Open Tool" button below.
2. Once Skills Network Labs loads, you should see the following screen:
Fantastic! Now you're on your way to learning more about popular open data science tools.

Please note that practice exercises are not graded, and you do not need submit anything.

This course uses a third-party tool, Lab - Getting Started with Skills Network Labs, to
enhance your learning experience. The tool will reference basic information like your
name, email, and Coursera ID.

Lab - Exploring My Data on Skills Network


Labs
Exercise 1 - Using "My Data" on Skills Network Labs

1. If you already have Skills Network Labs open in a tab, you can skip to the next step.
Otherwise, click on the "Open Tool" button below to launch Skills Network Labs. (Note: you
may see errors if you have multiple tabs with Skills Network Labs open.)
2. Take a look at the My Data page. This is where you can upload and store your data for your
account. It may take a few minutes to load.

2. Try uploading a file. If you don't know what to upload, you can use the following sample CSV file
(student_grades.csv):

Sample CSV file:

student_grades.csv

3. Great! Now that you've uploaded the file, notice that you have a filepath at the top of the screen.
You can use filepaths for any of your uploaded files to use with Jupyter Notebooks, RStudio IDE,
and the other tools available on Skills Network Labs.

This course uses a third-party tool, Lab - Exploring My Data on Skills Network Labs, to
enhance your learning experience. The tool will reference basic information like your
name, email, and Coursera ID.

Lab - Jupyter Notebooks - The Basics


Special note about the lab environment:
These Jupyter notebooks labs now use the latest environment: JupyterLab, which uses
the same .ipynb Jupyter notebook files.

Exercise 1 - Create a new notebook in Python

1. If you already have Skills Network Labs open in a tab, you can click on JupyterLab on the main
page.

2. To create a new notebook, click on any of the languages under "Notebook".

Exercise 2 - Write and execute code

1. In your new empty notebook (from Exercise 1), click within the gray code cell and write
some code, like "1 + 1" (without quotation marks).
2. Execute the code, by either clicking the Play button in the menu above the notebook, or by
pressing Shift+Enter on your notebook.
3. You should see in the output, "2".
4. Try executing other code (try simple math operations).
5. Great! Now you know how to write and run code in Jupyter Notebooks

Exercise 3 - Create new cells

1. In your Jupyter notebook, first click on any of the existing cells to select the cell.
2. From the menu, click on "Insert", then "Insert Cell Above" or "Insert Cell Below".
3. Great! Now you know how to insert new cells in Jupyter Notebooks. Note you can use the
keyboard shortcuts: [a] - Insert a Cell Above; [b] - Insert a Cell Below.

Exercise 4 - Create Markdown cells and add text

1. In your notebook, click on any code cell, and in the drop-down menu in the menu above,
change the cell type from "Code" to "Markdown". As you'll notice, you cannot create Markdown
cells without first creating cells and converting them from "Code" to "Markdown".
2. In the Markdown cell, write some text like "My Title".
3. To render the Markdown text, make sure the cell is selected (by clicking within it), and press
Play in the menu, or Shift+Enter.
4. Your Markdown cell should now be rendered!
5. To edit your Markdown cell, double-click anywhere within the cell. Note you can use the
keyboard shortcut: [m] - Convert Cell to Markdown
Please note that practice exercises are not graded, and you do not need submit anything.

This course uses a third-party tool, Lab - Jupyter Notebooks - The Basics, to enhance
your learning experience. The tool will reference basic information like your name,
email, and Coursera ID.

Lab - Jupyter Notebooks - More Features


This lab is a continuation of the previous lab.

Note: If you have JupyterLab on Skills Network Labs open in a tab already, continue using that tab.
Otherwise, click on "Open Tool" button below to launch Jupyter Notebooks.

Exercise 5 - Rename your Notebook

1. In the list of notebooks in the right-hand panel, click on the arrow (>) to the left of the
notebook name to expand the list of options.
2. Click on "Rename" to rename your notebook to something like "My_Notebook.ipynb".

Exercise 6 - Save and Download your Jupyter Notebook from Skills Network Labs
to your computer

Although your notebooks and data is preserved in your account after signing out, sometimes you
may want to download your notebook to your computer.

1. To save, click on the Save button in the menu, or go to "Save Notebook".


2. To download the notebook, click on the Files pane to open the list of files. Navigate to your
notebook, and right-click on the file to download the notebook.

Exercise 7 - Upload a Jupyter Notebook to Skills Network Labs

1. To upload a Jupyter Notebook from your computer to Skills Network Labs, drag and drop a
.ipynb file from your computer directly into the browser. You can use the notebook you
downloaded from Exercise 6.
2. Your notebook should open up automatically once it has finished uploading.

Exercise 8 - Change your kernel (to Python 3, R) in JupyterLab


1. With a notebook open, click on the kernel name (e.g., Python 3) in the top-righthand corner
of the notebook to open up a pop-up window to change it to a different language.

This course uses a third-party tool, Lab - Jupyter Notebooks - More Features, to
enhance your learning experience. The tool will reference basic information like your
name, email, and Coursera ID.

Lab - Jupyter Notebooks - Advanced Features


This lab is a continuation of the previous lab.

Note: If you have JupyterLab on Skills Network Labs open in a tab already, continue using that tab.
Otherwise, click on "Open Tool" button below to launch JupyterLab.

Exercise 9 - Advanced Markdown styling

1. You can write HTML code in your Markdown cells. Try executing the following HTML code in
a Markdown cell: <a href=https://www.cognitiveclass.ai>Cognitive Class</a>. You should now
see a hyperlink to https://www.cognitiveclass.ai that appears as "Cognitive Class".
2. You can also use Markdown formatting. For example, to use an H1 Header, try running the
following in a Markdown cell: # Header 1. You can find more rules for Markdown formatting in
this Markdown Cheatsheet: https://github.com/adam-p/markdown-here/wiki/Markdown-
Cheatsheet
3. Using only Markdown, try creating a table, a list of shopping items, and try embedding
an image.

This course uses a third-party tool, Lab - Jupyter Notebooks - Advanced Features, to
enhance your learning experience. The tool will reference basic information like your
name, email, and Coursera ID.

Lab - Zeppelin Notebooks - The Basics


Exercise 1 - Create a new Zeppelin notebook

1. Launch Zeppelin Notebooks on Skills Network Labs by clicking on the "Open Tool" button
below.
2. Click on "Create new note" to create a new Zeppelin notebook. You can give it a name like
"My_Zeppelin_Notebook". You can keep the default interpreter as "Spark".

Exercise 2 - Running Scala code in Zeppelin


1. By default, you will be running Scala code. To write your first Scala code, simply click in the
empty cell and type in "1 + 1" without the quotation marks, and press the Play button in the
upper right-hand corner of the cell. Alternatively, you can run the cell by
pressing Shift+Enter on your keyboard.
2. It may take a moment to load, but when it is finished executing the code, you should see
"res0: Int = 2"

Exercise 3 - Running Python code in Zeppelin

1. In the same notebook as Exercise 2, in an empty cell, type in: "%pyspark" in the first line,
followed by "1 + 1" in the second line of the same code cell. Make sure not to include the
quotation marks. Then press Play or Shift+Enter to run the cell.
2. The %pyspark command indicates that you want to run the cell with Python. The resulting
output of the cell should now say "2".

Exercise 4 - Create a new cell in Zeppelin

1. To create a new cell, move your cursor just slightly below one of the existing cells in your
notebook. You should see a bar that shows a "+" sign. Click and a new cell will be created.

Exercise 5 - Give a title to a cell

1. To add a title to your code cell, in the upper-right hand corner of your code cell, click on
the gear for "Settings", then click on "Show title". Now you should be able to see the title of
the code cell ("Untitled" by default).

Exercise 6 - Create a markdown cell

1. Like with Python, for Markdown cells you must first specify that you want to use markdown in
your code cell, by typing in %md in the first line of code. Below the %md, you can now write
markdown code and run the cell with Shift+Enter to render the Markdown cell.

This course uses a third-party tool, Lab - Zeppelin Notebooks - The Basics, to enhance
your learning experience. The tool will reference basic information like your name,
email, and Coursera ID.

Lab - Zeppelin Notebooks - Tutorial


Note: If you have Skills Network Labs open in a tab already, continue using that tab. Otherwise, click
on "Open Tool" button below to launch Zeppelin Notebooks.
Exercise 7 - Explore a Zeppelin tutorial on the Welcome page

1. On the main Zeppelin page, click on one of the available tutorials. You can open either the
"Tutorial for Scala" or "Tutorial for Python". If you don't see the main Zeppelin page and you
have Zeppelin open already on Skills Network Labs, click on the Zeppelin logo in the top-left
corner to go to the main Zeppelin page.

This course uses a third-party tool, Lab - Zeppelin Notebooks - Tutorial, to enhance
your learning experience. The tool will reference basic information like your name,
email, and Coursera ID.

Lab - RStudio - The Basics


Exercise 1 - Running your first R code in RStudio IDE

1. To launch RStudio IDE on Skills Network Labs, click on the "Open Tool" button below.


2. With RStudio IDE open in your browser, click within the Console window in the lower
lefthand corner.
3. Type in "1 + 1", without quotation marks and press Enter to run the R code.
4. You should see the result: "[1] 2". Note that the "[1]" simply refers to the [n]th item returned
in the output.
5. Now try running "1:1000" in your console, which should display the first 1000 integers from 1
to 1000.

Exercise 2 - Create a new R script

1. Go to "File" then "New File" then "R Script".


2. In your R script window, type "x <- 1" or "x = 1". Both do the same thing, but in R, variables
are traditionally assigned using the "<-" syntax, which resembles a leftward-pointing arrow.
3. Execute the code from the script window, either by pressing the "Run" button or by
pressing "Ctrl+Enter (Windows)" or "Cmd+Enter (Mac)" on your keyboard. You should see
that the code was properly run in the console.
4. Notice that in the upper-righthand window under "Environment", you should now see a new
variable for "x", with a value of "1".
5. Try adding 100 to x from the script window and confirm the result in the Console.

Exercise 3 - Save your new R script

1. Go to "File", then "Save" to save your R script. Alternatively, you can go to "File" then "Save
As..." to choose a filename and where you want to save your file.
2. In the bottom right-hand window for "Files", you should now be able to navigate
the folders and find your current R Script.
3. If you wish, you can download the R script to your computer. To do so, check the
checkbox to the left of the filename, and click on "Export" to download your file.

This course uses a third-party tool, Lab - RStudio - The Basics, to enhance your
learning experience. The tool will reference basic information like your name, email, and
Coursera ID.

Lab - RStudio - Creating an interactive map in


R
Note: If you have Skills Network Labs open in a tab already, continue using that tab. You can
navigate to RStudio IDE by click on "RStudio IDE" in the menu (click on the three lines in the top-left
corner). Otherwise, click on "Open Tool" button below to launch RStudio IDE. If you have two tabs
or windows open with RStudio IDE, only the most recent tab/window will be active.

Exercise 4 - Upload an R script, install an R library, and plot an interactive map in


R of New York City

1. Download the following R script:

map_new_york_city.R

2. Upload the R script to RStudio IDE by navigating to the "Upload" button in the lower-righthand
window under the "Files" tab.

3. Once upload is complete, click on the map_new_york_city.R script in RStudio to open it in the


script editor window.

4. Try running each line of code, one by one from the top, using "Run" or by pressing "Ctrl+Enter
(Windows)" or "Cmd+Enter (Mac)" on your keyboard. Don't worry too much about what the code
means or how to write similar code.

5. Once the map is created, you should see it in the lower right-hand corner. You can try to zoom in
and out of the map view.

6. (Optional) Try changing the co-ordinates of the map (its latitude and longitude values) in the script
window, then re-run the code again to re-create the map. For example, you can do a Google search
for a popular landmark in the world to find its latitude and longitude. You may need to figure out
which lines of code you need to change, which lines of code you need to re-run, and which lines of
code you do not need to re-run.

This course uses a third-party tool, Lab - RStudio - Creating an interactive map in R, to
enhance your learning experience. The tool will reference basic information like your
name, email, and Coursera ID.

IBM Watson Studio, formerly known as Data Science Experience or DSX, in an enterprise-ready
environment for data scientists and developers, and includes some of the tools as you have learned
so far on Cognitive Class Labs. You may find that many of the features are similar as what you have
seen on Cognitive Class Labs. However, because IBM Watson Studio was designed for scalability
and enterprise usage, you will find some extra features that include (1) collaboration with team
members, (2) scalability with Spark clusters to analyze big data, and (3) connections to various data
sources.

IBM Watson Studio also comes with a free trial, which includes Jupyter Notebooks, RStudio, space
for object storage, 2 Spark executors, and a community that includes notebooks and tutorials that
you can use.

Note that the videos in this course may still include references to "Data Science Experience" or
"DSX", but in your mind, simply replace those terms with "Watson Studio". Thank you for your
patience with us as we update the videos!

You can register for IBM Watson Studio here: https://cocl.us/Watson_Studio_Coursera_DS0105

OPTIONAL: IBM Cloud Promo Code


Click on the Open Tool button below to obtain a unique Promo code to upgrade your IBM Cloud Lite
account.

Module 1: From Problem to Approach and from Requirements


to Collection
 Business Understanding
 Analytic Approach
 Data Requirements
 Data Collection
 Lab: From Problem to Approach
 Lab: From Requirement to Collection
 Quiz: From Problem to Approach
 Quiz: From Requirement to Collection

Module 2: From Understanding to Preparation and from


Modeling to Evaluation
 Data Understanding
 Data Preparation
 Modeling
 Evaluation
 Lab: From Understanding to Preparation
 Lab: From Modeling to Evaluation
 Quiz: From Understanding to Preparation
 Quiz: From Modeling to Evaluation

Module 3: From Deployment to Feedback


 Deployment
 Feedback
 Quiz: From Deployment to Feedback
 Peer-review Assignment

Data Science Methodologies


This course focuses on the Foundational Methodology for Data Science by John Rollins, which was
introduced in the previous video. However, it is not the only methodology that you will encounter in
data science. For example, in data mining, the CRoss InduStry Process for Data Mining (CRISP-
DM) methodology is widely used.

What is CRISP-DM? 
The CRISP-DM methodology is a process aimed at increasing the use of data mining over a wide
variety of business applications and industries. The intent is to take case specific scenarios and
general behaviors to make them domain neutral.  CRISP-DM is comprised of six steps an entity has
to implement in order to have a reasonable chance of success. The six steps are shown in the
following diagram:

Fig.1 CRISP-DM model, IBM Knowledge Center, CRISP-DM Help Overview

1. Business Understanding This stage is the most important because this is where the
intention of the project is outlined. Foundational Methodology and CRISP-DM are aligned here.
It requires communication and clarity. The difficulty here is that stakeholders have different
objectives, biases and modalities of relating information. They don’t all see the same things or
in the same manner. Without clear, concise and complete perspective of what the project goals
are resources will be needlessly expended. 
2. Data Understanding Data understanding relies on business understanding. Data is
collected at this stage of the process. The understanding of what the business wants and needs
will determine what data is collected, from what sources and by what methods. CRISP-DM
combines the stages of Data Requirements, Data Collection and Data Understanding from the
Foundational Methodology outline. 
3. Data Preparation Once the data has been collected, it must be transformed into a useable
subset unless it is determined that more data is needed. Once a dataset is chosen it must then
be checked for questionable, missing or ambiguous cases. Data Preparation is common to
CRISP-DM and Foundational Methodology. 
4. Modeling Once prepared for use the data must be expressed through whatever appropriate
models give meaningful insights and hopefully new knowledge. This is the purpose of data
mining; to create knowledge information that has meaning and utility. The use of models reveals
patterns and structures within the data that provide insight into the features of interest. Models
are selected on a portion of the data and adjustments are made if necessary. Model selection is
an art and science. Both Foundational Methodology and CRISP-DM require for the subsequent
stage. 
5. Evaluation The selected model must be tested. This is usually done by having a preselected
test set to run the trained model on. This will allow you to see the effectiveness of the model on
a set it sees as new. Results from this are used to determine efficacy of the model and
foreshadows its role in the next and final stage. 
6. Deployment In the deployment step, the model is used on new data outside of the scope of
the dataset and by new stakeholders. The new interactions at this phase might reveal the new
variables and needs for the dataset and model. These new challenges could initiate revision of
either business needs and actions, or the model and data, or both.

CRISP-DM is a highly flexible and cyclical model. Flexibility is required at each step along with
communication to keep the project on track. At any of the six stages it may be necessary to revisit an
earlier stage and make changes. The key point of this process is that it’s cyclical and therefore even
at the finish you are having another business understanding encounter to discuss the viability after
deployment. The journey continues. 

It is useful for users outside North America who may have encountered problems provisioning
services in IBM Cloud.

This promo code gives you 6 months of trial with enhanced access to some of the "Platform-as-a-
Service (PaaS)" services (e.g. Watson, Db2) on IBM Cloud. (It does not provide access to
Infrastructure/IaaS services).

Once you have obtained the promo code by clicking below, follow these instructions to apply the
promo code to your account.

This course uses a third-party tool, OPTIONAL: IBM Cloud Promo Code, to enhance your
learning experience. The tool will reference basic information like your name, email, and
Coursera ID.

You might also like