0% found this document useful (0 votes)

32 views5 pages

Tools For Data Science

Uploaded by

CLAUDIA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views5 pages

Tools For Data Science

Uploaded by

CLAUDIA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

TOOLS FOR DATA SCIENCE

Module 1: Overview of Data Science Tools

In this module, you will learn about the different types and categories of tools that data scientists use
and popular examples of each. You will also become familiar with Open Source, Cloud-based, and
Commercial options for data science tools.

 Describe the components of a Data Scientist's toolkit and list various tool categories

 List examples of Open Source, Commercial, and Cloud-based tools in various categories

Module 2: Languages of Data Science

This module will bring awareness about the criteria determining which language you should learn. You
will learn the benefits of Python, R, SQL, and other common languages such as Java, Scala, C++,
JavaScript, and Julia. You will explore how you can use these languages in Data Science. You will also look
at some sites for more information about the languages.

Learning Objectives

 Identify the criteria and roles for determining the language to learn.

 Identify the users and benefits of Python.

 Identify the users and uses of the R language.

 Define SQL elements and list their benefits.

 Review languages such as Java, Scala, C++, JavaScript, and Julia.

 List the global communities for connecting with other users.

Module 3: Packages, APIs, Data Sets and Models

This module will give you in-depth knowledge of different libraries, APIs, dataset sources and models
used by data scientist.

Learning Objectives

 List examples of the various libraries: scientific, visualization, machine learning, and deep
learning.

 Define REST API to request and respond.

 Describe data sets and sources of data.

 Explore open data sets on the Data Asset eXchange.

 Describe how to use a learning model to solve a problem.

 List the tasks that a data scientist needs to perform to build a model.

 Explore ML models in the Model Learning eXchange.

Module 4: Jupyter Notebooks and JupyterLab

This module introduces the Jupyter Notebook and JupyterLab. You will learn how to work with different
kernels and the basic Jupyter architecture. In addition, you will identify the tools in an Anaconda Jupyter
environment. Finally, the module overviews cloud-based Jupyter environments and their data science
features.

Learning Objectives

 Describe how to use the notebooks in JupyterLab.

 Describe how to work in a notebook session.

 Describe the basic Jupyter architecture.

 Describe how to work with kernels.

 Identify tools in Anaconda Jupyter environments.

 Describe cloud-based Jupyter environments and their data science features.

Module 5: RStudio and GitHub

This module will start with an introduction to R and RStudio and will end up with Github usage. You will
learn about the different R visualization packages and how to create visual charts using the plot function.

Further in the module, you will develop the essential conceptual and hands-on skills to work with Git
and GitHub. You will start with an overview of Git and GitHub, creating a GitHub account and a project
repository, adding files, and committing your changes using the web interface. Next, you will become
familiar with Git workflows involving branches, pull requests (PRs), and merges. You will also complete a
project at the end to apply and demonstrate your newly acquired skills.

Learning Objectives

 Describe R capabilities and RStudio environment.

 Use the inbuilt R plot function.

 Explain version control and describe the Git and GitHub environment.

 Describe the purpose of source repositories and explain how GitHub satisfies the needs of a
source repository.

 Create a GitHub account and a project repository.

 Demonstrate how to edit and upload files in GitHub.

 Explain the purpose of branches and how to merge changes.

Module 6: Final Project and Assessment

In this module, you will work on a final project to demonstrate some of the skills learned in the course.
You will also be tested on your knowledge of various components and tools in a Data Scientist's toolkit
learned in the previous modules.
Learning Objectives

 Create a Jupyter Notebook with markdown and code cells

 List examples of languages, libraries and tools used in Data Science

 Share your Jupyter Notebook publicly on GitHub

 Evaluate notebooks submitted by your peers using the provided rubric

 Demonstrate proficiency in Data Science toolkit knowledge

Module7: IBM Watson Studio

This is as an optional module if you are interested in learning about and working with data science tools
from IBM such as Watson Studio.

Learning Objectives

 Find common resources in Watson Studio and IBM Cloud Pak for Data.

 Create an IBM Cloud account, service, and project in Watson Studio.

 Create and share a Jupyter Notebook.

 Use different types of Jupyter Notebook templates and kernels on IBM Watson Studio.

 Describe how to connect a Watson Studio account and publish a notebook in GitHub.

Module 1 Summary

Congratulations! You have completed this module. At this point in the course, you know:

 The Data Science Task Categories include:

o Data Management - storage, management and retrieval of data

o Data Integration and Transformation - streamline data pipelines and automate data
processing tasks

o Data Visualization - provide graphical representation of data and assist with

communicating insights

o Modelling - enable Building, Deployment, Monitoring and Assessment of Data and

Machine Learning models

 Data Science Tasks support the following:

o Code Asset Management - store & manage code, track changes and allow collaborative
development
o Data Asset Management - organize and manage data, provide access control, and
backup assets

o Development Environments - develop, test and deploy code

o Execution Environments - provide computational resources and run the code

The data science ecosystem consists of many open source and commercial options, and include both
traditional desktop applications and server-based tools, as well as cloud-based services that can be
accessed using web-browsers and mobile interfaces.

Data Management Tools: include Relational Databases, NoSQL Databases, and Big Data platforms:

 MySQL, and PostgreSQL are examples of Open Source Relational Database Management Systems
(RDBMS), and IBM Db2 and SQL Server are examples of commercial RDBMSes and are also
available as Cloud services.

 MongoDB and Apache Cassandra are examples of NoSQL databases.

 Apache Hadoop and Apache Spark are used for Big Data analytics.

Data Integration and Transformation Tools: include Apache Airflow and Apache Kafka.

Data Visualization Tools: include commercial offerings such as Cognos Analytics, Tableau and PowerBI
and can be used for building dynamic and interactive dashboards.

Code Asset Management Tools: Git is an essential code asset management tool. GitHub is a popular
web-based platform for storing and managing source code. Its features make it an ideal tool for
collaborative software development, including version control, issue tracking, and project management.

Development Environments: Popular development environments for Data Science include Jupyter
Notebooks and RStudio.

 Jupyter Notebooks provides an interactive environment for creating and sharing code,
descriptive text, data visualizations, and other computational artifacts in a web-browser based
interface.

 RStudio is an integrated development environment (IDE) designed specifically for working with
the R programming language, which is a popular tool for statistical computing and data analysis.
SEMANA 1.1

La gestión de activos de código proporciona una vista unificada en la que administra un inventario de
activos. Cuando desee desarrollar un modelo, es posible que deba actualizarlo, corregir errores o
mejorar las características del código de forma incremental. Todo esto requiere control de versiones. Los
desarrolladores usan el control de versiones para rastrear y administrar los cambios en el código de un
proyecto de software. Al trabajar en un modelo, equipa un repositorio centralizado donde todos pueden
cargar, editar y administrar los archivos de código simultáneamente. La colaboración permite que
diversas personas compartan y actualicen el mismo proyecto juntas. GitHub es un buen ejemplo de una
plataforma de gestión de activos de código. Está basado en la web y proporciona funciones de uso
compartido, colaboración y control de acceso. Como científico de datos, desea almacenar y organizar
correctamente todas sus imágenes, videos, texto y otros datos en una ubicación central. También desea
controlar quién puede acceder, editar y administrar sus datos. La gestión de activos de datos, también
llamada gestión de activos digitales (DAM), es la organización y gestión de datos importantes recopilados
de diferentes fuentes. DAM se realiza en una plataforma DAM que permite el control de versiones y la
colaboración. Las plataformas DAM también admiten la replicación, la copia de seguridad y la
administración de derechos de acceso para los datos almacenados. Los entornos de desarrollo, también
llamados entornos de desarrollo integrados o "IDE", proporcionan un espacio de trabajo y herramientas
para desarrollar, implementar, ejecutar, probar e implementar código fuente. Los IDE como IBM Watson
Studio brindan herramientas de prueba y simulación para emular el mundo real para que pueda ver
cómo se comportará su código después de implementarlo. Un entorno de ejecución tiene bibliotecas
para compilar el código fuente y recursos del sistema que ejecutan y verifican el código. Los entornos de
ejecución basados en la nube no están vinculados a ningún hardware o software específico y ofrecen
herramientas como IBM Watson Studio para el preprocesamiento de datos, el entrenamiento de
modelos y la implementación. Finalmente, las herramientas visuales completamente integradas como
IBM Watson Studio e IBM Cognos Dashboard Embedded cubren todos los componentes de herramientas
anteriores y se pueden usar para desarrollar modelos de aprendizaje profundo y aprendizaje automático.

En este video, aprendió que las categorías de tareas de ciencia de datos son: administración de datos,
integración y transformación de datos, visualización de datos, creación de modelos, implementación de
modelos y monitoreo y evaluación de modelos. Las tareas de ciencia de datos son compatibles con la
gestión de activos de datos, la gestión de activos de código, los entornos de ejecución y los entornos de
desarrollo.

SAP HANA Workload On Azure
No ratings yet
SAP HANA Workload On Azure
1,330 pages
Syllabus PracticalDataScience
No ratings yet
Syllabus PracticalDataScience
7 pages
Databases For Data Science-SQL
No ratings yet
Databases For Data Science-SQL
55 pages
Data Science - Module 1 - Data Science Fundamentals
No ratings yet
Data Science - Module 1 - Data Science Fundamentals
56 pages
Ds Module 1
No ratings yet
Ds Module 1
72 pages
Oracle Report Errors
100% (1)
Oracle Report Errors
70 pages
Report Seminar 6741
No ratings yet
Report Seminar 6741
24 pages
01 - DS and Env Setup
No ratings yet
01 - DS and Env Setup
17 pages
Data Science IBM
No ratings yet
Data Science IBM
157 pages
02 Learning Goals For The Course
No ratings yet
02 Learning Goals For The Course
3 pages
What Is Data Science by IBM
No ratings yet
What Is Data Science by IBM
9 pages
Unit 1-FDS
100% (2)
Unit 1-FDS
18 pages
Tools For Data Science
No ratings yet
Tools For Data Science
16 pages
01-Introduction To Data Science
No ratings yet
01-Introduction To Data Science
3 pages
08 Module 1 Summary
No ratings yet
08 Module 1 Summary
2 pages
Datascience Tools
No ratings yet
Datascience Tools
6 pages
1-Pre Requisite For Data Scientist-03!01!2025
No ratings yet
1-Pre Requisite For Data Scientist-03!01!2025
26 pages
Essential Tools For Data Science: A Comprehensive Overview Essential Tools For Data Science: A Comprehensive Overview
No ratings yet
Essential Tools For Data Science: A Comprehensive Overview Essential Tools For Data Science: A Comprehensive Overview
8 pages
SAP Transportation Management (SAP TM)
25% (4)
SAP Transportation Management (SAP TM)
13 pages
Db2 SQL Tuning Tips
100% (1)
Db2 SQL Tuning Tips
11 pages
H ERRAMIENTAS
No ratings yet
H ERRAMIENTAS
2 pages
Unit I Introduction To Data Science 9
No ratings yet
Unit I Introduction To Data Science 9
20 pages
Tools For Data Science: o o o o o o o o
No ratings yet
Tools For Data Science: o o o o o o o o
3 pages
Foslipy Notes For Data Science Module 1 & 2
No ratings yet
Foslipy Notes For Data Science Module 1 & 2
3 pages
PYDS 3150713 Unit-2
No ratings yet
PYDS 3150713 Unit-2
38 pages
DataScience - Unit 1
No ratings yet
DataScience - Unit 1
12 pages
Notes Unit1 Unit2
No ratings yet
Notes Unit1 Unit2
83 pages
02-Tools For Data Science
No ratings yet
02-Tools For Data Science
6 pages
Modul 2 Data Science
No ratings yet
Modul 2 Data Science
10 pages
DSOST1
No ratings yet
DSOST1
91 pages
Intro To DS Assignmnt 1 (Amna Iqbal) ....
No ratings yet
Intro To DS Assignmnt 1 (Amna Iqbal) ....
4 pages
Tools For Data Science-Data Science Methodology
No ratings yet
Tools For Data Science-Data Science Methodology
3 pages
DS Syllabus
No ratings yet
DS Syllabus
29 pages
Plan and Prepare Your Environment For FileNet P8
No ratings yet
Plan and Prepare Your Environment For FileNet P8
206 pages
Syllabus PracticalDataScience
No ratings yet
Syllabus PracticalDataScience
8 pages
Python For Data Science Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 02 Introduction To Python
No ratings yet
Python For Data Science Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 02 Introduction To Python
18 pages
Load Utility: by Swapan Banerjee: ... Courtesy IBM Utility Guide & Reference Manual
No ratings yet
Load Utility: by Swapan Banerjee: ... Courtesy IBM Utility Guide & Reference Manual
20 pages
Data Ty
No ratings yet
Data Ty
59 pages
Lesson - 2 Introduction To Data Science
No ratings yet
Lesson - 2 Introduction To Data Science
29 pages
Data Science Foundations Syllabus
No ratings yet
Data Science Foundations Syllabus
5 pages
Tools For Data Science
No ratings yet
Tools For Data Science
4 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
24 pages
Roshan SDP
No ratings yet
Roshan SDP
11 pages
Introduction-It Skills
No ratings yet
Introduction-It Skills
20 pages
Data Science and Analytics
No ratings yet
Data Science and Analytics
3 pages
2 DS # 1 Introduction To DS
No ratings yet
2 DS # 1 Introduction To DS
12 pages
SAP BPA Quick Install Guide
No ratings yet
SAP BPA Quick Install Guide
23 pages
Unit 1
No ratings yet
Unit 1
21 pages
Lecture - 5 - 2 - Skills Required by Data Scientist
No ratings yet
Lecture - 5 - 2 - Skills Required by Data Scientist
11 pages
Data Final
No ratings yet
Data Final
4 pages
6th Sem Cse Data Science Analytics SM o
No ratings yet
6th Sem Cse Data Science Analytics SM o
40 pages
What Is Data Science
No ratings yet
What Is Data Science
14 pages
PowerExchange CDC GuideForLinux (UNIX) AndWindows en PDF
No ratings yet
PowerExchange CDC GuideForLinux (UNIX) AndWindows en PDF
298 pages
Anaconda's Guide To Open-Source: Tools and Libraries For Enterprise Data Science and Machine Learning
No ratings yet
Anaconda's Guide To Open-Source: Tools and Libraries For Enterprise Data Science and Machine Learning
29 pages
Data Science Tools
No ratings yet
Data Science Tools
8 pages
Unit-V-Mca-305-Advanced DBMS
No ratings yet
Unit-V-Mca-305-Advanced DBMS
25 pages
Brochure - UoA - Curriculum
No ratings yet
Brochure - UoA - Curriculum
13 pages
A Review On Data Science Technologies
No ratings yet
A Review On Data Science Technologies
3 pages
Data Science Master Class 2023
No ratings yet
Data Science Master Class 2023
8 pages
CM3005 Data Science: Course Description
No ratings yet
CM3005 Data Science: Course Description
7 pages
Cognos Installation
No ratings yet
Cognos Installation
20 pages
Byte Academy: Data Science
No ratings yet
Byte Academy: Data Science
11 pages
Data Science - Data
No ratings yet
Data Science - Data
10 pages
Lesson1 Introduction To The Data Science Process and The Value of Learning Data Science
No ratings yet
Lesson1 Introduction To The Data Science Process and The Value of Learning Data Science
6 pages
WPV 615 Cluster Guide
No ratings yet
WPV 615 Cluster Guide
106 pages
Pow 03102
No ratings yet
Pow 03102
99 pages
Aiicsa 10
No ratings yet
Aiicsa 10
114 pages
Monarch Learning Guide
No ratings yet
Monarch Learning Guide
425 pages
V6R1 HW
No ratings yet
V6R1 HW
252 pages
Senior COBOL Programmer Analyst in NYC Resume Leslie Hornstein
No ratings yet
Senior COBOL Programmer Analyst in NYC Resume Leslie Hornstein
2 pages
Lab 3 - ERPsim Introduction and Activity One - Student Version
No ratings yet
Lab 3 - ERPsim Introduction and Activity One - Student Version
14 pages
Data Science Course and Machine Learnign Using Python
No ratings yet
Data Science Course and Machine Learnign Using Python
3 pages
DB2 Database Backup and Restore Steps
No ratings yet
DB2 Database Backup and Restore Steps
3 pages
ADONIS 3.9 - Installation and Database Management
No ratings yet
ADONIS 3.9 - Installation and Database Management
224 pages
SPAN Port Mirroring - Tech
No ratings yet
SPAN Port Mirroring - Tech
9 pages
Data Science With Python
No ratings yet
Data Science With Python
4 pages
HADR Implementation Process
No ratings yet
HADR Implementation Process
5 pages
Row and Column Access Control Support in IBM DB2 For I: Paper
No ratings yet
Row and Column Access Control Support in IBM DB2 For I: Paper
146 pages
E-Transport Management System
No ratings yet
E-Transport Management System
3 pages
Sample Resume
No ratings yet
Sample Resume
3 pages
Chapter 1 Lecture Notes
No ratings yet
Chapter 1 Lecture Notes
46 pages
Plural PLM Practice Portfolio
No ratings yet
Plural PLM Practice Portfolio
25 pages
A20 BB ConfigGuide EN US
No ratings yet
A20 BB ConfigGuide EN US
25 pages
IBM DB2 Test 302
No ratings yet
IBM DB2 Test 302
15 pages
0131477005
No ratings yet
0131477005
10 pages
Db2 HADR Wiki
No ratings yet
Db2 HADR Wiki
6 pages

Tools For Data Science

Uploaded by

Tools For Data Science

Uploaded by

TOOLS FOR DATA SCIENCE

Module 1: Overview of Data Science Tools

Module 2: Languages of Data Science

 Identify the users and benefits of Python.

 Identify the users and uses of the R language.

 Define SQL elements and list their benefits.

 Review languages such as Java, Scala, C++, JavaScript, and Julia.

 List the global communities for connecting with other users.

Module 3: Packages, APIs, Data Sets and Models

 Define REST API to request and respond.

 Describe data sets and sources of data.

 Explore open data sets on the Data Asset eXchange.

 Describe how to use a learning model to solve a problem.

 Explore ML models in the Model Learning eXchange.

 Describe how to use the notebooks in JupyterLab.

 Describe how to work in a notebook session.

 Describe the basic Jupyter architecture.

 Describe how to work with kernels.

 Identify tools in Anaconda Jupyter environments.

 Describe cloud-based Jupyter environments and their data science features.

Module 5: RStudio and GitHub

 Describe R capabilities and RStudio environment.

 Use the inbuilt R plot function.

 Create a GitHub account and a project repository.

 Demonstrate how to edit and upload files in GitHub.

 Explain the purpose of branches and how to merge changes.

Module 6: Final Project and Assessment

 Create a Jupyter Notebook with markdown and code cells

 List examples of languages, libraries and tools used in Data Science

 Share your Jupyter Notebook publicly on GitHub

 Evaluate notebooks submitted by your peers using the provided rubric

 Demonstrate proficiency in Data Science toolkit knowledge

Module7: IBM Watson Studio

 Create an IBM Cloud account, service, and project in Watson Studio.

 Create and share a Jupyter Notebook.

 The Data Science Task Categories include:

o Data Management - storage, management and retrieval of data

o Data Visualization - provide graphical representation of data and assist with

o Modelling - enable Building, Deployment, Monitoring and Assessment of Data and

 Data Science Tasks support the following:

o Development Environments - develop, test and deploy code

o Execution Environments - provide computational resources and run the code

 MongoDB and Apache Cassandra are examples of NoSQL databases.

You might also like