What Is Data Science - IBM
What Is Data Science - IBM
Cloud
Data Science
15 May 2020
Analytics
Data science
Data Science
As a result, data scientists (as data science practitioners are called) require
computer science and pure science skills beyond those of a typical data analyst. A
data scientist must be able to do the following:
– Use a wide range of tools and techniques for evaluating and preparing data—
everything from SQL to data mining to data integration methods
– Extract insights from data using predictive analytics and artificial intelligence
(AI), including machine learning and deep learning models
Cookie Preferences
This combination of skills is rare, and it’s no surprise that data scientists are currently
in high
Skipdemand. According to an IBM survey (PDF, 3.9 MB), the number of job
to content
openings in the field continues to grow at over 5% per year, with over 60,000
forecast for 2020.
– Capture: This is the gathering of raw structured and unstructured data from all
relevant sources via just about any method—from manual entry and web
scraping to capturing data from systems and devices in real time.
– Prepare and maintain: This involves putting the raw data into a consistent
format for analytics or machine learning or deep learning models. This can
include everything from cleansing, deduplicating, and reformatting the data, to
using ETL (extract, transform, load) or other data integration technologies to
combine the data into a data warehouse, data lake, or other unified store for
analysis.
Data scientists must be able to build and run code in order to create models. The
most popular programming languages among data scientists are open source tools
that include or support pre-built statistical, machine learning and graphics
capabilities. These languages include:
For a deep dive into the differences between these approaches, check out "Python
vs. R: What's the Difference?"
Data scientists need to be proficient in the use of big data processing platforms, such
as Apache Spark and Apache Hadoop. They also need to be skilled with a wide range
of data visualization tools, including the simple graphics tools included with business
presentation and spreadsheet applications, built-for-purpose commercial
visualization tools like Tableau and Microsoft PowerBI, and open source tools like
D3.js (a JavaScript library for creating interactive data visualizations) and RAW
Graphs.
Open source technologies are widely used in data science tool sets. When they’re
hosted in the cloud, teams don’t need to install, configure, maintain, or update them
locally. Several cloud providers also offer prepackaged tool kits that enable data
scientists to build models without coding, further democratizing access to the
innovations and insights that this discipline is making available.
Here are a few representative use cases for data science and AI:
IBM’s data science and AI lifecycle product portfolio is built upon our longstanding
commitment to open source technologies and includes a range of capabilities that
enable enterprises to unlock the value of their data in new ways.
The IBM Cloud Pak for Data platform provides a fully integrated and extensible data
and information architecture built on the Red Hat OpenShift Container Platform Cookie
that Preferences
runs on any cloud. With IBM Cloud Pak for Data, enterprises can more easily collect,
organize and
Skip to analyze data, making it possible to infuse insights from AI throughout
content
the entire organization.
Want to learn more about building and running data science models on IBM Cloud?
Get started for no-charge by signing up for an IBM Cloud account today.
Cookie Preferences
Skip to content
Learn more
Featured products
Watson Studio
Related links
ModelOps
Explainable AI
AutoAI
Open Cloud
Skip to content
Data centers
Case studies
Cloud Paks
Cloud pricing
Learn about
What is DevOps?
What is Microservices?
Resources
Get started
Docs
Architectures
IBM Garage
Cookie Preferences
Training and Certifications
Training and Certifications
Partners
Skip to content
Cloud blog
My Cloud account
Cookie Preferences