0% found this document useful (0 votes)

182 views

What Is Data Science - IBM

Data science combines various disciplines like statistics, programming, analytics, and storytelling to extract insights from large amounts of data. It involves preparing, analyzing, and communicating data to reveal patterns and enable informed decisions. Data scientists require skills in math, science, programming, and business communication. They use tools like R and Python to analyze data and create visualizations. Cloud computing makes large-scale data science more accessible for organizations. Data science is used across industries to improve processes, personalization, and decision-making through data-driven insights.

Uploaded by

waqar ahmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

182 views

What Is Data Science - IBM

Uploaded by

waqar ahmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Skip to content

Cloud

IBM Cloud Learn Hub

What is Data Science

Data Science

By: IBM Cloud Education

15 May 2020

Analytics
Data science

What is data science?

Data Science

Data science combines the scientific method,

math and statistics, specialized programming,
advanced analytics, AI, and even storytelling to
uncover and explain the business insights buried
in data.
Cookie Preferences
What is data science?
Skip to content

Data science is a multidisciplinary approach to extracting actionable insights from

the large and ever-increasing volumes of data collected and created by today’s
organizations. Data science encompasses preparing data for analysis and processing,
performing advanced data analysis, and presenting the results to reveal patterns and
enable stakeholders to draw informed conclusions.

Data preparation can involve cleansing, aggregating, and manipulating it to be ready

for specific types of processing. Analysis requires the development and use of
algorithms, analytics and AI models. It’s driven by software that combs through data
to find patterns within to transform these patterns into predictions that support
business decision-making. The accuracy of these predictions must be validated
through scientifically designed tests and experiments. And the results should be
shared through the skillful use of data visualization tools that make it possible for
anyone to see the patterns and understand trends.

As a result, data scientists (as data science practitioners are called) require
computer science and pure science skills beyond those of a typical data analyst. A
data scientist must be able to do the following:

– Apply mathematics, statistics, and the scientific method

– Use a wide range of tools and techniques for evaluating and preparing data—
everything from SQL to data mining to data integration methods

– Extract insights from data using predictive analytics and artificial intelligence
(AI), including machine learning and deep learning models

– Write applications that automate data processing and calculations

– Tell—and illustrate—stories that clearly convey the meaning of results to

decision-makers and stakeholders at every level of technical knowledge and
understanding

– Explain how these results can be used to solve business problems

Cookie Preferences
This combination of skills is rare, and it’s no surprise that data scientists are currently
in high
Skipdemand. According to an IBM survey (PDF, 3.9 MB), the number of job
to content
openings in the field continues to grow at over 5% per year, with over 60,000
forecast for 2020.

The data science lifecycle

The data science lifecycle—also called the data science pipeline—includes anywhere
from five to sixteen (depending on whom you ask) overlapping, continuing processes.
The processes common to just about everyone’s definition of the lifecycle include the
following:

– Capture: This is the gathering of raw structured and unstructured data from all
relevant sources via just about any method—from manual entry and web
scraping to capturing data from systems and devices in real time.

– Prepare and maintain: This involves putting the raw data into a consistent
format for analytics or machine learning or deep learning models. This can
include everything from cleansing, deduplicating, and reformatting the data, to
using ETL (extract, transform, load) or other data integration technologies to
combine the data into a data warehouse, data lake, or other unified store for
analysis.

– Preprocess or process: Here, data scientists examine biases, patterns, ranges,

and distributions of values within the data to determine the data’s suitability for
use with predictive analytics, machine learning, and/or deep learning algorithms
(or other analytical methods).

– Analyze: This is where the discovery happens—where data scientists perform

statistical analysis, predictive analytics, regression, machine learning and deep
learning algorithms, and more to extract insights from the prepared data.

– Communicate: Finally, the insights are presented as reports, charts, and other

data visualizations that make the insights—and their impact on the business—
easier for decision-makers to understand. A data science programming
language such as R or Python (see below) includes components for generating
visualizations; alternatively, data scientists can use dedicated visualization
tools. Cookie Preferences
Data science tools
Skip to content

Data scientists must be able to build and run code in order to create models. The
most popular programming languages among data scientists are open source tools
that include or support pre-built statistical, machine learning and graphics
capabilities. These languages include:

– R: An open source programming language and environment for developing

statistical computing and graphics, R is the most popular programming
language among data scientists. R provides a broad variety of libraries and tools
for cleansing and prepping data, creating visualizations, and training and
evaluating machine learning and deep learning algorithms. It’s also widely used
among data science scholars and researchers.

– Python: Python is a general-purpose, object-oriented, high-level programming

language that emphasizes code readability through its distinctive generous use
of white space. Several Python libraries support data science tasks, including
Numpy for handling large dimensional arrays, Pandas for data manipulation and
analysis, and Matplotlib for building data visualizations.

For a deep dive into the differences between these approaches, check out "Python
vs. R: What's the Difference?"

Data scientists need to be proficient in the use of big data processing platforms, such
as Apache Spark and Apache Hadoop. They also need to be skilled with a wide range
of data visualization tools, including the simple graphics tools included with business
presentation and spreadsheet applications, built-for-purpose commercial
visualization tools like Tableau and Microsoft PowerBI, and open source tools like
D3.js (a JavaScript library for creating interactive data visualizations) and RAW
Graphs.

Data science and cloud computing

Cloud computing is bringing many data science benefits within reach of even small
and midsized organizations.
Cookie Preferences
Data science’s foundation is the manipulation and analysis of extremely large data
sets; thetocloud
Skip provides access to storage infrastructures capable of handling large
content
amounts of data with ease. Data science also involves running machine learning
algorithms that demand massive processing power; the cloud makes available the
high-performance compute that’s necessary for the task. To purchase equivalent on-
site hardware would be far too expensive for many enterprises and research teams,
but the cloud makes access affordable with per-use or subscription-based pricing.

Cloud infrastructures can be accessed from anywhere in the world, making it

possible for multiple groups of data scientists to share access to the data sets they’re
working with in the cloud—even if they’re located in different countries.

Open source technologies are widely used in data science tool sets. When they’re
hosted in the cloud, teams don’t need to install, configure, maintain, or update them
locally. Several cloud providers also offer prepackaged tool kits that enable data
scientists to build models without coding, further democratizing access to the
innovations and insights that this discipline is making available.

Data science use cases

There’s no limit to the number or kind of enterprises that could potentially benefit
from the opportunities data science is creating. Nearly any business process can be
made more efficient through data-driven optimization, and nearly every type of
customer experience (CX) can be improved with better targeting and personalization.

Here are a few representative use cases for data science and AI:

– An international bank created a mobile app offering on-the-spot decisions to

loan applicants using machine learning-powered credit risk models and a hybrid
cloud computing architecture that is both powerful and secure.

– An electronics firm is developing ultra-powerful 3D-printed sensors that will

guide tomorrow’s driverless vehicles. The solution relies on data science and
analytics tools to enhance its real-time object detection capabilities.

– A robotic process automation (RPA) solution provider developed a cognitive

business process mining solution that reduces incident handling times between
Cookie Preferences
15% and 95% for its client companies. The solution is trained to understand
the content and sentiment of customer emails, directing service teams to
prioritize those that are most relevant and urgent.
Skip to content

– A digital media technology company created an audience analytics platform

that enables its clients to see what’s engaging TV audiences as they’re offered a
growing range of digital channels. The solution employs deep analytics and
machine learning to gather real-time insights into viewer behavior.

– An urban police department created statistical incident analysis tools to help

officers understand when and where to deploy resources in order to prevent
crime. The data-driven solution creates reports and dashboards to augment
situational awareness for field officers.

– A smart healthcare company developed a solution enabling seniors to live

independently for longer. Combining sensors, machine learning, analytics, and
cloud-based processing, the system monitors for unusual behavior and alerts
relatives and caregivers, while conforming to the strict security standards that
are mandatory in the healthcare industry.

Data science and IBM Cloud

IBM Cloud offers a highly secure public cloud infrastructure with a full-stack platform
that includes more than 170 products and services, many of which were designed to
support data science and AI.

IBM’s data science and AI lifecycle product portfolio is built upon our longstanding
commitment to open source technologies and includes a range of capabilities that
enable enterprises to unlock the value of their data in new ways.

AutoAI, a powerful new automated development capability in IBM Watson Studio,

speeds the data preparation, model development, and feature engineering stages of
the data science lifecycle. This allows data scientists to be more efficient and helps
them make better-informed decisions about which models will perform best for real-
world use cases. AutoAI simplifies enterprise data science across any cloud
environment.

The IBM Cloud Pak for Data platform provides a fully integrated and extensible data
and information architecture built on the Red Hat OpenShift Container Platform Cookie
that Preferences
runs on any cloud. With IBM Cloud Pak for Data, enterprises can more easily collect,
organize and
Skip to analyze data, making it possible to infuse insights from AI throughout
content
the entire organization.

Want to learn more about building and running data science models on IBM Cloud?
Get started for no-charge by signing up for an IBM Cloud account today.

IBM named a Leader

IBM is named a Leader in the 2021 Gartner Magic Quadrant for Data Science and Machine
Learning Platforms.

Read the report

Cookie Preferences
Skip to content

Data science community

Connect with experts and peers to elevate technical expertise, solve problems and share
insights.

Learn more

IBM Cloud Pak for Data

Why IBM Cloud

Hybrid Cloud approach

Cookie Preferences
Trust and security
Trust and security

Open Cloud
Skip to content
Data centers

Case studies

Products and Solutions

Cloud Paks

Cloud pricing

View all products

View all solutions

Learn about

What is Hybrid Cloud?

What is Cloud Computing?

What is Confidential Computing?

What is a Data Lake?

What is a Data Warehouse?

What is Artificial Intelligence (AI)?

What is Machine Learning?

What is DevOps?

What is Microservices?

Resources

Get started

Docs

Architectures

IBM Garage
Cookie Preferences
Training and Certifications
Training and Certifications

Partners
Skip to content
Cloud blog

Hybrid Cloud careers

My Cloud account

Cookie Preferences

Seminar On Data Science
100% (7)
Seminar On Data Science
25 pages
DataScience Reading
No ratings yet
DataScience Reading
6 pages
Introduction-to-Data-Science
No ratings yet
Introduction-to-Data-Science
19 pages
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
No ratings yet
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
5 pages
Data Science Applications by Rajesh - 91
No ratings yet
Data Science Applications by Rajesh - 91
46 pages
himadev
No ratings yet
himadev
37 pages
data science chacha
No ratings yet
data science chacha
150 pages
Data Science
No ratings yet
Data Science
7 pages
Data Science CLASS 12 INVESTIGATORY PROJECT
No ratings yet
Data Science CLASS 12 INVESTIGATORY PROJECT
9 pages
Data Science PDF
No ratings yet
Data Science PDF
8 pages
Data Science
No ratings yet
Data Science
3 pages
Fds Module 1
No ratings yet
Fds Module 1
65 pages
Unit 1-FDS
No ratings yet
Unit 1-FDS
18 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
DATA SCIENCE LIFE CYCLE
No ratings yet
DATA SCIENCE LIFE CYCLE
12 pages
Data Science and Data Scientist
No ratings yet
Data Science and Data Scientist
11 pages
49d634691070b2749a54e4ecd7d59f0d66a125e5 (1)
No ratings yet
49d634691070b2749a54e4ecd7d59f0d66a125e5 (1)
8 pages
What Is A Data Scientist
No ratings yet
What Is A Data Scientist
21 pages
COMPUTATIONAL DATA SCIENCE - UNIT 1
No ratings yet
COMPUTATIONAL DATA SCIENCE - UNIT 1
18 pages
Data Science - Data
No ratings yet
Data Science - Data
10 pages
DS3 Data Science Introduction
No ratings yet
DS3 Data Science Introduction
18 pages
Unit 2 Data Science
No ratings yet
Unit 2 Data Science
53 pages
Data Science and The Future
No ratings yet
Data Science and The Future
20 pages
Machine Learning: Description
No ratings yet
Machine Learning: Description
1 page
Lecture 1 Introduction Tools An - Chniques For Data Science
No ratings yet
Lecture 1 Introduction Tools An - Chniques For Data Science
16 pages
Anu Data Scie
No ratings yet
Anu Data Scie
32 pages
Basics of Data Science KPK
No ratings yet
Basics of Data Science KPK
38 pages
What Is Data Science
No ratings yet
What Is Data Science
8 pages
Extended_Comprehensive_Guide_to_Data_Science
No ratings yet
Extended_Comprehensive_Guide_to_Data_Science
2 pages
Ch7-Overview of Data Science-part 1
No ratings yet
Ch7-Overview of Data Science-part 1
37 pages
PSD02 - Data Science Overview
No ratings yet
PSD02 - Data Science Overview
64 pages
Data Science Components
No ratings yet
Data Science Components
7 pages
5 - Data Analytics, Data Science and Machine Learning
No ratings yet
5 - Data Analytics, Data Science and Machine Learning
56 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
85 pages
Handbook Introduction of Data Science AY 23-24
No ratings yet
Handbook Introduction of Data Science AY 23-24
171 pages
M-1-FDS-NOTES-PPT (2) (1)
No ratings yet
M-1-FDS-NOTES-PPT (2) (1)
19 pages
Data Science Modern Technology5
No ratings yet
Data Science Modern Technology5
6 pages
What Is Data Science
No ratings yet
What Is Data Science
3 pages
Datascience
75% (8)
Datascience
28 pages
Module 1 Applied Data Science 1.1 and 1.2
No ratings yet
Module 1 Applied Data Science 1.1 and 1.2
104 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
24 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Chapter 1
No ratings yet
Chapter 1
47 pages
What Is Data Science
No ratings yet
What Is Data Science
14 pages
github_com
No ratings yet
github_com
2 pages
Data Science M-1 Notes
No ratings yet
Data Science M-1 Notes
34 pages
IDS-UNIT-1-FINAL (1)
No ratings yet
IDS-UNIT-1-FINAL (1)
30 pages
Data Science Presentation Enhanced (1)
No ratings yet
Data Science Presentation Enhanced (1)
34 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
37 pages
TLMweek1IntroDs
No ratings yet
TLMweek1IntroDs
11 pages
DATA SCIENCE
No ratings yet
DATA SCIENCE
8 pages
Wepik Unleashing The Power of Data Science Unlocking Insights For Business Success 20231028080159NgQF
No ratings yet
Wepik Unleashing The Power of Data Science Unlocking Insights For Business Success 20231028080159NgQF
14 pages
IDS- UNIT-1
No ratings yet
IDS- UNIT-1
14 pages
Data Science Career Guide
No ratings yet
Data Science Career Guide
11 pages
Data-Science-Career-Guide-2
No ratings yet
Data-Science-Career-Guide-2
11 pages
Data Science Article
No ratings yet
Data Science Article
2 pages
Ds Intro KK
No ratings yet
Ds Intro KK
11 pages
Data Science Presentation Final
No ratings yet
Data Science Presentation Final
34 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Be Data Curious!: Be Data Curious!, #1
From Everand
Be Data Curious!: Be Data Curious!, #1
Nick Jewell
No ratings yet
What Is Memory
No ratings yet
What Is Memory
5 pages
What Is Bit (Binary Digit) - Definition From
No ratings yet
What Is Bit (Binary Digit) - Definition From
1 page
Definition of DCIM - PCMag
No ratings yet
Definition of DCIM - PCMag
5 pages
What Is A Cell - MedlinePlus Genetics
No ratings yet
What Is A Cell - MedlinePlus Genetics
3 pages
Fault Finding: Section F - Transmission
No ratings yet
Fault Finding: Section F - Transmission
1 page
ConfigGuide SIF PDF
No ratings yet
ConfigGuide SIF PDF
66 pages
Brushed DC Sensorless Feedback
No ratings yet
Brushed DC Sensorless Feedback
3 pages
RTi-Broschuere EN 20191010-WEB
No ratings yet
RTi-Broschuere EN 20191010-WEB
24 pages
Internet Security
No ratings yet
Internet Security
58 pages
User Manual: Rba Cs 24 E - Rba Cs 100 E
No ratings yet
User Manual: Rba Cs 24 E - Rba Cs 100 E
4 pages
Appendix I ComputerPrograms
No ratings yet
Appendix I ComputerPrograms
10 pages
ESV Wiring Rules Presentation 2008
No ratings yet
ESV Wiring Rules Presentation 2008
107 pages
Wiring Diagrams D5H 1DDO
No ratings yet
Wiring Diagrams D5H 1DDO
13 pages
Agamoni, Mrunmayee, Prabhuti, Shalini, Suhrit - AAMM2
No ratings yet
Agamoni, Mrunmayee, Prabhuti, Shalini, Suhrit - AAMM2
14 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
34 pages
Air Curtain Catalog 2017 New
No ratings yet
Air Curtain Catalog 2017 New
7 pages
Accenture Helps Marriott Reach 7 Billion Annual Online Sales
No ratings yet
Accenture Helps Marriott Reach 7 Billion Annual Online Sales
4 pages
PTL Electrical Actuator
No ratings yet
PTL Electrical Actuator
6 pages
Music Studio v2 7 User Manual
No ratings yet
Music Studio v2 7 User Manual
86 pages
Installation Enqueue
No ratings yet
Installation Enqueue
1 page
Type "1" and Type "2" Coordination: For Contactors and Motor Starters Acc. To UL 60947-4-1
No ratings yet
Type "1" and Type "2" Coordination: For Contactors and Motor Starters Acc. To UL 60947-4-1
9 pages
Preconfigured DSP System For Hearing Aids RHYTHM R3920: Description
No ratings yet
Preconfigured DSP System For Hearing Aids RHYTHM R3920: Description
14 pages
Understanding The Toolspace
No ratings yet
Understanding The Toolspace
2 pages
Banglalion Intern Report
75% (4)
Banglalion Intern Report
56 pages
82674cajournal Nov2024 26
No ratings yet
82674cajournal Nov2024 26
6 pages
Letter From Leo M. Spellacy Jr. To Cleveland Law Director Mark Griffin
No ratings yet
Letter From Leo M. Spellacy Jr. To Cleveland Law Director Mark Griffin
8 pages
PLC Microproject PDF
No ratings yet
PLC Microproject PDF
8 pages
Spiral at FM
No ratings yet
Spiral at FM
3 pages
Benefits of Audio-Visual Aids
No ratings yet
Benefits of Audio-Visual Aids
10 pages
LabExer004 Beronio AbrahamJr
No ratings yet
LabExer004 Beronio AbrahamJr
7 pages
Student Transport Form 2021
No ratings yet
Student Transport Form 2021
4 pages
Guidelines For Remote Proctored Online Descriptive Mode of Examinations 180722
No ratings yet
Guidelines For Remote Proctored Online Descriptive Mode of Examinations 180722
5 pages
19-Ccnp-Route Eigrp and Ospf Case Study
No ratings yet
19-Ccnp-Route Eigrp and Ospf Case Study
3 pages
A Kitchen Faucet and Its Components
No ratings yet
A Kitchen Faucet and Its Components
22 pages

What Is Data Science - IBM

Uploaded by

What Is Data Science - IBM

Uploaded by

Skip to content

IBM Cloud Learn Hub

By: IBM Cloud Education

What is data science?

Data science combines the scientific method,

Data science is a multidisciplinary approach to extracting actionable insights from

Data preparation can involve cleansing, aggregating, and manipulating it to be ready

– Apply mathematics, statistics, and the scientific method

– Write applications that automate data processing and calculations

– Tell—and illustrate—stories that clearly convey the meaning of results to

– Explain how these results can be used to solve business problems

The data science lifecycle

– Preprocess or process: Here, data scientists examine biases, patterns, ranges,

– Analyze: This is where the discovery happens—where data scientists perform

– Communicate: Finally, the insights are presented as reports, charts, and other

– R: An open source programming language and environment for developing

– Python: Python is a general-purpose, object-oriented, high-level programming

Data science and cloud computing

Cloud infrastructures can be accessed from anywhere in the world, making it

Data science use cases

– An international bank created a mobile app offering on-the-spot decisions to

– An electronics firm is developing ultra-powerful 3D-printed sensors that will

– A robotic process automation (RPA) solution provider developed a cognitive

– A digital media technology company created an audience analytics platform

– An urban police department created statistical incident analysis tools to help

– A smart healthcare company developed a solution enabling seniors to live

Data science and IBM Cloud

AutoAI, a powerful new automated development capability in IBM Watson Studio,

IBM named a Leader

Read the report

Data science community

IBM Cloud Pak for Data

Why IBM Cloud

Why IBM Cloud

Hybrid Cloud approach

Products and Solutions

View all products

View all solutions

What is Hybrid Cloud?

What is Cloud Computing?

What is Confidential Computing?

What is a Data Lake?

What is a Data Warehouse?

What is Artificial Intelligence (AI)?

What is Machine Learning?

Hybrid Cloud careers

You might also like