0% found this document useful (0 votes)

64 views11 pages

Data Science Course From Packt

The document discusses the application of data science. It describes the typical steps in a data science project as defining the problem, collecting data, analyzing and preparing data, training a model, assessing performance, communicating findings, and deploying the model. It emphasizes that properly defining the problem scope is critical. Examples of using data science in healthcare, education, and business are provided such as predicting medical outcomes, student dropout risk, and customer churn.

Uploaded by

mahmuda mimi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views11 pages

Data Science Course From Packt

Uploaded by

mahmuda mimi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Application of Data Science

As mentioned in the introduction, data science is a multidisciplinary approach to analyzing

and identifying complex patterns and extracting valuable insights from data. Running a data
science project usually involves multiple steps, including the following:

1. Defining the business problem to be solved

2. Collecting or extracting existing data
3. Analyzing, visualizing, and preparing data
4. Training a model to spot patterns in data and make predictions
5. Assessing a model's performance and making improvements
6. Communicating and presenting findings and gained insights
7. Deploying and maintaining a model

As its name implies, data science projects require data, but it is actually more important to
have defined a clear business problem to solve first. If it's not framed correctly, a project may
lead to incorrect results as you may have used the wrong information, not prepared the data
properly, or led a model to learn the wrong patterns. So, it is absolutely critical to properly
define the scope and objective of a data science project with your stakeholders.

There are a lot of data science applications in real-world situations or in business

environments. For example, healthcare providers may train a model for predicting a medical
outcome or its severity based on medical measurements, or a high school may want to predict
which students are at risk of dropping out within a year's time based on their historical grades
and past behaviors. Corporations may be interested to know the likelihood of a customer
buying a certain product based on his or her past purchases. They may also need to better
understand which customers are more likely to stop using existing services and churn. These
are examples where data science can be used to achieve a clearly defined goal, such as
increasing the number of patients detected with a heart condition at an early stage or reducing
the number of customers canceling their subscriptions after six months. That sounds exciting,
right? Soon enough, you will be working on such interesting projects.

What Is Machine Learning?

When we mention data science, we usually think about machine learning, and some people
may not understand the difference between them. Machine learning is the field of building
algorithms that can learn patterns by themselves without being programmed explicitly. So
machine learning is a family of techniques that can be used at the modeling stage of a data
science project.

Machine learning is composed of three different types of learning:

 Supervised learning
 Unsupervised learning
 Reinforcement learning

Supervised Learning
Supervised learning refers to a type of task where an algorithm is trained to learn patterns
based on prior knowledge. That means this kind of learning requires the labeling of the
outcome (also called the response variable, dependent variable, or target variable) to be
predicted beforehand. For instance, if you want to train a model that will predict whether a
customer will cancel their subscription, you will need a dataset with a column (or variable)
that already contains the churn outcome (cancel or not cancel) for past or existing customers.
This outcome has to be labeled by someone prior to the training of a model. If this dataset
contains 5,000 observations, then all of them need to have the outcome being populated. The
objective of the model is to learn the relationship between this outcome column and the other
features (also called independent variables or predictor variables). Following is an example of
such a dataset:

Figure 1.1: Example of customer churn dataset

The Cancel column is the response variable. This is the column you are interested in, and you
want the model to predict accurately the outcome for new input data (in this case, new
customers). All the other columns are the predictor variables.

The model, after being trained, may find the following pattern: a customer is more likely to
cancel their subscription after 12 months and if their average monthly spent is over $50. So,
if a new customer has gone through 15 months of subscription and is spending $85 per
month, the model will predict this customer will cancel their contract in the future.

When the response variable contains a limited number of possible values (or classes), it is a
classification problem (you will learn more about this in Chapter 3, Binary Classification,
and Chapter 4, Multiclass Classification with RandomForest). The model will learn how to
predict the right class given the values of the independent variables. The churn example we
just mentioned is a classification problem as the response variable can only take two different
values: yes or no.

On the other hand, if the response variable can have a value from an infinite number of
possibilities, it is called a regression problem.

An example of a regression problem is where you are trying to predict the exact number of
mobile phones produced every day for some manufacturing plants. This value can potentially
range from 0 to an infinite number (or a number big enough to have a large range of potential
values), as shown in Figure 1.2.

Figure 1.2: Example of a mobile phone production dataset

In the preceding figure, you can see that the values for Daily output can take any value
from 15000 to more than 50000. This is a regression problem, which we will look at
in Chapter 2, Regression.

Unsupervised Learning
Unsupervised learning is a type of algorithm that doesn't require any response variables at all.
In this case, the model will learn patterns from the data by itself. You may ask what kind of
pattern it can find if there is no target specified beforehand.

This type of algorithm usually can detect similarities between variables or records, so it will
try to group those that are very close to each other. This kind of algorithm can be used for
clustering (grouping records) or dimensionality reduction (reducing the number of variables).
Clustering is very popular for performing customer segmentation, where the algorithm will
look to group customers with similar behaviors together from the data. Chapter 5,
Performing Your First Cluster Analysis, will walk you through an example of clustering
analysis.

Reinforcement Learning
Reinforcement learning is another type of algorithm that learns how to act in a specific
environment based on the feedback it receives. You may have seen some videos where
algorithms are trained to play Atari games by themselves. Reinforcement learning techniques
are being used to teach the agent how to act in the game based on the rewards or penalties it
receives from the game.

For instance, in the game Pong, the agent will learn to not let the ball drop after multiple
rounds of training in which it receives high penalties every time the ball drops.
Note: Reinforcement learning algorithms are out of scope and will not be covered in this
course.

Overview of Python
As mentioned earlier, Python is one of the most popular programming languages for data
science. But before diving into Python's data science applications, let's have a quick
introduction to some core Python concepts.

Types of Variable
In Python, you can handle and manipulate different types of variables. Each has its own
specificities and benefits. We will not go through every single one of them but rather focus
on the main ones that you will have to use in this book. For each of the following code
examples, you can run the code in Google Colab to view the given output.

Numeric Variables
The most basic variable type is numeric. This can contain integer or decimal (or float)
numbers, and some mathematical operations can be performed on top of them.

Let's use an integer variable called var1 that will take the value 8 and another one
called var2 with the value 160.88, and add them together with the + operator, as shown
here:

var1 = 8
var2 = 160.88
var1 + var2

You should get the following output:

Figure 1.3: Output of the addition of two

variables

Very simple, right? In Python, you can perform other mathematical operations on numerical
variables, such as multiplication (with the * operator) and division (with /).

Text Variables
Another interesting type of variable is string, which contains textual information. You can
create a variable with some specific text using the single or double quote, as shown in the
following example:

var3 = 'Hello, '

var4 = 'World'

In order to display the content of a variable, you can call the print() function:

print(var3)
print(var4)

You should get the following output:

Figure 1.4: Printing the two text variables

Python also provides an interface called f-strings for printing text with the value of defined
variables. It is very handy when you want to print results with additional text to make it more
readable and interpret results. It is also quite common to use f-strings to print logs. You will
need to add f before the quotes (or double quotes) to specify that the text will be an f-string.
Then you can add an existing variable inside the quotes and display the text with the value of
this variable. You need to wrap the variable with curly brackets, {}. For instance, if we want
to print Text: before the values of var3 and var4, we will write the following code:

print(f"Text: {var3} {var4}!")

You should get the following output:

Figure 1.5: Printing

with f-strings

You can also perform some text-related transformations with string variables, such as
capitalizing or replacing characters. For instance, you can concatenate the two variables
together with the + operator:

var3 + var4
You should get the following output:

Figure 1.6: Concatenation of the two text

variables

Python List
Another very useful type of variable is the list. It is a collection of items that can be changed
(you can add, update, or remove items). To declare a list, you will need to use square
brackets, [], like this:

var5 = ['I', 'love', 'data', 'science']

print(var5)

You should get the following output:

Figure 1.7: List containing only string items

A list can have different item types, so you can mix numerical and text variables in it:

var6 = ['Packt', 15019, 2020, 'Data Science']

print(var6)

You should get the following output:

Figure 1.8: List containing numeric and string items

An item in a list can be accessed by its index (its position in the list). To access the first
(index 0) and third elements (index 2) of a list, you do the following:

print(var6[0])
print(var6[2])

Note: In Python, all indexes start at 0.

You should get the following output:

Figure 1.9: The first and third items in the var6
list

Python provides an API to access a range of items using the : operator. You just need to
specify the starting index on the left side of the operator and the ending index on the right
side. The ending index is always excluded from the range. So, if you want to get the first
three items (index 0 to 2), you should do as follows:

print(var6[0:3])

You should get the following output:

Figure 1.10: The

first three items of var6

You can also iterate through every item of a list using a for loop. If you want to print every
item of the var6 list, you should do this:

for item in var6:

print(item)

You should get the following output:

Figure 1.11: Output of the for loop

You can add an item at the end of the list using the .append() method:

var6.append('Python')
print(var6)

You should get the following output:

Figure 1.12: Output of var6 after inserting the 'Python' item

To delete an item from the list, you use the .remove() method:

var6.remove(15019)
print(var6)

You should get the following output:

Figure 1.13: Output of var6 after removing the '15019' item

Python Dictionary
Another very popular Python variable used by data scientists is the dictionary type. For
example, it can be used to load JSON data into Python so that it can then be converted into a
DataFrame (you will learn more about the JSON format and DataFrames in the following
sections). A dictionary contains multiple elements, like a list, but each element is
organized as a key-value pair. A dictionary is not indexed by numbers but by keys. So, to
access a specific value, you will have to call the item by its corresponding key. To define a
dictionary in Python, you will use curly brackets, {}, and specify the keys and values
separated by :, as shown here:

var7 = {'Topic': 'Data Science', 'Language': 'Python'}

print(var7)

You should get the following output:

Figure 1.14: Output of var7

To access a specific value, you need to provide the corresponding key name. For instance, if
you want to get the value Python, you do this:

var7['Language']

You should get the following output:

Figure 1.15: Value for the
'Language' key

Note: Each key-value pair in a dictionary needs to be unique.

Python provides a method to access all the key names from a dictionary, .keys(), which is
used as shown in the following code snippet:

var7.keys()

You should get the following output:

Figure 1.16: List of key names

There is also a method called .values(), which is used to access all the values of a
dictionary:

var7.values()

You should get the following output:

Figure 1.17: List of values

You can iterate through all items from a dictionary using a for loop and
the .items() method, as shown in the following code snippet:

for key, value in var7.items():

print(key)
print(value)

You should get the following output:

Figure 1.18: Output after iterating through the items of a dictionary

You can add a new element in a dictionary by providing the key name like this:

var7['Publisher'] = 'Packt'
print(var7)

You should get the following output:

Figure 1.19: Output of a dictionary after adding an item

You can delete an item from a dictionary with the del command:

del var7['Publisher']
print(var7)

You should get the following output:

Figure 1.20: Output of a dictionary after removing an item

In Exercise 1.01, we will be looking to use these concepts that we've just looked at.

Note: If you are interested in exploring Python in more depth, head over to our website to get
yourself the Python Workshop.

Unit 2 – Advance Concepts of Modelling in AI
No ratings yet
Unit 2 – Advance Concepts of Modelling in AI
12 pages
Data Science Real World Applications
100% (1)
Data Science Real World Applications
19 pages
50 TOP SAP ABAP Multiple Choice Questions and Answers PDF
100% (6)
50 TOP SAP ABAP Multiple Choice Questions and Answers PDF
9 pages
Building A REST API With Spring
No ratings yet
Building A REST API With Spring
118 pages
Tutorial Dev C++
No ratings yet
Tutorial Dev C++
4 pages
MachineLearning Jan2nd
100% (2)
MachineLearning Jan2nd
171 pages
Linear Reg Machine - Learning Elaborated
No ratings yet
Linear Reg Machine - Learning Elaborated
247 pages
ML Interactively
No ratings yet
ML Interactively
273 pages
Data Science: Sales Forecasting For Marketing
No ratings yet
Data Science: Sales Forecasting For Marketing
52 pages
SpecFlow Integration in TeamCity
No ratings yet
SpecFlow Integration in TeamCity
3 pages
Practical List of Python For Class 12
No ratings yet
Practical List of Python For Class 12
4 pages
Data Science Syllabus
No ratings yet
Data Science Syllabus
8 pages
Unit III
No ratings yet
Unit III
19 pages
R55_REG
No ratings yet
R55_REG
57 pages
defrgdsadsw
No ratings yet
defrgdsadsw
3 pages
ANIL DS PROJECT
No ratings yet
ANIL DS PROJECT
33 pages
Data Structures and Algorithms: Linked List Overview
No ratings yet
Data Structures and Algorithms: Linked List Overview
6 pages
Converting A Web Site Project To A Web Application Project
No ratings yet
Converting A Web Site Project To A Web Application Project
6 pages
8051 Uc
No ratings yet
8051 Uc
130 pages
Problem Solving Using C (Unit 1)
No ratings yet
Problem Solving Using C (Unit 1)
29 pages
Data Science Activity
No ratings yet
Data Science Activity
11 pages
Computer Science Art-Integrated Project 2021: Name - Rana Harshil Class:-11 A Sub - Computer Science Roll No - 11137
No ratings yet
Computer Science Art-Integrated Project 2021: Name - Rana Harshil Class:-11 A Sub - Computer Science Roll No - 11137
18 pages
14-004-1 Machine Learning
No ratings yet
14-004-1 Machine Learning
10 pages
AVL Trees
No ratings yet
AVL Trees
41 pages
Sat - 34.Pdf - A Systematic Approach Towards Description and Classification of Crime Incidents
No ratings yet
Sat - 34.Pdf - A Systematic Approach Towards Description and Classification of Crime Incidents
11 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
9 pages
Git Log
No ratings yet
Git Log
52 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
96 pages
ML NOTES(UNIT 1&2)
No ratings yet
ML NOTES(UNIT 1&2)
42 pages
BSC Computer Application Syllabus 2020-21
No ratings yet
BSC Computer Application Syllabus 2020-21
99 pages
Tesla Stock Marketing Price Prediction
No ratings yet
Tesla Stock Marketing Price Prediction
62 pages
PartB-U-2_Notes
No ratings yet
PartB-U-2_Notes
17 pages
Chapter 1
No ratings yet
Chapter 1
3 pages
Machine Learning Report
No ratings yet
Machine Learning Report
73 pages
Week 1
No ratings yet
Week 1
9 pages
PDS Qba
No ratings yet
PDS Qba
12 pages
Assignment 1: 2.5% Friday, 26th of January at 8AM
No ratings yet
Assignment 1: 2.5% Friday, 26th of January at 8AM
7 pages
Data Science Activity
No ratings yet
Data Science Activity
12 pages
SAP_Datasphere_Data_Builder
No ratings yet
SAP_Datasphere_Data_Builder
22 pages
INTRODUCTION
No ratings yet
INTRODUCTION
51 pages
ICT515_LEC1
No ratings yet
ICT515_LEC1
70 pages
Machine Learning Career Roadmap_
No ratings yet
Machine Learning Career Roadmap_
17 pages
ML SIG - Day 1
No ratings yet
ML SIG - Day 1
55 pages
Plant Information Modelling, Using Artificial Intelligence, For Process Hazard and Risk Analysis Study
No ratings yet
Plant Information Modelling, Using Artificial Intelligence, For Process Hazard and Risk Analysis Study
143 pages
L1 - Introduction To Data Science
No ratings yet
L1 - Introduction To Data Science
33 pages
Practical Workbook-1
No ratings yet
Practical Workbook-1
11 pages
Rtmnu AIIIII
No ratings yet
Rtmnu AIIIII
57 pages
nn
No ratings yet
nn
24 pages
Week 4 - Intro to ML
No ratings yet
Week 4 - Intro to ML
37 pages
Common To ALL Branches
No ratings yet
Common To ALL Branches
8 pages
System Calls
No ratings yet
System Calls
27 pages
Lecture 1 - Introduction To Data Science
No ratings yet
Lecture 1 - Introduction To Data Science
14 pages
AI ML June 4 2022
No ratings yet
AI ML June 4 2022
40 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
10 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
ICPC Jakatar
No ratings yet
ICPC Jakatar
8 pages
Machine Learning
No ratings yet
Machine Learning
41 pages
Diya Basera
No ratings yet
Diya Basera
15 pages
CS601_Machine Learning_Unit 1_Notes_1672759748
No ratings yet
CS601_Machine Learning_Unit 1_Notes_1672759748
13 pages
M1.1 DS
No ratings yet
M1.1 DS
57 pages
Data Science Book
No ratings yet
Data Science Book
383 pages
Machine Learning For Beginners Overview of Algorithm TypesStart Learning Machine Learning From Here
No ratings yet
Machine Learning For Beginners Overview of Algorithm TypesStart Learning Machine Learning From Here
13 pages
TTDS Lectures
No ratings yet
TTDS Lectures
13 pages
5_6237938787641463884
No ratings yet
5_6237938787641463884
9 pages
Backuped Active Metamask
0% (1)
Backuped Active Metamask
5 pages
Data Science
No ratings yet
Data Science
8 pages
Introduction To Statistics For Data Science: Opensap
No ratings yet
Introduction To Statistics For Data Science: Opensap
11 pages
Unit 3 - DS - 1st year
No ratings yet
Unit 3 - DS - 1st year
5 pages
Chapter-14 Data Science
No ratings yet
Chapter-14 Data Science
12 pages
DS-05 Introduction To Machine Learning
No ratings yet
DS-05 Introduction To Machine Learning
103 pages
Trees Datastructure
No ratings yet
Trees Datastructure
26 pages
Embedded Systems Certificate
No ratings yet
Embedded Systems Certificate
6 pages
Task The Problems That Can Be Solved With Machine Learning
No ratings yet
Task The Problems That Can Be Solved With Machine Learning
9 pages
CSF213 L3
No ratings yet
CSF213 L3
23 pages
305 BA PYTHON - APR 2022 ANSWER Key
No ratings yet
305 BA PYTHON - APR 2022 ANSWER Key
14 pages
I. The Types of Machine Learning
No ratings yet
I. The Types of Machine Learning
8 pages
1 - Module5 - Machine Learning
100% (1)
1 - Module5 - Machine Learning
78 pages
Fundamentals of Machine Learning II
No ratings yet
Fundamentals of Machine Learning II
13 pages
MAchine Learning
No ratings yet
MAchine Learning
10 pages
2020 It Syllabus
No ratings yet
2020 It Syllabus
118 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
Lect3 Machine Learning
No ratings yet
Lect3 Machine Learning
27 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
NAME-Rajat Gupta Section - B2B2 (Marketing and Analytics) UID - 2019-1706-0001-0007
No ratings yet
NAME-Rajat Gupta Section - B2B2 (Marketing and Analytics) UID - 2019-1706-0001-0007
9 pages
C7 633P Panel
No ratings yet
C7 633P Panel
258 pages
Mettl API
100% (1)
Mettl API
84 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)

Data Science Course From Packt

Uploaded by

Data Science Course From Packt

Uploaded by

Application of Data Science

As mentioned in the introduction, data science is a multidisciplinary approach to analyzing

1. Defining the business problem to be solved

There are a lot of data science applications in real-world situations or in business

What Is Machine Learning?

Machine learning is composed of three different types of learning:

Figure 1.1: Example of customer churn dataset

Figure 1.2: Example of a mobile phone production dataset

You should get the following output:

Figure 1.3: Output of the addition of two

var3 = 'Hello, '

In order to display the content of a variable, you can call the print() function:

You should get the following output:

Figure 1.4: Printing the two text variables

print(f"Text: {var3} {var4}!")

You should get the following output:

Figure 1.5: Printing

Figure 1.6: Concatenation of the two text

var5 = ['I', 'love', 'data', 'science']

You should get the following output:

Figure 1.7: List containing only string items

var6 = ['Packt', 15019, 2020, 'Data Science']

You should get the following output:

Figure 1.8: List containing numeric and string items

Note: In Python, all indexes start at 0.

You should get the following output:

You should get the following output:

Figure 1.10: The

for item in var6:

You should get the following output:

Figure 1.11: Output of the for loop

You should get the following output:

Figure 1.12: Output of var6 after inserting the 'Python' item

To delete an item from the list, you use the .remove() method:

You should get the following output:

Figure 1.13: Output of var6 after removing the '15019' item

var7 = {'Topic': 'Data Science', 'Language': 'Python'}

You should get the following output:

Figure 1.14: Output of var7

You should get the following output:

Note: Each key-value pair in a dictionary needs to be unique.

You should get the following output:

Figure 1.16: List of key names

You should get the following output:

Figure 1.17: List of values

for key, value in var7.items():

You should get the following output:

You should get the following output:

Figure 1.19: Output of a dictionary after adding an item

You can delete an item from a dictionary with the del command:

You should get the following output:

Figure 1.20: Output of a dictionary after removing an item

You might also like