100% found this document useful (1 vote)

40 views19 pages

Data Science Real World Applications

This PDF covers more about real world application of data science today

Uploaded by

shineqmwareya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

40 views19 pages

Data Science Real World Applications

This PDF covers more about real world application of data science today

Uploaded by

shineqmwareya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 19

Data Science: Real World Applications.

• Data Science has become an essential tool for businesses and

organizations to make informed decisions and drive growth.

• As the amount of data being generated continues to grow, the

importance of data science in solving complex problems and
making data-driven decisions will only continue to increase.

• Data science combines elements of mathematics, statistics,

computer science, and domain expertise to solve real-world
problems and make data-driven decisions.
• Data science involves the analysis of large and complex datasets using
statistical methods, machine learning techniques, and data
visualization.

• A dataset is a collection of data.

• Machine learning is a subset of artificial intelligence that involves

training computer systems to learn and improve from experience
without being explicitly programmed.

• In other words, it is a way of teaching computers to learn from data,

identify patterns, and make predictions or decisions.
• data science has a wide range of applications across many industries.

• One of the most important applications of data science is in the healthcare

industry. Machine learning algorithms can be used to analyze medical images
and diagnose diseases, e.g. use of chest x-ray images to detect whether a
person has TB or Covid-19

• Data science is also widely used in the finance industry. Financial institutions
use data science to detect fraudulent transactions and prevent financial
losses.
• In the education sector, data science can be used to improve student
outcomes. Student performance data can be analyzed to identify
areas of weakness and provide personalized learning experiences.

• Data science is also being used to improve transportation systems.

Real-time traffic data can be used to optimize traffic flow and reduce
congestion.

• Below is an example of an entire process of solving a real world

problem using Data Science
Problem

You are the Senior Data Scientist at a major private bank. Since the last
6 months, the number of customers who are not able to repay their
loan has increased. Keeping this in mind, you have to look at your
customer data and analyze which customers should be given the loan
approval and which customers should be denied.
Tasks to be performed
• Domain: Banking
• Programming language: Python
• Of note is that Python is the most widely used programming language
in data science.

1. Data collection

• The first step in applying data science to loan default prediction is to

collect relevant data.
• the relevant data is primarily information about the borrower such as their
gender, income, employment history and other financial details as shown
below. Data was obtained from the bank in line with our problem

• The structure of our dataset is a DataFrame. A DataFrame is a two-

dimensional data structure composed of rows and columns. It is a
fundamental data structure for data manipulation and analysis in Python
• The dataset has 12 independent variables and 1 target variable (i.e.
Loan_Status).
2. Data Cleaning
• Once the data has been collected, it is important to clean and preprocess
it. Data preprocessing is a process of preparing the raw data and making it
suitable for a machine learning model

• Often this is the lengthiest task. Without it, you’ll likely fall victim to
garbage-in, garbage-out.

• This task involves removing any duplicates, correcting errors, filling in

missing values, and transforming the data into a format that can be easily
analyzed. For example, you might convert string values that store numbers
to numeric values so that you can perform mathematical operations.
3. Feature Engineering
• Feature engineering involves creating new variables or features that can
be used to improve the accuracy of the loan default prediction model.
• Based on the domain knowledge, we can come up with new features that
might affect the Loan_Status variable. We will create the following three
new features:

a) Total Income – By combining the Applicant Income and Coapplicant

Income. If the total income is high, chances of loan approval might also
be high.
b) Equated Monthly Installment (EMI) – EMI is the monthly amount to be
paid by the applicant to repay the loan. Idea behind making this
variable is that people who have high EMI might find it difficult to pay
back the loan. We calculated the EMI by taking the ratio of loan amount
with respect to loan amount term.
c) Balance Income - This is the income left after the EMI has been
paid. Idea behind creating this variable is that if this value is high,
the chances are high that a person will repay the loan and hence
increasing the chances of loan approval.

Let us now drop the variables which we used to create these new
features. Reason for doing this is, the correlation between those old
features and these new features will be very high and this may result in
a noisy dataset, so removing correlated features will help in reducing
the noise.
Checking the dataset after feature
engineering
4. Model selection
• After feature engineering, the next step is to select a predictive model.
Different classification models such as LightGBM, Decision Trees,
Random Forest, Support Vector Machine, Logistic regression, Neural
Network, or other machine learning algorithms can be used for this
purpose.

5. Model training
• The selected model is trained on the cleaned and preprocessed data.
• The model is iteratively adjusted and fine-tuned until it can accurately
predict loan defaults.
• At this stage, the dataset is split into a training set and test set.
• A common split ratio is 70-30, which means that 70% of the data is
used for training and 30% is used for testing.
6. Model evaluation
• Once your machine learning model is built (with your training data),
you need unseen data to test your model. This data is called testing
data, and you can use it to evaluate the performance and progress of
your algorithms' training and adjust or optimize it for improved results.
• This can be done by computing various evaluation metrics such as
accuracy, precision, recall, F1 score and so on.

7. Model deployment
• Once the model has been trained and evaluated, it can be deployed in
a real-world scenario to predict loan defaults.
• This may involve integrating the model into an existing loan processing
system or developing a new system specifically for loan default
prediction.
8. Continuous Improvement
• The final stage of the process is continuous improvement. It involves
monitoring the model's performance, updating the model as
required, improving the data quality, and integrating new data
sources. This stage ensures that the model continues to provide
accurate predictions over time
Questions

Mastering Python 100 Exercises With Solutions
No ratings yet
Mastering Python 100 Exercises With Solutions
32 pages
TensorFlow Tutorial For Beginners (Article) - DataCamp PDF
No ratings yet
TensorFlow Tutorial For Beginners (Article) - DataCamp PDF
60 pages
Darktrace PLC
No ratings yet
Darktrace PLC
108 pages
Machine Learning
100% (1)
Machine Learning
8 pages
08 Advanced Digital Compositing
No ratings yet
08 Advanced Digital Compositing
154 pages
ADMET Predictor Manual
No ratings yet
ADMET Predictor Manual
798 pages
Q Learning Ejemplo
100% (1)
Q Learning Ejemplo
11 pages
Little Book of Deep Learning
100% (1)
Little Book of Deep Learning
158 pages
Quantitative Economics With Python
No ratings yet
Quantitative Economics With Python
543 pages
Accelerate Computing Vision and Image Processing Using VPI 1.1 by Rodolfo Lima
No ratings yet
Accelerate Computing Vision and Image Processing Using VPI 1.1 by Rodolfo Lima
23 pages
Chapter 7 - Regression Analysis
100% (1)
Chapter 7 - Regression Analysis
111 pages
Early Forest Fire Detection Paper
No ratings yet
Early Forest Fire Detection Paper
7 pages
How Is Bigdata Handled in Kaggle?: 17Cp006-Leenanci Parmar 17CP012-DHRUVI LAD
No ratings yet
How Is Bigdata Handled in Kaggle?: 17Cp006-Leenanci Parmar 17CP012-DHRUVI LAD
18 pages
Deep Learning (MODULE-3)
No ratings yet
Deep Learning (MODULE-3)
85 pages
LA Nutshell
No ratings yet
LA Nutshell
239 pages
The Desteni Newsletter - 2011
No ratings yet
The Desteni Newsletter - 2011
33 pages
Artificial Intelligence (AI) / Machine Learning (ML) : Limited Seats Only
67% (3)
Artificial Intelligence (AI) / Machine Learning (ML) : Limited Seats Only
2 pages
Data Mining Lab Manual
100% (1)
Data Mining Lab Manual
41 pages
Large-Scale Deep Reinforcement Learning
No ratings yet
Large-Scale Deep Reinforcement Learning
6 pages
2 DNN-CNN-RNN
100% (1)
2 DNN-CNN-RNN
87 pages
Peter Shirley - Ray Tracing in One Weekend (2016)
No ratings yet
Peter Shirley - Ray Tracing in One Weekend (2016)
38 pages
Machine Learning Regression
No ratings yet
Machine Learning Regression
64 pages
A Survey On Vision Transformer
No ratings yet
A Survey On Vision Transformer
24 pages
Deep Learning Nanodegree Syllabus 8-15
No ratings yet
Deep Learning Nanodegree Syllabus 8-15
15 pages
Adaline/Madaline:Applications
100% (1)
Adaline/Madaline:Applications
25 pages
Active Statistics (Andrew Gelman, Aki Vehtari) (Z-Library)
No ratings yet
Active Statistics (Andrew Gelman, Aki Vehtari) (Z-Library)
370 pages
Theory and Novel Applications of Machine Learning
No ratings yet
Theory and Novel Applications of Machine Learning
386 pages
Nit Recruitment 2013 PDF
No ratings yet
Nit Recruitment 2013 PDF
15 pages
Computer Vision For The Web - Sample Chapter
No ratings yet
Computer Vision For The Web - Sample Chapter
19 pages
Mitchell Machine Learning
No ratings yet
Mitchell Machine Learning
37 pages
Blackmagic Live Video Processing With OpenCV
No ratings yet
Blackmagic Live Video Processing With OpenCV
19 pages
Quantitative Economics With Python PDF
No ratings yet
Quantitative Economics With Python PDF
945 pages
Jupyter Installation
100% (1)
Jupyter Installation
19 pages
A Hands-On Introduction To Data Science
No ratings yet
A Hands-On Introduction To Data Science
2 pages
Sparse Coding and Dictionary Learning
No ratings yet
Sparse Coding and Dictionary Learning
40 pages
Project
No ratings yet
Project
10 pages
Towards Geometric Deep Learning I - On The Shoulders of Giants
No ratings yet
Towards Geometric Deep Learning I - On The Shoulders of Giants
13 pages
Btech CSE
No ratings yet
Btech CSE
17 pages
Depth Prediction Single Image
No ratings yet
Depth Prediction Single Image
8 pages
The Hundred Page Machine Learning Book
No ratings yet
The Hundred Page Machine Learning Book
7 pages
AI Driven Companies in Egypt
No ratings yet
AI Driven Companies in Egypt
16 pages
Answers All 2007
0% (1)
Answers All 2007
64 pages
State Oriented Programming
No ratings yet
State Oriented Programming
32 pages
C OMBINATORIAL M ODELS OF C OMPLEX S YSTEMSTesis Doctorado Eng
No ratings yet
C OMBINATORIAL M ODELS OF C OMPLEX S YSTEMSTesis Doctorado Eng
194 pages
Deep Learning Tutorial Release 0.1
No ratings yet
Deep Learning Tutorial Release 0.1
173 pages
Introduction To Neural Networks Using Matlab 6 0 S N Sivanandam Sumathi Deepa
0% (1)
Introduction To Neural Networks Using Matlab 6 0 S N Sivanandam Sumathi Deepa
4 pages
The Econometric Analysis of Transition Data - Tony Lancaster
No ratings yet
The Econometric Analysis of Transition Data - Tony Lancaster
374 pages
UNIT-I - Introduction To Computer Vision
No ratings yet
UNIT-I - Introduction To Computer Vision
45 pages
Full Syllabus of Calicut University (2004) Information Technology (IT)
No ratings yet
Full Syllabus of Calicut University (2004) Information Technology (IT)
191 pages
3D U-Net Based Brain Tumor Segmentation
No ratings yet
3D U-Net Based Brain Tumor Segmentation
11 pages
The MathWorks, Inc. - MATLAB Global Optimization Toolbox™ User's Guide (2020, The MathWorks, Inc.)
No ratings yet
The MathWorks, Inc. - MATLAB Global Optimization Toolbox™ User's Guide (2020, The MathWorks, Inc.)
878 pages
Executive Summary of AI and ET
No ratings yet
Executive Summary of AI and ET
154 pages
11.feature Selection, Extraction
No ratings yet
11.feature Selection, Extraction
38 pages
Lecture 01 (Introduction To Pattern Recognition)
No ratings yet
Lecture 01 (Introduction To Pattern Recognition)
26 pages
Data Mining Overview
No ratings yet
Data Mining Overview
14 pages
Neural
No ratings yet
Neural
35 pages
Diagnostic Report 2024 Book 1
No ratings yet
Diagnostic Report 2024 Book 1
287 pages
Deep Learning Kathi
No ratings yet
Deep Learning Kathi
18 pages
Oe Cae 3
No ratings yet
Oe Cae 3
7 pages
Arpit Pal E2 17 Report Loan-Prediction-System
No ratings yet
Arpit Pal E2 17 Report Loan-Prediction-System
34 pages
General Idea of Iterative Models-Spiral Model
No ratings yet
General Idea of Iterative Models-Spiral Model
30 pages
Career Plans For Next 2 Years
No ratings yet
Career Plans For Next 2 Years
11 pages
Data Science Learning Path For 50 Days
No ratings yet
Data Science Learning Path For 50 Days
15 pages
WEKA Manual For Version 3-6-5
No ratings yet
WEKA Manual For Version 3-6-5
303 pages
Semantic Kernel Source Guided Tour
No ratings yet
Semantic Kernel Source Guided Tour
111 pages
DWDM - Unit - VI
No ratings yet
DWDM - Unit - VI
38 pages
Social Psychology Trends
No ratings yet
Social Psychology Trends
5 pages
Cse - Ai & ML
No ratings yet
Cse - Ai & ML
44 pages
Adaptive Neuro-Fuzzy Inference
No ratings yet
Adaptive Neuro-Fuzzy Inference
13 pages
Customer Service 2024 Leadership Vision
No ratings yet
Customer Service 2024 Leadership Vision
16 pages
The Math Behind Convolutional Neural Networks - Towards Data Science
No ratings yet
The Math Behind Convolutional Neural Networks - Towards Data Science
37 pages
# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD
No ratings yet
# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD
4 pages
Machine Learning Basics Dl2 RK
No ratings yet
Machine Learning Basics Dl2 RK
16 pages
AI Trailblazing Webinar Takeaways Report
No ratings yet
AI Trailblazing Webinar Takeaways Report
24 pages
Boe-Bot Lab Report
No ratings yet
Boe-Bot Lab Report
10 pages
POSTER Classification of Fruits and Detection of Disease Using CNN
No ratings yet
POSTER Classification of Fruits and Detection of Disease Using CNN
1 page
Ai - Traffic Analyser: Domain: Artificial Intelligence Technology: Python
No ratings yet
Ai - Traffic Analyser: Domain: Artificial Intelligence Technology: Python
5 pages
Unit 3 KNC 405
No ratings yet
Unit 3 KNC 405
6 pages
Final
No ratings yet
Final
17 pages
Future-Of-Digital-Banking-In-2030-Cba - PD (3) .PDF - 20241217 - 002817 - 0000
No ratings yet
Future-Of-Digital-Banking-In-2030-Cba - PD (3) .PDF - 20241217 - 002817 - 0000
16 pages
PDF Modelling and Simulation in Management Sciences: Proceedings of The International Conference On Modelling and Simulation in Management Sciences (MS-18) Joan Carles Ferrer-Comalat Download
No ratings yet
PDF Modelling and Simulation in Management Sciences: Proceedings of The International Conference On Modelling and Simulation in Management Sciences (MS-18) Joan Carles Ferrer-Comalat Download
65 pages
Syllabus
No ratings yet
Syllabus
12 pages
MSDS Sample
No ratings yet
MSDS Sample
3 pages
Revolutionizing Digital Currency Exchange and Business Intergration
No ratings yet
Revolutionizing Digital Currency Exchange and Business Intergration
9 pages
Portfolio - Yashica Jain
No ratings yet
Portfolio - Yashica Jain
4 pages
Growth Trajectories: A Comparative Analysis of Nvidia Corporation and Cisco Systems in Technological Advancements and Market Dynamics
No ratings yet
Growth Trajectories: A Comparative Analysis of Nvidia Corporation and Cisco Systems in Technological Advancements and Market Dynamics
17 pages
Class Notes Dip
No ratings yet
Class Notes Dip
3 pages
Csa4005 Expert-Systems-And-Fuzzy-Logic LT 1.0 6 Csa4005
No ratings yet
Csa4005 Expert-Systems-And-Fuzzy-Logic LT 1.0 6 Csa4005
2 pages
Joydeep Paul 23yrs LE
No ratings yet
Joydeep Paul 23yrs LE
2 pages
CV Tanish Khandelwal
No ratings yet
CV Tanish Khandelwal
1 page

Data Science Real World Applications

Uploaded by

Data Science Real World Applications

Uploaded by

Data Science: Real World Applications.

• Data Science has become an essential tool for businesses and

• As the amount of data being generated continues to grow, the

• Data science combines elements of mathematics, statistics,

• A dataset is a collection of data.

• Machine learning is a subset of artificial intelligence that involves

• In other words, it is a way of teaching computers to learn from data,

• One of the most important applications of data science is in the healthcare

• Data science is also being used to improve transportation systems.

• Below is an example of an entire process of solving a real world

• The first step in applying data science to loan default prediction is to

• The structure of our dataset is a DataFrame. A DataFrame is a two-

• This task involves removing any duplicates, correcting errors, filling in

a) Total Income – By combining the Applicant Income and Coapplicant

You might also like