Data Science Real World Applications
Data Science Real World Applications
• Data science is also widely used in the finance industry. Financial institutions
use data science to detect fraudulent transactions and prevent financial
losses.
• In the education sector, data science can be used to improve student
outcomes. Student performance data can be analyzed to identify
areas of weakness and provide personalized learning experiences.
You are the Senior Data Scientist at a major private bank. Since the last
6 months, the number of customers who are not able to repay their
loan has increased. Keeping this in mind, you have to look at your
customer data and analyze which customers should be given the loan
approval and which customers should be denied.
Tasks to be performed
• Domain: Banking
• Programming language: Python
• Of note is that Python is the most widely used programming language
in data science.
1. Data collection
• Often this is the lengthiest task. Without it, you’ll likely fall victim to
garbage-in, garbage-out.
Let us now drop the variables which we used to create these new
features. Reason for doing this is, the correlation between those old
features and these new features will be very high and this may result in
a noisy dataset, so removing correlated features will help in reducing
the noise.
Checking the dataset after feature
engineering
4. Model selection
• After feature engineering, the next step is to select a predictive model.
Different classification models such as LightGBM, Decision Trees,
Random Forest, Support Vector Machine, Logistic regression, Neural
Network, or other machine learning algorithms can be used for this
purpose.
5. Model training
• The selected model is trained on the cleaned and preprocessed data.
• The model is iteratively adjusted and fine-tuned until it can accurately
predict loan defaults.
• At this stage, the dataset is split into a training set and test set.
• A common split ratio is 70-30, which means that 70% of the data is
used for training and 30% is used for testing.
6. Model evaluation
• Once your machine learning model is built (with your training data),
you need unseen data to test your model. This data is called testing
data, and you can use it to evaluate the performance and progress of
your algorithms' training and adjust or optimize it for improved results.
• This can be done by computing various evaluation metrics such as
accuracy, precision, recall, F1 score and so on.
7. Model deployment
• Once the model has been trained and evaluated, it can be deployed in
a real-world scenario to predict loan defaults.
• This may involve integrating the model into an existing loan processing
system or developing a new system specifically for loan default
prediction.
8. Continuous Improvement
• The final stage of the process is continuous improvement. It involves
monitoring the model's performance, updating the model as
required, improving the data quality, and integrating new data
sources. This stage ensures that the model continues to provide
accurate predictions over time
Questions