0% found this document useful (0 votes)
30 views5 pages

DS Report 1

This document discusses the initial steps in analyzing the Australian Credit Dataset to develop a predictive model for loan approval. It contains descriptions of 20 variables for 1000 loan applicants and whether they presented a good or bad credit risk. The solution reviews the roles of predictor variables in credit decisions, performs descriptive statistics and data visualization on the dataset, and develops six hypotheses about factors that could influence loan approval outcomes, such as applicants with a good account status, savings account, full employment, real estate background, proper credit history being more likely to be approved, and younger applicants being less likely to be approved.

Uploaded by

05Bala Saatvik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views5 pages

DS Report 1

This document discusses the initial steps in analyzing the Australian Credit Dataset to develop a predictive model for loan approval. It contains descriptions of 20 variables for 1000 loan applicants and whether they presented a good or bad credit risk. The solution reviews the roles of predictor variables in credit decisions, performs descriptive statistics and data visualization on the dataset, and develops six hypotheses about factors that could influence loan approval outcomes, such as applicants with a good account status, savings account, full employment, real estate background, proper credit history being more likely to be approved, and younger applicants being less likely to be approved.

Uploaded by

05Bala Saatvik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

71762108005 21AD46

COIMBATORE INSTITUTE OF TECHNOLOGY


COIMBATORE – 641014

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE


FOUNDATIONS IN DATA SCIENCE
TUTORIAL – 1

SUBMITTED BY

T. BALA SAATVIK

71762108005

1
71762108005 21AD46

DESCRIPTION:

The Australian Credit Dataset consists of 20 variables that describe the demographic and
socio-economic characteristics of 1000 loan applicants and one outcome variable that
indicates whether the applicants are a “good credit risk” (i.e. likely to repay the loan) or a
“bad credit risk” (i.e. unlikely to repay the loan). A predictive model, developed based on
this dataset, is expected to provide guidance for a bank manager to decide whether to
approve a loan based on the profile of a loan applicant.

PROBLEM STATEMENT:

In Phase 1 of the Data Analytics Lifecycle, a data science team learns the business
domain, assesses the resources available and formulates initial hypotheses (IHs) to test
and begin learning the data. The file AUS_CREDIT.xlsx contains two spreadsheets: one
containing a dataset of 21 variables from 1000 loan applicants and one containing
descriptions of all variables in the dataset. Use the following steps to formulate appropriate
hypotheses that can be tested with the given dataset:

1. Review and discuss the roles of the predictor variables in a credit decision.

2. Develop a set of hypotheses based on your discussion in Question 1.

SOLUTION:

i) Data Entry & Descriptive Statistical Analysis:

2
71762108005 21AD46

OUTPUT:

ii) Data Visualization:

OUTPUT:

3
71762108005 21AD46

This is a heatmap generated with the help of Seaborn library to identify the
Predictor Variables from the aus_credit dataset.

From the heatmap, we can observe that:

The MAXIMUM POSITIVE Creditability score found using the heatmap would be
given higher preference as a Predictor Variable.

The attributes: Credit_duration, Credit_purpose and Credit_amount has MAXIMUM


NEGATIVE Creditability score. So, the correlation of these attributes with the Creditability
is very low and hence they cannot be considered as Predictor Variables.

REVIEW & ROLE OF PREDICTOR VARIABLES IN CREDIT DECISION:

Account_status: It refers to the current state of a financial account, such as a bank account
or credit card account.

Saving_account: It is a type of deposit account offered by banks and other financial


institutions that allows individuals to deposit and save money while earning interest on
their savings.

4
71762108005 21AD46

Employment_length: It indicates the stability of the applicant's income source and


employment history, which may affect their ability to repay the loan.

Real_estate: It refers to property consisting of land and the buildings, structures, or natural
resources on it. Real estate can be residential, commercial, or industrial in nature, and
may be used for a variety of purposes, such as housing, retail, office space,
manufacturing, or agriculture.

Credit history: It provides information about past credit behaviour, such as timely
payments, defaults, or bankruptcies, and can indicate the applicant's ability to pay back a
loan.

Age: It may affect the applicant's ability to repay the loan and may be correlated with other
variables employment status.

SET OF HYPOTHESIS:

H1: Applicants with a good Account Status are more likely to be approved for a loan.

H2: Applicants with a Savings Account are more likely to be approved for a loan.

H3: Applicants who are employed full-time are more likely to be approved for a loan.

H4: Applicants with a Real Estate background are more likely to be approved for a loan.

H5: Applicants with a proper Credit History are more likely to be approved for a loan.

H6: Younger applicants are less likely to be approved for a loan.

You might also like