0% found this document useful (0 votes)
19 views2 pages

Python Developer Assignment

Company Assignment

Uploaded by

shivangi jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views2 pages

Python Developer Assignment

Company Assignment

Uploaded by

shivangi jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Python Pandas Assignment

Datasets
Consider the following datasets
Transactions: https://websdk-assets.s3.ap-south-
1.amazonaws.com/public/txns+(13).csv
This dataset represents “loans” booked. “transactionId” is a uniqueId for a
loan. “day” is the date on which the loan was confirmed.
“downPaymentAmount” is the amount given by the customer upfront for the
loan (in paise). “emiPrincipal” is the principal component of emi that the user
is expected to pay per month (in paise). “emiInterest” is the interest
component of the emi that the user is expected to pay per month (in paise).
“tenure” is the loan tenure in months (number of emis).
Repayments:https://websdk-assets.s3.ap-south-
1.amazonaws.com/public/repayments+(2).csv
This dataset represents the payments made by customers towards their
loans (inlcuding down payments and emi payments). “transactionId”
represents the loan towards which payment was made. "repaymentDay”
represents date on which the payment was made. “amount” is the amount
paid towards the loan.

Queries
1. For every loan what is the current outstanding (unpaid) principal amount (in
rupees). Expected output columns “transactionId”, “outstandingAmount’.

2. The sum of outstandingAmount in the above data is termed as AUM. What


is the AUM as of a specific date in the past (the date should be an input of
format YYYY-MM-DD)? Expected output: decimal number

3. Calculate portfolio performance: For every cohort (transactions [loans]


done in the same month will be of the same cohort), calculate the
distribution of amounts in different DPD buckets. DPD refers to the number
of days past due date. Due dates can be calculated basis the loan
confirmation date (”day” column in transactions dataset) by adding 30 days

Python Pandas Assignment 1


to it. The subsequent EMIs will also be 30 days apart. Thus each emi of
each loan can be categorised into different DPD buckets basis their DPD
value (described below). For every cohort calculate the outstanding amount
(in rupees) in different DPD buckets. Expected output columns: “month”,
“amount_not_due”, “amount_dpd_0_30”, “amount_dpd_30_60”,
“amount_dpd_60_90”, “amount_dpd_90_above”. The DPD buckets are as
described below

a. if DPD < 0: “Not due”

b. if DPD between 0-30: “DPD 0-30”

c. if DPD between 30-60: “DPD 30-60”

d. if DPD between 60-90: “DPD 60-90”

e. if DPD ≥ 90: “DPD 90+”

Submission instructions
The submission should be a python notebook file (ipynb) along with the
requirements as requirements.txt or as a Pipfile. Running all cells should:

1. Fetch the datasets from the URLs provided as above

a. The variables holding the URLs should be declared in the first cell so
that it is convenient to replace them with other datasets that we want to
test your code with. The code should automatically fetch through
internet / local filesystem basis the URL protocol used (https/http/file).

2. render pandas dataframes for all the queries above (no need to write to
CSVs etc.).

Python Pandas Assignment 2

You might also like