(2018) Data Analysis of Consumer Complaints in Banking

Proceedings of the Second International Conference on Computing Methodologies and Communication (ICCMC 2018)
IEEE Conference Record # 42656; IEEE Xplore ISBN:978-1-5386-3452-3
Data Analysis of Consumer Complaints in Banking

Industry using Hybrid Clustering
Govinda.K2 Somula Ramasubbareddy3
Surbhit Chugani1 SCOPE, VIT, Vellore, India
SCOPE, VIT, Vellore, India
SCOPE, VIT, Vellore, India [email protected] [email protected]
[email protected]
Abstract—This paper focus on exploring and analyzing alternative clusters. The mining techniques' goal is to detect the
Consumer Finance Complaints data, to find how many similar intrinsic grouping of a data set. In hierarchical clustering, a
complaints are there in relation to the same bank or service or treelike cluster structure (dendrogram) is created through
product. These datasets fall under the complaints of Credit recursive partitioning (divisive methods) or combining
reporting, Mortgage, Debt Collection, Consumer Loan and
(agglomerative) of existing clusters, whereas in k-means
Banking Accounting. By using data mining techniques, cluster
analysis as well as predictive modeling is applied to obtain clustering divides a cluster of k points with reference to a
valuable information about complaints in certain regions of the centroid, which helps if we are aware of the data points that are
Country. The banks that are receiving customer complaints filed probable and output relevant. We hope to find a correlation
against them will analyse the complaint data to provide results on between complaints, companies and consumers to refine
where the most complaints are being filed, what products/ company applications to better accommodate consumer needs
services are producing the most complaints and other useful using a hybrid approach of hierarchal and k-means clustering.
data. Our model will assist banks in identifying the location and
types of errors for resolution, leading to increased customer
satisfaction to drive revenue and profitability. II. LITERATURE REVIEW
Keywords—Consumer, Complaint, analysis, clustering, The number of studies has been conducted regarding the
predictive. services to customers and their awareness. As such, we have
I. INTRODUCTION reviewed some of them.
As we are aware that in today’s modern era people are Kamakodi (2007) concluded that modern day generation is
more into business, so receiving a complaint from a consumer influenced by the computation features used by banks and so
happens almost every day. A consumer’s complaints present the banks study about factors influencing their preferences.
bank or reporting agency with an opportunity to identify and Residence relocation, salary fluctuation and unavailability
rectify specific problems with their current product or service. banking based services are reasons enough to change bank.
Service complaints management is a critical part of business
management. A good complaint-management strategy will Uppal and Kaur(2007) determined how consumer's
result in best customer relationship outcome with minimal awareness of web domains used by banks and gave some
human-resource investment and so hope to find a correlation measures to make these applications more successful. They
between complaints, companies, and consumers to refine concluded that the limitation about today's web domain
company applications to better accommodate consumer needs. application is spreading the awareness about the varied features
Increasingly companies are recognizing the value of a offered.
customer complaint in that it is feedback on their experience,
and an opportunity to not only resolve a problem for that Mishra and Jain (2007) took up dimensions of consumer
particular customer but perhaps also for a much larger number satisfaction in national and private banks. The study talks about
of customers and that leads to inevitable amounts of data that how satisfaction is the foremost asset to the organization,
has to be analyzed and specific functions are used to aggregate which provides unmatched competitive edge that helps
the analysis results. achieving loyalty of a customer. They also spoke how high
level of customer satisfaction leads to loyalty. The study
Clustering is regarded as a crucial unsupervised learning observed ten factors and five areas of satisfaction for both
problem, that tries to search for similar structures among an national and private sector bank.
unlabeled data set .These similar structure are data sets, usually Jain and Jain (2006) demonstrated that the banking sector,
referred to as clusters. the information within every cluster is both private and public have suffered radical as well as
comparable (or close) to components within its cluster, and is revolutionary changed due to the liberalization act of 1991.
dissimilar to (or additional from) parts that belong to Retail banking is the consumer preferred choice which
978-1-5386-3452-3/18/$31.00 ©2018 IEEE 74

articulates itself responses received from 200 customers of A distance function dist(c1,c2)
HDFC bank, ICICI bank and some other banks in the city of
for i=1 to n
Varanasi, Uttar Pradesh and he looked upon the schemes
offered by the banks, quantized satisfaction in different types ci = {xi}
of services, expectations about these schemes and the height of
segmentation among the services offered. end for
C={c1,...,cn}
Singh (2006) discusses CRM approaches in various banks.
He emphasized on how the management targets customers in l=n+1
order to gain insight and gives out value added services and while c.size >1 do
products. Web as provided a smooth user experience, giving
access to the various features used by the customers thereby - (cmin1,cmin2) = minimum dist(ci,cj) for all ci,cj in c
achieving customer satisfaction. Management has to strive to - remove cmin1 and cmin2 from c
ensure end to end delivery and ensure customer satisfaction
which is essential to the banks in terms of maintaining high - add{cmin1, cmin2} to c
regards and loyalty obtained from customers. - l = l+1
Bhaskar (2004) computed that expansion of banking is end while

directly proportional to the quality of services provided by the B. K-Clustering
banks and satisfaction is regarded highly as customers
feedback is the only thing to lean on, when it comes to the In k-clustering the set of objects is divided to a certain number
highly competitive banking industry. Arguably, India's (k) clusters. We can distinguish different approaches from
banking industry is highly thriving and depends heavily on different points of view. The first classification is for hard and
customer morale and loyalty. fuzzy clustering. In the first one, an object is assigned exactly
to one cluster. The result is a membership matrix for objects
Furthermore, Hasanbanu (2004) stated how the rural and clusters with ones (the object is assigned to the cluster) and
India is unaware about various schemes and benefits offered zeroes (the object is not assigned to the cluster). In the second
by the banks in order to ensure financial welfare. The majority approach membership degrees are calculated for all cluster-
of rural population is inaccessible to the web domain services object pairs. Moreover, some other approaches to expressing
of the banks and continue to prefer local moneylenders uncertainty in cluster analysis have been proposed.
charging ligh interest rates. The study was conclusive and
based on the data provided by the RBI, however, it is based on
the questionnaire and surveys. 1. Initialize cluster centroids µ1,µ2,.....µk € Rn randomly.
Although Singh (2004) spurred about the reality of banks 2. Repeat until convergence: {
in terms of providing customer support and found out that the
customers are influenced by the banks location and the For every i, set () () 2
minutest detail of the banking details including the banking
interest rates as well as attitudes and customer support
∑ () ()
provided by the personnel.
For each j, set
∑
III. METHODOLOGY AND PROCEDURE
C. Multi-linear Regression
A. Hierarchical Clustering
As a predictive analysis, the multiple linear regression is
Probably the most applied method in economy is
used to explain the relationship between one continuous
agglomerative hierarchical cluster analysis. It is based on a
dependent variable and two or more independent variables.
proximity matrix which includes the similarity evaluation for
The independent variables can be continuous or categorical .
all pairs of objects. It means that various similarity or
dissimilarity measures for different types of variables
(quantitative, qualitative and binary)can be used. Moreover,
different approaches for evaluation of the cluster similarity
(single linkage, complete linkage, average linkage, Ward’s Relevant to understand the correlation between our variables
method, etc.) can also be applied. and against the single response
D. Outlier Analysis
Given:
In data mining, anomaly detection (also outlier detection) is
A set X of objects{x1,x2,.....xn}
the identification of items, events or observations which do
978-1-5386-3452-3/18/$31.00 ©2018 IEEE 75

not conform to an expected pattern or other items in a dataset.

Instead, a cluster analysis algorithm may be able to detect the
micro clusters formed by these patterns.
On top of all this, we’ve also performed eclat and apriori

algorithm as well as Topic modeling.
Figure 1. Clustering of frequently appeared words
Each week the Consumer Financial Protection Bureau
sends thousands of consumers’ complaints about financial
products and services to companies for response. Those
complaints are then compiled into a large dataset. The data
specifically contains complaint information from Americans
who have general debts such as student loans, mortgages,
credit cards, consumer loans, and etc. Learning about this data
set we have set out to analyze it and find patterns that help
understand the finance complaints characteristics. Excel and
Jupyter are the tools we plan on using to explore the data.
Excel will be used to visually explore the data and to determine
what parts of the data are going to be most useful. Figure 2. Clustering Dendrogram of the most frequent
words with marked clusters
Then Jupyter will be used to write code in R to clean,
reduce and draw preliminary relationships in the data. We will Reflecting on our cluster modeling results, we evaluated the
clean the data by auto filling certain blank columns of each performance of our model. Our clusters were graphed to depict
observation. Then we will remove certain columns that had the relationship between the various features against the known
irrelevant information. Then we remove observations that are tags, we focus on the top five companies in our data set.
missing a value in a crucial attribute column. We will stick to However, the types of these five companies are different. Three
the basic outline of pre-processing data. Removal of of the companies are banks: Bank of America, Wells fargo, and
observations rather than auto-filling indices will be more JP morgan. The other two companies are consumer credit
preferable so that the dataset’s size will be reduced. Then we reporting agency: Experian and Equifax. Therefore, as can be
will continue on to performing modeling on the data in such as seen in Figure3, the main product complaints for Bank of
way as to reveal to us any pattern or correlation that can help America, Wells fargo, and JPmorgan have to do with
solve or isolate certain complaints. mortgage, credit cards, Bank accounts or service. While the
main product complaints of Experian and Equifax have to do
IV. RESULTS with credit reporting.
By performing Hierarchical Clustering and K- Means
clustering, we got a better insight by having 5 clusters. Figure1
and Figure2 shows clusters in having the highest density while
the rest of the have a lower density in comparison. The plots
below show us the results for our clustering model. Cluster 2 &
3 have 5 elements in their cluster while clusters 1,4& 5 have
significantly more elements in their clusters.
978-1-5386-3452-3/18/$31.00 ©2018 IEEE 76

that were caused, were by Bank of America, Wells Fargo &

Company and JP Morgan Chase & Co.
Figure 3.Products of Companies

Figure 5. Company’s data with the most records
Figure4 showsthe most frequent issue that Experian and
Equifax have is Incorrect information on credit report and the Next, taking a closer look at each company’s main issue in
most frequent issue that Bank of America, Wells fargo, and their products, which can be seen in Figure6. Experian and
JPmorgan receive is Loan modification, collection, and Equifax service with the highest amount of complaints issue in
foreclosure. their main product of credit reporting were incorrect
information of credit reporting. Credit reports contain
customer’s personal information, creditor information, lines of
credit, and credit inquiries. We found that credit reporting
agencies will list misinformation on your credit report that is
harmful to customer’s credit, because they have incorrect
information. In fact, the highest recorded complaint against
credit reporting was Incorrect Information on Credit Report at
77%.
Figure 4. Companies with the number of records
Bank of America, Wells Fargo, and JPmorgan received

larger complain than Experian and Equifax as shown in Figure 6. The top five issues in the product of credit reporting
Figure5. Therefore, it can explain why our data mining result in
that 80% of the Products data consist of only Mortgage, Credit
Reporting and Debt Collection and that 80% of the problems
978-1-5386-3452-3/18/$31.00 ©2018 IEEE 77

CONCLUSION
The results show what problems customers are having with

specific problems in particular regions of the country. This
valuable information will show where companies will need to
invest in to improve their overall performance in the view of
their customer. This will lead to improved customer
satisfaction. By maximizing customer satisfaction, you can
increase the opportunity for repeat sales to customers.
Customer satisfaction also helps to increase customer loyalty,
reducing the need to allocate marketing budget to acquire new
customers. Satisfied customers may also recommend your
products or services to other potential customers, increasing the
potential for additional revenue and profit. Future research can
be done to collect more complaint data so that we can perform
analysis for other products, services and companies.
REFERENCES
[1] Goyal S, Thakur KS (2008). A Study of Customer Satisfaction Public
and Private Sector Banks of India Punjab, J. Bus. Stud., 3(2): 121- 127.
[2] Uppal RK (2007). Customer Service in Banks- An Empirical Study’,
Bankers Conference Proceedings, pp. 36-42.
[3] Kamakodi N (2007). Customer Preferences on e-Banking Services-
Understanding through a Sample Survey of Customers of Present Day
Banks in India Contributors, Banknet Publications, 4: 30-43.
[4] Mishra JK, Jain M (2007). Constituent Dimensions of Customer
Satisfaction: A Study of Nationalized and Private Banks Prajnan, 35(4):
390-398.
[5] Jain AK, Jain P (2006). Customer Satisfaction in Retail Banking
Services NICE, J. Bus. Stud., 1(2):95-102.
[6] Singh SB (2006). Customer Management in Banks Vinimaya, 37(3): 31-
35.
[7] Bhaskar PV (2004). Customer Service in Banks IBA Bulletin, 36(8): 9-
13.
[8] Hasanbanu S (2004). Customer Service in Rural Banks: An Analytical
Study of Attitude of Different types of Customers towards Banking
Services IBA Bulletin, 36(8): 21-25.
[9] Singh S (2004). An Appraisal of Customer Service of Public Sector
Banks IBA Bulletin, 36(8): 30-33.
[10] Shankar AG (2004). Customer Service in Banks IBA Bulletin, 36(8): 5-
7.
[11] Ganesh C, Varghese ME (2003). Customer Service in Banks: An
Empirical Study’.Vinimaya, 36(2): 14-26.
978-1-5386-3452-3/18/$31.00 ©2018 IEEE 78

(2018) Data Analysis of Consumer Complaints in Banking

Uploaded by

Copyright:

Available Formats

(2018) Data Analysis of Consumer Complaints in Banking

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(2018) Data Analysis of Consumer Complaints in Banking

Uploaded by

Copyright:

Available Formats

Proceedings of the Second International Conference on Computing Methodologies and Communication (ICCMC 2018)

IEEE Conference Record # 42656; IEEE Xplore ISBN:978-1-5386-3452-3

Data Analysis of Consumer Complaints in Banking

978-1-5386-3452-3/18/$31.00 ©2018 IEEE 74

Bhaskar (2004) computed that expansion of banking is end while

978-1-5386-3452-3/18/$31.00 ©2018 IEEE 75

not conform to an expected pattern or other items in a dataset.

On top of all this, we’ve also performed eclat and apriori

978-1-5386-3452-3/18/$31.00 ©2018 IEEE 76

that were caused, were by Bank of America, Wells Fargo &

Figure 3.Products of Companies

Figure 4. Companies with the number of records

Bank of America, Wells Fargo, and JPmorgan received

978-1-5386-3452-3/18/$31.00 ©2018 IEEE 77

The results show what problems customers are having with

978-1-5386-3452-3/18/$31.00 ©2018 IEEE 78

You might also like