计算机科学 ›› 2019, Vol. 46 ›› Issue (10): 7-13.doi: 10.11896/jsjkx.181102216

• 大数据与数据科学* • 上一篇    下一篇

基于堆栈降噪自编码网络的个人信用风险评估方法

杨德杰1, 章宁1, 袁戟2, 白璐1   

  1. (中央财经大学信息学院 北京100081)1
    (德国慕尼黑工业大学土木-地质-环境学院 慕尼黑80333)2
  • 收稿日期:2018-11-29 修回日期:2019-04-15 出版日期:2019-10-15 发布日期:2019-10-21
  • 通讯作者: 章宁(1975-),女,博士,教授,主要研究方向为金融科技、个人信息保护,E-mail:[email protected]
  • 作者简介:杨德杰(1987-),男,博士生,高级工程师,主要研究方向为机器学习、金融风控,E-mail:[email protected];袁戟(1985-),男,博士,助教,高级工程师,主要研究方向为贝叶斯反演分析、随机有限元方法等;白璐(1987-),男,博士,副教授,CCF会员,主要研究方向为机器学习、特征选择等。
  • 基金资助:
    本文受国家重点研发计划(2017YFB1400701),国家社会科学基金重点项目资助(13AXW010)资助。

Individual Credit Risk Assessment Based on Stacked Denoising Autoencoder Networks

YANG De-jie1, ZHANG Ning1, YUAN Ji2, BAI Lu1   

  1. (School of Information,Central University of Finance and Economics,Beijing 100081,China)1
    (College of Civil,Geo and Environmental Engineering,Technical University of Munich,Munich 80333,Germany)2
  • Received:2018-11-29 Revised:2019-04-15 Online:2019-10-15 Published:2019-10-21

摘要: 个人信用历来是银行衡量个人履约风险最重要的因素。近年来,随着我国借贷需求与日俱增,仅依据信用卡信息的传统个人信用评估方式,已不能完全满足银行业的发展需求。因此,为了构建更加丰富的用户信用画像,文中基于银行大数据提取信用风险评估特征。为了解决金融大数据带来的维度灾难和噪声问题,充分考虑了数据特征之间的相关性,对堆栈降噪自编码神经网络模型进行了改进,引入了截断的Karhunen-Loève展开作为噪声传入项,并在某商业银行的大数据平台上进行了一系列数据实验。实验结果显示:相比仅使用信用卡信息,利用银行大数据能使衡量正负样本分离度的指标——K-S值提升约11%;改进的堆栈降噪自编码神经网络方法具有更好的风险评估效果,准确率相比原模型提高了3%左右,验证了在银行大数据环境下进行信用风险评估的有效性。

关键词: 大数据, 堆栈降噪, 深度学习, 特征选择, 维度灾难, 信用风险评估

Abstract: Personal credit is the most important factor for banks to measure individual compliance risk.In recent years,with the increasing demand for borrowing in China,the traditional way of making credit evaluation,which is merely based on credit card transaction information,cannot fully meet the development needs of the banking industry.Therefore,this paper proposed to use the big data of personal consumption in bank as the important feature information to construct a richer user image.In order to overcome the dimensional curse and noise caused by the financial big data,a modified deep learning evaluation algorithm based on stacked denoising autoencoder neural network is proposed by considering the correlation of feature data and the truncated Karhunen-Loève expansion is applied as the noise input term,then a series of related data experiments are conducted on big data platform of a commercial bank.The experimental results show that,compared with the risk evaluation just based on credit card transaction information,the K-S value that measure the positive and negative sample resolution based on big data of bank improves 11%;the improved stack denoising autoencoder neural network method has better risk assessment results and the accuracy rate is increased by about 3% compared with the original model,thus validating the effectiveness of credit risk assessment in the big data environment of bank.

Key words: Big data, Credit risk assessment, Deep learning, Dimensional curse, Feature selection, Stacked denoising

中图分类号: 

  • TP181
[1]LESSMANN S,BAESENS B,SEOW H V,et al.Benchmarking State-of-theart Classification Algorithms for Credit Scoring:An Update of Research[J].European Journal of Operational Research,2015,247(1):124-136.
[2]VISHWAKARMA A C,SOLANKI R.Analysing Credit Risk using Statistical and Machine Learning Techniques[J].International Journal of Engineering Science and Computing,2018,8(6):18397-18404.
[3]JAYANTHI J,JOSEPH KS,VAISHNAVI J.Bankruptcy Prediction using SVM and Hybrid SVM Survey [J].International Journal of Computer Application,2011,33(7):39-45.
[4]FANG K N,ZHANG G J,ZHANG H Y.Individual Credit Risk Prediction Method:Application of a Lasso-logistic Model [J].The Journal of Quantitative & Technical Economics,2014,31(2):125-136.(in Chinese)
方匡南,章贵军,张慧颖.基于Lasso-logistic模型的个人信用风险预警方法[J].数量经济技术经济研究,2014,31(2):125-136.
[5]LIN W Y,HU Y H,TSAI C F.Machine Learning in Financial Crisis Prediction:A Survey[J].IEEE Transactions on Systems Man & Cybernetics Part C,2012,42(4):421-436.
[6]CHEN M Y,CHEN C C,LIU J Y.Credit Rating Analysis with Support Vector Machines and Artificial Bee Colony Algorithm[C]//Recent Trends in Applied Artificial Intelligence.Amsterdam:Springer,2013:528-534.
[7]HEATON J B,POLSON N G,WITTE J H.Deep Learning in Finance[J].Applied Stochastic Models in Business and Industry,2017,33(1):561-580.
[8]YU L,YANG Z B,TANG L.A Novel Multistage Deep Belief Network Based Extreme Learning Machine Ensemble Learning Paradigm for Credit Risk Assessment[J].Flexible Services & Manufacturing Journal,2016,28(4):576-592.
[9]SIRIGNANO J,SADHWANI A,GIESECKE K.Deep Learning for Mortgage Risk[J].Social Science Electronic Publishing,2017,22(6):134-216.
[10]SHIGEYUKI H,MINAMI K,TAKAHIRO K,et al.Ensemble Learning or Deep Learning? Application to Default Risk Analysis[J].Risk and Financial Management,2018,11(1):12-25.
[11]MA S L,WUNIRI Q G,LI X P.Deep Learning With Big Data:State of The Art and Development [J].CAAI Transactions on Intelligent Systems,2016,11(6):728-742.(in Chinese)
马世龙,乌尼日其其格,李小平.大数据与深度学习综述[J].智能系统学报,2016,11(6):728-742.
[12]LIU X H,DING W.Big Data Credit Reporting Practices of ZestFinance in The United States[J].Credit Reference,2015,22(8):27-32.(in Chinese)
刘新海,丁伟.美国ZestFinance公司大数据征信实践 [J].征信,2015,22(8):27-32.
[13]LECUN Y,BENGIO Y,HINTON G.Deep Learning [J].Nature,2015,521(7553):436-444.
[14]CUI L X,BAI L,HANCOCK E R,et al.Identifying the most informative features using a structurally interacting elastic net[J].Neurocomputing,2018,313(11):65-77.
[15]ADDO P M,GUEGAN D,HASSANI B.Credit Risk Analysis Machine and Deep Learning Models[J].Risks,2018,6(2):38-57.
[16]HINTON G E,SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science,2006,313(5786):504-507.
[17]VINCENT P,LAROCHELLE H,LAJOIE I,et al.Stacked Denosing Autoencoders:Learning Useful Representations in a Deep Network with aLocal Denoising Criterion [J].Journal Machine Learning Research,2010,27(11):3371-3408.
[18]SAGHA H,CUMMINS N,SCHULLER B.Stacked Denoising Autoencoders for Sentiment Analysis:A review[J].Data Mining and Knowledge Discovery,2017,7(5):132-146.
[19]ALHASSAN Z,MCGOUGH A,ALSHAMMARI R,et al. Stacked Denoising Autoencoders for Mortality Risk Prediction Using Imbalanced Clinical Data[C]//IEEE International Conference on Machine Learning and Applications.Orlando:IEEE Press,2018:396-401.
[20]VANMARCKE E H.Random Fields:Analysis and Synthesis [M].Cambridge:MIT Press,1983:92-101.
[21]YUAN J.Time-dependent Probabilistic Assessment of Rainfall-induced Slope Failure[D].Munich:Technical University of Munich,2016.
[22]BETZ W,PAPAIOANNOU I,STRAUB D.Numerical Methods for the Discretization of Random Fields by Means of the Karhunen-Loève Expansion[J].Computer Methods in Applied Mechanics and Engineering,2014,271(0):109-129.
[1] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[4] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[5] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[6] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[8] 李斌, 万源.
基于相似度矩阵学习和矩阵校正的无监督多视角特征选择
Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment
计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124
[9] 陈晶, 吴玲玲.
多源异构环境下的车联网大数据混合属性特征检测方法
Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment
计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273
[10] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[11] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[12] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[13] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[14] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[15] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!