Journal Article
An Intelligent System for Default Identification in Commercial Bank
by
Mia Anderson
, Oliver King
and
Qin Tsai
Abstract
Credit default identification technology usually requires a higher classification accuracy, and low rate of false positives, in view of the traditional credit default identification model based on machine learning, in dealing with high dimensional imbalance of financial transaction data on the problem of lower overall classification accuracy, along with the development of the l
[...] Read more
Credit default identification technology usually requires a higher classification accuracy, and low rate of false positives, in view of the traditional credit default identification model based on machine learning, in dealing with high dimensional imbalance of financial transaction data on the problem of lower overall classification accuracy, along with the development of the large data, the credit transaction data of large-scale and high dimension feature, It is necessary to ensure the efficient and timely construction of the credit default recognition model. Semi-supervised learning can reduce the cost of manual labeling, and train the model in time by making full use of a large amount of unlabeled data and a small amount of labeled data, which has important research significance in the field of credit default recognition. At the same time, the traditional supervised learning model can only recognize the default behaviors that have happened before, but the default behaviors are variable, and the semi-supervised learning algorithm can recognize the unknown default behaviors. From the perspective of semi-supervised learning, DBN, as a semi-supervised deep learning framework, utilizes its feature learning ability and iForest unsupervised learning algorithm's ability to identify abnormal behaviors to propose a default identification method by use of the DBN-Iforest. This paper first introduces the experimental data, including the data source and the pretreatment process of the original data. Then the performance evaluation criteria applied to all experimental models in this paper are introduced in detail. Then the steps of DBN-iForest algorithm are described in detail. Next, algorithms for iForest optimization are introduced, including particle swarm optimization algorithm and simulated annealing optimization algorithm. Finally, through comparative analysis of experimental results at different levels, the advantages of the DBN-Iforest model are proved, the classification performance is improved, and the difficulty of data resource shortage in credit default recognition is overcome. The semi-supervised default identification method by use of the DBN-IForest fully improves the classification performance of DBN and iForest by using less label data.