Data Mining Applications in Banking Sector While Preserving Customer Privacy

Özge Doğuç


In real-life data mining applications, organizations cooperate by using each other’s data on the same data mining task for more accurate results, although they may have different security and privacy concerns. Privacy-preserving data mining (PPDM) practices involve rules and techniques that allow parties to collaborate on data mining applications while keeping their data private. The objective of this paper is to present a number of PPDM protocols and show how PPDM can be used in data mining applications in the banking sector. For this purpose, the paper discusses homomorphic cryptosystems and secure multiparty computing. Supported by experimental analysis, the paper demonstrates that data mining tasks such as clustering and Bayesian networks (association rules) that are commonly used in the banking sector can be efficiently and securely performed. This is the first study that combines PPDM protocols with applications for banking data mining.


Doi: 10.28991/ESJ-2022-06-06-014

Full Text: PDF


Data Management; Data Security; Data Mining; Banking Processes.


Agrawal, R., & Srikant, R. (2000). Privacy-preserving data mining. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data - SIGMOD ’00. doi:10.1145/342009.335438.

Cramer, R., Damgård, I., Nielsen, J.B. (2001). Multiparty Computation from Threshold Homomorphic Encryption. Advances in Cryptology — EUROCRYPT 2001, Lecture Notes in Computer Science, 2045. Springer, Berlin, Germany. doi:10.1007/3-540-44987-6_18.

Kantarcioglu, M., & Clifton, C. (2004). Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering, 16(9), 1026–1037. doi:10.1109/TKDE.2004.45.

Du, W., & Zhan, Z. (2002). Building decision tree classifier on private data. Proceedings of the IEEE International Conference on Privacy, Security and Data Mining-Volume 14, 1–8. 1 December, Maebashi City, Japan.

Evfimievski, A., Srikant, R., Agrawal, R., & Gehrke, J. (2002). Privacy preserving mining of association rules. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining-KDD ’02. doi:10.1145/775047.775080.

Kantarcıoǧlu, M., Clifton, C. (2004). Privately Computing a Distributed K-NN Classifier. Knowledge Discovery in Databases: PKDD 2004. PKDD 2004. Lecture Notes in Computer Science, 3202. Springer, Berlin, Germany. doi:10.1007/978-3-540-30116-5_27.

Jagannathan, G., & Wright, R. N. (2005). Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. Proceeding of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining-KDD ’05. doi:10.1145/1081870.1081942.

Wright, R., & Yang, Z. (2004). Privacy-preserving Bayesian network structure computation on distributed heterogeneous data. Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining- KDD ’04. doi:10.1145/1014052.1014145.

Gilburd, B., Schuster, A., & Wolff, R. (2004). Privacy-preserving data mining on data grids in the presence of malicious participants. Proceedings. 13th IEEE International Symposium on High Performance Distributed Computing, 24 August 2004 Honolulu, HI, USA. doi:10.1109/hpdc.2004.1323540.

Yao, A. C. (1982). Protocols for secure computations. 23rd Annual Symposium on Foundations of Computer Science (SFCS 1982). doi:10.1109/sfcs.1982.38.

Atallah, M.J., Du, W. (2001). Secure Multi-party Computational Geometry. Algorithms and Data Structures, WADS 2001, Lecture Notes in Computer Science, 2125. Springer, Berlin, Germany. doi:10.1007/3-540-44634-6_16.

Boudot, F., Schoenmakers, B., & Traoré, J. (2001). A fair and efficient solution to the socialist millionaires’ problem. Discrete Applied Mathematics, 111(1–2), 23–36. doi:10.1016/S0166-218X(00)00342-5.

Paillier, P. (1999). Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. Advances in Cryptology— EUROCRYPT ’99, EUROCRYPT 1999, Lecture Notes in Computer Science, 1592, Springer, Berlin, Germany. doi:10.1007/3-540-48910-X_16.

Du, W., Han, Y. S., & Chen, S. (2004). Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification. Proceedings of the 2004 SIAM International Conference on Data Mining. doi:10.1137/1.9781611972740.21.

Li, X., Yi, S., Cundy, A. B., & Chen, W. (2022). Sustainable decision-making for contaminated site risk management: A decision tree model using machine learning algorithms. Journal of Cleaner Production, 371, 133612.doi:10.1016/j.jclepro.2022.133612.

Du, W., & Zhan, Z. (2003). Using randomized response techniques for privacy-preserving data mining. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining-KDD ’03. doi:10.1145/956750.956810.

Beaver, D. (1997). Commodity-based cryptography (extended abstract). Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing-STOC ’97. doi:10.1145/258533.258637.

Zhan, J., Matwin, S., Chang, L. (2005). Privacy-Preserving Collaborative Association Rule Mining. Data and Applications Security XIX. DBSec 2005, Lecture Notes in Computer Science, 3654. Springer, Berlin, Germany. doi:10.1007/11535706_12.

Hasheminejad, S. M. H., & Khorrami, M. (2018). Data mining techniques for analyzing bank customers: A survey. Intelligent Decision Technologies, 12(3), 303–321. doi:10.3233/IDT-180335.

Özmen, M., Aydoğan, E. K., Delice, Y., & Toksarı, M. D. (2020). Churn prediction in Turkey’s telecommunications sector: A proposed multiobjective–cost-sensitive ant colony optimization. WIREs Data Mining and Knowledge Discovery, 10(1). doi:10.1002/widm.1338.

Matsunaga, F. T., Brancher, J. D., & Busto, R. M. (2014). Data mining applications and techniques: A systematic review. Rev. Eletrônica Argentina-Brasil Tecnologias da Informação e da Comunicação, 1(2).

Olufemi Ogunleye, J. (2022). The Concept of Data Mining. Intechopen, London, United Kingdom. doi:10.5772/intechopen.99417.

Li, Y., Jiang, X., Wang, S., Xiong, H., & Ohno-Machado, L. (2016). VERTIcal Grid lOgistic regression (VERTIGO). Journal of the American Medical Informatics Association, 23(3), 570–579. doi:10.1093/jamia/ocv146.

Das, A., Bhattacharyya, D. K., & Kalita, J. K. (2003). Horizontal vs. vertical partitioning in association rule mining: a comparison. Proceedings of the 6th International Conference on Computational Intelligence and Natural Computation (CINC), 1617-1620, 26-30 September, 2003, Embassy Suites Hotel and Conference Center, Cary, North Carolina, United States.

Hemlata, & Gulia, P. (2017). Novel algorithm for PPDM of vertically partitioned data. International Journal of Applied Engineering Research, 12(12), 3090–3096.

Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, 226–231, 2-4 August, 1996, Portland Oregon, United States.

Evfimievski, A., Gehrke, J., & Srikant, R. (2003). Limiting privacy breaches in privacy preserving data mining. Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems-PODS ’03. doi:10.1145/773153.773174.

Lindell, Y., & Pinkas, B. (2012). Secure two-party computation via cut-and-choose oblivious transfer. Journal of Cryptology, 25(4), 680–722. doi:10.1007/s00145-011-9107-0.

Yang, Z., & Wright, R. N. (2006). Privacy-preserving computation of bayesian networks on vertically partitioned data. IEEE Transactions on Knowledge and Data Engineering, 18(9), 1253–1264. doi:10.1109/TKDE.2006.147.

Goethals, B., Laur, S., Lipmaa, H., Mielikäinen, T. (2005). On Private Scalar Product Computation for Privacy-Preserving Data Mining. Information Security and Cryptology – ICISC 2004. ICISC 2004, Lecture Notes in Computer Science, 3506. Springer, Berlin, Germany. doi:10.1007/11496618_9.

Har-Peled, S., & Sadri, B. (2005). How fast is the k-means method? Algorithmica, 41(3), 185–202. doi:10.1007/s00453-004-1127-9.

Jagannathan, G., & Wright, R. N. (2005). Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. Proceeding of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining-KDD ’05. doi:10.1145/1081870.1081942.

Freedman, M.J., Nissim, K., Pinkas, B. (2004). Efficient Private Matching and Set Intersection. Advances in Cryptology-EUROCRYPT 2004. EUROCRYPT 2004, Lecture Notes in Computer Science, 3027. Springer, Berlin, Germany. doi:10.1007/978-3-540-24676-3_1.

Bunn, P., & Ostrovsky, R. (2007). Secure two-party k-means clustering. Proceedings of the 14th ACM Conference on Computer and Communications Security- CCS2007. doi:10.1145/1315245.1315306.

Malkhi, D., Nisan, N., Pinkas, B., & Sella, Y. (2004). Fairplay-Secure Two-Party Computation System. USENIX Security Symposium, 9-13August, 2004, San Diego, United States.

Kissner, L., Song, D. (2005). Privacy-Preserving Set Operations. Advances in Cryptology – CRYPTO 2005, CRYPTO 2005, Lecture Notes in Computer Science, 3621. Springer, Berlin, Germany. doi:10.1007/11535218_15.

Full Text: PDF

DOI: 10.28991/ESJ-2022-06-06-014


  • There are currently no refbacks.

Copyright (c) 2022 Ozge Doguc