Exploiting Pairing Attribute-Based VDM for Enhanced Similarity Learning

Document Type : Research Article

Authors

1 Department of Computer Science, University of Mohaghegh Ardabili, Ardabil, Iran

2 Department of Computer Science, Yazd University, Yazd, Iran

3 Department of Mathematics, Yazd University, Yazd, Iran

Abstract

The value difference metric (VDM) is a well-established similarity measure for nominal attributes in classification tasks. However, it suffers from a critical limitation: it assigns a zero distance to differing attribute values with identical class distributions, reducing discriminatory power. To address this, we propose the pairing attribute value difference metric (PAVDM), which enhances similarity evaluation by jointly considering pairs of attribute values. While PAVDM improves discrimination, it introduces higher computational costs. To mitigate this, we introduce two optimization strategies: CSPAVDM, which leverages Cramér’s $V$ for correlation-based pairing, and ASPAVDM, which employs AdaBoost to prioritize impactful attributes. Results show that PAVDM and its optimized variants outperform classical VDM in accuracy, precision, F1-score, and ROC AUC under a fair evaluation protocol.

Keywords

Main Subjects


 

Article PDF

[1] Cramer, Harald. Mathematical methods of statistics, Princeton University Press, 1999.
[2] Denis, Daniel J. Applied univariate, bivariate, and multivariate statistics: Understanding statistics for social and natural scientists, with applications in SPSS and R, John Wiley & Sons, 2021.
[3] Kibler, D., Aha, D. W., and Albert, M. K. Instance-based prediction of real-valued attributes, Computational Intelligence, 5(2), 51–57, (1989).
[4] Kasif, S., Salzberg, S., Waltz, D., Rachlin, J., and Aha, D. W. A probabilistic framework for memory-based reasoning, Artificial Intelligence, 104(1-2), 287–311, (1998).
[5] Chen, Y., Miao, D., and Zhang, H. Neighborhood outlier detection, Expert Systems with Applications, 37(12), 8745–8749, (2010).
[6] Jiang, L. and Li, C. An augmented value difference measure, Pattern Recognition Letters, 34(10), 1169–1174, (2013).
[7] Wilson, D. R. and Martinez, T. R. Improved heterogeneous distance functions, Journal of Artificial Intelligence Research, 6, 1–34, (1997).
[8] Li, C., Jiang, L., Li, H., Wu, J., and Zhang, P. Toward value difference metric with attribute weighting, Knowledge and Information Systems, 50, 795–825, (2017).
[9] Li, C., Jiang, L., and Li, H. Local value difference metric, Pattern Recognition Letters, 49, 62–68, (2014).
[10] Li, C., Jiang, L., Li, H., and Wang, S. Attribute weighted value difference metric, In: 2013 IEEE 25th International Conference on Tools with Artificial Intelligence, 575–580, (2013).
[11] Li, C. and Li, H. One dependence value difference metric, Knowledge-Based Systems, 24(5), 589–594, (2011).
[12] Li, Y., Fan, X., and Gaussier, E. Supervised categorical metric learning with schatten p-norms, IEEE Transactions on Cybernetics, 52(4), 2059–2069, (2020).
[13] Gu, D., Liang, C., and Zhao, H. A case-based reasoning system based on weighted heterogeneous value distance metric for breast cancer diagnosis, Artificial Intelligence in Medicine, 77, 31–47, (2017).
[14] Li, C. and Li, H. Selective value difference metric, Journal of Computers, 8(9), 2232–2238, (2013).
[15] Li, C., Jiang, L., and Li, H. Naive bayes for value difference metric, Frontiers of Computer Science, 8, 255–264, (2014).
[16] Skowron, A. and Wojna, A. K nearest neighbor classification with local induction of the simple value difference metric, In: Rough Sets and Current Trends in Computing: 4th International Conference, RSCTC 2004, Uppsala, Sweden, June 1-5, 2004, 229–234, (2004).
[17] Ortakaya, A. F. Independently weighted value difference metric, Pattern Recognition Letters, 97, 61–68, (2017).
[18] Liu, F., Vanschoenwinkel, B., Chen, Y., and Manderick, B. A modified value difference metric kernel for context-dependent classification tasks, In: 2006 International Conference on Machine Learning and Cybernetics, 3432–3437, (2006).
[19] Wilson, D. R. and Martinez, T. R. Value difference metrics for continuously valued attributes, In: Proceedings of the International Conference on Artificial Intelligence, Expert Systems and Neural Networks, 11–14, (1996).
[20] Rodriguez, Y., De Baets, B., Garcia, M. M., Morell, C., and Grau, R. A correlation-based distance function for nearest neighbor classification, In: Iberoamerican Congress on Pattern Recognition, 284–291, (2008).
[21] Kurian, M. J. and Gladston, R. S. An analysis on the performance of a classification based outlier detection system using feature selection, International Journal of Computer Applications, 132(8), 15–21, (2015).
[22] Townsend, J. T. Theoretical analysis of an alphabetic confusion matrix, Perception & Psychophysics, 9, 40–50, (1971).
[23] Cramer, Harald. Mathematical methods of statistics, Princeton University Press, 1999.
[24] Denis, Daniel J. Applied univariate, bivariate, and multivariate statistics: Understanding statistics for social and natural scientists, with applications in SPSS and R, John Wiley & Sons, 2021.
[25] Jiang, L. and Li, C. An augmented value difference measure, Pattern Recognition Letters, 34(10), 1169–1174, (2013).
[26] Freund, Y. and Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55(1), 119–139, (1997).
[27] Demir, S. and Sahin, E. K. An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost, Neural Computing and Applications, 35(4), 3173–3190, (2023).
[28] Pudjihartono, N., Fadason, T., Kempa-Liehr, A. W., and O’Sullivan, J. M. A review of feature selection methods for machine learning-based disease risk prediction, Frontiers in Bioinformatics, 2, 927312, (2022). 
Volume 10, Issue 2
December 2025
Pages 271-286
  • Receive Date: 22 October 2025
  • Revise Date: 30 December 2025
  • Accept Date: 30 December 2025
  • Publish Date: 30 December 2025