Sparse Canonical Correlation Analysis With Preserved Sparsity

Published in IEEE Transactions on Knowledge and Data Engineering, 2025

Canonical correlation analysis (CCA) is a widely used multivariate analysis technique for explaining the relation between two sets of variables. It achieves this goal by finding linear combinations of the variables with maximal correlation. Recently, under the assumption that leading canonical directions are sparse, various penalized CCA procedures have been proposed for high dimensional data applications. However, all these procedures have the inconvenience of not preserving the sparsity among the retained leading canonical directions. To address this issue, two new sparse CCA methods are proposed in this paper. The first method is obtained by diagonal thresholding of two square matrices derived from the cross-covariance matrix of the two sets of variables where each matrix characterizes one set of variables. A model selection criterion is used to select the number of variables to retain from each matrix diagonal. The second method is derived within an adaptive alternating penalized least squares framework where the ℓ1 2-norm is used as a penalty promoting block sparsity. Compared to existing sparse CCA methods, the proposed methods have the advantage of preserving the sparsity across the retained canonical loading vectors. Their performance are illustrated in an extended experimental study which shows the superior performance of the proposed methods.