Regularization and kernelization of the maximin correlation approach
Robust classification becomes challenging when each class consists of multiple subclasses.
Examples include multi-font optical character recognition and automated protein function
prediction. In correlation-based nearest-neighbor classification, the maximin correlation
approach (MCA) provides the worst-case optimal solution by minimizing the maximum
misclassification risk through an iterative procedure. Despite the optimality, the original MCA
has drawbacks that have limited its wide applicability in practice. That is, the MCA tends to …
Examples include multi-font optical character recognition and automated protein function
prediction. In correlation-based nearest-neighbor classification, the maximin correlation
approach (MCA) provides the worst-case optimal solution by minimizing the maximum
misclassification risk through an iterative procedure. Despite the optimality, the original MCA
has drawbacks that have limited its wide applicability in practice. That is, the MCA tends to …
Robust classification becomes challenging when each class consists of multiple subclasses. Examples include multi-font optical character recognition and automated protein function prediction. In correlation-based nearest-neighbor classification, the maximin correlation approach (MCA) provides the worst-case optimal solution by minimizing the maximum misclassification risk through an iterative procedure. Despite the optimality, the original MCA has drawbacks that have limited its wide applicability in practice. That is, the MCA tends to be sensitive to outliers, cannot effectively handle nonlinearities in datasets, and suffers from having high computational complexity. To address these limitations, we propose an improved solution, named regularized MCA (R-MCA). We first reformulate MCA as a quadratically constrained linear programming (QCLP) problem, incorporate regularization by introducing slack variables in the primal problem of the QCLP, and derive the corresponding Lagrangian dual. The dual formulation enables us to apply the kernel trick to R-MCA, so that it can better handle nonlinearities. Our experimental results demonstrate that the regularization and kernelization make the proposed R-MCA more robust and accurate for various classification tasks than the original MCA. Furthermore, when the data size or dimensionality grows, R-MCA runs substantially faster by solving either the primal or dual (whichever has a smaller variable dimension) of the QCLP. The source code of the proposed R-MCA methodology is available at http://data.snu.ac.kr/rmca .
ieeexplore.ieee.org
Showing the best result for this search. See all results