New Balanced Active Learning Model and Optimization Algorithm

New Balanced Active Learning Model and Optimization Algorithm

Xiaoqian Wang, Yijun Huang, Ji Liu, Heng Huang

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence

It is common in machine learning applications that unlabeled data are abundant while acquiring labels is extremely difficult. In order to reduce the cost of training model while maintaining the model quality, active learning provides a feasible solution. Instead of acquiring labels for random samples, active learning methods carefully select the data to be labeled so as to alleviate the impact from the redundancy or noise in the selected data and improve the trained model performance. In early stage experimental design, previous active learning methods adopted data reconstruction framework, such that the selected data maintained high representative power. However, these models did not consider the data class structure, thus the selected samples could be predominated by the samples from major classes. Such mechanism fails to include samples from the minor classes thus tends to be less "representative". To solve this challenging problem, we propose a novel active learning model for the early stage of experimental design. We use exclusive sparsity norm to enforce the selected samples to be (roughly) evenly distributed among different groups. We provide a new efficient optimization algorithm and theoretically prove the optimal convergence rate O(1/{T^2}). With a simple substitution, we reduce the computational load of each iteration from O(n^3) to O(n^2), which makes our algorithm more scalable than previous frameworks.
Keywords:
Machine Learning: Active Learning
Machine Learning: Machine Learning