ControlMath: Controllable Data Generation Promotes Math Generalist Models

Chen, Nuo; Wu, Ning; Chang, Jianhui; Li, Jia

Computer Science > Machine Learning

arXiv:2409.15376 (cs)

[Submitted on 20 Sep 2024]

Title:ControlMath: Controllable Data Generation Promotes Math Generalist Models

Authors:Nuo Chen, Ning Wu, Jianhui Chang, Jia Li

View PDF

Abstract:Utilizing large language models (LLMs) for data augmentation has yielded encouraging results in mathematical reasoning. However, these approaches face constraints in problem diversity, potentially restricting them to in-domain/distribution data generation. To this end, we propose ControlMath, an iterative method involving an equation-generator module and two LLM-based agents. The module creates diverse equations, which the Problem-Crafter agent then transforms into math word problems. The Reverse-Agent filters and selects high-quality data, adhering to the "less is more" principle, achieving better results with fewer data points. This approach enables the generation of diverse math problems, not limited to specific domains or distributions. As a result, we collect ControlMathQA, which involves 190k math word problems. Extensive results prove that combining our dataset with in-domain datasets like GSM8K can help improve the model's mathematical ability to generalize, leading to improved performances both within and beyond specific domains.

Comments:	17 pages
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Report number:	EMNLP 2024 Main
Cite as:	arXiv:2409.15376 [cs.LG]
	(or arXiv:2409.15376v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2409.15376

Submission history

From: Nuo Chen [view email]
[v1] Fri, 20 Sep 2024 03:58:26 UTC (441 KB)

Computer Science > Machine Learning

Title:ControlMath: Controllable Data Generation Promotes Math Generalist Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:ControlMath: Controllable Data Generation Promotes Math Generalist Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators