Closing the Performance Gap: Data-Centric Fine-Tuning of Vision Language Models for the Standardized Exam Questions

Curated 161.4M-token reasoning corpus with SFT. Increased performance of an open-source model on par with top proprietary models. Accuracy on YKSUniform: 62.46% â 78.59% (rank 3). SFT on 8xH200 GPUs with FSDP.









