This is a bibliography of papers that are presumed to be tangentially related to OpenAI’s o1.
Cobbe, Karl, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, et al. 2021. “Training Verifiers to Solve Math Word Problems.” arXiv [Cs.LG]. http://arxiv.org/abs/2110.14168.
Gandhi, Kanishk, Denise Lee, Gabriel Grand, Muxin Liu, Winson Cheng, Archit Sharma, and Noah D Goodman. 2024. “Stream of Search (SoS): Learning to Search in Language.” arXiv [Cs.LG]. http://arxiv.org/abs/2404.03683.
Kazemnejad, Amirhossein, Milad Aghajohari, Eva Portelance, Alessandro Sordoni, Siva Reddy, Aaron Courville, and Nicolas Le Roux. 2024. “VinePPO: Unlocking RL Potential for LLM Reasoning Through Refined Credit Assignment.” arXiv [Cs.LG]. http://arxiv.org/abs/2410.01679.
Kirchner, Jan Hendrik, Yining Chen, Harri Edwards, Jan Leike, Nat McAleese, and Yuri Burda. 2024. “Prover-Verifier Games Improve Legibility of LLM Outputs.” arXiv [Cs.CL]. http://arxiv.org/abs/2407.13692.
Kumar, Aviral, Vincent Zhuang, Rishabh Agarwal, Yi Su, John D Co-Reyes, Avi Singh, Kate Baumli, et al. 2024. “Training Language Models to Self-Correct via Reinforcement Learning.” arXiv [Cs.LG]. http://arxiv.org/abs/2409.12917.
Lightman, Hunter, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. 2023. “Let’s Verify Step by Step.” arXiv [Cs.LG]. http://arxiv.org/abs/2305.20050.
Snell, Charlie, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. 2024. “Scaling LLM Test-Time Compute Optimally Can Be More Effective Than Scaling Model Parameters.” arXiv [Cs.LG]. http://arxiv.org/abs/2408.03314.
Uesato, Jonathan, Nate Kushman, Ramana Kumar, Francis Song, Noah Siegel, Lisa Wang, Antonia Creswell, Geoffrey Irving, and Irina Higgins. 2022. “Solving Math Word Problems with Process- and Outcome-Based Feedback.” arXiv [Cs.LG]. http://arxiv.org/abs/2211.14275.
Wang, Junlin, Jue Wang, Ben Athiwaratkun, Ce Zhang, and James Zou. 2024. “Mixture-of-Agents Enhances Large Language Model Capabilities.” arXiv [Cs.CL]. http://arxiv.org/abs/2406.04692.
Wang, Xuezhi, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2022. “Self-Consistency Improves Chain of Thought Reasoning in Language Models.” arXiv [Cs.CL]. http://arxiv.org/abs/2203.11171.
Wu, Tianhao, Janice Lan, Weizhe Yuan, Jiantao Jiao, Jason Weston, and Sainbayar Sukhbaatar. 2024. “Thinking LLMs: General Instruction Following with Thought Generation.” arXiv [Cs.CL]. http://arxiv.org/abs/2410.10630.
Wu, Yangzhen, Zhiqing Sun, Shanda Li, Sean Welleck, and Yiming Yang. 2024. “Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models.” arXiv [Cs.AI]. http://arxiv.org/abs/2408.00724.
Yoshida, Davis, Kartik Goyal, and Kevin Gimpel. 2024. “MAP’s Not Dead yet: Uncovering True Language Model Modes by Conditioning Away Degeneracy.” In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 16164–215. Stroudsburg, PA, USA: Association for Computational Linguistics. https://aclanthology.org/2024.acl-long.855.pdf.
Zelikman, Eric, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, and Noah D Goodman. 2024. “Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking.” arXiv [Cs.CL]. http://arxiv.org/abs/2403.09629.
Zelikman, Eric, Yuhuai Wu, Jesse Mu, and Noah D Goodman. 2022. “STaR: Bootstrapping Reasoning with Reasoning.” arXiv [Cs.LG]. http://arxiv.org/abs/2203.14465.
Yuxi Xie, Kenji Kawaguchi, Yiran Zhao, Xu Zhao, Min-Yen Kan, Junxian He, Qizhe Xie. 2023. "Self-Evaluation Guided Beam Search for Reasoning." arXiv [Cs.CL]. https://arxiv.org/abs/2305.00633.
Yuxi Xie, Anirudh Goyal, Wenyue Zheng, Min-Yen Kan, Timothy P. Lillicrap, Kenji Kawaguchi, Michael Shieh. 2024. "Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning." arXiv [cs.AI]. https://arxiv.org/abs/2405.00451.