Peiyang Song

I am a third-year undergraduate studying Computer Science at California Institute of Technology (Caltech), advised by Prof. Steven Low, with a minor in Robotics advised by Prof. Günter Niemeyer. I am a researcher in Berkeley AI Research (BAIR) Lab, advised by Prof. Dawn Song. I also work in Stanford AI Lab (SAIL), advised by Prof. Noah Goodman and Dr. Gabriel Poesia in the Computation & Cognition Lab (CoCoLab). I have been fortunate to work with Prof. Anima Anandkumar (Caltech), Dr. Kaiyu Yang (Meta), Prof. Tim Sherwood (UC Santa Barbara), and Dr. Jeremy Lau (Google) during my undergrad.

宋沛洋 / Email / CV / Bio / Google Scholar / GitHub / LinkedIn / Twitter

News

[June 2025] One paper accepted to ICML Workshop on Models of Human Feedback for AI Alignment (MoFA). Stay tuned for our release!
[June 2025] I am featured by Caltech in the Division of Engineering and Applied Science (EAS) Newsletter Spotlight. Thank you!
[June 2025] I am back in the bay this summer from mid-June to late-September. Hit me up if you are around!
[May 2025] Attending NeuS 2025 in Philadelphia, PA from 5/27 to 5/30, presenting Lean Copilot.
[May 2025] Honored to be recognized by ICLR 2025 as a notable reviewer.

Research

My current research interest is mainly in LLM reasoning, especially neuro-symbolic AI for formal math and verifiable code generation. In the past, I also published on neuro-symbolic methods for energy-efficient ML systems and neural machine translation.

	LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction Suozhi Huang, Peiyang Song, Robert Joseph George, and Anima Anandkumar ICLR VerifAI: AI Verification in the Wild Workshop, 2025 arXiv LLMs struggling with long proofs? We present LeanProgress, which uses a novel critic model where “distance” to the goal state acts as a key signal for step prediction, boosting performance on neural theorem proving in Lean. Our method achieves 75.1% prediction accuracy, with a 3.8% gain in proof search with step prediction.
	LeanAgent: Lifelong Learning for Formal Theorem Proving Adarsh Kumarappan, Mo Tiwari, Peiyang Song, Robert Joseph George, Chaowei Xiao, and Anima Anandkumar International Conference on Learning Representations (ICLR), 2025 arXiv / code / proceeding LeanAgent continuously learns and improves on ever-expanding mathematical knowledge without forgetting what it learned before. It has a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity.
	Lean Copilot: Large Language Models as Copilots for Theorem Proving in Lean Peiyang Song, Kaiyu Yang, and Anima Anandkumar International Conference on Neuro-Symbolic Systems (NeuS), 2025 1.1k+ stars on Github, ranking 2nd (only after Mathlib4) among all Lean repositories arXiv / code / poster / demo / slides / tutorial / media We introduce Lean Copilot, a framework for running neural network inference directly in Lean. It enables various LLM-based proof automation tools that integrate seamlessly into the workflow of Lean users, including tools for suggesting proof steps (tactics), selecting premises, and searching for complete proofs using LLMs.
	Creative and Context-Aware Translation of East Asian Idioms with GPT-4 Kenan Tang, Peiyang Song*, Yao Qin, and Xifeng Yan ( Equal Contribution) Findings of Empirical Methods in Natural Language Processing (EMNLP), 2024 arXiv / code / proceeding To compile a dictionary of East Asian idiom translations demands much time and creativity even for expert translators. To alleviate such burden, we automate high-quality data generation with GPT-4, and discover Pareto-optimal prompting strategies on both faithfulness and creativity, outperforming existing translation engines and human baseline.
	In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language Models Pengrui Han, Peiyang Song*, Haofei Yu, and Jiaxuan You ( Equal Contribution) Findings of Empirical Methods in Natural Language Processing (EMNLP), 2024 arXiv / code / proceeding Motivated by the crucial cognitive phenomenon of A-not-B errors, we present the first systematic evaluation on the surprisingly vulnerable inhibitory control abilities of LLMs. We reveal that this weakness undermines LLMs' trustworthy reasoning capabilities across diverse domains, and introduce various mitigations.
	Energy Efficient Convolutions with Temporal Arithmetic Rhys Gretsch, Peiyang Song, Advait Madhavan, Jeremy Lau, and Tim Sherwood ACM Int'l Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024; IEEE Micro Top Pick 2025 proceeding We introduce energy-efficient convolution that improves the energy per pixel of each convolution frame by more than 2× compared to the state-of-the-art while improving the energy delay product by four orders of magnitude, by developing a new temporal arithmetic with a negative log transformation.
	LeanDojo: Theorem Proving with Retrieval-Augmented Language Models Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, and Anima Anandkumar Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, 2023, Oral Presentation arXiv / project / code / poster / slides / proceeding / media Can LLMs generate mathematical proofs that can be rigorously checked? We release LeanDojo: an open-source playground consisting of toolkits, benchmarks, and models for LLMs to prove formal theorems in the Lean proof assistant.