Peiyang Song

I am a 4th-year undergrad majoring in Computer Science at California Institute of Technology (Caltech), advised by Prof. Steven Low, with a minor in Robotics advised by Prof. Günter Niemeyer. I am a researcher in Berkeley AI Research (BAIR) Lab, working with Prof. Dawn Song and Dr. Jingxuan He. I am also part of Stanford AI Lab (SAIL), advised by Prof. Noah Goodman and Dr. Gabriel Poesia. I am fortunate to have worked with Prof. Anima Anandkumar (Caltech), Dr. Kaiyu Yang (Meta), Prof. Tim Sherwood (UC Santa Barbara), and Dr. Jeremy Lau (Google) during my undergrad.

I am currently applying for 26Fall PhD positions in Computer Science. Please feel free to explore my research interest and work below. Would love to connect if you see potential fits!

宋沛洋  /  Email  /  CV  /  Bio  /  Google Scholar  /  GitHub  /  LinkedIn  /  Twitter

profile photo
Recent News

[Oct 2025] I am giving an invited talk at CMU on LLM Reasoning for Math and Code. Thank you for having me!
[Sep 2025] Our work on The Personality Illusion is accepted to NeurIPS LAW Workshop. See you in San Diego!
[Sep 2025] We released our work discovering The Personality Illusion: LLMs do not have personalities in the way humans do.
[Jul 2025] Our paper on Delay Space Arithmetic and Architecture is awarded IEEE Micro Top Picks.
[Jul 2025] Our paper on LLM Reasoning Failures is accepted to ICML AI for Math Workshop. Stay tuned for our full release!

Research

My research centers on LLM reasoning, agentic AI, and neuro-symbolic AI. This agenda unfolds along one central axis and extends to two broader directions:

  • Central axis (AI for Math & Code): Integrating LLM reasoning, agentic AI, and neuro-symbolic methods by combining neural models (LLMs) with formal symbolic systems (such as Lean) to advance LLM-based agents for formal reasoning in math and code [LeanDojo], in a practical [Lean Copilot], predictable [LeanProgress], and extendable [LeanAgent] manner.
  • Broader LLM reasoning: Extending LLM reasoning beyond formal domains, exploring how LLMs can tackle informal reasoning in natural language, inspired by cognitive science principles and studies of human-like reasoning [Personality Illusion] [A-Not-B] [LLMRF].
  • Broader neuro-symbolic AI: Building general neuro-symbolic approaches towards fundamental AI systems beyond formal reasoning, such as developing energy-efficient inference [Delay Space] [DelayNet] and machine translation across idioms and languages [Idioms].

If any of these topics resonate with your interests, feel free to reach out and let's connect/collaborate!

A Survey on Large Language Model Reasoning Failures
Peiyang Song*, Pengrui Han*, and Noah Goodman (* Equal Contribution)
ICML AI for Math Workshop, 2025
preprint / full release coming soon

We present the first comprehensive survey dedicated to reasoning failures in LLMs. By unifying fragmented research efforts, our survey provides a structured perspective on systemic weaknesses in LLM reasoning, offering valuable insights and guiding future research towards building stronger, more reliable, and robust reasoning capabilities.

The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs
Pengrui Han*, Rafal D. Kocielnik*, Peiyang Song, Ramit Debnath, Dean Mobbs, Anima Anandkumar, and R. Michael Alvarez (* Equal Contribution)
NeurIPS LAW Workshop: Bridging Language, Agent, and World Models, 2025
arXiv / project / code

LLMs say they have personalities, but they don’t act like it. Alignment today shapes language, not behavior. This linguistic–behavioral dissociation cautions against equating coherent self-reports with cognitive depth.

Delay Space Arithmetic and Architecture
Rhys Gretsch, Peiyang Song, Advait Madhavan, Jeremy Lau, and Tim Sherwood
IEEE Micro, 2025, Top Pick Award
proceeding

What operations can you perform efficiently when you use the “time of arrival” of a signal’s edge to represent a number? We present negative-logarithmic delay space arithmetic as a completely new approach to temporal coding. Under this approach, general purpose arithmetic is transformed to a “soft” version of the standard temporal operations in such a way that preserves all of the algebraic identities.

LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction
Suozhi Huang, Peiyang Song, Robert Joseph George, and Anima Anandkumar
ICLR VerifAI: AI Verification in the Wild Workshop, 2025
arXiv / project

LLMs struggling with long proofs? We present LeanProgress, which uses a novel critic model where “distance” to the goal state acts as a key signal for step prediction, boosting performance on neural theorem proving in Lean. Our method achieves 75.1% prediction accuracy, with a 3.8% gain in proof search with step prediction.

LeanAgent: Lifelong Learning for Formal Theorem Proving
Adarsh Kumarappan, Mo Tiwari, Peiyang Song, Robert Joseph George, Chaowei Xiao, and Anima Anandkumar
International Conference on Learning Representations (ICLR), 2025
arXiv / project / code / proceeding / media

LeanAgent continuously learns and improves on ever-expanding mathematical knowledge without forgetting what it learned before. It has a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity.

Lean Copilot: Large Language Models as Copilots for Theorem Proving in Lean
Peiyang Song, Kaiyu Yang, and Anima Anandkumar
International Conference on Neuro-Symbolic Systems (NeuS), 2025
1.2k+ stars on Github, ranking 2nd (only after Mathlib4) among all Lean projects
arXiv / project / code / proceeding / poster / demo / slides / tutorial / media

We introduce Lean Copilot, a framework for running neural network inference directly in Lean. It enables various LLM-based proof automation tools that integrate seamlessly into the workflow of Lean users, including tools for suggesting proof steps (tactics), selecting premises, and searching for complete proofs using LLMs.

Creative and Context-Aware Translation of East Asian Idioms with GPT-4
Kenan Tang*, Peiyang Song*, Yao Qin, and Xifeng Yan (* Equal Contribution)
Findings of Empirical Methods in Natural Language Processing (EMNLP), 2024
arXiv / code / proceeding / demo

To compile a dictionary of East Asian idiom translations demands much time and creativity even for expert translators. To alleviate such burden, we automate high-quality data generation with GPT-4, and discover Pareto-optimal prompting strategies on both faithfulness and creativity, outperforming existing translation engines and human baseline.

In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language Models
Pengrui Han*, Peiyang Song*, Haofei Yu, and Jiaxuan You (* Equal Contribution)
Findings of Empirical Methods in Natural Language Processing (EMNLP), 2024
arXiv / code / proceeding

Motivated by the crucial cognitive phenomenon of A-not-B errors, we present the first systematic evaluation on the surprisingly vulnerable inhibitory control abilities of LLMs. We reveal that this weakness undermines LLMs' trustworthy reasoning capabilities across diverse domains, and introduce various mitigations.

Energy Efficient Convolutions with Temporal Arithmetic
Rhys Gretsch, Peiyang Song, Advait Madhavan, Jeremy Lau, and Tim Sherwood
ACM Int'l Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024
proceeding

We introduce energy-efficient convolution that improves the energy per pixel of each convolution frame by more than 2× compared to the state-of-the-art while improving the energy delay product by four orders of magnitude, by developing a new temporal arithmetic with a negative log transformation.

LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, and Anima Anandkumar
Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, 2023, Oral Presentation
arXiv / project / code / dataset / model / poster / proceeding / media

Can LLMs generate mathematical proofs that can be rigorously checked? We release LeanDojo: an open-source playground consisting of toolkits, benchmarks, and models for LLMs to prove formal theorems in the Lean proof assistant.

Selected Media

My works have been covered by many media. Some representative ones include:

Selected Awards
  • Caltech FCC Appreciation Award (2025)
  • ICLR Notable Reviewer Award (2025)
  • George W. Housner Student Discovery Fund (2025)
  • IEEE Micro Top Pick Award (2025)
  • Early Research Scholarship (2023)
  • Caltech SURF Award (2023)
Teaching
Academic Services
  • Reviewer @ NeurIPS, ICLR, ARR, ACL, EMNLP, IJCNLP, AACL, etc.
  • Admissions Ambassador @ Undergraduate Admissions Office, Caltech.
  • First-Year Caltech Connector (FCC) @ Student & Family Engagement Office, Caltech.
  • Organizing Staff @ Agentic AI Summit 2025, UC Berkeley.

Last updated: Sep. 2025. Website template credit: Jon Barron.