I am a senior undergraduate majoring in Computer Science at California Institute of Technology (Caltech) with a minor in Robotics, advised by Prof. Steven Low and Prof. Günter Niemeyer. I am an incoming PhD student at Carnegie Mellon University (CMU), affiliated with the Language Technologies Institute (LTI) in the School of Computer Science, and the Institute for Computer-Aided Reasoning in Mathematics (ICARM). I am joining MiroMind AI in June, working with Dr. Kaiyu Yang on verifiable reasoning agents.
I build agentic reasoning systems: AI agents that can plan, act, verify, adapt, and improve over time. My work centers on making reasoning both capable and dependable, across formally verified environments and open-ended natural language. A recurring theme is leveraging structure—often neuro-symbolic—to provide control, interpretability, and reliability in increasingly autonomous systems.
(1) Advancing formal reasoning from static models to adaptive agents. In theorem proving, correctness is verifiable—but effective agency is hard. My work advances formal reasoning systems by enabling learning through environment and data infrastructures [LeanDojo], bringing learned capabilities back into the prover via neural-assisted automation [Lean Copilot] and human-centered tooling [Human-AI Formalization], and more recently moving beyond static prediction toward adaptive agents [Adaptation] that operate over evolving libraries [LeanAgent] and long-horizon contexts [LeanProgress].
(2) Diagnosing and strengthening reasoning in natural language. Outside formal systems, correctness guarantees disappear—so reliability must come from principled diagnosis. My work takes a human-grounded, failure-driven approach: connecting classic cognitive phenomena to modern LLM behavior [A-Not-B], examining context sensitivity and cultural fairness in language [Idioms], extending from individual failure modes to questioning foundational assumptions in behavioral evaluation [Personality Illusion] and developing a systematic framework for understanding and mitigating LLM reasoning failures [Reasoning Failures].
(3) Enhancing the efficiency of reasoning systems. I complement algorithmic advances with a system-level perspective, developing architectural and temporal arithmetic techniques for efficient computation [Delay Space][DelayNet].
My long-term goal is to build AI systems whose reasoning is as creative as human intuition and as dependable as formal logic.
We present the first comprehensive survey dedicated to reasoning failures in LLMs. By unifying fragmented research efforts, our survey provides a structured perspective on systemic weaknesses in LLM reasoning, offering valuable insights and guiding future research towards building stronger, more reliable, and robust reasoning capabilities.
Cutting-edge agentic AI systems are built on foundation models that can be adapted to plan, reason, and interact with external tools to perform increasingly complex and specialized tasks. As these systems grow in capability and scope, adaptation becomes a central mechanism for improving performance, reliability, and generalization. In this paper, we unify the rapidly expanding research landscape into a systematic framework that spans both agent adaptations and tool adaptations.
We conduct an initial exploration into people's formalization process with and without AI. We collect more than 80 hours of video from seven participants formalizing informal proofs with and without AI on a range of mathematical problems covering different levels of difficulty and domains. We offer a first characterization of people's formalization process, noting places where AI assistance helps, and a few instances where it may hurt.
LLMs say they have personalities, but they don’t act like it. Alignment today shapes language, not behavior. This linguistic–behavioral dissociation cautions against equating coherent self-reports with cognitive depth.
We introduce Lean Copilot, a framework for running neural network inference directly in Lean. It enables various LLM-based proof automation tools that integrate seamlessly into the workflow of Lean users, including tools for suggesting proof steps (tactics), selecting premises, and searching for complete proofs using LLMs.
What operations can you perform efficiently when you use the “time of arrival” of a signal’s edge to represent a number? We present negative-logarithmic delay space arithmetic as a completely new approach to temporal coding. Under this approach, general purpose arithmetic is transformed to a “soft” version of the standard temporal operations in such a way that preserves all of the algebraic identities.
LeanAgent continuously learns and improves on ever-expanding mathematical knowledge without forgetting what it learned before. It has a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity.
LLMs struggling with long proofs? We present LeanProgress, which uses a novel critic model where “distance” to the goal state acts as a key signal for step prediction, boosting performance on neural theorem proving in Lean. Our method achieves 75.1% prediction accuracy, with a 3.8% gain in proof search with step prediction.
Motivated by the crucial cognitive phenomenon of A-not-B errors, we present the first systematic evaluation on the surprisingly vulnerable inhibitory control abilities of LLMs. We reveal that this weakness undermines LLMs' trustworthy reasoning capabilities across diverse domains, and introduce various mitigations.
To compile a dictionary of East Asian idiom translations demands much time and creativity even for expert translators. To alleviate such burden, we automate high-quality data generation with GPT-4, and discover Pareto-optimal prompting strategies on both faithfulness and creativity, outperforming existing translation engines and human baseline.
We introduce energy-efficient convolution that improves the energy per pixel of each convolution frame by more than 2× compared to the state-of-the-art while improving the energy delay product by four orders of magnitude, by developing a new temporal arithmetic with a negative log transformation.
Can LLMs generate mathematical proofs that can be rigorously checked? We release LeanDojo: an open-source playground consisting of toolkits, benchmarks, and models for LLMs to prove formal theorems in the Lean proof assistant.
Selected Media
My research has been covered by many media. Some representative ones include:
I enjoy long walks, running, cycling, and badminton. I also love reading and writing, and sharing good meals with friends. When I’m on vacation, I like traveling, collecting fridge magnets, and doing nature photography — and I finally have time to dive into longer books.
Earlier on, I explored a range of spare-time activities, including math and algorithm contests, debate tournaments, and producing music.
Last updated: Apr. 2026. Website template credit: Jon Barron.