DeepSeek Prover V2
DeepSeek Prover V2
Z.Z. Ren*, Zhihong Shao*, Junxiao Song*, Huajian Xin† , Haocheng Wang† , Wanjia Zhao† , Liyue Zhang, Zhe Fu
Qihao Zhu, Dejian Yang, Z.F. Wu, Zhibin Gou, Shirong Ma, Hongxuan Tang, Yuxuan Liu, Wenjun Gao
Daya Guo, Chong Ruan
DeepSeek-AI
https://github.com/deepseek-ai/DeepSeek-Prover-V2
arXiv:2504.21801v1 [cs.CL] 30 Apr 2025
Abstract
8
Number of Problems Solved (out of 15)
85
82.0% 40
80.7%
80 6
6
Pass Rate (%)
30
75 23
73.0%
4
20
70
67.6%
10 2
64.7% 8 7
65
1
60 0 0
MiniF2F-test PutnamBench ProverBench-AIME 24&25
Figure 1 | Benchmark performance of DeepSeek-Prover-V2. On the AIME benchmark, DeepSeek-
V3 is evaluated using the standard find-answer task for natural-language reasoning, while
prover models generate Lean code to construct formal proofs for a given correct answer.
2
Show that for any integer � ≥ 4, we have �2 ≤ �!.
Recursively Solve Subgoals by DeepSeek-Prover-V2-7B
theorem induction_ineq_nsqlefactn (n : ℕ) (h₀ : 4 ≤ n) : n ^ 2 ≤ n ! := sorry
lemma inductive_step (n : ℕ) (h₀ : 4 ≤ n) (base_case : 4 ^ 2 ≤ 4 !)
: ∀ k ≥ 4, k ^ 2 ≤ k ! → (k + 1) ^ 2 ≤ (k + 1) ! := by
Understanding the Problem sorry
We need to show that for any integer � ≥ 4, the inequality �2 ≤ �! holds. Here,
�! denotes the factorial of �, which is the product of all positive integers up to �.
Synthesize into Complete Formal Proofs
Proof Sketch
To prove this formally, we can use mathematical induction. Here's how the theorem induction_ineq_nsqlefactn (n : ℕ) (h₀ : 4 ≤ n) : n ^ 2 ≤ n ! := by
induction would work: have base_case : 4 ^ 2 ≤ 4 ! := by
1. Base Case: Verify the inequality for � = 4. simp [Nat.factorial]
2. Inductive Step: Assume the inequality holds for some � = 4, i.e., �2 ≤ �!.
have inductive_step : ∀ k ≥ 4, k ^ 2 ≤ k ! → (k + 1) ^ 2 ≤ (k + 1) ! := by
Then show that it holds for � + 1, i.e., (� + 1)2 ≤ (� + 1)!.
intro k h₁ h₂
chain-of-thought by DeepSeek-V3 simp_all [Nat.factorial]
Lean 4 Proof Structure nlinarith
have final_proof : ∀ n ≥ 4, n ^ 2 ≤ n ! := by
theorem induction_ineq_nsqlefactn (n : ℕ) (h₀ : 4 ≤ n) : n ^ 2 ≤ n ! := by intro n hn
have base_case : 4 ^ 2 ≤ 4 ! := by induction' hn with k hk
sorry case refl => exact base_case
have inductive_step : ∀ k ≥ 4, k ^ 2 ≤ k ! → (k + 1) ^ 2 ≤ (k + 1) ! := by case step =>
sorry apply inductive_step k hk
exact by assumption
have final_proof : ∀ n ≥ 4, n ^ 2 ≤ n ! := by
sorry apply final_proof
sorry exact h₀
the connection between informal mathematical reasoning and formal proof construction. Our
experiments show that reinforcement learning starting from the cold start of informal reasoning
in task decomposition significantly enhances the model’s capabilities in formal theorem prov-
ing. The resulting DeepSeek-Prover-V2-671B model establishes a new state-of-the-art in neural
theorem proving across multiple benchmarks. On MiniF2F-test, it achieves 82.4% accuracy
with Pass@32, improving to 88.9% with Pass@8192. The model shows strong generalization
capabilities to college-level theorem proving, solving 37.1% of ProofNet-test problems with
Pass@1024 and tackling 49 out of 658 challenging PutnamBench problems. Additionally, we
contribute ProverBench, a benchmark dataset containing 325 formalized problems to advance
neural theorem proving research, including 15 from the prestigious AIME competitions (years
24-25). DeepSeek-Prover-V2-671B successfully solves 6 of these 15 challenging AIME problems,
further demonstrating its sophisticated mathematical reasoning capabilities.
2. Method
Decomposing the proof of a complex theorem into a sequence of smaller lemmas that serve as
stepping stones is an effective strategy commonly employed by human mathematicians. Several
previous studies have explored this hierarchical strategy in the context of neural theorem prov-
ing, aiming to enhance proof search efficiency by leveraging the informal reasoning capabilities
of modern LLMs (Jiang et al., 2023; Zhao et al., 2023; Wang et al., 2024a; Dong et al., 2024). In
this paper, we develop a simple yet effective pipeline that utilizes DeepSeek-V3 (DeepSeek-AI,
2024) as a unified tool for subgoal decomposition in formal theorem proving.
3
Subgoal Decomposition (a) Substitute the original goal
theorem imo_1974_p5 (a b c d s : ℝ) (h₀ : 0 < a ∧ 0 < b ∧ 0 < c ∧ 0 < d) lemma lower_bound (a b c d s : ℝ) (h₀ : 0 < a ∧ 0 < b ∧ 0 < c ∧ 0 < d)
(h₁ : s = a / (a + b + d) + b / (a + b + c) + c / (b + c + d) + d / (a + c + d)) : (h₁ : s = a / (a + b + d) + b / (a + b + c) + c / (b + c + d) + d / (a + c + d)) :
1 < s ∧ s < 2 := by 1 < s := by
have term1_pos : 0 < a / (a + b + d) := by sorry sorry
have term1_less1 : a / (a + b + d) < 1 := by sorry
have term2_pos : 0 < b / (a + b + c) := by sorry (b) Incorporate preceding subgoals as premises
have term2_less1 : b / (a + b + c) < 1 := by sorry
lemma lower_bound (a b c d s : ℝ) (h₀ : 0 < a ∧ 0 < b ∧ 0 < c ∧ 0 < d)
have term3_pos : 0 < c / (b + c + d) := by sorry
(h₁ : s = a / (a + b + d) + b / (a + b + c) + c / (b + c + d) + d / (a + c + d))
have term3_less1 : c / (b + c + d) < 1 := by sorry
(term1_pos : 0 < a / (a + b + d)) (term1_less1 : a / (a + b + d) < 1)
have term4_pos : 0 < d / (a + c + d) := by sorry
(term2_pos : 0 < b / (a + b + c)) (term2_less1 : b / (a + b + c) < 1)
have term4_less1 : d / (a + c + d) < 1 := by sorry
(term3_pos : 0 < c / (b + c + d)) (term3_less1 : c / (b + c + d) < 1)
have lower_bound : 1 < s := by sorry
(term4_pos : 0 < d / (a + c + d)) (term4_less1 : d / (a + c + d) < 1) :
have upper_bound : s < 2 := by sorry
1 < s := by
sorry
sorry
Sketching Formal Proofs from Natural Language Reasoning. Recent advances in large lan-
guage models have led to significant breakthroughs in informal reasoning capabilities. To bridge
the gap between formal and informal reasoning, we leverage cutting-edge general-purpose
LLMs, recognized for their strong mathematical reasoning and instruction-following abilities, to
construct the foundational framework of our theorem-proving system. Our findings indicate
that off-the-shelf models, such as DeepSeek-V3 (DeepSeek-AI, 2024), are capable of decompos-
ing proof steps and expressing them in formal languages. To prove a given formal theorem
statement, we prompt DeepSeek-V3 to first analyze the mathematical problem in natural lan-
guage, then decompose the proof into smaller steps, translating each step into a corresponding
Lean formal statement. Since general-purpose models are known to struggle with producing
complete Lean proofs, we instruct DeepSeek-V3 to generate only a high-level proof sketch with
the details omitted. The resulting chain of thought culminates in a Lean theorem composed of a
sequence of have statements, each concluded with a sorry placeholder indicating a subgoal to
be solved. This approach mirrors the human style of proof construction, in which a complex
theorem is incrementally reduced to a sequence of more manageable lemmas.
Curriculum Learning for Subgoal-based Theorem Proving. The training of prover models
requires large formal-language problem sets, typically derived by formalizing existing natural-
language mathematical corpora (Xin et al., 2024a; Ying et al., 2024; Lin et al., 2025). Although
formalization of human-authored texts provides high-quality and diverse formal content, the
resulting training signals for prover models are often sparse, as a large proportion of computa-
4
tional attempts do not yield successful proofs and therefore offer no positive reward signals.
To generate denser training signals, Dong and Ma (2025) proposed a self-play approach that
enriches training problem sets by generating tractable conjectures closely related to the original
theorem statements, thereby enabling more efficient use of training compute. In this paper, we
implement a straightforward approach that leverages subgoals to expand the scope of formal
statements used for model training. We generate two types of subgoal theorems, one incor-
porating preceding subgoals as premises and one without, corresponding to Figure 3(b) and
Figure 3(a), respectively. Both types are integrated into the expert iteration stage (Polu and
Sutskever, 2020), establishing a curriculum that progressively guides the prover model toward
systematically addressing a curated subset of challenging problems. This procedure builds on
the same underlying principle as AlphaProof’s test-time reinforcement learning (DeepMind,
2024), wherein variations of a target problem are generated to enhance the model’s capability in
solving challenging IMO-level problems.
The algorithmic framework discussed above operates in two stages, leveraging two comple-
mentary models: DeepSeek-V3 for lemma decomposition and a 7B prover model to complete
the corresponding formal proof details. This pipeline provides a natural and scalable mecha-
nism for synthesizing formal reasoning data by combining high-level reasoning from language
models with precise formal verification. In this manner, we unify the capabilities of informal
mathematical reasoning and proof formalization within a single model.
Cold Start by Synthetic Data. We curate a subset of challenging problems that remain unsolved
by the 7B prover model in an end-to-end manner, but for which all decomposed subgoals have
been successfully resolved. By composing the proofs of all subgoals, we construct a complete
formal proof for the original problem. This proof is then appended to DeepSeek-V3’s chain-of-
thought, which outlines the corresponding lemma decomposition, thereby producing a cohesive
synthesis of informal reasoning and subsequent formalization. It enables the collection of
hundreds of high-quality synthetic cold-start data, which serve as the foundation for training
DeepSeek-Prover-V2. This cold-start dataset generation strategy differs from that of Kimina-
Prover (Wang et al., 2025), a concurrent work on formal reasoning models. Specifically, we
synthesize data by formalizing natural-language proofs directly into structured formal proof
sketches. In contrast, Kimina-Prover adopts a reverse workflow: it begins by collecting complete
formal proofs alongside their informal counterparts, then uses general-purpose reasoning
models to retrosynthesize intermediate natural-language reasoning steps into coherent thinking
blocks.
5
2.3. Training Details of DeepSeek-Prover-V2
Consistent with DeepSeek-Prover-V1.5 (Xin et al., 2024b), these two generation modes are
governed by two distinct guiding prompts (see Appendix A for examples). In the first stage,
we employ expert iteration within a curriculum learning framework to train a non-CoT prover
model, meanwhile, synthesizing proofs for hard problems through subgoal-based recursive
proving. The non-CoT generation mode is chosen to accelerate iterative training and data
collection processes, as it offers significantly faster inference and validation cycles. Building on
this foundation, the second stage leverages cold-start chain-of-thought (CoT) data synthesized
by integrating DeepSeek-V3’s sophisticated mathematical reasoning patterns with our synthetic
formal proofs. The CoT mode is enhanced through a further reinforcement learning stage,
following the standard training pipeline commonly used for reasoning models.
Expert Iteration. The training procedure for the non-CoT mode of DeepSeek-Prover-V2 follows
the paradigm of expert iteration (Polu and Sutskever, 2020), a widely adopted framework for
developing formal theorem provers. In each training iteration, the current best prover policy is
used to generate proof attempts for those challenging problems that remain unsolved in prior
iterations. Those successful attempts, verified by Lean proof assistant, are incorporated into
the SFT dataset to train an improved model. This iterative loop ensures that the model not
only learns from the initial demonstration datasets but also distills its own successful reasoning
traces, progressively refining its ability to solve harder problems. The overall training procedure
remains largely aligned with that of DeepSeek-Prover-V1 (Xin et al., 2024a) and DeepSeek-Prover-
V1.5 (Xin et al., 2024b), with only two modifications to the distribution of training problems.
First, we incorporate additional problems derived from autoformalization and various open-
source datasets (Ying et al., 2024; Dong and Ma, 2025; Lin et al., 2025), broadening the coverage
of the training problem domains. Second, we augment the dataset with problems generated
through subgoal decomposition, aiming at solving more challenging instances from the valid
split of the MiniF2F benchmark (Zheng et al., 2022).
6
Reinforcement Learning. We employ Group Relative Policy Optimization (GRPO; Shao et al.,
2024) as our reinforcement learning algorithm, which has demonstrated superior effectiveness
and efficiency in reasoning tasks (DeepSeek-AI, 2025). Unlike PPO (Schulman et al., 2017),
GRPO eliminates the need for a separate critic model by sampling a group of candidate proofs
for each theorem prompt and optimizing the policy based on their relative rewards. Training
utilizes binary rewards, where each generated Lean proof receives a reward of 1 if verified as
correct and 0 otherwise. To ensure effective learning, we curate training prompts to include
only problems that are sufficiently challenging yet solvable by the supervised fine-tuned model.
During each iteration, we sample 256 distinct problems, generating 32 candidate proofs per
theorem with a maximum sequence length of 32,768 tokens.
3. Experimental Results
In this section, we present a systematic evaluation of DeepSeek-Prover-V2 across diverse bench-
mark datasets of formal theorem proving, covering both high school competition problems
and undergraduate-level mathematics. All experimental results of DeepSeek-Prover-V2 are
conducted with Lean 4.9.0, using the same testing environment as DeepSeek-Prover-V1.5 (Xin
et al., 2024b). Without further specification, baseline evaluation results are sourced from their
respective original papers.
MiniF2F (Zheng et al., 2022) consists of 488 formalized problem statements sourced from a
diverse range of mathematical materials, including the AIME, AMC, and IMO competitions,
along with selected problems from the MATH dataset (Hendrycks et al., 2021). The benchmark
includes Olympiad-level problems covering core areas of elementary mathematics, including
algebra, number theory, and induction. These problems are divided into two equally sized
subsets, denoted by miniF2F-valid and miniF2F-test, each containing 244 problems with an
identical distribution across subject areas. We reserve the miniF2F-test set exclusively for evalu-
ating model performance, while the miniF2F-valid problems are incorporated into curriculum
learning with subgoal decomposition. We adopt the revised version of miniF2F released by
Wang et al. (2025), and further introduce two additional revisions to miniF2F-valid and one
revision to miniF2F-test (see Appendix C).
7
Method Model size Sample budget miniF2F-test
Tree Search Methods
Hypertree Proof Search (Lample et al., 2022) 600M 64 × 5000 41.0%
InternLM2.5-StepProver + BFS + CG (Wu et al., 2024) 7B 256 × 32 × 600 65.9%
HunyuanProver v16 + BFS + DC (Li et al., 2024) 7B 600 × 8 × 400 68.4%
BFS-Prover (Xin et al., 2025) 7B 2048 × 2 × 600 70.83% ± 0.89%
Whole-proof Generation Methods
Leanabell-Prover-GD-RL (Zhang et al., 2025) 7B 128 61.1%
Goedel-Prover-SFT (Lin et al., 2025) 7B 25600 64.7%
STP (Dong and Ma, 2025) 7B 25600 67.6%
1 52.5%
Kimina-Prover-Preview-Distill (Wang et al., 2025) 7B 32 63.1%
1024 70.8%
1 52.94%
32 68.85%
Kimina-Prover-Preview (Wang et al., 2025) 72B
1024 77.87%
8192 80.74%
1 55.5% ± 1.4%
32 68.0% ± 0.5%
7B
1024 73.2% ± 0.5%
8192 75.0%
DeepSeek-Prover-V2 (non-CoT)
1 59.5% ± 1.4%
32 73.8% ± 0.4%
671B
1024 76.7% ± 0.2%
8192 78.3%
1 58.6% ± 1.1%
32 75.6% ± 0.5%
7B
1024 79.9% ± 0.3%
8192 82.0%
DeepSeek-Prover-V2 (CoT)
1 61.9% ± 1.6%
32 82.4% ± 0.6%
671B
1024 86.6% ± 0.3%
8192 88.9%
Table 1 | Comparison with state-of-the-art models on the miniF2F-test dataset. The notation
𝜇 ± 𝜎 denotes the average accuracy 𝜇 and the standard deviation 𝜎. The tags CoT and non-CoT
refer to two generation modes of a unified model, each guided by a different prompt.
miniF2F-valid miniF2F-test
Problem Category
curriculum (+Pass@8192) Pass@8192
IMO 10/20 = 50.0% 10/20 = 50.0%
Olympiad AIME 10(+2)/15 = 80.0% 14/15 = 93.3%
AMC 39/45 = 86.7% 35/45 = 77.8%
Algebra 69/70 = 98.6% 70/70 = 100.0%
MATH
Number Theory 58/60 = 96.7% 58/60 = 96.7%
Algebra 18/18 = 100.0% 15/18 = 83.3%
Custom Number Theory 8/8 = 100.0% 7/8 = 87.5%
Induction 8/8 = 100.0% 8/8 = 100.0%
Overall Pass Rate 220(+2)/244 = 91.0% 217/244 = 88.9%
8
DeepSeek-Prover-V2-7B also exhibits competitive performance, surpassing all existing open-
source theorem provers in the literature. The comparative analysis further reveals a compelling
scaling pattern: as the sample budget increases from 1 to 8192, the performance gap between
the 7B and 671B variants widens considerably, with the larger model demonstrating superior
sample efficiency and a steeper improvement trajectory.
ProofNet (Azerbayev et al., 2023) consists of 371 problems in Lean 3, drawn from a range of
popular undergraduate pure mathematics textbooks, covering topics such as real and complex
analysis, linear algebra, abstract algebra, and topology. We use the Lean 4 translation of ProofNet
made available by Xin et al. (2024b), which is further divided into two splits: ProofNet-valid
and ProofNet-test, containing 185 and 186 problems, respectively. The test split of ProofNet
is reserved exclusively for model evaluation, as variants of the ProofNet-valid problems are
included in the public synthetic dataset provided by Dong and Ma (2025), which is used in
our supervised fine-tuning. The results, shown in Table 4, indicate a substantial improvement
in the pass rate of DeepSeek-Prover-V2 when using CoT reasoning compared to the non-
CoT setting. Notably, despite the training data being predominantly drawn from high-school
9
Method Model size Sample budget ProofNet-test PutnamBench
32 15.6% 6/644
Goedel-Prover-SFT (Lin et al., 2025) 7B
512 - 7/644
128 19.5% ± 0.7% 7/644
STP (Dong and Ma, 2025) 7B 3200 23.9% ± 0.6% 8/644
25600 26.9% -
32 21.6% ± 0.2% 11/658
7B 128 23.1% ± 0.6% 15/658
1024 24.7% 23/658
DeepSeek-Prover-V2 (non-CoT)
32 23.8% ± 0.2% 9/658
671B 128 27.2% ± 0.5% 11/658
1024 31.2% 16/658
32 23.0% ± 0.4% 9/658
7B 128 25.4% ± 0.7% 10/658
1024 29.6% 11/658
DeepSeek-Prover-V2 (CoT)
32 30.5% ± 0.7% 22/658
671B 128 33.6% ± 0.3% 33/658
1024 37.1% 49/658
Table 4 | The experimental results on ProofNet-test and PutnamBench. The scores for Goedel-
Prover-SFT and STP on PutnamBench are sourced from their original papers, which conducted
evaluations on an earlier version of PutnamBench comprising 644 problems.
level mathematics, the model exhibits strong generalization to more advanced, college-level
mathematical problems, underscoring its robust formal reasoning capabilities.
10
ProverBench
Method Model size Sample budget
All AIME 24&25
32 27.5% ± 0.7% 0/15
STP (Dong and Ma, 2025) 7B 128 31.4% ± 1.1% 1/15
512 36.3% 1/15
32 47.7% ± 0.6% 1/15
7B 128 48.8% ± 0.2% 1/15
512 49.5% 1/15
DeepSeek-Prover-V2 (non-CoT)
32 49.5% ± 0.5% 1/15
671B 128 51.5% ± 0.3% 2/15
512 52.3% 2/15
32 49.0% ± 0.3% 1/15
7B 128 50.8% ± 0.5% 1/15
512 51.7% 1/15
DeepSeek-Prover-V2 (CoT)
32 52.9% ± 0.9% 4/15
671B 128 56.5% ± 0.5% 5/15
512 59.1% 6/15
Table 6 | The experimental results on ProverBench. The All category represents the complete
evaluation set consisting of 325 problems, while AIME 24&25 denotes a subset of 15 problems
formalized from recent AIME competitions. The results for STP (Dong and Ma, 2025) are
evaluated using the open-source model weights.
To enhance existing benchmarks and advance research in formal theorem proving, we introduce
a benchmark dataset comprising 325 problems. Of these, 15 are formalized from number theory
and algebra questions featured in the recent AIME competitions (AIME 24 and 25), offering
authentic high-school competition-level challenges. The remaining 310 problems are drawn from
curated textbook examples and educational tutorials, contributing a diverse and pedagogically
grounded collection of formalized mathematical problems. This benchmark is designed to
enable more comprehensive evaluation across both high-school competition problems and
undergraduate-level mathematics.
11
AIME Formalization. The American In-
Contest Problems
vitational Mathematics Examination (AIME)
is an annual mathematics competition de- AIME 24I P2 , P7 , P13
signed to challenge and recognize talented AIME 24II P4 , P7, P13 , P14
high school students who demonstrate ex-
ceptional proficiency in mathematics. The AIME 25I P1 , P8 , P9, P11
problems from AIME 24&25 have become a AIME 25II P2 , P4 , P13, P15
standard benchmark for evaluating the rea-
soning capabilities of large language mod- Table 7 | Selection of AIME 24&25 problems for
els. In order to bridge the evaluation of formalization. Problems with underlined bolded
model performance across formal and in- indices have been solved by DeepSeek-Prover-
formal mathematical reasoning, we curate V2. Problems solved by DeepSeek-V3-0324 using
and formalize a subset of problems from Maj@16 are highlighted with a gray background.
AIME 24&25. To ensure cleaner formaliza-
tions, we filter out geometry, combinatorics, and counting problems whose representations in
Lean are potentially cumbersome. This results in 15 selected problems, covering competition-
level topics in elementary number theory and algebra. We evaluate DeepSeek-V3-0324 on the
selected set of problems using the standard find-answer task for natural-language mathematical
reasoning. With majority voting over 16 sampled responses, the model successfully solves 8 out
of 15 problems. In comparison, DeepSeek-Prover-V2-671B, operating under the formal proof
generation setting with given correct answers, is able to construct valid formal proofs for 6 of 15
problems. This comparison highlights that the performance gap between informal mathematical
reasoning and formal theorem proving is substantially narrowing, indicating growing alignment
between linguistic understanding and formal logical rigor in advanced language models.
12
671B with CoT reasoning consistently outperforms all baselines, reinforcing the trends observed
in other benchmark evaluations.
4. Conclusion
In this work, we propose a comprehensive pipeline for synthesizing cold-start reasoning data
to advance formal theorem proving. Our data construction process is grounded in a recursive
theorem-proving framework, wherein DeepSeek-V3 serves as a unified model for both subgoal
decomposition and lemma formalization within the Lean 4 proof assistant. Our approach com-
bines high-level proof sketches with formal steps, creating a sequence of manageable subgoals
that can be efficiently solved using a smaller 7B model, significantly reducing computational re-
quirements. The curriculum learning framework we developed uses these decomposed subgoals
to generate increasingly difficult training tasks, creating a more effective learning progression.
By pairing complete formal proofs with DeepSeek-V3’s chain-of-thought reasoning, we estab-
lished valuable cold-start reasoning data that bridges informal mathematical thinking with
formal proof structures. The subsequent reinforcement learning stage substantially enhanced
this connection, leading to significant improvements in formal theorem proving capabilities.
The resulting model, DeepSeek-Prover-V2-671B, consistently outperforms all baselines across
a range of benchmarks, spanning both high-school competition problems and undergraduate-
level mathematics. Our future work will focus on scaling this paradigm to an AlphaProof-like
system with the ultimate aim of tackling IMO-level mathematical problems that represent the
frontier of automated theorem proving challenges.
References
Z. Azerbayev, B. Piotrowski, H. Schoelkopf, E. W. Ayers, D. Radev, and J. Avigad. ProofNet:
Autoformalizing and formally proving undergraduate-level mathematics. arXiv preprint
arXiv:2302.12433, 2023.
K. Dong and T. Ma. STP: Self-play llm theorem provers with iterative conjecturing and proving.
arXiv preprint arXiv:2502.00212, 2025.
K. Dong, A. Mahankali, and T. Ma. Formal theorem proving by rewarding llms to decompose
proofs hierarchically. arXiv preprint arXiv:2411.01829, 2024.
13
M. Eppe, C. Gumbsch, M. Kerzel, P. D. Nguyen, M. V. Butz, and S. Wermter. Intelligent problem-
solving as integrated hierarchical reinforcement learning. Nature Machine Intelligence, 4(1):
11–20, 2022.
A. Q. Jiang, S. Welleck, J. P. Zhou, T. Lacroix, J. Liu, W. Li, M. Jamnik, G. Lample, and Y. Wu.
Draft, sketch, and prove: Guiding formal theorem provers with informal proofs. In The
Eleventh International Conference on Learning Representations, 2023.
Y. Li, D. Du, L. Song, C. Li, W. Wang, T. Yang, and H. Mi. Hunyuanprover: A scalable data
synthesis framework and guided tree search for automated theorem proving. arXiv preprint
arXiv:2412.20735, 2024.
Y. Lin, S. Tang, B. Lyu, J. Wu, H. Lin, K. Yang, J. Li, M. Xia, D. Chen, S. Arora, et al. Goedel-
Prover: A frontier model for open-source automated theorem proving. arXiv preprint
arXiv:2502.07640, 2025.
J. Liu, X. Lin, J. Bayer, Y. Dillies, W. Jiang, X. Liang, R. Soletskyi, H. Wang, Y. Xie, B. Xiong,
et al. CombiBench: Benchmarking llm capability for combinatorial mathematics. https:
//moonshotai.github.io/CombiBench/, 2025.
L. d. Moura and S. Ullrich. The Lean 4 theorem prover and programming language. In
Automated Deduction–CADE 28: 28th International Conference on Automated Deduction,
Virtual Event, July 12–15, 2021, Proceedings 28, pages 625–635. Springer, 2021.
S. Polu and I. Sutskever. Generative language modeling for automated theorem proving. arXiv
preprint arXiv:2009.03393, 2020.
Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, M. Zhang, Y. Li, Y. Wu, and D. Guo. DeepSeekMath:
Pushing the limits of mathematical reasoning in open language models. arXiv preprint
arXiv:2402.03300, 2024.
14
H. Wang, H. Xin, Z. Liu, W. Li, Y. Huang, J. Lu, Y. Zhicheng, J. Tang, J. Yin, Z. Li, et al. Prov-
ing theorems recursively. In The Thirty-eighth Annual Conference on Neural Information
Processing Systems, 2024a.
H. Wang, H. Xin, C. Zheng, Z. Liu, Q. Cao, Y. Huang, J. Xiong, H. Shi, E. Xie, J. Yin, et al.
Lego-prover: Neural theorem proving with growing libraries. In The Twelfth International
Conference on Learning Representations, 2024b.
H. Wang, M. Unsal, X. Lin, M. Baksys, J. Liu, M. D. Santos, F. Sung, M. Vinyes, Z. Ying, Z. Zhu,
et al. Kimina-Prover Preview: Towards large formal reasoning models with reinforcement
learning. arXiv preprint arXiv:2504.11354, 2025.
Z. Wu, S. Huang, Z. Zhou, H. Ying, J. Wang, D. Lin, and K. Chen. Internlm2. 5-stepprover:
Advancing automated theorem proving via expert iteration on large-scale lean problems.
arXiv preprint arXiv:2410.15700, 2024.
H. Xin, D. Guo, Z. Shao, Z. Ren, Q. Zhu, B. Liu, C. Ruan, W. Li, and X. Liang. DeepSeek-
Prover: Advancing theorem proving in llms through large-scale synthetic data. arXiv preprint
arXiv:2405.14333, 2024a.
H. Xin, Z. Ren, J. Song, Z. Shao, W. Zhao, H. Wang, B. Liu, L. Zhang, X. Lu, Q. Du, et al.
DeepSeek-Prover-V1.5: Harnessing proof assistant feedback for reinforcement learning and
monte-carlo tree search. arXiv preprint arXiv:2408.08152, 2024b.
R. Xin, C. Xi, J. Yang, F. Chen, H. Wu, X. Xiao, Y. Sun, S. Zheng, and K. Shen. BFS-Prover:
Scalable best-first tree search for llm-based automatic theorem proving. arXiv preprint
arXiv:2502.03438, 2025.
K. Yang, G. Poesia, J. He, W. Li, K. Lauter, S. Chaudhuri, and D. Song. Formal mathematical
reasoning: A new frontier in AI. arXiv preprint arXiv:2412.16075, 2024.
H. Ying, Z. Wu, Y. Geng, J. Wang, D. Lin, and K. Chen. Lean workbook: A large-scale lean
problem set formalized from natural language math problems. In The Thirty-eight Conference
on Neural Information Processing Systems Datasets and Benchmarks Track, 2024.
J. Zhang, Q. Wang, X. Ji, Y. Liu, Y. Yue, F. Zhang, D. Zhang, G. Zhou, and K. Gai. Leanabell-prover:
Posttraining scaling in formal reasoning. arXiv preprint arXiv:2504.06122, 2025.
X. Zhao, W. Li, and L. Kong. Decomposing the enigma: Subgoal-based demonstration learning
for formal theorem proving. arXiv preprint arXiv:2305.16366, 2023.
X. Zhao, L. Zheng, H. Bo, C. Hu, U. Thakker, and L. Kong. Subgoalxl: Subgoal-based expert
learning for theorem proving. arXiv preprint arXiv:2408.11172, 2024.
C. Zheng, H. Wang, E. Xie, Z. Liu, J. Sun, H. Xin, J. Shen, Z. Li, and Y. Li. Lyra: Orchestrating
dual correction in automated theorem proving. Transactions on Machine Learning Research,
2024.
K. Zheng, J. M. Han, and S. Polu. miniF2F: a cross-system benchmark for formal olympiad-level
mathematics. In International Conference on Learning Representations, 2022.
15
A. Examples of Non-CoT and CoT Prompting for Proof Generation
Input:
Complete the following Lean 4 code:
‘‘‘lean4
import Mathlib
import Aesop
set_option maxHeartbeats 0
Output:
‘‘‘lean4
import Mathlib
import Aesop
set_option maxHeartbeats 0
16
<;> nlinarith [Real.cos_sq_add_sin_sq (Real.pi / 7), Real.sin_le_one (Real.pi / 7),
Real.cos_le_one (Real.pi / 7)]
rw [h83 ]
rw [h82 ]
rw [h81 ]
rw [h7 , h8 ]
−− Simplify the expression using the derived identities
have h9 : Real.cos (Real.pi / 7) > 0 := h1
have h10 : Real.cos (Real.pi / 7) ≠ 0 := by linarith
−− Use the quadratic equation to solve for the sum of cosines
have h11 : Real.cos (Real.pi / 7) = Real.cos (Real.pi / 7) := rfl
have h12 : 8 * Real.cos (Real.pi / 7) ^ 3 − 4 * Real.cos (Real.pi / 7) ^ 2 − 4 * Real.cos (Real.pi / 7) + 1 = 0
:= by
−− Use the identity to form a polynomial equation and solve it
have h13 : Real.cos (Real.pi / 7) > 0 := h1
have h14 : Real.cos (Real.pi / 7) ≠ 0 := by linarith
have h15 : Real.cos (Real.pi / 7) = Real.cos (Real.pi / 7) := rfl
−− Use the polynomial equation to find the sum of cosines
have h16 : Real.cos (Real.pi / 7) ⩾ 0 := by linarith
−− Use the identity to form a polynomial equation and solve it
have h17 : Real.sin (Real.pi / 7) > 0 := by
apply Real.sin_pos_of_pos_of_lt_pi
<;> linarith [Real.pi_gt_three]
have h18 : Real.sin (3 * (Real.pi / 7)) > 0 := by
apply Real.sin_pos_of_pos_of_lt_pi
<;>
(try ring_nf at * <;> nlinarith [Real.pi_gt_three]) <;>
(try ring_nf) <;>
(try nlinarith [Real.pi_gt_three])
have h19 : Real.sin (3 * (Real.pi / 7)) = 3 * Real.sin (Real.pi / 7) − 4 * Real.sin (Real.pi / 7) ^ 3 := by
have h191 : Real.sin (3 * (Real.pi / 7)) = 3 * Real.sin (Real.pi / 7) − 4 * Real.sin (Real.pi / 7) ^ 3 := by
rw [Real.sin_three_mul]
<;> ring_nf
<;> nlinarith [Real.cos_sq_add_sin_sq (Real.pi / 7), Real.sin_le_one (Real.pi / 7),
Real.cos_le_one (Real.pi / 7)]
rw [h191 ]
have h20 : Real.sin (2 * (Real.pi / 7)) = 2 * Real.sin (Real.pi / 7) * Real.cos (Real.pi / 7) := by
have h201 : Real.sin (2 * (Real.pi / 7)) = 2 * Real.sin (Real.pi / 7) * Real.cos (Real.pi / 7) := by
rw [Real.sin_two_mul]
<;> ring_nf
<;> nlinarith [Real.cos_sq_add_sin_sq (Real.pi / 7), Real.sin_le_one (Real.pi / 7),
Real.cos_le_one (Real.pi / 7)]
rw [h201 ]
have h21 : Real.sin (3 * (Real.pi / 7)) = Real.sin (Real.pi − 4 * (Real.pi / 7)) := by
have h211 : 3 * (Real.pi / 7) = Real.pi − 4 * (Real.pi / 7) := by ring
rw [h211 ]
have h22 : Real.sin (Real.pi − 4 * (Real.pi / 7)) = Real.sin (4 * (Real.pi / 7)) := by
rw [Real.sin_pi_sub]
have h23 : Real.sin (4 * (Real.pi / 7)) = 2 * Real.sin (2 * (Real.pi / 7)) * Real.cos (2 * (Real.pi / 7)) := by
have h231 : Real.sin (4 * (Real.pi / 7)) = 2 * Real.sin (2 * (Real.pi / 7)) * Real.cos (2 * (Real.pi / 7)) :=
by
have h232 : Real.sin (4 * (Real.pi / 7)) = Real.sin (2 * (2 * (Real.pi / 7))) := by ring
rw [h232 ]
have h233 : Real.sin (2 * (2 * (Real.pi / 7))) = 2 * Real.sin (2 * (Real.pi / 7)) * Real.cos (2 * (Real.pi /
7)) := by
rw [Real.sin_two_mul]
<;> ring_nf
rw [h233 ]
rw [h231 ]
have h24 : Real.sin (2 * (Real.pi / 7)) > 0 := by
apply Real.sin_pos_of_pos_of_lt_pi
17
<;> (try ring_nf at * <;> nlinarith [Real.pi_gt_three]) <;> (try ring_nf) <;> (try nlinarith
[Real.pi_gt_three])
have h25 : Real.cos (2 * (Real.pi / 7)) > 0 := by
have h251 : Real.cos (2 * (Real.pi / 7)) > 0 := by
apply Real.cos_pos_of_mem_Ioo
<;> constructor <;> (try ring_nf at * <;> nlinarith [Real.pi_gt_three]) <;> (try ring_nf) <;>
(try nlinarith [Real.pi_gt_three])
linarith
have h26 : Real.sin (3 * (Real.pi / 7)) > 0 := h18
have h27 : Real.sin (Real.pi / 7) > 0 := h17
have h28 : Real.sin (2 * (Real.pi / 7)) > 0 := h24
have h29 : Real.cos (2 * (Real.pi / 7)) > 0 := h25
−− Use the polynomial equation to find the sum of cosines
nlinarith [Real.sin_sq_add_cos_sq (Real.pi / 7), Real.sin_sq_add_cos_sq (2 * (Real.pi / 7)),
Real.sin_sq_add_cos_sq (3 * (Real.pi / 7)), Real.sin_le_one (Real.pi / 7), Real.sin_le_one (2 *
(Real.pi / 7)),
Real.sin_le_one (3 * (Real.pi / 7)), Real.cos_le_one (Real.pi / 7), Real.cos_le_one (2 * (Real.pi /
7)),
Real.cos_le_one (3 * (Real.pi / 7))]
−− Use the polynomial equation to find the sum of cosines
nlinarith [Real.cos_le_one (Real.pi / 7), Real.cos_le_one (2 * (Real.pi / 7)), Real.cos_le_one (3 *
(Real.pi / 7))]
−− Use the polynomial equation to find the sum of cosines
linarith
‘‘‘
Input:
Complete the following Lean 4 code:
‘‘‘lean4
import Mathlib
import Aesop
set_option maxHeartbeats 0
/-- Let 𝑝, 𝑞, 𝑟 be integers with 1 < 𝑝 < 𝑞 < 𝑟. Show that if ( 𝑝 − 1) ( 𝑞 − 1) ( 𝑟 − 1) divides
𝑝𝑞𝑟 − 1, then ( 𝑝, 𝑞, 𝑟 ) = (2, 4, 8) or ( 𝑝, 𝑞, 𝑟 ) = (3, 5, 15) .-/
theorem imo_1992_p1 (p q r : Z) (h0 : 1 < p ∧ p < q ∧ q < r)
(h1 : (p - 1) * (q - 1) * (r - 1) | p * q * r - 1) :
(p, q, r) = (2, 4, 8) ∨ (p, q, r) = (3, 5, 15) := by
sorry
‘‘‘
Before producing the Lean 4 code to formally prove the given theorem, provide
a detailed proof plan outlining the main proof steps and strategies.
The plan should highlight key ideas, intermediate lemmas, and proof
structures that will guide the construction of the final formal proof.
To enhance clarity, model outputs are reformulated using inline LaTeX expressions.
Output:
18
### Detailed Proof and Analysis
We are given integers 𝑝, 𝑞, 𝑟 such that 1 < 𝑝 < 𝑞 < 𝑟, and ( 𝑝 − 1)( 𝑞 − 1)( 𝑟 − 1) divides 𝑝𝑞𝑟 − 1. We need to
prove that ( 𝑝, 𝑞, 𝑟) is either (2, 4, 8) or (3, 5, 15).
But the precise calculation is not needed. Since 𝑝 ⩾ 2, 𝑞 ⩾ 3, 𝑟 ⩾ 4, 1 ⩽ 16 , while 𝑝𝑞𝑟 ⩾ 24,
( 𝑝 −1) ( 𝑞 −1) ( 𝑟 −1)
so:
𝑝𝑞𝑟 −1 𝑝𝑞𝑟
𝑘 = ( 𝑝 −1) ( 𝑞 −1) ( 𝑟 −1) < ( 𝑝 −1) ( 𝑞 −1) ( 𝑟 −1) ⩽ 4
and the bound 𝑘 < 4 follows. Since 𝑘 is an integer, 𝑘 ∈ {1, 2, 3}.
However, we can tighten this further. The condition 𝑘 ⩾ 4 would require ( 𝑝, 𝑞, 𝑟) = (2, 3, 4), but
𝑝𝑞𝑟 − 1 = 23 and ( 𝑝 − 1)( 𝑞 − 1)( 𝑟 − 1) = 6, which does not divide. So we must have 𝑘 < 4.
Alternatively, we can note that for 𝑝 ⩾ 2, 𝑞 ⩾ 3, 𝑟 ⩾ 4, the product ( 𝑝 − 1)( 𝑞 − 1)( 𝑟 − 1) is at least 6, and
𝑝𝑞𝑟 − 1 < 𝑝𝑞𝑟 , so 𝑘 ⩽ 3.
**Case 𝑘 = 1:**
𝑝𝑞𝑟 − 1 = ( 𝑝 − 1)( 𝑞 − 1)( 𝑟 − 1)
Expand the right side:
𝑝𝑞𝑟 − 𝑝𝑞 − 𝑝𝑟 − 𝑞𝑟 + 𝑝 + 𝑞 + 𝑟 − 1 = 𝑝𝑞𝑟 − 1
Simplify to:
− 𝑝𝑞 − 𝑝𝑟 − 𝑞𝑟 + 𝑝 + 𝑞 + 𝑟 = 0
or:
𝑝𝑞 + 𝑝𝑟 + 𝑞𝑟 = 𝑝 + 𝑞 + 𝑟
Since 𝑝, 𝑞, 𝑟 ⩾ 2, we have 𝑝𝑞 + 𝑝𝑟 + 𝑞𝑟 ⩾ 4 𝑝 + 4𝑞 + 4𝑟, which is much larger than 𝑝 + 𝑞 + 𝑟, so no solutions
exist.
**Case 𝑘 = 2:**
𝑝𝑞𝑟 − 1 = 2( 𝑝 − 1)( 𝑞 − 1)( 𝑟 − 1)
Expand and simplify:
𝑝𝑞𝑟 − 1 = 2( 𝑝𝑞𝑟 − 𝑝𝑞 − 𝑝𝑟 − 𝑞𝑟 + 𝑝 + 𝑞 + 𝑟 − 1)
𝑝𝑞𝑟 − 1 = 2 𝑝𝑞𝑟 − 2 𝑝𝑞 − 2 𝑝𝑟 − 2𝑞𝑟 + 2 𝑝 + 2𝑞 + 2𝑟 − 2
− 𝑝𝑞𝑟 + 2 𝑝𝑞 + 2 𝑝𝑟 + 2𝑞𝑟 − 2 𝑝 − 2𝑞 − 2𝑟 + 1 = 0
19
Multiply by −1:
𝑝𝑞𝑟 − 2 𝑝𝑞 − 2 𝑝𝑟 − 2𝑞𝑟 + 2 𝑝 + 2𝑞 + 2𝑟 − 1 = 0
Factor:
( 𝑝 − 2) ( 𝑞 − 2) ( 𝑟 − 2) = 𝑝𝑞𝑟 − 2 𝑝𝑞 − 2 𝑝𝑟 − 2𝑞𝑟 + 4 𝑝 + 4𝑞 + 4𝑟 − 8
But:
𝑝𝑞𝑟 − 2 𝑝𝑞 − 2 𝑝𝑟 − 2𝑞𝑟 + 2 𝑝 + 2𝑞 + 2𝑟 − 1 = 0
Thus:
( 𝑝 − 2) ( 𝑞 − 2) ( 𝑟 − 2) = ( 𝑝𝑞𝑟 − 2 𝑝𝑞 − 2 𝑝𝑟 − 2𝑞𝑟 + 2 𝑝 + 2𝑞 + 2𝑟 − 1) − (−2 𝑝 − 2𝑞 − 2𝑟 + 7)
Wait, no. It is easier to write:
( 𝑝 − 2) ( 𝑞 − 2) ( 𝑟 − 2) = 𝑝𝑞𝑟 − 2 𝑝𝑞 − 2 𝑝𝑟 − 2𝑞𝑟 + 4 𝑝 + 4𝑞 + 4𝑟 − 8
Then:
𝑝𝑞𝑟 − 2 𝑝𝑞 − 2 𝑝𝑟 − 2𝑞𝑟 + 2 𝑝 + 2𝑞 + 2𝑟 − 1 = 0
Subtract:
( 𝑝 − 2) ( 𝑞 − 2) ( 𝑟 − 2) − ( 𝑝𝑞𝑟 − 2 𝑝𝑞 − 2 𝑝𝑟 − 2𝑞𝑟 + 2 𝑝 + 2𝑞 + 2𝑟 − 1) = 2 𝑝 + 2𝑞 + 2𝑟 − 7
But from above:
( 𝑝 − 2) ( 𝑞 − 2) ( 𝑟 − 2) = 2 𝑝 + 2𝑞 + 2𝑟 − 7
Since 𝑝 ⩾ 2, the left is ⩾ 0, so 2 𝑝 + 2𝑞 + 2𝑟 − 7 ⩾ 0.
Similarly, set 𝑝 = 3:
3𝑞𝑟 − 1 = 2(2) ( 𝑞 − 1)( 𝑟 − 1)
3𝑞𝑟 − 1 = 4( 𝑞𝑟 − 𝑞 − 𝑟 + 1)
3𝑞𝑟 − 1 = 4𝑞𝑟 − 4𝑞 − 4𝑟 + 4
−𝑞𝑟 + 4𝑞 + 4𝑟 − 5 = 0
𝑞𝑟 − 4𝑞 − 4𝑟 + 5 = 0
Add 11:
𝑞𝑟 − 4𝑞 − 4𝑟 + 16 = 11
Factor:
( 𝑞 − 4) ( 𝑟 − 4) = 11
Since 𝑞 > 𝑝 = 3, 𝑞 ⩾ 4, and 𝑟 > 𝑞, possible factorings are:
− 𝑞 − 4 = 1, 𝑟 − 4 = 11: 𝑞 = 5, 𝑟 = 15
− 𝑞 − 4 = 11, 𝑟 − 4 = 1: But then 𝑞 = 15, 𝑟 = 5, but 𝑟 > 𝑞, invalid.
Thus, the only solution is ( 𝑝, 𝑞, 𝑟) = (3, 5, 15).
For 𝑝 ⩾ 4, the term ( 𝑝 − 2)( 𝑞 − 2)( 𝑟 − 2) dominates 2 𝑝 + 2𝑞 + 2𝑟 − 7, so no solutions will exist. Formally,
since 𝑞 ⩾ 𝑝 + 1 ⩾ 5, 𝑟 ⩾ 𝑞 + 1 ⩾ 6, we have:
( 𝑝 − 2) ( 𝑞 − 2) ( 𝑟 − 2) ⩾ (4 − 2)(5 − 2)(6 − 2) = 24
while 2 𝑝 + 2𝑞 + 2𝑟 − 7 ⩽ 2( 𝑟 − 2) + 2( 𝑟 − 1) + 2𝑟 − 7 = 6𝑟 − 13, but no, let’s instead note that:
( 𝑝 − 2) ( 𝑞 − 2) ( 𝑟 − 2) ⩾ ( 𝑝 − 2)(( 𝑝 + 1) − 2)(( 𝑝 + 2) − 2) = ( 𝑝 − 2)( 𝑝 − 1)( 𝑝)
and 2 𝑝 + 2𝑞 + 2𝑟 − 7 < 6𝑟. But 𝑝 ⩾ 4, ( 𝑝 − 2)( 𝑝 − 1)( 𝑝) ⩾ 24, 𝑟 ⩾ 𝑝 + 2 ⩾ 6, but this is not directly leading to
a contradiction.
20
and 2 𝑝 + 2𝑞 + 2𝑟 − 7 ⩽ 2 𝑝 + 2( 𝑝 + 1) + 2( 𝑝 + 2) − 7 = 6 𝑝 − 1. But 6 𝑝 − 1 < ( 𝑝 − 2)( 𝑝 − 1)( 𝑝) for 𝑝 ⩾ 4, since 𝑝 = 4
gives 23 < 24, 𝑝 = 5 gives 29 < 60, etc. Thus no solutions exist for 𝑝 ⩾ 4.
But we have already found (3, 5, 15), and need to consider other 𝑘.
**Case 𝑘 = 3:**
The equation is:
𝑝𝑞𝑟 − 1 = 3( 𝑝 − 1)( 𝑞 − 1)( 𝑟 − 1)
Expand:
𝑝𝑞𝑟 − 1 = 3( 𝑝𝑞𝑟 − 𝑝𝑞 − 𝑝𝑟 − 𝑞𝑟 + 𝑝 + 𝑞 + 𝑟 − 1)
𝑝𝑞𝑟 − 1 = 3 𝑝𝑞𝑟 − 3 𝑝𝑞 − 3 𝑝𝑟 − 3𝑞𝑟 + 3 𝑝 + 3𝑞 + 3𝑟 − 3
−2 𝑝𝑞𝑟 + 3 𝑝𝑞 + 3 𝑝𝑟 + 3𝑞𝑟 − 3 𝑝 − 3𝑞 − 3𝑟 + 2 = 0
Multiply by −1:
2 𝑝𝑞𝑟 − 3 𝑝𝑞 − 3 𝑝𝑟 − 3𝑞𝑟 + 3 𝑝 + 3𝑞 + 3𝑟 − 2 = 0
As before, look for small 𝑝.
Set 𝑝 = 2:
4𝑞𝑟 − 6𝑞 − 6𝑟 + 6 + 3𝑞 + 3𝑟 − 2 = 0
Simplify:
4𝑞𝑟 − 3𝑞 − 3𝑟 + 4 = 0
Factor:
(4𝑞 − 3) ( 𝑟 ) − 3𝑞 + 4 = 0
Not obvious, but solve:
4𝑞𝑟 − 3𝑞 − 3𝑟 + 4 = 0
4𝑞𝑟 − 3𝑞 − 3𝑟 = −4
Not possible since 𝑞 ⩾ 3, 𝑟 ⩾ 4, so 4𝑞𝑟 − 3𝑞 − 3𝑟 ⩾ 4 · 3 · 4 − 3 · 3 − 3 · 4 = 48 − 9 − 12 = 27 > −4.
Wait, but we need to revisit the case when 𝑝 = 2. The correct equation is:
2𝑞𝑟 − 1 = 3(1) ( 𝑞 − 1)( 𝑟 − 1) = 3( 𝑞𝑟 − 𝑞 − 𝑟 + 1)
Thus:
2𝑞𝑟 − 1 = 3𝑞𝑟 − 3𝑞 − 3𝑟 + 3
−𝑞𝑟 + 3𝑞 + 3𝑟 − 4 = 0
Multiply by −1:
𝑞𝑟 − 3𝑞 − 3𝑟 + 4 = 0
Add 5:
𝑞𝑟 − 3𝑞 − 3𝑟 + 9 = 5
Factor:
( 𝑞 − 3) ( 𝑟 − 3) = 5
Since 𝑞 ⩾ 3, 𝑟 ⩾ 4, possible pairs are:
− 𝑞 − 3 = 1, 𝑟 − 3 = 5: 𝑞 = 4, 𝑟 = 8
− 𝑞 − 3 = 5, 𝑟 − 3 = 1: 𝑞 = 8, 𝑟 = 4, invalid as 𝑟 > 𝑞.
Thus, the only solution is (2, 4, 8).
Set 𝑝 = 3:
3𝑞𝑟 − 1 = 3(2) ( 𝑞 − 1)( 𝑟 − 1)
3𝑞𝑟 − 1 = 6( 𝑞𝑟 − 𝑞 − 𝑟 + 1)
3𝑞𝑟 − 1 = 6𝑞𝑟 − 6𝑞 − 6𝑟 + 6
−3𝑞𝑟 + 6𝑞 + 6𝑟 − 7 = 0
3𝑞𝑟 − 6𝑞 − 6𝑟 + 7 = 0
Multiply by 4:
12𝑞𝑟 − 24𝑞 − 24𝑟 + 28 = 0
Add 1:
12𝑞𝑟 − 24𝑞 − 24𝑟 + 49 = 1
Factor:
(3𝑞 − 6) (4𝑟 − 8) = −47
But 𝑞 ⩾ 4, 𝑟 ⩾ 5, so (3𝑞 − 6) ⩾ 6, (4𝑟 − 8) ⩾ 12, so left side is ⩾ 72, which cannot be −47. Thus no solution
here.
21
Set 𝑝 = 4:
4𝑞𝑟 − 1 = 3(3) ( 𝑞 − 1)( 𝑟 − 1)
4𝑞𝑟 − 1 = 9𝑞𝑟 − 9𝑞 − 9𝑟 + 9
−5𝑞𝑟 + 9𝑞 + 9𝑟 − 10 = 0
Multiply by 5:
−25𝑞𝑟 + 45𝑞 + 45𝑟 − 50 = 0
Factor:
(5𝑞 − 9) (5𝑟 − 9) = 31
Since 𝑞 ⩾ 5, 𝑟 ⩾ 6, (5𝑞 − 9) ⩾ 16, (5𝑟 − 9) ⩾ 21, so left side is ⩾ 16 · 21 = 336 > 31. Thus no solution here.
Similarly, for larger 𝑝 ⩾ 4, no solutions exist, because the term 𝑝𝑞𝑟 dominates.
3. **Case 𝑘 = 1**:
− Derive the equation 𝑝𝑞 + 𝑝𝑟 + 𝑞𝑟 = 𝑝 + 𝑞 + 𝑟 .
− This has no solutions since 𝑝, 𝑞, 𝑟 ⩾ 2, making the left side much larger than the right.
4. **Case 𝑘 = 2**:
− Derive the equation 𝑝𝑞𝑟 − 2 𝑝𝑞 − 2 𝑝𝑟 − 2𝑞𝑟 + 2 𝑝 + 2𝑞 + 2𝑟 − 1 = 0.
− For 𝑝 = 2, no solution.
− For 𝑝 = 3, solve to get ( 𝑝, 𝑞, 𝑟 ) = (3, 5, 15).
− For 𝑝 ⩾ 4, no solutions exist.
5. **Case 𝑘 = 3**:
− Derive the equation 2 𝑝𝑞𝑟 − 3 𝑝𝑞 − 3 𝑝𝑟 − 3𝑞𝑟 + 3 𝑝 + 3𝑞 + 3𝑟 − 2 = 0.
− For 𝑝 = 2, solve to get ( 𝑝, 𝑞, 𝑟 ) = (2, 4, 8).
− For 𝑝 ⩾ 3, no solutions exist.
6. **Conclusion**:
− The only valid triples are (2, 4, 8) and (3, 5, 15).
‘‘‘lean4
theorem imo_1992_p1 (p q r : Z) (h0 : 1 < p ∧ p < q ∧ q < r)
(h1 : (p − 1) * (q − 1) * (r − 1) | p * q * r − 1) :
(p, q, r) = (2, 4, 8) ∨ (p, q, r) = (3, 5, 15) := by
have h2 : p ⩾ 2 := by sorry
have h3 : q ⩾ 3 := by sorry
have h4 : r ⩾ 4 := by sorry
have h5 : (p, q, r) = (2, 4, 8) ∨ (p, q, r) = (3, 5, 15) := by sorry
sorry
‘‘‘
‘‘‘lean4
22
theorem imo_1992_p1 (p q r : Z) (h0 : 1 < p ∧ p < q ∧ q < r)
(h1 : (p − 1) * (q − 1) * (r − 1) | p * q * r − 1) :
(p, q, r) = (2, 4, 8) ∨ (p, q, r) = (3, 5, 15) := by
have h2 : p ⩾ 2 := by
linarith [h0 .1]
have h3 : q ⩾ 3 := by
by_contra h
have h4 : q ⩽ 2 := by linarith
have h5 : q ⩾ 2 := by linarith [h0 .2.1]
have h6 : q = 2 := by linarith
have h7 : p < 2 := by linarith [h0 .2.1]
have h8 : p ⩾ 2 := by linarith [h0 .1]
linarith
have h4 : r ⩾ 4 := by
by_contra h
have h5 : r ⩽ 3 := by linarith
have h6 : r ⩾ 3 := by linarith [h0 .2.2]
have h7 : r = 3 := by linarith
have h8 : q < 3 := by linarith [h0 .2.2]
have h9 : q ⩾ 3 := by linarith [h3 ]
linarith
23
have h17 : (p − 1 : Z) * (q − 1 : Z) ⩾ 2 := by
nlinarith
have h18 : (p − 1 : Z) * (q − 1 : Z) * (r − 1 : Z) ⩾ 6 := by
nlinarith
have h19 : (k : Z) * ((p − 1 : Z) * (q − 1 : Z) * (r − 1 : Z)) ⩾ 4 * ((p − 1 : Z) * (q − 1 : Z) * (r − 1 : Z)) := by
nlinarith
have h20 : (p * q * r − 1 : Z) ⩾ 4 * ((p − 1 : Z) * (q − 1 : Z) * (r − 1 : Z)) := by
linarith
have h21 : (p * q * r − 1 : Z) < 4 * ((p − 1 : Z) * (q − 1 : Z) * (r − 1 : Z)) := by
have h22 : (p : Z) * q * r < 4 * ((p − 1 : Z) * (q − 1 : Z) * (r − 1 : Z)) + 1 := by
nlinarith [mul_nonneg (sub_nonneg.mpr h2 ) (sub_nonneg.mpr h3 ),
mul_nonneg (sub_nonneg.mpr h2 ) (sub_nonneg.mpr h4 ),
mul_nonneg (sub_nonneg.mpr h3 ) (sub_nonneg.mpr h4 )]
linarith
linarith
have h13 : k = 1 ∨ k = 2 ∨ k = 3 := by
omega
−− We need to handle each case of k separately
rcases h13 with (rfl | rfl | rfl)
· −− Case k = 1
have h14 : p * q * r − 1 = (p − 1) * (q − 1) * (r − 1) := by
ring_nf at hk ⊢
<;> linarith
have h15 : p * q + p * r + q * r = p + q + r := by
have h151 : p * q * r − 1 = (p − 1) * (q − 1) * (r − 1) := by linarith
ring_nf at h151 ⊢
nlinarith
have h16 : p = 2 := by
by_contra h16
have h17 : p ⩾ 3 := by
by_contra h17
have h18 : p ⩽ 2 := by linarith
have h19 : p = 2 := by linarith
contradiction
have h20 : q ⩾ 3 := by linarith
have h21 : r ⩾ 4 := by linarith
have h22 : (p : Z) * q ⩾ 3 * 3 := by
nlinarith
have h23 : (p : Z) * r ⩾ 3 * 4 := by
nlinarith
have h24 : (q : Z) * r ⩾ 3 * 4 := by
nlinarith
nlinarith
have h17 : q = 3 := by
by_contra h17
have h18 : q ⩾ 4 := by
by_contra h18
have h19 : q ⩽ 3 := by linarith
have h20 : q = 3 := by linarith
contradiction
have h21 : p = 2 := by linarith
have h22 : (p : Z) * q ⩾ 2 * 4 := by
nlinarith
have h23 : (p : Z) * r ⩾ 2 * 4 := by
nlinarith
have h24 : (q : Z) * r ⩾ 4 * 4 := by
nlinarith
nlinarith
have h18 : r = 4 := by
by_contra h18
have h19 : r ⩾ 5 := by
24
by_contra h19
have h20 : r ⩽ 4 := by linarith
have h21 : r = 4 := by linarith
contradiction
have h22 : p = 2 := by linarith
have h23 : q = 3 := by linarith
have h24 : (p : Z) * q ⩾ 2 * 3 := by
nlinarith
have h25 : (p : Z) * r ⩾ 2 * 5 := by
nlinarith
have h26 : (q : Z) * r ⩾ 3 * 5 := by
nlinarith
nlinarith
exfalso
norm_num [h16 , h17 , h18 ] at h14 h15 hk h0 ⊢ <;> linarith
· −− Case k = 2
have h14 : p * q * r − 1 = 2 * ((p − 1) * (q − 1) * (r − 1)) := by
ring_nf at hk ⊢
<;> linarith
have h15 : p = 3 := by
by_contra h15
have h16 : p ≠ 3 := by tauto
−− We need to show that p cannot be greater than 3
have h17 : p ⩾ 4 := by
by_contra h17
have h18 : p ⩽ 3 := by linarith
have h19 : p = 2 := by
by_contra h19
have h20 : p ⩾ 3 := by omega
have h21 : p = 3 := by omega
contradiction
have h22 : p = 2 := by omega
have h23 : q ⩾ 3 := by linarith
have h24 : r ⩾ 4 := by linarith
have h25 : (p : Z) * q ⩾ 2 * 3 := by nlinarith
have h26 : (p : Z) * r ⩾ 2 * 4 := by nlinarith
have h27 : (q : Z) * r ⩾ 3 * 4 := by nlinarith
have h28 : (p : Z) * q * r ⩾ 2 * 3 * 4 := by nlinarith
have h29 : (p : Z) * q * r − 1 = 2 * ((p − 1) * (q − 1) * (r − 1)) := by linarith
have h30 : (p : Z) = 2 := by omega
have h31 : (q : Z) ⩾ 3 := by omega
have h32 : (r : Z) ⩾ 4 := by omega
have h33 : (p : Z) * q * r − 1 = 2 * ((p − 1) * (q − 1) * (r − 1)) := by linarith
have h34 : (p : Z) = 2 := by omega
have h35 : (q : Z) ⩾ 3 := by omega
have h36 : (r : Z) ⩾ 4 := by omega
have h37 : (p : Z) * q * r − 1 = 2 * ((p − 1) * (q − 1) * (r − 1)) := by linarith
have h38 : False := by
have h39 : (p : Z) = 2 := by omega
have h40 : (q : Z) ⩾ 3 := by omega
have h41 : (r : Z) ⩾ 4 := by omega
have h42 : (p : Z) * q * r − 1 = 2 * ((p − 1) * (q − 1) * (r − 1)) := by linarith
have h43 : 2 * q * r − 1 = 2 * (1 * (q − 1) * (r − 1)) := by
simp [h39 ] at h42 ⊢
<;> ring_nf at h42 ⊢ <;> linarith
have h44 : 2 * q * r − 1 = 2 * ((q − 1) * (r − 1)) := by
ring_nf at h43 ⊢ <;> linarith
have h45 : 2 * q * r − 1 = 2 * (q * r − q − r + 1) := by
ring_nf at h44 ⊢ <;> linarith
have h46 : 2 * q * r − 1 = 2 * q * r − 2 * q − 2 * r + 2 := by
ring_nf at h45 ⊢ <;> linarith
25
have h47 : −1 = −2 * q − 2 * r + 2 := by
linarith
have h48 : 2 * q + 2 * r = 3 := by linarith
have h49 : (q : Z) ⩾ 3 := by omega
have h50 : (r : Z) ⩾ 4 := by omega
have h51 : 2 * q + 2 * r ⩾ 14 := by
nlinarith
linarith
exact h38
have h19 : q ⩾ p + 1 := by omega
have h20 : r ⩾ q + 1 := by omega
have h21 : (p : Z) ⩾ 4 := by omega
have h22 : (q : Z) ⩾ 5 := by omega
have h23 : (r : Z) ⩾ 6 := by omega
have h24 : (p : Z) * q ⩾ 4 * 5 := by nlinarith
have h25 : (p : Z) * r ⩾ 4 * 6 := by nlinarith
have h26 : (q : Z) * r ⩾ 5 * 6 := by nlinarith
have h27 : (p : Z) * q * r ⩾ 4 * 5 * 6 := by nlinarith
have h28 : (p : Z) * q * r − 1 = 2 * ((p − 1) * (q − 1) * (r − 1)) := by linarith
have h29 : (p : Z) ⩾ 4 := by omega
have h30 : (q : Z) ⩾ 5 := by omega
have h31 : (r : Z) ⩾ 6 := by omega
have h32 : (p : Z) * q * r − 1 = 2 * ((p − 1) * (q − 1) * (r − 1)) := by linarith
have h33 : False := by
nlinarith [sq_nonneg ((p : Z) − 2), sq_nonneg ((q : Z) − 2), sq_nonneg ((r : Z) − 2),
mul_nonneg (sub_nonneg.mpr h2 ) (sub_nonneg.mpr h3 ),
mul_nonneg (sub_nonneg.mpr h2 ) (sub_nonneg.mpr h4 ),
mul_nonneg (sub_nonneg.mpr h3 ) (sub_nonneg.mpr h4 )]
exact h33
have h16 : q = 5 := by
have h17 : p = 3 := by linarith
have h18 : (p : Z) * q * r − 1 = 2 * ((p − 1) * (q − 1) * (r − 1)) := by linarith
have h19 : (p : Z) = 3 := by norm_num [h17 ]
have h20 : (q : Z) ⩾ 4 := by
by_contra h20
have h21 : q ⩽ 3 := by linarith
have h22 : q = 3 := by linarith
have h23 : (p : Z) = 3 := by norm_num [h17 ]
have h24 : (q : Z) = 3 := by norm_num [h22 ]
have h25 : (r : Z) ⩾ 4 := by linarith
norm_num [h17 , h22 , h23 , h24 ] at h18
<;>
(try omega) <;>
(try nlinarith) <;>
(try
{
nlinarith [mul_pos (sub_pos.mpr h0 .2.1) (sub_pos.mpr h0 .2.2)]
})
have h21 : (r : Z) ⩾ q + 1 := by linarith
have h22 : (q : Z) ⩾ 4 := by linarith
have h23 : (p : Z) = 3 := by norm_num [h17 ]
have h24 : (p : Z) * q * r − 1 = 2 * ((p − 1) * (q − 1) * (r − 1)) := by linarith
have h25 : 3 * q * r − 1 = 2 * (2 * (q − 1) * (r − 1)) := by
norm_num [h17 , h19 ] at h24 ⊢
<;> ring_nf at h24 ⊢ <;> linarith
have h26 : 3 * q * r − 1 = 4 * ((q − 1) * (r − 1)) := by
ring_nf at h25 ⊢
<;> nlinarith
have h27 : 3 * q * r − 1 = 4 * (q * r − q − r + 1) := by
ring_nf at h26 ⊢
<;> nlinarith
26
have h28 : 3 * q * r − 1 = 4 * q * r − 4 * q − 4 * r + 4 := by
ring_nf at h27 ⊢
<;> nlinarith
have h29 : − q * r + 4 * q + 4 * r − 5 = 0 := by
ring_nf at h28 ⊢
<;> nlinarith
have h30 : (q : Z) * r − 4 * q − 4 * r + 5 = 0 := by
ring_nf at h29 ⊢
<;> nlinarith
have h31 : (q − 4 : Z) * (r − 4 : Z) = 11 := by
ring_nf at h30 ⊢
<;> nlinarith
have h32 : (q : Z) − 4 = 1 ∧ (r : Z) − 4 = 11 ∨ (q : Z) − 4 = 11 ∧ (r : Z) − 4 = 1 ∨ (q : Z) − 4 = −1 ∧ (r : Z) − 4 =
−11 ∨ (q : Z) − 4 = −11 ∧ (r : Z) − 4 = −1 := by
have h33 : (q : Z) − 4 = 1 ∨ (q : Z) − 4 = 11 ∨ (q : Z) − 4 = −1 ∨ (q : Z) − 4 = −11 := by
have h34 : (q : Z) − 4 | 11 := by
use (r : Z) − 4
linarith
have h35 : (q : Z) − 4 = 1 ∨ (q : Z) − 4 = 11 ∨ (q : Z) − 4 = −1 ∨ (q : Z) − 4 = −11 := by
have h36 : (q : Z) − 4 = 1 ∨ (q : Z) − 4 = 11 ∨ (q : Z) − 4 = −1 ∨ (q : Z) − 4 = −11 := by
rw [← Int.natAbs_dvd_natAbs] at h34
−− We use the fact that the absolute value of (q − 4) divides the absolute value of 11
have h37 : ((q : Z) − 4).natAbs | 11 := by
simpa [Int.natAbs] using h34
−− Since the possible divisors of 11 are 1 and 11, we check the cases
have h38 : ((q : Z) − 4).natAbs = 1 ∨ ((q : Z) − 4).natAbs = 11 := by
have h39 : ((q : Z) − 4).natAbs | 11 := h37
have h40 : ((q : Z) − 4).natAbs ⩽ 11 := Nat.le_of_dvd (by decide) h39
interval_cases ((q : Z) − 4).natAbs <;> norm_num at h39 ⊢ <;> omega
cases h38 with
| inl h38 =>
have h41 : (q : Z) − 4 = 1 ∨ (q : Z) − 4 = −1 := by
have h42 : ((q : Z) − 4).natAbs = 1 := h38
have h43 : (q : Z) − 4 = 1 ∨ (q : Z) − 4 = −1 := by
rw [Int.natAbs_eq_iff] at h42
tauto
exact h43
cases h41 with
| inl h41 =>
tauto
| inr h41 =>
tauto
| inr h38 =>
have h41 : (q : Z) − 4 = 11 ∨ (q : Z) − 4 = −11 := by
have h42 : ((q : Z) − 4).natAbs = 11 := h38
have h43 : (q : Z) − 4 = 11 ∨ (q : Z) − 4 = −11 := by
rw [Int.natAbs_eq_iff] at h42
tauto
exact h43
cases h41 with
| inl h41 =>
tauto
| inr h41 =>
tauto
exact h36
exact h35
cases h33 with
| inl h33 =>
have h34 : (q : Z) − 4 = 1 := h33
have h35 : (r : Z) − 4 = 11 := by
have h36 : ((q : Z) − 4) * ((r : Z) − 4) = 11 := by
27
linarith
rw [h34 ] at h36
linarith
exact Or.inl ⟨h34 , h35 ⟩
| inr h33 =>
cases h33 with
| inl h33 =>
have h34 : (q : Z) − 4 = 11 := h33
have h35 : (r : Z) − 4 = 1 := by
have h36 : ((q : Z) − 4) * ((r : Z) − 4) = 11 := by
linarith
rw [h34 ] at h36
linarith
exact Or.inr (Or.inl ⟨h34 , h35 ⟩)
| inr h33 =>
cases h33 with
| inl h33 =>
have h34 : (q : Z) − 4 = −1 := h33
have h35 : (r : Z) − 4 = −11 := by
have h36 : ((q : Z) − 4) * ((r : Z) − 4) = 11 := by
linarith
rw [h34 ] at h36
linarith
exact Or.inr (Or.inr (Or.inl ⟨h34 , h35 ⟩))
| inr h33 =>
have h34 : (q : Z) − 4 = −11 := h33
have h35 : (r : Z) − 4 = −1 := by
have h36 : ((q : Z) − 4) * ((r : Z) − 4) = 11 := by
linarith
rw [h34 ] at h36
linarith
exact Or.inr (Or.inr (Or.inr ⟨h34 , h35 ⟩))
rcases h32 with (⟨h32 , h33 ⟩ | ⟨h32 , h33 ⟩ | ⟨h32 , h33 ⟩ | ⟨h32 , h33 ⟩)
· −− Case 1: q − 4 = 1 and r − 4 = 11
have h34 : q = 5 := by
linarith
have h35 : r = 15 := by
linarith
simp [h34 , h35 , h15 ]
<;> norm_num <;> omega
· −− Case 2: q − 4 = 11 and r − 4 = 1
have h34 : q = 15 := by
linarith
have h35 : r = 5 := by
linarith
have h36 : r > q := by
linarith
linarith
· −− Case 3: q − 4 = −1 and r − 4 = −11
have h34 : q = 3 := by
linarith
have h35 : r = −7 := by
linarith
have h36 : r > q := by
linarith
linarith
· −− Case 4: q − 4 = −11 and r − 4 = −1
have h34 : q = −7 := by
linarith
have h35 : r = 3 := by
linarith
28
have h36 : r > q := by
linarith
linarith
have h17 : r = 15 := by
have h18 : p = 3 := by linarith
have h19 : q = 5 := by linarith
have h20 : (p : Z) * q * r − 1 = 2 * ((p − 1) * (q − 1) * (r − 1)) := by linarith
have h21 : (p : Z) = 3 := by norm_num [h18 ]
have h22 : (q : Z) = 5 := by norm_num [h19 ]
norm_num [h18 , h19 , h21 , h22 ] at h20
<;> ring_nf at h20 ⊢ <;> omega
have h18 : (p, q, r) = (3, 5, 15) := by
simp_all [Prod.ext_iff]
<;> norm_num <;> linarith
exact Or.inr h18
· −− Case k = 3
have h14 : p * q * r − 1 = 3 * ((p − 1) * (q − 1) * (r − 1)) := by
ring_nf at hk ⊢
<;> linarith
have h15 : p = 2 := by
by_contra h15
have h16 : p ⩾ 3 := by
by_contra h16
have h17 : p ⩽ 2 := by linarith
have h18 : p = 2 := by linarith
contradiction
have h19 : q ⩾ p + 1 := by linarith
have h20 : r ⩾ q + 1 := by linarith
have h21 : (p : Z) ⩾ 3 := by linarith
have h22 : (q : Z) ⩾ 4 := by linarith
have h23 : (r : Z) ⩾ 5 := by linarith
have h24 : (p : Z) * q ⩾ 3 * 4 := by nlinarith
have h25 : (p : Z) * r ⩾ 3 * 5 := by nlinarith
have h26 : (q : Z) * r ⩾ 4 * 5 := by nlinarith
have h27 : (p : Z) * q * r ⩾ 3 * 4 * 5 := by nlinarith
have h28 : (p : Z) * q * r − 1 = 3 * ((p − 1) * (q − 1) * (r − 1)) := by linarith
have h29 : (p : Z) ⩾ 3 := by linarith
have h30 : (q : Z) ⩾ 4 := by linarith
have h31 : (r : Z) ⩾ 5 := by linarith
have h32 : (p : Z) * q * r − 1 = 3 * ((p − 1) * (q − 1) * (r − 1)) := by linarith
have h33 : False := by
nlinarith [sq_nonneg ((p : Z) − 2), sq_nonneg ((q : Z) − 2), sq_nonneg ((r : Z) − 2),
mul_nonneg (sub_nonneg.mpr h2 ) (sub_nonneg.mpr h3 ),
mul_nonneg (sub_nonneg.mpr h2 ) (sub_nonneg.mpr h4 ),
mul_nonneg (sub_nonneg.mpr h3 ) (sub_nonneg.mpr h4 )]
exact h33
have h16 : q = 4 := by
have h17 : p = 2 := by linarith
have h18 : (p : Z) * q * r − 1 = 3 * ((p − 1) * (q − 1) * (r − 1)) := by linarith
have h19 : (p : Z) = 2 := by norm_num [h17 ]
have h20 : (q : Z) ⩾ 3 := by
by_contra h20
have h21 : q ⩽ 2 := by linarith
have h22 : q = 2 := by linarith
have h23 : (p : Z) = 2 := by norm_num [h17 ]
have h24 : (q : Z) = 2 := by norm_num [h22 ]
have h25 : (r : Z) ⩾ 3 := by linarith
norm_num [h17 , h22 , h23 , h24 ] at h18
<;>
(try omega) <;>
(try nlinarith) <;>
29
(try
{
nlinarith [mul_pos (sub_pos.mpr h0 .2.1) (sub_pos.mpr h0 .2.2)]
})
have h21 : (r : Z) ⩾ q + 1 := by linarith
have h22 : (q : Z) ⩾ 3 := by linarith
have h23 : (p : Z) = 2 := by norm_num [h17 ]
have h24 : (p : Z) * q * r − 1 = 3 * ((p − 1) * (q − 1) * (r − 1)) := by linarith
have h25 : 2 * q * r − 1 = 3 * (1 * (q − 1) * (r − 1)) := by
norm_num [h17 , h19 ] at h24 ⊢
<;> ring_nf at h24 ⊢ <;> linarith
have h26 : 2 * q * r − 1 = 3 * ((q − 1) * (r − 1)) := by
ring_nf at h25 ⊢
<;> nlinarith
have h27 : 2 * q * r − 1 = 3 * (q * r − q − r + 1) := by
ring_nf at h26 ⊢
<;> nlinarith
have h28 : 2 * q * r − 1 = 3 * q * r − 3 * q − 3 * r + 3 := by
ring_nf at h27 ⊢
<;> nlinarith
have h29 : − q * r + 3 * q + 3 * r − 4 = 0 := by
ring_nf at h28 ⊢
<;> nlinarith
have h30 : (q : Z) * r − 3 * q − 3 * r + 4 = 0 := by
ring_nf at h29 ⊢
<;> nlinarith
have h31 : (q − 3 : Z) * (r − 3 : Z) = 5 := by
ring_nf at h30 ⊢
<;> nlinarith
have h32 : (q : Z) − 3 = 1 ∧ (r : Z) − 3 = 5 ∨ (q : Z) − 3 = 5 ∧ (r : Z) − 3 = 1 ∨ (q : Z) − 3 = −1 ∧ (r : Z) − 3 = −5
∨ (q : Z) − 3 = −5 ∧ (r : Z) − 3 = −1 := by
have h33 : (q : Z) − 3 = 1 ∨ (q : Z) − 3 = 5 ∨ (q : Z) − 3 = −1 ∨ (q : Z) − 3 = −5 := by
have h34 : (q : Z) − 3 | 5 := by
use (r : Z) − 3
linarith
have h35 : (q : Z) − 3 = 1 ∨ (q : Z) − 3 = 5 ∨ (q : Z) − 3 = −1 ∨ (q : Z) − 3 = −5 := by
have h36 : (q : Z) − 3 = 1 ∨ (q : Z) − 3 = 5 ∨ (q : Z) − 3 = −1 ∨ (q : Z) − 3 = −5 := by
rw [← Int.natAbs_dvd_natAbs] at h34
−− We use the fact that the absolute value of (q − 3) divides the absolute value of 5
have h37 : ((q : Z) − 3).natAbs | 5 := by
simpa [Int.natAbs] using h34
−− Since the possible divisors of 5 are 1 and 5, we check the cases
have h38 : ((q : Z) − 3).natAbs = 1 ∨ ((q : Z) − 3).natAbs = 5 := by
have h39 : ((q : Z) − 3).natAbs | 5 := h37
have h40 : ((q : Z) − 3).natAbs ⩽ 5 := Nat.le_of_dvd (by decide) h39
interval_cases ((q : Z) − 3).natAbs <;> norm_num at h39 ⊢ <;> omega
cases h38 with
| inl h38 =>
have h41 : (q : Z) − 3 = 1 ∨ (q : Z) − 3 = −1 := by
have h42 : ((q : Z) − 3).natAbs = 1 := h38
have h43 : (q : Z) − 3 = 1 ∨ (q : Z) − 3 = −1 := by
rw [Int.natAbs_eq_iff] at h42
tauto
exact h43
cases h41 with
| inl h41 =>
tauto
| inr h41 =>
tauto
| inr h38 =>
have h41 : (q : Z) − 3 = 5 ∨ (q : Z) − 3 = −5 := by
30
have h42 : ((q : Z) − 3).natAbs = 5 := h38
have h43 : (q : Z) − 3 = 5 ∨ (q : Z) − 3 = −5 := by
rw [Int.natAbs_eq_iff] at h42
tauto
exact h43
cases h41 with
| inl h41 =>
tauto
| inr h41 =>
tauto
exact h36
exact h35
cases h33 with
| inl h33 =>
have h34 : (q : Z) − 3 = 1 := h33
have h35 : (r : Z) − 3 = 5 := by
have h36 : ((q : Z) − 3) * ((r : Z) − 3) = 5 := by
linarith
rw [h34 ] at h36
linarith
exact Or.inl ⟨h34 , h35 ⟩
| inr h33 =>
cases h33 with
| inl h33 =>
have h34 : (q : Z) − 3 = 5 := h33
have h35 : (r : Z) − 3 = 1 := by
have h36 : ((q : Z) − 3) * ((r : Z) − 3) = 5 := by
linarith
rw [h34 ] at h36
linarith
exact Or.inr (Or.inl ⟨h34 , h35 ⟩)
| inr h33 =>
cases h33 with
| inl h33 =>
have h34 : (q : Z) − 3 = −1 := h33
have h35 : (r : Z) − 3 = −5 := by
have h36 : ((q : Z) − 3) * ((r : Z) − 3) = 5 := by
linarith
rw [h34 ] at h36
linarith
exact Or.inr (Or.inr (Or.inl ⟨h34 , h35 ⟩))
| inr h33 =>
have h34 : (q : Z) − 3 = −5 := h33
have h35 : (r : Z) − 3 = −1 := by
have h36 : ((q : Z) − 3) * ((r : Z) − 3) = 5 := by
linarith
rw [h34 ] at h36
linarith
exact Or.inr (Or.inr (Or.inr ⟨h34 , h35 ⟩))
rcases h32 with (⟨h32 , h33 ⟩ | ⟨h32 , h33 ⟩ | ⟨h32 , h33 ⟩ | ⟨h32 , h33 ⟩)
· −− Case 1: q − 3 = 1 and r − 3 = 5
have h34 : q = 4 := by
linarith
have h35 : r = 8 := by
linarith
simp [h34 , h35 , h15 ]
<;> norm_num <;> omega
· −− Case 2: q − 3 = 5 and r − 3 = 1
have h34 : q = 8 := by
linarith
have h35 : r = 4 := by
31
linarith
have h36 : r > q := by
linarith
linarith
· −− Case 3: q − 3 = −1 and r − 3 = −5
have h34 : q = 2 := by
linarith
have h35 : r = −2 := by
linarith
have h36 : r > q := by
linarith
linarith
· −− Case 4: q − 3 = −5 and r − 3 = −1
have h34 : q = −2 := by
linarith
have h35 : r = 2 := by
linarith
have h36 : r > q := by
linarith
linarith
have h17 : r = 8 := by
have h18 : p = 2 := by linarith
have h19 : q = 4 := by linarith
have h20 : (p : Z) * q * r − 1 = 3 * ((p − 1) * (q − 1) * (r − 1)) := by linarith
have h21 : (p : Z) = 2 := by norm_num [h18 ]
have h22 : (q : Z) = 4 := by norm_num [h19 ]
norm_num [h18 , h19 , h21 , h22 ] at h20
<;> ring_nf at h20 ⊢ <;> omega
have h18 : (p, q, r) = (2, 4, 8) := by
simp_all [Prod.ext_iff]
<;> norm_num <;> linarith
exact Or.inl h18
exact h5
‘‘‘
/−−
Let 𝐻 be an 𝑛 × 𝑛 matrix all of whose entries are ±1 and whose rows are mutually orthogonal. Suppose 𝐻 has an
𝑎 × 𝑏 submatrix whose entries are all 1. Show that 𝑎𝑏 ⩽ 𝑛.
−/
theorem putnam_2005_a4
(n : N)
(H : Matrix (Fin n) (Fin n) R)
(a b : N)
(S : Matrix (Fin a) (Fin b) R)
(npos : n ⩾ 1)
(Hentries : ∀ i j : Fin n, H i j = 1 ∨ H i j = −1)
(Hortho : H.HasOrthogonalRows)
(hab : 1 ⩽ a ∧ a ⩽ n ∧ 1 ⩽ b ∧ b ⩽ n)
(Ssub : ∃ (rri : Fin a → Fin n) (cri : Fin b → Fin n), rri.Injective ∧ cri.Injective ∧ S = H.submatrix
rri cri)
(Sentries : ∀ (i : Fin a) (j : Fin b), S i j = 1)
: a * b ⩽ n := by
32
obtain ⟨rri, cri, hrinj, hcrij, hsub⟩ := Ssub
have h1 : a ⩽ n := hab.2.1
have h2 : b ⩽ n := hab.2.2.2
have h3 : a * b ⩽ n := by
calc
a * b = Cardinal.toNat (a * b) := by simp [Nat.cast_inj]
_ ⩽ Cardinal.toNat (n) := by
apply?
_ = n := by simp [Nat.cast_inj]
exact h3
import Mathlib
−− fun n ↦→ 2 ^ (n + 1)
/−−
Let 𝑛 be a positive integer. Find the number of pairs 𝑃 , 𝑄 of polynomials with real coefficients such that
\[
(P(X))^2 + (Q(X))^2 = X^{2n} + 1
\]
and deg 𝑃 > deg 𝑄 .
−/
theorem putnam_2007_b4
(n : N) (npos : n > 0) :
({(P, Q) : (Polynomial R) × (Polynomial R) | P ^ 2 + Q ^ 2 = Polynomial.X ^ (2 * n) + 1 ∧ P.degree >
Q.degree}.ncard = ((fun n ↦→ 2 ^ (n + 1)) : N → N ) n) := by
have h1 : ∀ n : N, n > 0 → ({ (P, Q) : (Polynomial R) × (Polynomial R) | P ^ 2 + Q ^ 2 = Polynomial.X ^ (2 *
n) + 1 ∧ P.degree > Q.degree }.ncard : N) = 2 ^ (n + 1) := by
intro n hn
have h : ∀ n : N, n > 0 → ({ (P, Q) : (Polynomial R) × (Polynomial R) | P ^ 2 + Q ^ 2 = Polynomial.X ^ (2 * n)
+ 1 ∧ P.degree > Q.degree }.ncard : N) = 2 ^ (n + 1) := by
intro n hn
have h1 : ({ (P, Q) : (Polynomial R) × (Polynomial R) | P ^ 2 + Q ^ 2 = Polynomial.X ^ (2 * n) + 1 ∧
P.degree > Q.degree }.ncard : N) = 2 ^ (n + 1) := by
have h2 : ({ (P, Q) : (Polynomial R) × (Polynomial R) | P ^ 2 + Q ^ 2 = Polynomial.X ^ (2 * n) + 1 ∧
P.degree > Q.degree } : Set ((Polynomial R) × (Polynomial R))).ncard = 2 ^ (n + 1) := by
rw [← Cardinal.natCast_inj]
apply?
exact_mod_cast h2
exact h1
exact h n hn
C. Revision to MiniF2F
1. mathd_algebra_247:
/−− Let 𝑡 = 2𝑠 − 𝑠2 and 𝑠 = 𝑛2 − 2𝑛 + 1. What is the value of 𝑡 when 𝑛 = 3? Show that it is 0.−/
theorem mathd_algebra_247 (t s : R) (n : Z) (h0 : t = 2 * s − s ^ 2) (h1 : s = n ^ 2 − 2 ^ n + 1)
(n) (_ : n = 3) : t = 0 := by
sorry
−− revise to
theorem mathd_algebra_247 (t s : R) (n : Z) (h0 : t = 2 * s − s ^ 2) (h1 : s = n ^ 2 − 2 ^ n + 1)
33
(_ : n = 3) : t = 0 := by
sorry
2. induction_sum_odd:
Í −1
/−− Show that for positive integer 𝑛, 𝑛𝑘=0 (2𝑘 + 1) = 𝑛2 .−/
theorem induction_sum_odd (n : N) : ( k in Finset.range n, 2 * k) + 1 = n ^ 2 := by
Í
sorry
−− revise to
theorem induction_sum_odd (n : N) : ( k in Finset.range n, (2 * k + 1)) = n ^ 2 := by
Í
sorry
3. induction_prod1p1onk3le3m1onn:
/−− Show that for any positive integer 𝑛, we have 𝑛𝑘=1 (1 + 1/𝑘3 ) ⩽ 3 − 1/𝑛.−/
Î
theorem induction_prod1p1onk3le3m1onn (n : N) (h0 : 0 < n) :
( k in Finset.Icc 1 n, 1 + (1 : R) / k ^ 3) ⩽ (3 : R) − 1 / ↑n := by
Î
sorry
−− revise to
theorem induction_prod1p1onk3le3m1onn (n : N) (h0 : 0 < n) :
( k in Finset.Icc 1 n, (1 + (1 : R) / k ^ 3)) ⩽ (3 : R) − 1 / ↑n := by
Î
sorry
34