0% found this document useful (0 votes)

80 views23 pages

DeepSeek - Wikipedia

DeepSeek is a Chinese AI company founded in July 2023, specializing in large language models (LLMs) and owned by High-Flyer. Its flagship model, DeepSeek-R1, was launched in January 2025 and is noted for its low training costs and competitive performance against established models like OpenAI's GPT-4. The company emphasizes research over commercialization and recruits talent from diverse fields to enhance its models' capabilities.

Uploaded by

Tounka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views23 pages

DeepSeek - Wikipedia

Uploaded by

Tounka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

DeepSeek

Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd.,[3][4][5][a] doing
business as DeepSeek,[b] is a Chinese artificial intelligence company that develops large
language models (LLMs). Based in Hangzhou, Zhejiang, it is owned and funded by the Chinese
hedge fund High-Flyer. DeepSeek was founded in July 2023 by Liang Wenfeng, the co-founder of
High-Flyer, who also serves as the CEO for both companies.[7][8][9] The company launched an
eponymous chatbot alongside its DeepSeek-R1 model in January 2025.

Released under the MIT License, DeepSeek-R1

Hangzhou DeepSeek Artificial
provides responses comparable to other
Intelligence Basic Technology
contemporary large language models, such as
Research Co., Ltd.
OpenAI's GPT-4 and o1.[10] Its training cost is
reported to be significantly lower than other
LLMs. The company claims that it trained its
V3 model for US$6 million—far less than the
Native name 杭州深度求索人工智
US$100 million cost for OpenAI's GPT-4 in
能基础技术研究有限
2023[11]—and using approximately one-tenth 公司
the computing power consumed by Meta's
Company type Private
comparable model, Llama 3.1.[11][12][13][14]
DeepSeek's success against larger and more Industry Information
technology
established rivals has been described as
Artificial intelligence
"upending AI".[15][16]
Founded 17 July 2023[1]
DeepSeek's models are described as "open
Founder Liang Wenfeng
weight," meaning the exact parameters are
openly shared, although certain usage Headquarters Hangzhou, Zhejiang,
conditions differ from typical open-source China

software.[17][18] The company reportedly Key people Liang Wenfeng (CEO)

recruits AI researchers from top Chinese
Owner High-Flyer
universities[15] and also hires from outside
traditional computer science fields to broaden Number of 160 (2025)[2]
employees
its models' knowledge and capabilities.[12]

Website deepseek.com (http

DeepSeek significantly reduced training s://www.deepseek.co
expenses for their R1 model by incorporating m/)

techniques such as mixture of experts (MoE)

layers.[19] The company also trained its models during ongoing trade restrictions on AI chip
exports to China, using weaker AI chips intended for export and employing fewer units
overall.[13][20] Observers say this breakthrough sent "shock waves" through the industry,
threatening established AI hardware leaders such as Nvidia; Nvidia's share price dropped sharply,
losing US$600 billion in market value, the largest single-company decline in U.S. stock market
history.[21][22]

History

Founding and early years (2016–2023)

In February 2016, High-Flyer was co-founded by AI enthusiast Liang Wenfeng, who had been
trading since the 2007–2008 financial crisis while attending Zhejiang University.[23] The company
began stock trading using a GPU-dependent deep learning model on 21 October 2016; before
then, it had used CPU-based linear models. By the end of 2017, most of its trading was driven by
AI.[24]

Liang established High-Flyer as a hedge fund focused on developing and using AI trading
algorithms, and by 2021 the firm was using AI exclusively,[25] often using Nvidia chips.[26]

In 2019, the company began constructing its first computing cluster, Fire-Flyer, at a cost of
200 million yuan; it contained 1,100 GPUs interconnected at 200 Gbit/s and was retired after
1.5 years in operation.[24]

By 2021, Liang had started buying large quantities of Nvidia GPUs for an AI project,[26] reportedly
obtaining 10,000 Nvidia A100 GPUs[27] before the United States restricted chip sales to China.[25]
Computing cluster Fire-Flyer 2 began construction in 2021 with a budget of 1 billion yuan.[24]

It was reported that in 2022, Fire-Flyer 2's capacity had been used at over 96%, totaling 56.74
million GPU hours. 27% was used to support scientific computing outside the company.[24]

During 2022, Fire-Flyer 2 had 5000 PCIe A100 GPUs in 625 nodes, each containing 8 GPUs. At the
time, it exclusively used PCIe instead of the DGX version of A100, since at the time the models it
trained could fit within a single 40 GB GPU VRAM and so there was no need for the higher
bandwidth of DGX (i.e., it required only data parallelism but not model parallelism).[28] Later, it
incorporated NVLinks and NCCL (Nvidia Collective Communications Library) to train larger
models that required model parallelism.[29][30]

On 14 April 2023,[31] High-Flyer announced the launch of an artificial general intelligence (AGI)
research lab, stating that the new lab would focus on developing AI tools unrelated to the firm's
financial business.[32][33] Two months later, on 17 July 2023,[1] that lab was spun off into an
independent company, DeepSeek, with High-Flyer as its principal investor and backer.[25][34][33]
Venture capital investors were reluctant to provide funding, as they considered it unlikely that the
venture would be able to quickly generate an "exit".[25]

On 16 May 2023, the company Beijing DeepSeek Artificial Intelligence Basic Technology
Research Company, Limited. was incorporated. It was later taken under 100% control of
Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd, which was
incorporated two months after.

Model releases (2023–present)

DeepSeek released its first model, DeepSeek Coder, on 2 November 2023, followed by the
DeepSeek-LLM series on 29 November 2023.[35]: section 5 In January 2024, it released two
DeepSeek-MoE models (Base and Chat),[36] and in April three DeepSeek-Math models (Base,
Instruct, and RL).[37]

DeepSeek-V2 was released in May 2024, followed a month later by the DeepSeek-Coder V2
series.[38] In September 2024, DeepSeek V2.5 was introduced and revised in December.[39] On
20 November 2024, the preview of DeepSeek-R1-Lite became available via API and chat.[40][41] In
December, DeepSeek-V3-Base and DeepSeek-V3 (chat) were released.[29]

The DeepSeek login page following a

cyberattack around its January 20,
2025 launch

On 20 January 2025, DeepSeek launched the DeepSeek chatbot—based on the DeepSeek-R1

model—free for iOS and Android. By 27 January, DeepSeek surpassed ChatGPT as the most
downloaded freeware app on the iOS App Store in the United States,[15] triggering an 18% drop in
Nvidia's share price.[42][43]

On 24 March 2025, DeepSeek released DeepSeek-V3-0324 under the MIT License.[44][45]

Company operation

DeepSeek is headquartered in Hangzhou, Zhejiang, and is owned and funded by High-Flyer. Its
co-founder, Liang Wenfeng, serves as CEO. As of May 2024, Liang personally held an 84% stake
in DeepSeek through two shell corporations.[note 1][46]
Strategy

DeepSeek states that it focuses on research and does not have immediate plans for
commercialization.[47] This posture also means it can skirt certain provisions of China's AI
regulations aimed at consumer-facing technologies.[12]

DeepSeek's hiring approach emphasizes skills over lengthy work experience, resulting in many
hires fresh out of university.[33][12] The company likewise recruits individuals without computer
science backgrounds to expand the range of expertise incorporated into the models, for instance
in poetry or advanced mathematics.[15][12]

Training framework

High-Flyer/DeepSeek operates at least two primary computing clusters: Fire-Flyer ( 萤火一号) and
Fire-Flyer 2 ( 萤火二号). Fire-Flyer 2 consists of co-designed software and hardware architecture.
On the hardware side, Nvidia GPUs use 200 Gbps interconnects. The cluster is divided into two
"zones", and the platform supports cross-zone tasks. The network topology was two fat trees,
chosen for high bisection bandwidth. On the software side are:[30][24]

3FS (Fire-Flyer File System): A distributed parallel file system, specifically designed for
asynchronous random reads. It uses Direct I/O and RDMA Read. In contrast to standard
Buffered I/O, Direct I/O does not cache data. Caching is useless for this case, since each data
read is random and is not reused.[48][49]

hfreduce : Library for asynchronous communication, originally designed to replace Nvidia

Collective Communication Library (NCCL).[28] It is mainly used for allreduce, especially of
gradients during backpropagation. It is asynchronously run on the CPU to avoid blocking
kernels on the GPU.[30] It uses two-tree broadcast like NCCL.[28]

hfai.nn : Software library of commonly used operators for neural network training, similar to
torch.nn in PyTorch.

HaiScale Distributed Data Parallel (DDP): Parallel training library that

implements various forms of parallelism such as Data Parallelism (DP), Pipeline Parallelism
(PP), Tensor Parallelism (TP), Experts Parallelism (EP), Fully Sharded Data Parallel (FSDP) and
Zero Redundancy Optimizer (ZeRO). It is similar to PyTorch DDP, which uses NCCL on the
backend.

HAI Platform : Various applications such as task scheduling, fault handling, and disaster
recovery.[50]

As of 2022, Fire-Flyer 2 had 5000 PCIe A100 GPUs in 625 nodes, each containing 8 GPUs.[28] It
later incorporated NVLinks and NCCL to train larger models that required model
parallelism.[29][30]

Development and release history

Major versions of DeepSeek models. SFT stands for supervised finetuning.

Major Release
Major variants Remarks
versions date

DeepSeek 2 Nov Base (pretrained); Instruct

Coder 2023 (with instruction-finetuned)

Base; The architecture is essentially the same as Llama.

DeepSeek- 29 Nov
LLM 2023 Chat (with SFT)

Base;
DeepSeek- 9 Jan
Developed a variant of mixture of experts (MoE).
MoE 2024 Chat

Base Initialized with DS-Coder-Base-v1.5

DeepSeek- Instruct (with SFT)

Apr 2024
Math
RL (using a process reward Developed Group Relative Policy Optimization (GRPO), a
model) variant of Proximal Policy Optimization (PPO).

DeepSeek-V2, DeepSeek-V2-
Chat

DeepSeek-V2-Lite, Developed multi-head latent attention (MLA). Also used

DeepSeek mixture of experts (MoE).
May 2024 DeepSeek-V2-Lite-Chat
V2
Implemented KV caching.
DeepSeek-Coder-V2

DeepSeek-V2.5

DeepSeek-V3-Base
DeepSeek The architecture is essentially the same as V2. Updated
Dec 2024 DeepSeek-V3 (a chat
V3 on 2025-03-24.
model)

20 Nov
DeepSeek-R1-Lite-Preview Only accessed through API and a chat interface.
2024

DeepSeek-R1
Initialized from DeepSeek-V3-Base and sharing the V3
DeepSeek R1
20 Jan DeepSeek-R1-Zero architecture.

2025
Initialized from other models, such as Llama, Qwen, etc.
Distilled models
Distilled from data synthesized by R1 and R1-Zero.[51]

The first DeepSeek models were essentially the same as Llama,[35] which were dense decoder-
only transformers. Later models incorporated the multi-head latent attention (MLA), Mixture of
Experts (MoE), and KV caching.[36][38]

A decoder-only transformer consists of multiple identical decoder layers. Each of these layers
features two main components: an attention layer and a FeedForward network (FFN) layer.[38] In
the attention layer, the traditional multi-head attention mechanism has been enhanced with multi-
head latent attention. This update introduces compressed latent vectors to boost performance
and reduce memory usage during inference.[38]

Meanwhile, the FFN layer adopts a variant of the mixture of experts (MoE) approach, effectively
doubling the number of experts compared to standard implementations. It distinguishes
between two types of experts: shared experts, which are always active to encapsulate general
knowledge, and routed experts, only a select few of which are activated to capture specialized
information.[36]

Consider the current sequence of n tokens as input. To predict the next token based on the
current input, the attention mechanism involves extensive calculations of matrices, including
query (Q), key (K), and value (V) matrices. The dimensions of Q, K, and V are determined by the
current number of tokens and the model’s embedding size. Once the new token is generated, the
autoregressive procedure appends it to the end of the input sequence, and the transformer layers
repeat the matrix calculation for the next token. A mathematical analysis reveals that the new
token introduces a new query, key, and value vector, appended to Q, K, and V, respectively.
Appending these new vectors to the K and V matrices is sufficient for calculating the next token
prediction. Consequently, storing the current K and V matrices in memory saves time by avoiding
the recalculation of the attention matrix. This feature is known as K-V caching.[38] This technique
effectively reduces computational cost during inference.

Overview of models

DeepSeek's models are "open weight", which provides less freedom for modification than true
open source software.[17][52]

DeepSeek Coder

DeepSeek Coder is a series of eight models, four pretrained ( Base ) and four instruction-
finetuned ( Instruct ). All have 16K context lengths. The model was made source-available

under the DeepSeek License, which includes "open and responsible downstream usage"
restrictions.[53]

The training program was:[54][55][56]

1. Pretraining: 1.8T tokens (87% source code, 10% code-related English (GitHub markdown
and Stack Exchange), and 3% code-unrelated Chinese).

2. Long-context pretraining: 200B tokens. This extends the context length from 4K to 16K. This
produced the Base models.

3. Supervised finetuning (SFT): 2B tokens of instruction data. This produced the Instruct
models.

They were trained on clusters of A100 and H800 Nvidia GPUs, connected by InfiniBand, NVLink,
NVSwitch.[54]

DeepSeek Coder properties[54]: Table 2 [57]

Params.

1.3B 24 2048 5504 16 16

5.7B 32 4096 11008 32 1[note 2]

6.7B 32 4096 11008 32 32

33B 62 7168 19200 56 7[note 2]

DeepSeek-LLM

The DeepSeek-LLM series was released in November 2023. It has 7B and 67B parameters in both
Base and Chat forms. DeepSeek's accompanying paper claimed benchmark results higher than
Llama 2 and most open-source LLMs at the time.[35]: section 5 The model code is under the source-
available DeepSeek License.[58]

The architecture was essentially the same as the Llama series. They used the pre-norm decoder-
only Transformer with RMSNorm as the normalization, SwiGLU in the feedforward layers, rotary
positional embedding (RoPE), and grouped-query attention (GQA). Both had vocabulary size
102,400 (byte-level BPE) and context length of 4096. They trained on 2 trillion tokens of English
and Chinese text obtained by deduplicating the Common Crawl.[35]

DeepSeek LLM properties[35]: Table 2

Params.

7B 30 4096 11008 32 32

67B 95 8192 22016 64 8[note 2]

The Chat versions of the two Base models was released concurrently, obtained by training Base
by supervised finetuning (SFT) followed by direct policy optimization (DPO).[35]
MoE

DeepSeek-MoE models (Base and Chat), each have 16B parameters (2.7B activated per token, 4K
context length). The training was essentially the same as DeepSeek-LLM 7B, and was trained on
a part of its training dataset. They claimed performance comparable to a 16B MoE as a 7B non-
MoE. It is a variant of the standard sparsely-gated MoE, with "shared experts" that are always
queried, and "routed experts" that might not be. They found this to help with expert balancing. In
standard MoE, some experts can become overused, while others are rarely used, wasting space.
Attempting to balance expert usage causes experts to replicate the same capacity. They
proposed the shared experts to learn core capacities that are often used, and let the routed
experts learn peripheral capacities that are rarely used.[36]

Math

DeepSeek-Math includes 3 models: Base, Instruct, and RL. Math was trained as follows:[37]

1. Initialize with a previously pretrained DeepSeek-Coder Base v1.5 7B.

2. Further pretrain with 500B tokens (6% DeepSeekMath Corpus, 4% AlgebraicStack, 10% arXiv,
20% GitHub code, 10% Common Crawl). This produced Base.

3. Train an instruction-following model by SFT Base with 776K math problems and tool-use-
integrated step-by-step solutions. This produced Instruct.

4. Reinforcement learning (RL): The reward model was a process reward model (PRM) trained
from Base according to the Math-Shepherd method.[59] This reward model was then used to
train Instruct using Group Relative Policy Optimization (GRPO) on a dataset of 144K math
questions "related to GSM8K and MATH". The reward model was continuously updated
during training to avoid reward hacking. This resulted in RL.

The architecture of V2, showing both shared-routed

MoE and MLA[60]: Figure 2
In May 2024, DeepSeek released the DeepSeek-V2 series. The series includes 4 models, 2 base
models (DeepSeek-V2, DeepSeek-V2 Lite) and 2 chatbots (Chat). The two larger models were
trained as follows:[60]

1. Pretrain on a dataset of 8.1T tokens, using 12% more Chinese tokens than English ones.

2. Extend context length from 4K to 128K using YaRN.[61] This resulted in DeepSeek-V2.

3. SFT with 1.2M instances for helpfulness and 0.3M for safety. This resulted in Chat SFT,
which was not released.

4. RL using GRPO in two stages. The first stage was trained to solve math and coding
problems. This stage used 1 reward model, trained on compiler feedback (for coding) and
ground-truth labels (for math). The second stage was trained to be helpful, safe, and follow
rules. This stage used 3 reward models. The helpfulness and safety reward models were
trained on human preference data. The rule-based reward model was manually
programmed. All trained reward models were initialized from Chat (SFT). This resulted in
the released version of Chat.

They opted for 2-staged RL, because they found that RL on reasoning data had "unique
characteristics" different from RL on general data. For example, RL on reasoning could improve
over more training steps.[60]

The two V2-Lite models were smaller, and trained similarly. DeepSeek-V2 Lite-Chat underwent
only SFT, not RL. They trained the Lite version to help "further research and development on MLA
and DeepSeekMoE".[60]

Architecturally, the V2 models were significantly different from the DeepSeek LLM series. They
changed the standard attention mechanism by a low-rank approximation called multi-head latent
attention (MLA), and used the previously published mixture of experts (MoE) variant.[36]

DeepSeek V2 properties[60]: Section 3.1.2, Appendix B [62][63]

Name Params. Active params Context length

V2-Lite 15.7B 2.4B 27 32K 2 64

V2 236B 21B 60 128K 2 160

The Financial Times reported that it was cheaper than its peers with a price of 2 RMB for every
million output tokens. The University of Waterloo Tiger Lab's leaderboard ranked DeepSeek-V2
seventh on its LLM ranking.[34]

The DeepSeek-Coder V2 series included V2-Base, V2-Lite-Base, V2-Instruct, and V20-Lite-

Instruct.. Training:[38][note 3]

1. Base models were initialized from corresponding intermediate checkpoints after pretraining
on 4.2T tokens (not the version at the end of pretraining), then pretrained further for 6T
tokens, then context-extended to 128K context length.

2. DeepSeek-Coder and DeepSeek-Math were used to generate 20K code-related and 30K
math-related instruction data, then combined with an instruction dataset of 300M tokens.
This was used for SFT.

3. RL with GRPO. The reward for math problems was computed by comparing with the ground-
truth label. The reward for code problems was generated by a reward model trained to
predict whether a program would pass the unit tests.

DeepSeek-V2.5 was made by combining DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct.[39]

Multi-Token Prediction

DeepSeek-V3-Base and DeepSeek-V3 (a chat model) use essentially the same architecture as V2
with the addition of multi-token prediction, which (optionally) decodes extra tokens faster but
less accurately. Training process:[29]

1. Pretraining on 14.8T tokens of a multilingual corpus, mostly English and Chinese. It

contained a higher ratio of math and programming than the pretraining dataset of V2.

2. Extend context length twice, from 4K to 32K and then to 128K, using YaRN.[61] This
produced DeepSeek-V3-Base.

3. SFT for 2 epochs on 1.5M samples of reasoning (math, programming, logic) and non-
reasoning (creative writing, roleplay, simple question answering) data. Reasoning data was
generated by "expert models". Non-reasoning data was generated by DeepSeek-V2.5 and
checked by humans.
The "expert models" were trained by starting with an unspecified base model, then SFT
on both <problem, original response> data, and synthetic <system prompt, prompt,
problem, R1 response> data generated by an internal DeepSeek-R1-Lite model. The
system prompt asked R1 to reflect and verify during thinking. Then the expert models
were RL using an undisclosed reward function.

Each expert model was trained to generate just synthetic reasoning data in one specific
domain (math, programming, logic).
Expert models were used instead of R1 itself, since the output from R1 itself suffered
"overthinking, poor formatting, and excessive length".

4. Model-based reward models were made by starting with a SFT checkpoint of V3, then
finetuning on human preference data containing both final reward and chain-of-thought
leading to the final reward. The reward model produced reward signals for both questions
with objective but free-form answers, and questions without objective answers (such as
creative writing).

5. An SFT checkpoint of V3 was trained by GRPO using both reward models and rule-based
reward. The rule-based reward was computed for math problems with a final answer (put in
a box), and for programming problems by unit tests. This produced DeepSeek-V3.

DeepSeek released its DeepSeek-V3-0324 model, which used the same architecture as V3, on 24
March 2025 under the MIT License.[64]

DeepSeek V3 properties[29]: Section 4.2 [65]

Name Params. Active params Context length

V3 671B 37B 61 128K 1 256

[29]: Figure 6
Mixed-precision framework for V3

The DeepSeek team performed extensive low-level engineering to improve efficiency. They used
mixed-precision arithmetic. Much of the forward pass was performed in 8-bit floating point
numbers (5E2M: 5-bit exponent and 2-bit mantissa) rather than the standard 32-bit, requiring
special GEMM routines to accumulate accurately. They used a custom 12-bit float (E5M6) only
for the inputs to the linear layers after the attention modules. Optimizer states were in 16-bit
(BF16). They minimized communication latency by extensively overlapping computation and
communication, such as dedicating 20 streaming multiprocessors out of 132 per H800 for only
inter-GPU communication. They lowered communication by rearranging (every 10 minutes) the
exact machine each expert was on so as to avoid querying certain machines more often than
others, adding auxiliary load-balancing losses to the training loss function, and other load-
balancing techniques.[29]

After training, it was deployed on clusters of H800 GPUs. The 8 H800 GPUs within a cluster were
connected by NVLink, and the clusters were connected by InfiniBand.[29]
Total cost of training the DeepSeek-V3 model[29]: Table 1

Stage Cost (in one thousand GPU hours) Cost (in one million USD$)

Pre-training 2,664 5.328

Context extension 119 0.24

Fine-tuning 5 0.01

Total 2,788 5.576

The cost has been discussed[66][67][68] and called misleading, because it covers only parts of the
true cost.[69]

Benchmark tests show that V3 outperformed Llama 3.1 and Qwen 2.5 while matching GPT-4o
and Claude 3.5 Sonnet.[33][70][71][72]

In January 2025, DeepSeek released the DeepSeek-R1 model under the MIT License.[73]

DeepSeek-R1-Lite-Preview[40][41][note 4] was trained for logical inference, mathematical reasoning,

and real-time problem-solving. DeepSeek claimed that it exceeded performance of OpenAI o1 on
benchmarks such as American Invitational Mathematics Examination (AIME) and MATH.[74]
However, The Wall Street Journal reported that on 15 problems from the 2024 edition of AIME,
the o1 model reached a solution faster.[75]

DeepSeek-R1 and DeepSeek-R1-Zero[76] were initialized from DeepSeek-V3-Base and share its
architecture. DeepSeek-R1-Distill models were instead initialized from other pretrained open-
weight models, including LLaMA and Qwen, then fine-tuned on synthetic data generated by
R1.[51]

DeepSeek-R1-Zero was trained exclusively

Template for DeepSeek-R1-Zero
using GRPO RL without SFT. Unlike previous
A conversation between User and Assistant.
versions, it used no model-based reward. All
The user asks a question, and the Assistant
reward functions were rule-based, "mainly"
solves it. The assistant first thinks about the
of two types (other types were not
reasoning process in the mind and then
specified): accuracy rewards and format
provides the user with the answer. The
rewards. Accuracy reward was checking
reasoning process and answer are enclosed
whether a boxed answer is correct (for
within <think> </think> and <answer>
math) or whether a code passes tests (for
</answer> tags, respectively, i.e., <think>
programming). Format reward was checking
reasoning process here </think> <answer>
whether the model puts its thinking trace
answer here </answer>. User: <prompt>.
within a <think>...</think> tag.[51]
Assistant:
R1-Zero has issues with readability and – <prompt> is replaced with the specific reasoning
mixing languages. R1 was trained to question during training.

address these issues and further improve

reasoning:[51]

1. SFT DeepSeek-V3-Base on "thousands" of "cold-start" data all with the standard format of
|special_token|<reasoning_process>|special_token|<summary> , designed
to improve model output readability.

2. Apply the same GRPO RL process as R1-Zero, adding a "language consistency reward" to
encourage it to respond monolingually. This produced an un released internal model.

3. Synthesize 600K reasoning data from the internal model, with rejection sampling (i.e. if the
generated reasoning had a wrong final answer, then it is removed). Synthesize 200K non-
reasoning data (writing, factual QA, self-cognition, translation) using DeepSeek-V3.

4. SFT DeepSeek-V3-Base on the 800K synthetic data for 2 epochs.

5. Apply the same GRPO RL process as R1-Zero with rule-based reward (for reasoning tasks),
but also model-based reward (for non-reasoning tasks, helpfulness, and harmlessness).
This produced DeepSeek-R1.

Distilled models were trained by SFT on 800K data synthesized from DeepSeek-R1, in a similar
way as step 3. They were not trained with RL.[51]

R2, the successor to R1, is originally planned for release in early May 2025, but release schedule
accelerated.[77]

Significance

DeepSeek's success against larger and more established rivals has been described as "upending
AI".[15][78]

The DeepSeek-R1 model provides responses comparable to other contemporary large language
models, such as OpenAI's GPT-4o and o1.[79] Its training cost is reported to be significantly lower
than other LLMs.

The company claims that it trained V3, a predecessor of R1, for US$6 million compared to
$100 million for OpenAI's GPT-4 in 2023,[11] and approximately one tenth of the computing power
used for Meta's comparable model, LLaMA 3.1.[11][12][13][14]

The January 2025 release of the R1 model, which offered significantly lower costs than
competing models, some investors anticipated a price war in the American AI industry.[80] It was
dubbed the "Pinduoduo of AI", and other Chinese tech giants such as ByteDance, Tencent, Baidu,
and Alibaba cut the price of their AI models. Despite its low price, it was profitable compared to
its money-losing rivals.[47]

Free and open-source

software portal
Business portal

China portal

DeepSeek (chatbot) – Chatbot developed by DeepSeek

Artificial intelligence industry in China

OpenAI – Artificial intelligence research organization

Jevons paradox – Efficiency leads to increased demand

Liang Wenfeng - Founder of DeepSeek and High Flyer

High Flyer - Chinese hedge fund (Owner of DeepSeek)

CCP (Chinese Communist Party)

Tiananmen Square - Prohibited topic in DeepSeek chat queries

Notes

a. Chinese: 杭州深度求索人工智能基础技术研究有限公司. [6]

Sometimes simply referred to in
English as Hangzhou DeepSeek Artificial Intelligence.

深度求索; pinyin: Shēndù Qiúsuǒ

b. Chinese:

1. 宁波程信柔兆企业管理咨询合伙企业（有限合伙） and 宁波程恩企业管理咨询合伙企业（有

限合伙）
2. The number of heads does not equal the number of KV heads, due to GQA.

3. Inexplicably, the model named DeepSeek-Coder-V2 Chat in the paper was released as
DeepSeek-Coder-V2-Instruct in HuggingFace.

4. At that time, the R1-Lite-Preview required selecting "Deep Think enabled", and every
user could use it only 50 times a day.

References

1. "DeepSeek 突传消息" (https://finance.sina.com.cn/jjxw/2025-02-01/doc-inehyqcx9694053.s

html) . Sina Corp. 1 February 2025. Retrieved 1 February 2025.
2. Wu, Zijing (14 March 2025). "DeepSeek focuses on research over revenue in contrast to
Silicon Valley" (https://www.ft.com/content/fb5c11bb-1d4b-465f-8283-451a19a3d425) .
Financial Times. Retrieved 14 March 2025.

3. "Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd" (https://ww
w.bloomberg.com/profile/company/2544189D:CH) . Bloomberg L.P.

4. "DeepSeek Coder Model Service Agreement" (https://chat.deepseek.com/downloads/Deep

Seek%20Coder%20Model%20Service%20Agreement_1019.pdf) (PDF), DeepSeek, 19
October 2023

5. "DeepSeek Coder Privacy Policy" (https://chat.deepseek.com/downloads/DeepSeek%20Co

der%20Privacy%20Policy_1019.pdf) (PDF). DeepSeek. Retrieved 19 February 2025.

6. " 全国互联网安全管理平台" (https://beian.mps.gov.cn/#/query/webSearch?code=33010502

011812) . beian.mps.gov.cn. Retrieved 9 February 2025.

7. "Beijing puts spotlight on China's new face of AI, DeepSeek's Liang Wenfeng" (https://www.s
cmp.com/tech/policy/article/3295662/beijing-meeting-puts-spotlight-chinas-new-face-ai-d
eepseek-founder-liang-wenfeng) . South China Morning Post. 21 January 2025. Retrieved
4 March 2025.

8. Baptista, Eduardo (28 January 2025). "Who is Liang Wenfeng, the founder of DeepSeek?" (ht
tp://web.archive.org/web/20250219122827/https://www.reuters.com/technology/deepsee
k-founder-liang-wenfeng-puts-focus-chinese-innovation-2025-01-28/?) . Reuters. Archived
from the original (https://www.reuters.com/technology/deepseek-founder-liang-wenfeng-pu
ts-focus-chinese-innovation-2025-01-28/) on 19 February 2025. Retrieved 4 March 2025.

9. "Behind DeepSeek lies a dazzling Chinese university" (https://www.economist.com/china/2

025/02/19/behind-deepseek-lies-a-dazzling-chinese-university) . The Economist.
ISSN 0013-0613 (https://search.worldcat.org/issn/0013-0613) . Archived (https://archive.t
oday/20250224111435/https://www.economist.com/china/2025/02/19/behind-deepseek-l
ies-a-dazzling-chinese-university) from the original on 24 February 2025. Retrieved
5 March 2025.

10. Gibney, Elizabeth (23 January 2025). "China's cheap, open AI model DeepSeek thrills
scientists" (https://www.nature.com/articles/d41586-025-00229-6) . Nature. 638 (8049):
13–14. Bibcode:2025Natur.638...13G (https://ui.adsabs.harvard.edu/abs/2025Natur.638...1
3G) . doi:10.1038/d41586-025-00229-6 (https://doi.org/10.1038%2Fd41586-025-00229-
6) . ISSN 1476-4687 (https://search.worldcat.org/issn/1476-4687) . PMID 39849139 (htt
ps://pubmed.ncbi.nlm.nih.gov/39849139) .

11. Vincent, James (28 January 2025). "The DeepSeek panic reveals an AI world ready to blow"
(https://www.theguardian.com/commentisfree/2025/jan/28/deepseek-r1-ai-world-chinese-
chatbot-tech-world-western) . The Guardian.
12. Metz, Cade; Tobin, Meaghan (23 January 2025). "How Chinese A.I. Start-Up DeepSeek Is
Competing With Silicon Valley Giants" (https://www.nytimes.com/2025/01/23/technology/
deepseek-bd-ai-chips.html?smid=fb-nytimes&smtyp=cur&fbclid=IwY2xjawIEynFleHRuA2Flb
QIxMQABHZYKXN7GJpUyNRsaGEDQVadxRBarp-aBp1GhiuRe3B57Ehe6HYv7oiK78Q_aem_
KTeDgqjV_-R80owNNWOBCQ) . The New York Times. ISSN 0362-4331 (https://search.worl
dcat.org/issn/0362-4331) . Retrieved 27 January 2025.

13. Cosgrove, Emma (27 January 2025). "DeepSeek's cheaper models and weaker chips call
into question trillions in AI infrastructure spending" (https://www.businessinsider.com/expla
ining-deepseek-chinese-models-efficiency-scaring-markets-2025-1) . Business Insider.

14. Erdil, Ege (17 January 2025). "How has DeepSeek improved the Transformer architecture?"
(https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architectu
re) . Epoch AI. Retrieved 3 February 2025.

15. Metz, Cade (27 January 2025). "What is DeepSeek? And How Is It Upending A.I.?" (https://w
ww.nytimes.com/2025/01/27/technology/what-is-deepseek-china-ai.html) . The New York
Times. ISSN 0362-4331 (https://search.worldcat.org/issn/0362-4331) . Retrieved
27 January 2025.

16. Roose, Kevin (28 January 2025). "Why DeepSeek Could Change What Silicon Valley Believes
About A.I." (https://www.nytimes.com/2025/01/28/technology/why-deepseek-could-chang
e-what-silicon-valley-believes-about-ai.html) The New York Times. ISSN 0362-4331 (http
s://search.worldcat.org/issn/0362-4331) . Retrieved 28 January 2025.

17. Delbert, Caroline (31 January 2025). "DeepSeek Is Cracking the 'Black Box' of Corporate AI
Wide Open" (https://www.popularmechanics.com/science/a63633889/deepseek-open-weig
ht/) . Popular Mechanics. Retrieved 12 February 2025.

18. Gibney, Elizabeth (23 January 2025). "China's cheap, open AI model DeepSeek thrills
scientists" (https://www.nature.com/articles/d41586-025-00229-6) . Nature. 638 (8049):
13–14. Bibcode:2025Natur.638...13G (https://ui.adsabs.harvard.edu/abs/2025Natur.638...1
3G) . doi:10.1038/d41586-025-00229-6 (https://doi.org/10.1038%2Fd41586-025-00229-
6) . PMID 39849139 (https://pubmed.ncbi.nlm.nih.gov/39849139) . Retrieved
12 February 2025.

19. Metz, Cade (12 February 2025). "How Did DeepSeek Build Its A.I. With Less Money?" (http
s://www.nytimes.com/2025/02/12/technology/deepseek-ai-chip-costs.html) . The New
York Times. Retrieved 21 March 2025.

20. Allen, Gregory C. (7 March 2025). "DeepSeek, Huawei, Export Controls, and the Future of the
U.S.-China AI Race" (https://www.csis.org/analysis/deepseek-huawei-export-controls-and-fu
ture-us-china-ai-race) . Center for Strategic and International Studies.
21. Saah, Jasper (13 February 2025). "DeepSeek sends shock waves across Silicon Valley" (http
s://liberationnews.org/deepseek-sends-shock-waves-across-silicon-valley/) . Liberation
News – The Newspaper of the Party for Socialism and Liberation. Retrieved 13 February
2025.

22. Sillars, James (28 January 2025). "DeepSeek: Tech firm suffers biggest drop in US stock
market history as low-cost Chinese AI company bites Silicon Valley" (https://news.sky.com/
story/deepseek-us-tech-stocks-tumble-on-fears-of-cheaper-chinese-ai-13297788) . Sky
News. Retrieved 13 February 2025.

23. Chen, Caiwei (24 January 2025). "How a top Chinese AI model overcame US sanctions" (htt
ps://www.technologyreview.com/2025/01/24/1110526/china-deepseek-top-ai-despite-sanc
tions/) . MIT Technology Review. Archived (https://web.archive.org/web/20250125180427/
https://www.technologyreview.com/2025/01/24/1110526/china-deepseek-top-ai-despite-s
anctions/) from the original on 25 January 2025. Retrieved 25 January 2025.

24. " 幻方 | 幻方历程" (https://www.high-flyer.cn/history/) . High-Flyer (in Chinese (China)).

Retrieved 2 February 2025.

25. Ottinger, Lily (9 December 2024). "Deepseek: From Hedge Fund to Frontier Model Maker" (ht
tps://www.chinatalk.media/p/deepseek-from-hedge-fund-to-frontier) . ChinaTalk. Archived
(https://web.archive.org/web/20241228030725/https://www.chinatalk.media/p/deepseek-f
rom-hedge-fund-to-frontier) from the original on 28 December 2024. Retrieved
28 December 2024.

26. Olcott, Eleanor; Wu, Zijing (24 January 2025). "How small Chinese AI start-up DeepSeek
shocked Silicon Valley" (https://www.ft.com/content/747a7b11-dcba-4aa5-8d25-403f56216
d7e) . Financial Times. Retrieved 31 January 2025.

27. Leswing, Kif (23 February 2023). "Meet the $10,000 Nvidia chip powering the race for A.I." (h
ttps://www.cnbc.com/2023/02/23/nvidias-a100-is-the-10000-chip-powering-the-race-for-ai
-.html) CNBC. Retrieved 30 January 2025.

28. "hfreduce | 高性能的多卡并行通信工具" (https://www.high-flyer.cn/blog/hf-reduce/) . High-

Flyer. 4 March 2020. Retrieved 3 February 2025.

29. DeepSeek-AI; Liu, Aixin; Feng, Bei; Xue, Bing; Wang, Bingxuan; Wu, Bochao; Lu, Chengda;
Zhao, Chenggang; Deng, Chengqi (27 December 2024), DeepSeek-V3 Technical Report,
arXiv:2412.19437 (https://arxiv.org/abs/2412.19437)
30. An, Wei; Bi, Xiao; Chen, Guanting; Chen, Shanhuang; Deng, Chengqi; Ding, Honghui; Dong,
Kai; Du, Qiushi; Gao, Wenjun; Guan, Kang; Guo, Jianzhong; Guo, Yongqiang; Fu, Zhe; He, Ying;
Huang, Panpan (17 November 2024). "Fire-Flyer AI-HPC: A Cost-Effective Software-
Hardware Co-Design for Deep Learning" (https://ieeexplore.ieee.org/document/1079319
3) . SC24: International Conference for High Performance Computing, Networking, Storage
and Analysis. IEEE. pp. 1–23. arXiv:2408.14158 (https://arxiv.org/abs/2408.14158) .
doi:10.1109/SC41406.2024.00089 (https://doi.org/10.1109%2FSC41406.2024.00089) .
ISBN 979-8-3503-5291-7.

31. " 独家|幻方量化回应市场关注：AGI不是用来炒股的，"和金融没关系" " (https://www.yicai.co

m/news/101732215.html) . Yicai. Retrieved 3 February 2025.

32. Yu, Xu (17 April 2023). "[Exclusive] Chinese Quant Hedge Fund High-Flyer Won't Use AGI to
Trade Stocks, MD Says" (https://www.yicaiglobal.com/news/exclusive-chinese-quant-fund-h
igh-flyer-will-not-use-agi-to-trade-stocks-managing-director-says) . Yicai Global. Archived (h
ttps://web.archive.org/web/20231231030712/https://www.yicaiglobal.com/news/exclusive
-chinese-quant-fund-high-flyer-will-not-use-agi-to-trade-stocks-managing-director-says)
from the original on 31 December 2023. Retrieved 28 December 2024.

33. Jiang, Ben; Perezi, Bien (1 January 2025). "Meet DeepSeek: the Chinese start-up that is
changing how AI models are trained" (https://www.scmp.com/tech/tech-trends/article/329
3050/meet-deepseek-chinese-start-changing-how-ai-models-are-trained) . South China
Morning Post. Archived (https://web.archive.org/web/20250122160046/https://www.scmp.
com/tech/tech-trends/article/3293050/meet-deepseek-chinese-start-changing-how-ai-mod
els-are-trained) from the original on 22 January 2025. Retrieved 1 January 2025.

34. McMorrow, Ryan; Olcott, Eleanor (9 June 2024). "The Chinese quant fund-turned-AI pioneer"
(https://www.ft.com/content/357f3c68-b866-4c2e-b678-0d075051a260) . Financial Times.
Archived (https://web.archive.org/web/20240717030903/https://www.ft.com/content/357f
3c68-b866-4c2e-b678-0d075051a260) from the original on 17 July 2024. Retrieved
28 December 2024.

35. DeepSeek-AI; Bi, Xiao; Chen, Deli; Chen, Guanting; Chen, Shanhuang; Dai, Damai; Deng,
Chengqi; Ding, Honghui; Dong, Kai (5 January 2024), DeepSeek LLM: Scaling Open-Source
Language Models with Longtermism, arXiv:2401.02954 (https://arxiv.org/abs/2401.02954)

36. Dai, Damai; Deng, Chengqi; Zhao, Chenggang; Xu, R. X.; Gao, Huazuo; Chen, Deli; Li, Jiashi;
Zeng, Wangding; Yu, Xingkai (11 January 2024), DeepSeekMoE: Towards Ultimate Expert
Specialization in Mixture-of-Experts Language Models, arXiv:2401.06066 (https://arxiv.org/ab
s/2401.06066)
37. Shao, Zhihong; Wang, Peiyi; Zhu, Qihao; Xu, Runxin; Song, Junxiao; Bi, Xiao; Zhang, Haowei;
Zhang, Mingchuan; Li, Y. K. (27 April 2024), DeepSeekMath: Pushing the Limits of
Mathematical Reasoning in Open Language Models, arXiv:2402.03300 (https://arxiv.org/abs/
2402.03300) .

38. DeepSeek-AI; Zhu, Qihao; Guo, Daya; Shao, Zhihong; Yang, Dejian; Wang, Peiyi; Xu, Runxin;
Wu, Y.; Li, Yukun (17 June 2024), DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source
Models in Code Intelligence, arXiv:2406.11931 (https://arxiv.org/abs/2406.11931)

39. "deepseek-ai/DeepSeek-V2.5 · Hugging Face" (https://huggingface.co/deepseek-ai/DeepSe

ek-V2.5) . Hugging Face. 3 January 2025. Retrieved 28 January 2025.

40. "Deepseek Log in page" (https://chat.deepseek.com/sign_in) . DeepSeek. Retrieved

30 January 2025.

41. "News | DeepSeek-R1-Lite Release 2024/11/20: 🚀 DeepSeek-R1-Lite-Preview is now live:

unleashing supercharged reasoning power!" (https://web.archive.org/web/2024112014132
4/https://api-docs.deepseek.com/news/news1120) . DeepSeek API Docs. Archived from
the original (https://api-docs.deepseek.com/news/news1120) on 20 November 2024.
Retrieved 28 January 2025.

42. Field, Hayden (27 January 2025). "China's DeepSeek AI dethrones ChatGPT on App Store:
Here's what you should know" (https://www.cnbc.com/2025/01/27/chinas-deepseek-ai-top
s-chatgpt-app-store-what-you-should-know.html) . CNBC.

43. Picchi, Aimee (27 January 2025). "What is DeepSeek, and why is it causing Nvidia and other
stocks to slump?" (https://www.cbsnews.com/news/what-is-deepseek-ai-china-stock-nvidia
-nvda-asml/) . CBS News.

44. Nuñez, Michael (24 March 2025). "DeepSeek-V3 now runs at 20 tokens per second on Mac
Studio, and that's a nightmare for OpenAI" (https://venturebeat.com/ai/deepseek-v3-now-ru
ns-at-20-tokens-per-second-on-mac-studio-and-thats-a-nightmare-for-openai/) .
VentureBeat. Retrieved 24 March 2025.

45. "deepseek-ai/DeepSeek-V3-0324 · Hugging Face" (https://huggingface.co/deepseek-ai/Dee

pSeek-V3-0324) . Hugging Face. Retrieved 24 March 2025.

46. " 大模型价格又砍一刀这次"屠夫"竟是量化私募？" (https://www.cls.cn/detail/1672635) .

www.cls.cn. 10 May 2024. Retrieved 3 February 2025.

47. Schneider, Jordan (27 November 2024). "Deepseek: The Quiet Giant Leading China's AI
Race" (https://www.chinatalk.media/p/deepseek-ceo-interview-with-chinas) . ChinaTalk.
Retrieved 28 December 2024.

48. " 幻方力量 | 高速文件系统 3FS" (https://www.high-flyer.cn/blog/3fs/) . High-Flyer. 13 June

2019. Retrieved 3 February 2025.
49. deepseek-ai/3FS (https://github.com/deepseek-ai/3FS) , DeepSeek, 28 February 2025,
retrieved 28 February 2025

50. "HFAiLab/hai-platform" (https://github.com/HFAiLab/hai-platform) , High-Flyer, 2 February

2025, retrieved 3 February 2025

51. DeepSeek-AI; Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Zhang, Ruoyu; Xu,
Runxin; Zhu, Qihao; Ma, Shirong (22 January 2025), DeepSeek-R1: Incentivizing Reasoning
Capability in LLMs via Reinforcement Learning, arXiv:2501.12948 (https://arxiv.org/abs/250
1.12948)

52. Gibney, Elizabeth (23 January 2025). "China's cheap, open AI model DeepSeek thrills
scientists" (https://www.nature.com/articles/d41586-025-00229-6) . Nature. 638 (8049):
13–14. Bibcode:2025Natur.638...13G (https://ui.adsabs.harvard.edu/abs/2025Natur.638...1
3G) . doi:10.1038/d41586-025-00229-6 (https://doi.org/10.1038%2Fd41586-025-00229-
6) . PMID 39849139 (https://pubmed.ncbi.nlm.nih.gov/39849139) . Retrieved
12 February 2025.

53. "DeepSeek-Coder/LICENSE-MODEL at main · deepseek-ai/DeepSeek-Coder" (https://github.

com/deepseek-ai/DeepSeek-Coder/blob/main/LICENSE-MODEL) . GitHub. Archived (http
s://web.archive.org/web/20250122195853/https://github.com/deepseek-ai/deepseek-code
r/blob/main/LICENSE-MODEL) from the original on 22 January 2025. Retrieved
24 January 2025.

54. Guo, Daya; Zhu, Qihao; Yang, Dejian; Xie, Zhenda; Dong, Kai; Zhang, Wentao; Chen, Guanting;
Bi, Xiao; Wu, Y. (26 January 2024), DeepSeek-Coder: When the Large Language Model Meets
Programming – The Rise of Code Intelligence, arXiv:2401.14196 (https://arxiv.org/abs/2401.
14196)

55. "DeepSeek Coder" (https://deepseekcoder.github.io/) . deepseekcoder.github.io. Retrieved

27 January 2025.

56. deepseek-ai/DeepSeek-Coder (https://github.com/deepseek-ai/deepseek-coder/) ,

DeepSeek, 27 January 2025, retrieved 27 January 2025

57. "deepseek-ai/deepseek-coder-5.7bmqa-base · Hugging Face" (https://huggingface.co/deep

seek-ai/deepseek-coder-5.7bmqa-base) . Hugging Face. Retrieved 27 January 2025.

58. deepseek-ai/DeepSeek-LLM (https://github.com/deepseek-ai/DeepSeek-LLM) , DeepSeek,

27 January 2025, retrieved 27 January 2025

59. Wang, Peiyi; Li, Lei; Shao, Zhihong; Xu, R. X.; Dai, Damai; Li, Yifei; Chen, Deli; Wu, Y.; Sui,
Zhifang (19 February 2024), Math-Shepherd: Verify and Reinforce LLMs Step-by-step without
Human Annotations, arXiv:2312.08935 (https://arxiv.org/abs/2312.08935) .
60. DeepSeek-AI; Liu, Aixin; Feng, Bei; Wang, Bin; Wang, Bingxuan; Liu, Bo; Zhao, Chenggang;
Dengr, Chengqi; Ruan, Chong (19 June 2024), DeepSeek-V2: A Strong, Economical, and
Efficient Mixture-of-Experts Language Model, arXiv:2405.04434 (https://arxiv.org/abs/2405.0
4434) .

61. Peng, Bowen; Quesnelle, Jeffrey; Fan, Honglu; Shippole, Enrico (1 November 2023), YaRN:
Efficient Context Window Extension of Large Language Models, arXiv:2309.00071 (https://arx
iv.org/abs/2309.00071) .

62. "config.json · deepseek-ai/DeepSeek-V2-Lite at main" (https://huggingface.co/deepseek-ai/

DeepSeek-V2-Lite/blob/main/config.json) . Hugging Face. 15 May 2024. Retrieved
28 January 2025.

63. "config.json · deepseek-ai/DeepSeek-V2 at main" (https://huggingface.co/deepseek-ai/Deep

Seek-V2/blob/main/config.json) . Hugging Face. 6 May 2024. Retrieved 28 January 2025.

64. Feng, Coco (25 March 2025). "DeepSeek wows coders with more powerful open-source V3
model" (https://www.scmp.com/tech/big-tech/article/3303798/deepseeks-upgraded-found
ational-model-excels-coding-and-maths) . South China Morning Post. Retrieved 6 April
2025.

65. "config.json · deepseek-ai/DeepSeek-V3 at main" (https://huggingface.co/deepseek-ai/Deep

Seek-V3/blob/main/config.json) . Hugging Face. 26 December 2024. Retrieved 28 January
2025.

66. Patel, Dylan; Kourabi, AJ; O'Laughlin, Dylan; Knuhtsen, Doug (31 January 2025). "DeepSeek
Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts" (h
ttps://semianalysis.com/2025/01/31/deepseek-debates/) . SemiAnalysis. Retrieved
13 February 2025.

67. Thubron, Rob (3 February 2025). "DeepSeek's AI costs far exceed $5.5 million claim, may
have reached $1.6 billion with 50,000 Nvidia GPUs" (https://www.techspot.com/news/1066
12-deepseek-ai-costs-far-exceed-55-million-claim.html) . TechSpot. Retrieved 13 February
2025.

68. Kajal, Kapil (31 January 2025). "Research exposes DeepSeek's AI training cost is not $6M,
it's a staggering $1.3B" (https://www.yahoo.com/news/research-exposes-deepseek-ai-traini
ng-165025904.html) . Yahoo News. Retrieved 13 February 2025.

69. "Martin Vechev of INSAIT: "DeepSeek $6M Cost Of Training Is Misleading" " (https://therecur
sive.com/martin-vechev-of-insait-deepseek-6m-cost-of-training-is-misleading/) .
TheRecursive.com. 28 January 2025. Retrieved 13 February 2025.
70. Jiang, Ben (27 December 2024). "Chinese start-up DeepSeek's new AI model outperforms
Meta, OpenAI products" (https://www.scmp.com/tech/tech-trends/article/3292507/chinese
-start-deepseek-launches-ai-model-outperforms-meta-openai-products) . South China
Morning Post. Archived (https://web.archive.org/web/20241227191529/https://www.scmp.
com/tech/tech-trends/article/3292507/chinese-start-deepseek-launches-ai-model-outperfo
rms-meta-openai-products) from the original on 27 December 2024. Retrieved
28 December 2024.

71. Sharma, Shubham (26 December 2024). "DeepSeek-V3, ultra-large open-source AI,
outperforms Llama and Qwen on launch" (https://venturebeat.com/ai/deepseek-v3-ultra-lar
ge-open-source-ai-outperforms-llama-and-qwen-on-launch/) . VentureBeat. Archived (http
s://web.archive.org/web/20241227195503/https://venturebeat.com/ai/deepseek-v3-ultra-l
arge-open-source-ai-outperforms-llama-and-qwen-on-launch/) from the original on 27
December 2024. Retrieved 28 December 2024.

72. Wiggers, Kyle (26 December 2024). "DeepSeek's new AI model appears to be one of the
best 'open' challengers yet" (https://techcrunch.com/2024/12/26/deepseeks-new-ai-model-
appears-to-be-one-of-the-best-open-challengers-yet/) . TechCrunch. Archived (https://web.
archive.org/web/20250102103526/https://techcrunch.com/2024/12/26/deepseeks-new-ai
-model-appears-to-be-one-of-the-best-open-challengers-yet/) from the original on 2
January 2025. Retrieved 31 December 2024.

73. Edwards, Benj (21 January 2025). "Cutting-edge Chinese "reasoning" model rivals OpenAI
o1—and it's free to download" (https://arstechnica.com/ai/2025/01/china-is-catching-up-wi
th-americas-best-reasoning-ai-models/) . Ars Technica. Retrieved 16 February 2025.

74. Franzen, Carl (20 November 2024). "DeepSeek's first reasoning model R1-Lite-Preview turns
heads, beating OpenAI o1 performance" (https://venturebeat.com/ai/deepseeks-first-reaso
ning-model-r1-lite-preview-turns-heads-beating-openai-o1-performance/) . VentureBeat.
Archived (https://web.archive.org/web/20241122010413/https://venturebeat.com/ai/deep
seeks-first-reasoning-model-r1-lite-preview-turns-heads-beating-openai-o1-performance/)
from the original on 22 November 2024. Retrieved 28 December 2024.

75. Huang, Raffaele (24 December 2024). "Don't Look Now, but China's AI Is Catching Up Fast"
(https://www.wsj.com/tech/ai/china-ai-advances-us-chips-7838fd20) . The Wall Street
Journal. Archived (https://web.archive.org/web/20241227183842/https://www.wsj.com/tec
h/ai/china-ai-advances-us-chips-7838fd20) from the original on 27 December 2024.
Retrieved 28 December 2024.
76. "Release DeepSeek-R1 · deepseek-ai/DeepSeek-R1@23807ce" (https://github.com/deepsee
k-ai/DeepSeek-R1/commit/23807ced51627276434655dd9f27725354818974) . GitHub.
Archived (https://web.archive.org/web/20250121104009/https://github.com/deepseek-ai/
DeepSeek-R1/commit/23807ced51627276434655dd9f27725354818974) from the
original on 21 January 2025. Retrieved 21 January 2025.

77. Eduardo Baptista; Julie Zhu; Fanny Potkin (25 February 2025). "DeepSeek rushes to launch
new AI model as China goes all in" (https://www.reuters.com/technology/artificial-intelligen
ce/deepseek-rushes-launch-new-ai-model-china-goes-all-2025-02-25/) . Reuters. Retrieved
25 February 2025.

78. Roose, Kevin (28 January 2025). "Why DeepSeek Could Change What Silicon Valley Believe
About A.I." (https://www.nytimes.com/2025/01/28/technology/why-deepseek-could-chang
e-what-silicon-valley-believes-about-ai.html) The New York Times. ISSN 0362-4331 (http
s://search.worldcat.org/issn/0362-4331) . Retrieved 28 January 2025.

79. Gibney, Elizabeth (23 January 2025). "China's cheap, open AI model DeepSeek thrills
scientists" (https://www.nature.com/articles/d41586-025-00229-6) . Nature. 638 (8049):
13–14. Bibcode:2025Natur.638...13G (https://ui.adsabs.harvard.edu/abs/2025Natur.638...1
3G) . doi:10.1038/d41586-025-00229-6 (https://doi.org/10.1038%2Fd41586-025-00229-
6) . ISSN 1476-4687 (https://search.worldcat.org/issn/1476-4687) . PMID 39849139 (htt
ps://pubmed.ncbi.nlm.nih.gov/39849139) .

80. Chow, Andrew R.; Perrigo, Billy (30 January 2025). "Is the DeepSeek Panic Overblown?" (http
s://time.com/7211646/is-deepseek-panic-overblown/) . TIME. Retrieved 17 March 2025.

External links

Official website (https://www.deepseek.com/)

DeepSeek (https://github.com/deepseek-ai) on GitHub

DeepSeek (https://huggingface.co/deepseek-ai/DeepSeek-V2.5-1210) on Hugging Face

Official API documentation (https://api-docs.deepseek.com/)

Anthology of DeepSeek papers (https://huggingface.co/collections/Presidentlin/deepseek-pap

ers-674c536aa6acddd9bc98c2ac)

Research blog of High-Flyer (https://www.high-flyer.cn/blog/)

Distributed System MCQ
67% (3)
Distributed System MCQ
10 pages
Project 2 Info IT 140 SNHU
No ratings yet
Project 2 Info IT 140 SNHU
5 pages
TLE7-CSS Mod2 Part2 Performing Computer-Operation V4
100% (2)
TLE7-CSS Mod2 Part2 Performing Computer-Operation V4
28 pages
Rodiag DOC V50 en
No ratings yet
Rodiag DOC V50 en
79 pages
Spectre Parts Book
No ratings yet
Spectre Parts Book
96 pages
Tutorial Grad
100% (1)
Tutorial Grad
148 pages
Instruction Manual: MSA 230 Polivalent Electrofusion Unit
No ratings yet
Instruction Manual: MSA 230 Polivalent Electrofusion Unit
36 pages
FAT Filesystem
No ratings yet
FAT Filesystem
40 pages
Armatura SDK Guide - MultiBio 2.1 - Windows-202206
No ratings yet
Armatura SDK Guide - MultiBio 2.1 - Windows-202206
130 pages
More On Pandas
No ratings yet
More On Pandas
47 pages
1.5 IGCSE ICT Impact of Emerging Technologies Topic Question Paper 1 Set 1 WM
No ratings yet
1.5 IGCSE ICT Impact of Emerging Technologies Topic Question Paper 1 Set 1 WM
5 pages
Vissim 3Ds Max Script User Guide: Intro
No ratings yet
Vissim 3Ds Max Script User Guide: Intro
9 pages
Subject: Computer Organization & Architecture Saviya Varghese Dept of BCA 2020-21
No ratings yet
Subject: Computer Organization & Architecture Saviya Varghese Dept of BCA 2020-21
11 pages
AZ 900 Questions
No ratings yet
AZ 900 Questions
2 pages
Angix System3D Calibration
No ratings yet
Angix System3D Calibration
21 pages
Online Food Ordering System3
No ratings yet
Online Food Ordering System3
10 pages
Assignment # 2: F C Riphah International University
No ratings yet
Assignment # 2: F C Riphah International University
4 pages
Chapter 1 Question & Answers
No ratings yet
Chapter 1 Question & Answers
5 pages
Questions
No ratings yet
Questions
2 pages
Unit-10 Ajm
No ratings yet
Unit-10 Ajm
20 pages
Gis 10
No ratings yet
Gis 10
9 pages
CC 1.4 Resource Pooling-23
No ratings yet
CC 1.4 Resource Pooling-23
15 pages
Belkin Nostromo n52
100% (2)
Belkin Nostromo n52
16 pages
MDTools 980 What's New
No ratings yet
MDTools 980 What's New
15 pages
Canon CXDI Control Software tcm13-1009866
No ratings yet
Canon CXDI Control Software tcm13-1009866
3 pages
Unit 1
No ratings yet
Unit 1
14 pages
About DeepSeek
No ratings yet
About DeepSeek
1 page
Deep See
No ratings yet
Deep See
6 pages
Deep Seek
No ratings yet
Deep Seek
2 pages
DeepSeek Is A Chinese Artificial Intelligence Company That Has Rapidly Gained Prominence in The AI Industry
No ratings yet
DeepSeek Is A Chinese Artificial Intelligence Company That Has Rapidly Gained Prominence in The AI Industry
2 pages
Chinese AI Startup DeepSeek Shocks US Tech Industry
No ratings yet
Chinese AI Startup DeepSeek Shocks US Tech Industry
5 pages
Deepseek Notes
No ratings yet
Deepseek Notes
5 pages
Blister Research Paper
No ratings yet
Blister Research Paper
4 pages
Deepseek Meeting Points
No ratings yet
Deepseek Meeting Points
12 pages
AI Report
No ratings yet
AI Report
3 pages
DeepSeek AI
No ratings yet
DeepSeek AI
2 pages
Ai Research Paper
No ratings yet
Ai Research Paper
1 page
Deep Seek
No ratings yet
Deep Seek
5 pages
Dawn Editorial of The Day-97
No ratings yet
Dawn Editorial of The Day-97
2 pages
Deepseek AI How This Remarkable Technology Changed The World
No ratings yet
Deepseek AI How This Remarkable Technology Changed The World
14 pages
Deepsek 1
No ratings yet
Deepsek 1
2 pages
Lab Linux Complete
No ratings yet
Lab Linux Complete
113 pages
DeepSeek - Can It Create A Blue Ocean in The AI Industry
No ratings yet
DeepSeek - Can It Create A Blue Ocean in The AI Industry
2 pages
Chinas DeepSeek and The Criminal World of American AI
No ratings yet
Chinas DeepSeek and The Criminal World of American AI
53 pages
Deep Seek
No ratings yet
Deep Seek
2 pages
BEN59SN DeepSeek
No ratings yet
BEN59SN DeepSeek
12 pages
Es3a4 Cad
No ratings yet
Es3a4 Cad
12 pages
Deep Seek
No ratings yet
Deep Seek
6 pages
DeepSeek Ai Research
No ratings yet
DeepSeek Ai Research
3 pages
Innews - 28DEC - 2025 - Deepseek
No ratings yet
Innews - 28DEC - 2025 - Deepseek
8 pages
Ultimate Qlik Cloud Data Analytics and Data Integration: Master Data Integration and Analytics with Qlik Cloud to Drive Real-Time, Insightful, and Impactful Business Decisions Across Your Organization (English Edition)
From Everand
Ultimate Qlik Cloud Data Analytics and Data Integration: Master Data Integration and Analytics with Qlik Cloud to Drive Real-Time, Insightful, and Impactful Business Decisions Across Your Organization (English Edition)
Orange Board
No ratings yet
Image Collection Exploration: Unveiling Visual Landscapes in Computer Vision
From Everand
Image Collection Exploration: Unveiling Visual Landscapes in Computer Vision
Fouad Sabry
No ratings yet
DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Preparation
From Everand
DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Preparation
Georgio Daccache
No ratings yet
DeepSeek AI from Beginner to Paid Professional: Master DeepSeek with Hands-On Practice, Real-World Applications and Scalable AI Solutions
From Everand
DeepSeek AI from Beginner to Paid Professional: Master DeepSeek with Hands-On Practice, Real-World Applications and Scalable AI Solutions
Bolakale Aremu
No ratings yet
DeepSeek AI from Beginner to Paid Professional, 1: Master DeepSeek with Hands-On Practice, Real-World Applications and Scalable AI Solutions
From Everand
DeepSeek AI from Beginner to Paid Professional, 1: Master DeepSeek with Hands-On Practice, Real-World Applications and Scalable AI Solutions
Bolakale Aremu
No ratings yet
Cloud Native Apps on Google Cloud Platform: Use Serverless, Microservices and Containers to Rapidly Build and Deploy Apps on Google Cloud
From Everand
Cloud Native Apps on Google Cloud Platform: Use Serverless, Microservices and Containers to Rapidly Build and Deploy Apps on Google Cloud
alasdair gilchrist
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
The DeepSeek Millionaire : Simple Strategies for Wealth Creation at The Moment: The AI Wealth Engine, #1
From Everand
The DeepSeek Millionaire : Simple Strategies for Wealth Creation at The Moment: The AI Wealth Engine, #1
Sassi Souid
No ratings yet
Learning AWS
From Everand
Learning AWS
Aurobindo Sarkar
4/5 (4)
Hands-On Microservices with Kubernetes: Build, deploy, and manage scalable microservices on Kubernetes
From Everand
Hands-On Microservices with Kubernetes: Build, deploy, and manage scalable microservices on Kubernetes
Gigi Sayfan
5/5 (1)
Internet of Things Programming Projects: Build exciting IoT projects using Raspberry Pi 5, Raspberry Pi Pico, and Python
From Everand
Internet of Things Programming Projects: Build exciting IoT projects using Raspberry Pi 5, Raspberry Pi Pico, and Python
Colin Dow
No ratings yet
Edge Cloud Operations: A Systems Approach
From Everand
Edge Cloud Operations: A Systems Approach
Larry L Peterson
No ratings yet
Deepseek AI for Entrepreneurs
From Everand
Deepseek AI for Entrepreneurs
Nova Callahan
No ratings yet
The New Generative AI with LangChain Playbook: Build Scalable, Secure, and Production-Ready Multi-Agent Systems for Real-World Business Applications
From Everand
The New Generative AI with LangChain Playbook: Build Scalable, Secure, and Production-Ready Multi-Agent Systems for Real-World Business Applications
Stack Logic
No ratings yet
SRS - How to build a Pen Test and Hacking Platform
From Everand
SRS - How to build a Pen Test and Hacking Platform
alasdair gilchrist
2/5 (1)
Mastering DeepSeek AI: Unlocking the Power of Next-Generation Artificial Intelligence
From Everand
Mastering DeepSeek AI: Unlocking the Power of Next-Generation Artificial Intelligence
Mustaque Mohammed
No ratings yet
Internet of Things from Scratch: Build IoT solutions for Industry 4.0 with ESP32, Raspberry Pi, and AWS
From Everand
Internet of Things from Scratch: Build IoT solutions for Industry 4.0 with ESP32, Raspberry Pi, and AWS
Renaldi Gondosubroto
No ratings yet
How To Create An App
From Everand
How To Create An App
Duong Tran
3/5 (8)
Learn IoT Programming Using Node-RED: Begin to Code Full Stack IoT Apps and Edge Devices with Raspberry Pi, NodeJS, and Grafana
From Everand
Learn IoT Programming Using Node-RED: Begin to Code Full Stack IoT Apps and Edge Devices with Raspberry Pi, NodeJS, and Grafana
Bernardo Ronquillo Japón
No ratings yet
Introduction to TinyML
From Everand
Introduction to TinyML
Rohit Sharma
5/5 (1)
InduSoft Application Design and SCADA Deployment Recommendations for Industrial Control System Security
From Everand
InduSoft Application Design and SCADA Deployment Recommendations for Industrial Control System Security
Richard Clark
No ratings yet
Mastering Lead Generation with DeepSeek AI/ A Comprehensive Guide to Transforming Your Sales Strategy
From Everand
Mastering Lead Generation with DeepSeek AI/ A Comprehensive Guide to Transforming Your Sales Strategy
Robert Cullen
No ratings yet
Real-World Edge Computing: Scale, secure, and succeed in the realm of edge computing with Open Horizon
From Everand
Real-World Edge Computing: Scale, secure, and succeed in the realm of edge computing with Open Horizon
Robert High
No ratings yet
Apache Airflow Best Practices: A practical guide to orchestrating data workflow with Apache Airflow
From Everand
Apache Airflow Best Practices: A practical guide to orchestrating data workflow with Apache Airflow
Dylan Intorf
No ratings yet
Building AI Applications with Microsoft Semantic Kernel: Easily integrate generative AI capabilities and copilot experiences into your applications
From Everand
Building AI Applications with Microsoft Semantic Kernel: Easily integrate generative AI capabilities and copilot experiences into your applications
Lucas A. Meyer
No ratings yet
Learning Docker
From Everand
Learning Docker
Pethuru Raj
5/5 (5)
Hands-On Industrial Internet of Things: Build robust industrial IoT infrastructure by using the cloud and artificial intelligence
From Everand
Hands-On Industrial Internet of Things: Build robust industrial IoT infrastructure by using the cloud and artificial intelligence
Giacomo Veneri
No ratings yet
How To Program A Mobile Game
From Everand
How To Program A Mobile Game
Duong Tran
4/5 (1)
Essays on Infrastructure-as-code
From Everand
Essays on Infrastructure-as-code
Ravi Rajamani
No ratings yet
Learning Docker - Second Edition
From Everand
Learning Docker - Second Edition
Jeeva S. Chelladhurai
4/5 (1)
Instant Citrix XenApp
From Everand
Instant Citrix XenApp
Andrew Mallett
5/5 (1)
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
From Everand
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
Editor IJSMI
No ratings yet
A Greater Foundation for Machine Learning Engineering: The Hallmarks of the Great Beyond in Pytorch, R, Tensorflow, and Python
From Everand
A Greater Foundation for Machine Learning Engineering: The Hallmarks of the Great Beyond in Pytorch, R, Tensorflow, and Python
Dr. Ganapathi Pulipaka
No ratings yet
Shedding Light on Cloud Computing
From Everand
Shedding Light on Cloud Computing
Gregor Petri
5/5 (1)
Internet of Things with Intel Galileo
From Everand
Internet of Things with Intel Galileo
Miguel de Sousa
No ratings yet
CodeIgniter 1.7
From Everand
CodeIgniter 1.7
David Upton
No ratings yet
Learning DHTMLX Suite UI
From Everand
Learning DHTMLX Suite UI
Eli Geske
No ratings yet
Professional Heroku Programming
From Everand
Professional Heroku Programming
Chris Kemp
4/5 (2)

DeepSeek - Wikipedia

Uploaded by

DeepSeek - Wikipedia

Uploaded by

DeepSeek

Released under the MIT License, DeepSeek-R1

software.[17][18] The company reportedly Key people Liang Wenfeng (CEO)

Website deepseek.com (http

techniques such as mixture of experts (MoE)

Founding and early years (2016–2023)

Model releases (2023–present)

The DeepSeek login page following a

On 20 January 2025, DeepSeek launched the DeepSeek chatbot—based on the DeepSeek-R1

On 24 March 2025, DeepSeek released DeepSeek-V3-0324 under the MIT License.[44][45]

hfreduce : Library for asynchronous communication, originally designed to replace Nvidia

HaiScale Distributed Data Parallel (DDP): Parallel training library that

Development and release history

Major versions of DeepSeek models. SFT stands for supervised finetuning.

DeepSeek 2 Nov Base (pretrained); Instruct

Base; The architecture is essentially the same as Llama.

Base Initialized with DS-Coder-Base-v1.5

DeepSeek- Instruct (with SFT)

DeepSeek-V2-Lite, Developed multi-head latent attention (MLA). Also used

The training program was:[54][55][56]

DeepSeek Coder properties[54]: Table 2 [57]

1.3B 24 2048 5504 16 16

5.7B 32 4096 11008 32 1[note 2]

6.7B 32 4096 11008 32 32

33B 62 7168 19200 56 7[note 2]

DeepSeek LLM properties[35]: Table 2

67B 95 8192 22016 64 8[note 2]

1. Initialize with a previously pretrained DeepSeek-Coder Base v1.5 7B.

The architecture of V2, showing both shared-routed

DeepSeek V2 properties[60]: Section 3.1.2, Appendix B [62][63]

Name Params. Active params Context length

V2-Lite 15.7B 2.4B 27 32K 2 64

V2 236B 21B 60 128K 2 160

The DeepSeek-Coder V2 series included V2-Base, V2-Lite-Base, V2-Instruct, and V20-Lite-

DeepSeek-V2.5 was made by combining DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct.[39]

1. Pretraining on 14.8T tokens of a multilingual corpus, mostly English and Chinese. It

DeepSeek V3 properties[29]: Section 4.2 [65]

Name Params. Active params Context length

V3 671B 37B 61 128K 1 256

Pre-training 2,664 5.328

Context extension 119 0.24

Total 2,788 5.576

DeepSeek-R1-Lite-Preview[40][41][note 4] was trained for logical inference, mathematical reasoning,

DeepSeek-R1-Zero was trained exclusively

address these issues and further improve

4. SFT DeepSeek-V3-Base on the 800K synthetic data for 2 epochs.

Free and open-source

DeepSeek (chatbot) – Chatbot developed by DeepSeek

Artificial intelligence industry in China

OpenAI – Artificial intelligence research organization

Jevons paradox – Efficiency leads to increased demand

Liang Wenfeng - Founder of DeepSeek and High Flyer

High Flyer - Chinese hedge fund (Owner of DeepSeek)

CCP (Chinese Communist Party)

Tiananmen Square - Prohibited topic in DeepSeek chat queries

a. Chinese: 杭州深度求索人工智能基础技术研究有限公司. [6]

深度求索; pinyin: Shēndù Qiúsuǒ

1. 宁波程信柔兆企业管理咨询合伙企业（有限合伙） and 宁波程恩企业管理咨询合伙企业（有

1. "DeepSeek 突传消息" (https://finance.sina.com.cn/jjxw/2025-02-01/doc-inehyqcx9694053.s

4. "DeepSeek Coder Model Service Agreement" (https://chat.deepseek.com/downloads/Deep

5. "DeepSeek Coder Privacy Policy" (https://chat.deepseek.com/downloads/DeepSeek%20Co

6. " 全国互联网安全管理平台" (https://beian.mps.gov.cn/#/query/webSearch?code=33010502

9. "Behind DeepSeek lies a dazzling Chinese university" (https://www.economist.com/china/2

24. " 幻方 | 幻方历程" (https://www.high-flyer.cn/history/) . High-Flyer (in Chinese (China)).

28. "hfreduce | 高性能的多卡并行通信工具" (https://www.high-flyer.cn/blog/hf-reduce/) . High-

31. " 独家|幻方量化回应市场关注：AGI不是用来炒股的，"和金融没关系" " (https://www.yicai.co

39. "deepseek-ai/DeepSeek-V2.5 · Hugging Face" (https://huggingface.co/deepseek-ai/DeepSe

40. "Deepseek Log in page" (https://chat.deepseek.com/sign_in) . DeepSeek. Retrieved

41. "News | DeepSeek-R1-Lite Release 2024/11/20: 🚀 DeepSeek-R1-Lite-Preview is now live:

45. "deepseek-ai/DeepSeek-V3-0324 · Hugging Face" (https://huggingface.co/deepseek-ai/Dee

46. " 大模型价格又砍一刀 这次"屠夫"竟是量化私募？" (https://www.cls.cn/detail/1672635) .

48. " 幻方力量 | 高速文件系统 3FS" (https://www.high-flyer.cn/blog/3fs/) . High-Flyer. 13 June

50. "HFAiLab/hai-platform" (https://github.com/HFAiLab/hai-platform) , High-Flyer, 2 February

53. "DeepSeek-Coder/LICENSE-MODEL at main · deepseek-ai/DeepSeek-Coder" (https://github.

55. "DeepSeek Coder" (https://deepseekcoder.github.io/) . deepseekcoder.github.io. Retrieved

56. deepseek-ai/DeepSeek-Coder (https://github.com/deepseek-ai/deepseek-coder/) ,

46. " 大模型价格又砍一刀这次"屠夫"竟是量化私募？" (https://www.cls.cn/detail/1672635) .