0% found this document useful (0 votes)
264 views4 pages

DeepSeek Texte

Uploaded by

kbdsoft
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
264 views4 pages

DeepSeek Texte

Uploaded by

kbdsoft
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

DeepSeek-Coder-V2: An Open-Source Marvel in Code Intelligence

Introduction

The world of code intelligence has seen a significant breakthrough with the introduction
of DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model
that rivals closed-source giants like GPT-4 Turbo. This powerful model is a further pre-
trained version of DeepSeek-V2, enriched with an additional 6 trillion tokens. This
extensive training enhances its capabilities in coding and mathematical reasoning while
maintaining robust performance in general language tasks.

Advancements and Features

Performance and Benchmarks

DeepSeek-Coder-V2 excels in standard benchmark evaluations, surpassing notable


models such as GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro in both coding and
math benchmarks. Its superiority is evident in various code-related tasks and general
reasoning abilities.

Expanded Language Support

One of the standout features of DeepSeek-Coder-V2 is its expanded support for


programming languages, which has grown from 86 to an impressive 338. This makes it a
versatile tool for developers working across different coding environments.

Context Length

The model also boasts an extended context length, increasing from 16K to 128K. This
allows for more comprehensive and nuanced understanding and generation of code,
accommodating larger and more complex coding projects.
Model Availability

DeepSeek-Coder-V2 is available in several versions to cater to different needs:

• DeepSeek-Coder-V2-Lite-Base: 16B total parameters, 2.4B active parameters,


128K context length.

• DeepSeek-Coder-V2-Lite-Instruct: 16B total parameters, 2.4B active


parameters, 128K context length.

• DeepSeek-Coder-V2-Base: 236B total parameters, 21B active parameters, 128K


context length.

• DeepSeek-Coder-V2-Instruct: 236B total parameters, 21B active parameters,


128K context length.

These models can be accessed and downloaded from Hugging Face.

Using DeepSeek-Coder-V2

Chat Website and API

You can interact with DeepSeek-Coder-V2 on its official chat website or use the OpenAI-
compatible API provided by DeepSeek Platform. The API offers a pay-as-you-go pricing
model, making it accessible for various users.

Local Deployment

For those looking to run DeepSeek-Coder-V2 locally, the following examples illustrate
how to use the model with Hugging Face’s Transformers library:

Code Completion

from transformers import AutoTokenizer, AutoModelForCausalLM


import torch

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-
Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-
Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()

input_text = "#write a quick sort algorithm"


inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Code Insertion
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-
Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-
Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()

input_text = """<|fim_begin|>def quick_sort(arr):


if len(arr) <= 1:
return arr
pivot = arr[0]
left = []
right = []
<|fim_hole|>
if arr[i] < pivot:
left.append(arr[i])
else:
right.append(arr[i])
return quick_sort(left) + [pivot] + quick_sort(right)<|fim_end|>"""

inputs = tokenizer(input_text, return_tensors="pt").to(model.device)


outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])

Chat Completion

from transformers import AutoTokenizer, AutoModelForCausalLM


import torch

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-
Instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-
Lite-Instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()

messages=[{'role': 'user', 'content': "write a quick sort algorithm in python."}]


inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True,
return_tensors="pt").to(model.device)

outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50,


top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))

You might also like