DeepSeek finetuning tutorial

2025. 4. 8. 18:00

728x90

https://huggingface.co/blog/sdiazlor/fine-tune-deepseek-with-a-synthetic-reasoning-data

Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset

Hello, thank you so much for the excellent tutorials and solutions! If possible, could you please create a tutorial on running a synthetic data generator locally? As others have mentioned, Llama can be quite unstable, and I'd like to try running it locally

huggingface.co

드디어 일주일 휴가내고 베트남 다낭에서 딥시크 파인튜닝하는날,, ㅎㅎ!!

Model Load

from unsloth import FastLanguageModel

MODEL = "unsloth/DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit"

# Load the model
# Load the 4bit pre quantized model of deepseek and the tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = MODEL,
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

# We add the LORA adapters to the model
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Data Loader

from datasets import load_dataset
# Prepare the dataset

prompt_style = """Below is an instruction that describes a task, paired with a question that provides further context.
Write a response that appropriately answers the question.
Before answering, think carefully but concisely about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are an expert programmer with advanced knowledge of Python. Your task is to provide concise and easy-to-understand solutions. Please answer the following python question.

### Question:
{}

### Response:
<think>
{}
"""

EOS_TOKEN = tokenizer.eos_token
def formatting_prompts_func(examples):
    prompts = examples["prompt"]
    completions = examples["completion"]
    texts = []
    for prompt,completion in zip(prompts, completions):
        text = prompt_style.format(prompt, completion) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

dataset = load_dataset("sdiazlor/python-reasoning-dataset", split="train")
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]

Model Train

from huggingface_hub import login

hf_token = input("🤗 Enter your Hugging Face token: ").strip()

login(token=hf_token)

from trl import SFTTrainer
from transformers import TrainingArguments 
fine_tuned_model = "seyeon-shijuan/deepseek-r1-distill-qwen-1.5-unsloth-sft-python"
MODEL_NAME = "./deepseek-r1-distill-qwen-1.5-unsloth-sft-python"

training_arguments = TrainingArguments(
    output_dir=MODEL_NAME,
    per_device_train_batch_size=4,
    num_train_epochs=3,
    logging_steps=10,
    save_steps=100,
    learning_rate=2e-4,
    # fp16=True,
    # bf16=False,
    fp16=False,             # <--- float16 비활성화
    bf16=True,              # <--- bfloat16 사용
    push_to_hub=False,
    report_to="none"
)


# Configure the trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=2048,
    dataset_num_proc=2,
    packing=False,
    args=training_arguments,
)

# Train the model
trainer_stats = trainer.train()

# Save the fine-tuned model
model.save_pretrained_merged(MODEL_NAME, tokenizer, save_method="merged_16bit")
model.push_to_hub_merged(fine_tuned_model, tokenizer, save_method="merged_16bit")
model.push_to_hub_gguf(
    f"{fine_tuned_model}_q4_k_m", tokenizer, quantization_method="q4_k_m"
)

Inference

# Run inference
question = "How can I get the prime numbers from 0 to 125?"

FastLanguageModel.for_inference(model)
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=2048,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
response

Output

["<｜begin▁of▁sentence｜>Below is an instruction that describes a task, paired with a question that provides further context.

Write a response that appropriately answers the question.

Before answering, think carefully but concisely about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are an expert programmer with advanced knowledge of Python. Your task is to provide concise and easy-to-understand solutions. Please answer the following python question.### Question:How can I get the prime numbers from 0 to 125?### Response:

<think>Before I can find the prime numbers between 0 and 125, I need to understand what a prime number is. A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. So, I need to check each number in that range to see if it meets this criterion.I'll start by initializing an empty list to store the prime numbers. Then, I'll loop through each number from 0 to 125. For each number, I'll check if it's greater than 1 because 0 and 1 are not considered prime. If it is, I'll proceed to check its divisibility.

For each number, I'll check if it's divisible by any number from 2 up to the square root of the number. This is an optimization because if a number has a factor larger than its square root, the corresponding factor would be smaller than the square root. If any of these divisions result in a remainder of 0, the number is not prime. If none do, then the number is prime.

After checking each number, I'll add it to the list of primes if it satisfies the condition. Finally, I'll print out the list of prime numbers between 0 and 125.</think>

To find the prime numbers between 0 and 125, we need to identify numbers greater than 1 that have no divisors other than 1 and themselves. Here's a step-by-step explanation:
1. **Initialize an empty list** to store the prime numbers.
2. **Loop through each number** from 2 to 125.
3. **Check if the number is greater than 1**. If it's not, skip to the next number.
4. **Check for divisors** from 2 up to the square root of the number.
5. **If any number divides the current number evenly**, it's not prime. Move to the next number.
6. **If no divisors are found**, add the number to the list of primes.
7. **Print the list of primes**.Here's the Python code implementing this:

```python
primes = []
for num in range(2, 126):
    is_prime = True
    for divisor in range(2, int(num ** 0.5) + 1):
        if num % divisor == 0:
            is_prime = False
            break
    if is_prime:
        primes.append(num)

print(primes)
```

**Output:**
```
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
<｜end▁of▁sentence｜>"]

딥시크에서는 <think>라는 스페셜 토큰이 Chain of Thought 역할을 하고, output 결괏값을 더 잘 나올 수 있게 하는 역할을 한다.

기존에 CoT나 Plan and Execute, Reflection등으로 output 흐름 제어를 했는데,

딥시크의 think process와 output이 어떻게 연결되는지, 파인튜닝, RL 등을 더 공부해볼 예정이다.

😍😍

저작자표시 (새창열림)

'AI > LLM' 카테고리의 다른 글

Hugging Face StackLLaMA - RLHF로 LLaMA 모델 훈련 (기술 블로그 정리) (1)	2024.01.06
LLM의 Reinforcement Learning & RLHR & DPO (0)	2024.01.05

Shijuan's AI Diary

DeepSeek finetuning tutorial

'AI > LLM' 카테고리의 다른 글

티스토리툴바