728x90
https://huggingface.co/blog/sdiazlor/fine-tune-deepseek-with-a-synthetic-reasoning-data
Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset
Hello, thank you so much for the excellent tutorials and solutions! If possible, could you please create a tutorial on running a synthetic data generator locally? As others have mentioned, Llama can be quite unstable, and I'd like to try running it locally
huggingface.co
드디어 일주일 휴가내고 베트남 다낭에서 딥시크 파인튜닝하는날,, ㅎㅎ!!
Model Load
from unsloth import FastLanguageModel
MODEL = "unsloth/DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit"
# Load the model
# Load the 4bit pre quantized model of deepseek and the tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = MODEL,
max_seq_length = 2048,
dtype = None,
load_in_4bit = True,
)
# We add the LORA adapters to the model
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=[
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
random_state=3407,
use_rslora=False,
loftq_config=None,
)
Data Loader
from datasets import load_dataset
# Prepare the dataset
prompt_style = """Below is an instruction that describes a task, paired with a question that provides further context.
Write a response that appropriately answers the question.
Before answering, think carefully but concisely about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:
You are an expert programmer with advanced knowledge of Python. Your task is to provide concise and easy-to-understand solutions. Please answer the following python question.
### Question:
{}
### Response:
<think>
{}
"""
EOS_TOKEN = tokenizer.eos_token
def formatting_prompts_func(examples):
prompts = examples["prompt"]
completions = examples["completion"]
texts = []
for prompt,completion in zip(prompts, completions):
text = prompt_style.format(prompt, completion) + EOS_TOKEN
texts.append(text)
return {
"text": texts,
}
dataset = load_dataset("sdiazlor/python-reasoning-dataset", split="train")
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]
Model Train
from huggingface_hub import login
hf_token = input("🤗 Enter your Hugging Face token: ").strip()
login(token=hf_token)
from trl import SFTTrainer
from transformers import TrainingArguments
fine_tuned_model = "seyeon-shijuan/deepseek-r1-distill-qwen-1.5-unsloth-sft-python"
MODEL_NAME = "./deepseek-r1-distill-qwen-1.5-unsloth-sft-python"
training_arguments = TrainingArguments(
output_dir=MODEL_NAME,
per_device_train_batch_size=4,
num_train_epochs=3,
logging_steps=10,
save_steps=100,
learning_rate=2e-4,
# fp16=True,
# bf16=False,
fp16=False, # <--- float16 비활성화
bf16=True, # <--- bfloat16 사용
push_to_hub=False,
report_to="none"
)
# Configure the trainer
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=2048,
dataset_num_proc=2,
packing=False,
args=training_arguments,
)
# Train the model
trainer_stats = trainer.train()
# Save the fine-tuned model
model.save_pretrained_merged(MODEL_NAME, tokenizer, save_method="merged_16bit")
model.push_to_hub_merged(fine_tuned_model, tokenizer, save_method="merged_16bit")
model.push_to_hub_gguf(
f"{fine_tuned_model}_q4_k_m", tokenizer, quantization_method="q4_k_m"
)
Inference
# Run inference
question = "How can I get the prime numbers from 0 to 125?"
FastLanguageModel.for_inference(model)
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")
outputs = model.generate(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
max_new_tokens=2048,
use_cache=True,
)
response = tokenizer.batch_decode(outputs)
response
Output
["<|begin▁of▁sentence|>Below is an instruction that describes a task, paired with a question that provides further context.
Write a response that appropriately answers the question.
Before answering, think carefully but concisely about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:
You are an expert programmer with advanced knowledge of Python. Your task is to provide concise and easy-to-understand solutions. Please answer the following python question.### Question:How can I get the prime numbers from 0 to 125?### Response:
<think>Before I can find the prime numbers between 0 and 125, I need to understand what a prime number is. A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. So, I need to check each number in that range to see if it meets this criterion.I'll start by initializing an empty list to store the prime numbers. Then, I'll loop through each number from 0 to 125. For each number, I'll check if it's greater than 1 because 0 and 1 are not considered prime. If it is, I'll proceed to check its divisibility.
For each number, I'll check if it's divisible by any number from 2 up to the square root of the number. This is an optimization because if a number has a factor larger than its square root, the corresponding factor would be smaller than the square root. If any of these divisions result in a remainder of 0, the number is not prime. If none do, then the number is prime.
After checking each number, I'll add it to the list of primes if it satisfies the condition. Finally, I'll print out the list of prime numbers between 0 and 125.</think>
To find the prime numbers between 0 and 125, we need to identify numbers greater than 1 that have no divisors other than 1 and themselves. Here's a step-by-step explanation:
1. **Initialize an empty list** to store the prime numbers.
2. **Loop through each number** from 2 to 125.
3. **Check if the number is greater than 1**. If it's not, skip to the next number.
4. **Check for divisors** from 2 up to the square root of the number.
5. **If any number divides the current number evenly**, it's not prime. Move to the next number.
6. **If no divisors are found**, add the number to the list of primes.
7. **Print the list of primes**.Here's the Python code implementing this:
```python
primes = []
for num in range(2, 126):
is_prime = True
for divisor in range(2, int(num ** 0.5) + 1):
if num % divisor == 0:
is_prime = False
break
if is_prime:
primes.append(num)
print(primes)
```
**Output:**
```
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
<|end▁of▁sentence|>"]
딥시크에서는 <think>라는 스페셜 토큰이 Chain of Thought 역할을 하고, output 결괏값을 더 잘 나올 수 있게 하는 역할을 한다.
기존에 CoT나 Plan and Execute, Reflection등으로 output 흐름 제어를 했는데,
딥시크의 think process와 output이 어떻게 연결되는지, 파인튜닝, RL 등을 더 공부해볼 예정이다.
😍😍
반응형
'AI > LLM' 카테고리의 다른 글
Hugging Face StackLLaMA - RLHF로 LLaMA 모델 훈련 (기술 블로그 정리) (0) | 2024.01.06 |
---|---|
LLM의 Reinforcement Learning & RLHR & DPO (0) | 2024.01.05 |