Instructions to use tanliboy/llama-3.2-3b-dpo-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tanliboy/llama-3.2-3b-dpo-2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="tanliboy/llama-3.2-3b-dpo-2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tanliboy/llama-3.2-3b-dpo-2")
model = AutoModelForCausalLM.from_pretrained("tanliboy/llama-3.2-3b-dpo-2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use tanliboy/llama-3.2-3b-dpo-2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tanliboy/llama-3.2-3b-dpo-2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tanliboy/llama-3.2-3b-dpo-2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/tanliboy/llama-3.2-3b-dpo-2

SGLang

How to use tanliboy/llama-3.2-3b-dpo-2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "tanliboy/llama-3.2-3b-dpo-2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tanliboy/llama-3.2-3b-dpo-2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "tanliboy/llama-3.2-3b-dpo-2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tanliboy/llama-3.2-3b-dpo-2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use tanliboy/llama-3.2-3b-dpo-2 with Docker Model Runner:
```
docker model run hf.co/tanliboy/llama-3.2-3b-dpo-2
```

tanliboy commited on Oct 1, 2024

Commit

daad045

verified ·

1 Parent(s): e899843

Model save

Browse files

Files changed (7) hide show

README.md +90 -0
all_results.json +9 -0
generation_config.json +13 -0
model-00001-of-00002.safetensors +1 -1
model-00002-of-00002.safetensors +1 -1
train_results.json +9 -0
trainer_state.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,90 @@

+---
+library_name: transformers
+license: llama3.2
+base_model: tanliboy/llama-3.2-3b-sft-2
+tags:
+- trl
+- dpo
+- generated_from_trainer
+model-index:
+- name: llama-3.2-3b-dpo-2
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# llama-3.2-3b-dpo-2
+This model is a fine-tuned version of [tanliboy/llama-3.2-3b-sft-2](https://huggingface.co/tanliboy/llama-3.2-3b-sft-2) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.5808
+- Rewards/chosen: 1.8125
+- Rewards/rejected: -4.0822
+- Rewards/accuracies: 0.7880
+- Rewards/margins: 5.8947
+- Logps/rejected: -387.3112
+- Logps/chosen: -337.8669
+- Logits/rejected: 0.2355
+- Logits/chosen: 0.1785
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-07
+- train_batch_size: 4
+- eval_batch_size: 4
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 128
+- total_eval_batch_size: 32
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.03
+- num_epochs: 3
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.7596        | 0.1741 | 100  | 0.7588          | 0.1349         | -1.4398          | 0.6994             | 1.5747          | -360.8871      | -354.6434    | 0.6135          | 0.5482        |
+| 0.6725        | 0.3483 | 200  | 0.6680          | 0.6247         | -2.7323          | 0.7278             | 3.3569          | -373.8118      | -349.7451    | 0.5335          | 0.4718        |
+| 0.6452        | 0.5224 | 300  | 0.6514          | 0.1770         | -3.8036          | 0.75               | 3.9807          | -384.5256      | -354.2216    | 0.5477          | 0.4866        |
+| 0.6259        | 0.6966 | 400  | 0.6328          | 0.9885         | -3.5382          | 0.7722             | 4.5267          | -381.8713      | -346.1070    | 0.4531          | 0.3927        |
+| 0.5709        | 0.8707 | 500  | 0.6219          | 0.9150         | -4.0091          | 0.7816             | 4.9242          | -386.5804      | -346.8415    | 0.4148          | 0.3563        |
+| 0.5835        | 1.0448 | 600  | 0.6094          | 1.5034         | -3.6390          | 0.7722             | 5.1423          | -382.8790      | -340.9584    | 0.3504          | 0.2933        |
+| 0.5571        | 1.2190 | 700  | 0.5992          | 1.5696         | -3.7206          | 0.7690             | 5.2901          | -383.6949      | -340.2962    | 0.3217          | 0.2649        |
+| 0.5532        | 1.3931 | 800  | 0.5954          | 1.7147         | -3.7261          | 0.7785             | 5.4408          | -383.7506      | -338.8453    | 0.2961          | 0.2383        |
+| 0.5168        | 1.5673 | 900  | 0.5930          | 1.9934         | -3.3982          | 0.7753             | 5.3916          | -380.4709      | -336.0577    | 0.2838          | 0.2266        |
+| 0.5232        | 1.7414 | 1000 | 0.5884          | 1.7308         | -4.0024          | 0.7816             | 5.7332          | -386.5127      | -338.6839    | 0.2787          | 0.2220        |
+| 0.5574        | 1.9155 | 1100 | 0.5849          | 1.8420         | -3.9351          | 0.7911             | 5.7771          | -385.8401      | -337.5714    | 0.2706          | 0.2134        |
+| 0.5077        | 2.0897 | 1200 | 0.5842          | 1.6188         | -4.2472          | 0.7880             | 5.8659          | -388.9607      | -339.8043    | 0.2657          | 0.2083        |
+| 0.4952        | 2.2638 | 1300 | 0.5837          | 1.9316         | -3.8913          | 0.7816             | 5.8229          | -385.4018      | -336.6759    | 0.2694          | 0.2115        |
+| 0.5236        | 2.4380 | 1400 | 0.5812          | 1.8289         | -4.0636          | 0.7880             | 5.8925          | -387.1253      | -337.7025    | 0.2465          | 0.1895        |
+| 0.5001        | 2.6121 | 1500 | 0.5814          | 1.7432         | -4.1735          | 0.7848             | 5.9167          | -388.2242      | -338.5596    | 0.2395          | 0.1826        |
+| 0.5246        | 2.7862 | 1600 | 0.5809          | 1.8622         | -4.0120          | 0.7880             | 5.8742          | -386.6093      | -337.3701    | 0.2395          | 0.1825        |
+| 0.5042        | 2.9604 | 1700 | 0.5808          | 1.8125         | -4.0822          | 0.7880             | 5.8947          | -387.3112      | -337.8669    | 0.2355          | 0.1785        |
+### Framework versions
+- Transformers 4.44.2
+- Pytorch 2.4.0+cu121
+- Datasets 2.19.1
+- Tokenizers 0.19.1

all_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 2.998693948628646,
+    "total_flos": 0.0,
+    "train_loss": 0.5855818307081304,
+    "train_runtime": 16735.6732,
+    "train_samples": 73493,
+    "train_samples_per_second": 13.174,
+    "train_steps_per_second": 0.103
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 128000,
+  "do_sample": true,
+  "eos_token_id": [
+    128001,
+    128008,
+    128009
+  ],
+  "temperature": 0.6,
+  "top_p": 0.9,
+  "transformers_version": "4.44.2"
+}

model-00001-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9044ca739884300817f6156576c86a3643b437ea29214b3547c2363fbe18922b
 size 4965799096

 version https://git-lfs.github.com/spec/v1
+oid sha256:9736e76efe6fc87015d8a7e38b9307c0e6955c509756b36986de9cecad3301f4
 size 4965799096

model-00002-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e111d9658e94b5165e7626198f9d0a1c79b847e549b0a5a423ecd4a8bc231d76
 size 1459729952

 version https://git-lfs.github.com/spec/v1
+oid sha256:c52adbd020768e3816f99aa0fb43854c074a66f16b9b698142ec89da1b79a82f
 size 1459729952

train_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 2.998693948628646,
+    "total_flos": 0.0,
+    "train_loss": 0.5855818307081304,
+    "train_runtime": 16735.6732,
+    "train_samples": 73493,
+    "train_samples_per_second": 13.174,
+    "train_steps_per_second": 0.103
+}

trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff