Instructions to use LumiOpen/Llama-Poro-2-8B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LumiOpen/Llama-Poro-2-8B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LumiOpen/Llama-Poro-2-8B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("LumiOpen/Llama-Poro-2-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("LumiOpen/Llama-Poro-2-8B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use LumiOpen/Llama-Poro-2-8B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LumiOpen/Llama-Poro-2-8B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LumiOpen/Llama-Poro-2-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LumiOpen/Llama-Poro-2-8B-Instruct

SGLang

How to use LumiOpen/Llama-Poro-2-8B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LumiOpen/Llama-Poro-2-8B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LumiOpen/Llama-Poro-2-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LumiOpen/Llama-Poro-2-8B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LumiOpen/Llama-Poro-2-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use LumiOpen/Llama-Poro-2-8B-Instruct with Docker Model Runner:
```
docker model run hf.co/LumiOpen/Llama-Poro-2-8B-Instruct
```

On the mtbench settings

by RASMUS - opened Nov 20, 2025

Discussion

RASMUS

Nov 20, 2025

•

edited Nov 20, 2025

Hi,

I am trying to get the results for this model per category in mtbench as it was not published in this repo but I got some issues:

When I run the mtbench on this model I quess there is something wrong as this model mostly speaks in English for Finnish mtbench. Did you modify some settings?
My command from latest LumiOpen/FastChat
!python gen_model_answer.py --model-path /home/jovyan/work/FastChat/Llama-Poro-2-8B-Instruct --model-id LumiOpen/Llama-Poro-2-8B-Instruct --lang fi --max-new-token 2048

Testing starts like this:

CONV:
Conversation(name='poro', system_template='<|im_start|>system\n{system_message}', system_message='', system_message_vision='', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=<SeparatorStyle.CHATML: 10>, sep='<|im_end|>', sep2=None, stop_str=None, stop_token_ids=[4, 5], max_image_size_mb=None)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.

OUTPUT:
The Alps and the Rhine River have had a significant impact on the settlement and agriculture of Western Europe. Here are three key effects:

1 Natural Barrier and Corridor: The Alps acted as a natural barrier, separating Western Europe from Eastern Europe and the rest of Central Europe. However, they also created a corridor for trade and migration along the Rhine River, which flows from the Swiss Alps through Germany and into the North Sea. This corridor facilitated the movement of people, goods, and ideas, leading to the development of trade networks and cultural exchange.

2 Agricultural Limitations and Opportunities: The Alps presented significant agricultural challenges due to their rugged terrain and high altitudes. However, the valleys and plains along the Rhine River offered fertile land suitable for farming. This led to the development of settled agriculture and the establishment of communities in these areas. The availability of arable land and water resources supported the growth of populations and the emergence of complex societies.

3 Settlement Patterns and Cultural Diversity: The Alps and the Rhine River influenced settlement patterns in Western Europe, leading to a diverse range of cultural traditions. The mountainous regions were often inhabited by isolated communities with distinct cultures, while the river valleys attracted larger populations and facilitated cultural exchange between different groups. This diversity contributed to the rich cultural heritage of Western Europe.

ezosa

LumiOpen org Nov 27, 2025

Hi,

Poro2 uses the Llama-3 chat template, this is different from the ChatML template we used for the original Poro. You can use the llama-3 template specified in conversation.py:
https://github.com/LumiOpen/FastChat/blob/main/fastchat/conversation.py#L1592
https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py#L1752

We will update the Poro2 model cards for the SFT and Instruct models with the per-category and per-turn MTBench scores. Thank you for pointing this out!

ezosa changed discussion status to closed Dec 10, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment