Instructions to use maldv/winter-garden-7b-alpha with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use maldv/winter-garden-7b-alpha with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="maldv/winter-garden-7b-alpha")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("maldv/winter-garden-7b-alpha")
model = AutoModelForCausalLM.from_pretrained("maldv/winter-garden-7b-alpha")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use maldv/winter-garden-7b-alpha with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "maldv/winter-garden-7b-alpha"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "maldv/winter-garden-7b-alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/maldv/winter-garden-7b-alpha

SGLang

How to use maldv/winter-garden-7b-alpha with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "maldv/winter-garden-7b-alpha" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "maldv/winter-garden-7b-alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "maldv/winter-garden-7b-alpha" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "maldv/winter-garden-7b-alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use maldv/winter-garden-7b-alpha with Docker Model Runner:
```
docker model run hf.co/maldv/winter-garden-7b-alpha
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Winter Garden 7B - α - "Smart Assistant"

It was mentioned that we are in the open ai dark winter; so I thought I would make myself a nice winter garden.

An experiment

I've merged four partitions successfully in the past, so lets go for 9! I started with:

Mistral-7B-v0.1

and merged in

OmniBeagleSquaredMBX-v3-7B
ZySec-7B-v1
Omningotex-7b-slerp
Erosumika-7B
LemonadeRP-4.5.3
Thespis-Krangled-7b
pastiche-crown-clown-7b-dare
Snorkel-Mistral-PairRM-DPO
multi_verse_model

9-partition merge

All of the layers were partitioned in to 9 random bins. Alternating models were slerped at [0...1], and [1...0] gradients; except attention, which was slerped at 0.03.

This means that the model is still predominantly ordered around base mistral - including half of the input and output layers, and 28% of attention.

Other

Includes fast tokenizer.

Chat Template

I put a conversational chat template, which takes "name", "to" (optional), and "content" as the turns. It is designed to follow a transcript style chat which is used by some of the models. This type of use-case is best done by outlining a scene and creating a character card.

### {% title %}
{% metadata %}

USER: Hello

ASSISTANT: Hi, how are you?

It leans to being a coder when given an ### Instruction, follows <s>[INST][/INST], and likes <|user|>, <|assistant|> as well.

A quite cheery and intelligent model. Very good with science and math, but still capable of a decent amount of creativity for a 7b model.

Scores

Metric	Score
Average	66.91
ARC	65.19
HellaSwag	85.36
MMLU	65.2
TruthfulQA	50.94
Winogrande	80.35
GSM8K	54.44

Details

Downloads last month: 121

Safetensors

Model size

7B params

Tensor type

F16

Model tree for maldv/winter-garden-7b-alpha

CorticalStack/pastiche-crown-clown-7b-dare

KatyTheCutie/LemonadeRP-4.5.3

MTSAIR/multi_verse_model

ZySec-AI/SecurityLLM

cgato/Thespis-Krangled-7b

liminerity/Omningotex-7b-slerp

paulml/OmniBeagleSquaredMBX-v3-7B

snorkelai/Snorkel-Mistral-PairRM-DPO

Merge model

this model

Finetunes

1 model

Quantizations

3 models

Spaces using maldv/winter-garden-7b-alpha 9

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

65.190
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

85.360
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

65.200
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

50.940
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

80.350
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

54.440