Instructions to use datatab/Yugo55-GPT-v4-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use datatab/Yugo55-GPT-v4-4bit with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="datatab/Yugo55-GPT-v4-4bit")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("datatab/Yugo55-GPT-v4-4bit")
model = AutoModelForCausalLM.from_pretrained("datatab/Yugo55-GPT-v4-4bit")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use datatab/Yugo55-GPT-v4-4bit with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "datatab/Yugo55-GPT-v4-4bit"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "datatab/Yugo55-GPT-v4-4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/datatab/Yugo55-GPT-v4-4bit

SGLang

How to use datatab/Yugo55-GPT-v4-4bit with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "datatab/Yugo55-GPT-v4-4bit" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "datatab/Yugo55-GPT-v4-4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "datatab/Yugo55-GPT-v4-4bit" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "datatab/Yugo55-GPT-v4-4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use datatab/Yugo55-GPT-v4-4bit with Docker Model Runner:
```
docker model run hf.co/datatab/Yugo55-GPT-v4-4bit
```

Yugo55-GPT-v4-4bit

Developed by: datatab
License: mit
Quantized from model : datatab/Yugo55-GPT-v4

🧩 Configuration

models:
  - model: datatab/Serbian-Mistral-Orca-Slim-v1
    parameters:
      weight: 1.0
  - model: mlabonne/AlphaMonarch-7B
    parameters:
      weight: 1.0
  - model: datatab/YugoGPT-Alpaca-v1-epoch1-good
    parameters:
      weight: 1.0
merge_method: linear
dtype: float16

🏆 Results

Results obtained through the Serbian LLM evaluation, released by Aleksa Gordić: serbian-llm-eval

Evaluation was conducted on a 4-bit version of the model due to hardware resource constraints.

MODEL	ARC-E	ARC-C	Hellaswag	BoolQ	Winogrande	OpenbookQA	PiQA
*Yugo55-GPT-v4-4bit	51.41	36.00	57.51	80.92	65.75	34.70	70.54
Yugo55A-GPT	51.52	37.78	57.52	84.40	65.43	35.60	69.43

💻 Usage

!pip -q install git+https://github.com/huggingface/transformers # need to install from github
!pip install -q datasets loralib sentencepiece
!pip -q install bitsandbytes accelerate

from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "datatab/Yugo55-GPT-v4-4bit", torch_dtype="auto"
)

tokenizer = AutoTokenizer.from_pretrained(
    "datatab/Yugo55-GPT-v4-4bit", torch_dtype="auto"
)

from transformers import TextStreamer


def generate(question="", input="Odgovaraj uvek na Srpskom jeziku!!!"):
    alpaca_prompt = """Ispod je uputstvo koje opisuje zadatak, upareno sa unosom koji pruža dodatni kontekst. Napišite odgovor koji na odgovarajući način kompletira zahtev.
  
  ### Uputstvo:
   {}
  ### Unos:
   {}
  ### Odgovor:
   {}"""

    inputs = tokenizer(
        [
            alpaca_prompt.format(
                question,  # instruction
                input,  # input
                "",  # output - leave this blank for generation!
            )
        ],
        return_tensors="pt",
    ).to("cuda")

    text_streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
    _ = model.generate(
        **inputs,
        streamer=text_streamer,
        max_new_tokens=1024,
        temperature=0.1,
        repetition_penalty=1.11,
        top_p=0.92,
        top_k=1,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
        do_sample=True,
        use_cache=True
    )

generate("Nabroj mi sve planete suncevog sistemai reci mi koja je najveca planeta")

generate("Koja je razlika između lame, vikune i alpake?")

generate("Napišite kratku e-poruku Semu Altmanu dajući razloge za GPT-4 otvorenog koda")

Downloads last month: 5

Safetensors

Model size

7B params

Tensor type

F32

F16

Model tree for datatab/Yugo55-GPT-v4-4bit

Base model

mlabonne/Monarch-7B

Finetuned

mlabonne/NeuralMonarch-7B

Finetuned

mlabonne/AlphaMonarch-7B

Finetuned

datatab/Yugo55-GPT-v4

Quantized

(1)

this model

Datasets used to train datatab/Yugo55-GPT-v4-4bit

Collection including datatab/Yugo55-GPT-v4-4bit

Yugo-GPT

Collection

Yugo-GPT class of LLM (45, 55, 60) • 12 items • Updated Mar 2