jsbaicenter
/

r1-1776-distill-llama-70b-FP8-Dynamic

Text Generation

text-generation-inference

compressed-tensors

Model card Files Files and versions

laurahigueras commited on 29 days ago

Commit

ac9d344

·

verified ·

1 Parent(s): 348dbf5

improved description

Files changed (1) hide show

README.md +21 -4

README.md CHANGED Viewed

@@ -1,13 +1,30 @@
 ---
 license: mit
-base_model:
-- deepseek-ai/DeepSeek-R1-Distill-Llama-70B
 library_name: transformers
 ---
-The [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/perplexity-ai/r1-1776-distill-llama-70b) model quantized to fp8.
-# quantization using llm_compressor
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 from llmcompressor.transformers import oneshot

 ---
 license: mit
+base_model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
+pipeline_tag: text-generation
 library_name: transformers
+tags:
+- deepseek
+- llama
+- quantization
+- fp8
+- llm-compressor
+- text-generation
 ---
+# DeepSeek-R1-Distill-Llama-70B-FP8-Dynamic
+FP8 dynamic quantization pipeline for DeepSeek-R1-Distill-Llama-70B using `llm_compressor`.
+---
+## Overview
+- This repository demonstrates how to apply FP8 dynamic quantization to the DeepSeek-R1-Distill-Llama-70B model.
+- The goal is to reduce memory usage and improve inference efficiency while maintaining strong performance for large language model tasks.
+> ⚠️ This is a quantization pipeline, not a pre-quantized checkpoint.
+---
+## Usage
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 from llmcompressor.transformers import oneshot