laurahigueras commited on
Commit
ac9d344
·
verified ·
1 Parent(s): 348dbf5

improved description

Browse files
Files changed (1) hide show
  1. README.md +21 -4
README.md CHANGED
@@ -1,13 +1,30 @@
1
  ---
2
  license: mit
3
- base_model:
4
- - deepseek-ai/DeepSeek-R1-Distill-Llama-70B
5
  library_name: transformers
 
 
 
 
 
 
 
6
  ---
7
 
8
- The [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/perplexity-ai/r1-1776-distill-llama-70b) model quantized to fp8.
 
9
 
10
- # quantization using llm_compressor
 
 
 
 
 
 
 
 
 
11
  ```python
12
  from transformers import AutoTokenizer, AutoModelForCausalLM
13
  from llmcompressor.transformers import oneshot
 
1
  ---
2
  license: mit
3
+ base_model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
4
+ pipeline_tag: text-generation
5
  library_name: transformers
6
+ tags:
7
+ - deepseek
8
+ - llama
9
+ - quantization
10
+ - fp8
11
+ - llm-compressor
12
+ - text-generation
13
  ---
14
 
15
+ # DeepSeek-R1-Distill-Llama-70B-FP8-Dynamic
16
+ FP8 dynamic quantization pipeline for DeepSeek-R1-Distill-Llama-70B using `llm_compressor`.
17
 
18
+ ---
19
+ ## Overview
20
+ - This repository demonstrates how to apply FP8 dynamic quantization to the DeepSeek-R1-Distill-Llama-70B model.
21
+ - The goal is to reduce memory usage and improve inference efficiency while maintaining strong performance for large language model tasks.
22
+
23
+ > ⚠️ This is a quantization pipeline, not a pre-quantized checkpoint.
24
+
25
+ ---
26
+
27
+ ## Usage
28
  ```python
29
  from transformers import AutoTokenizer, AutoModelForCausalLM
30
  from llmcompressor.transformers import oneshot