Qwen2.5-1.5B-Instruct-python

This version of Qwen2.5-1.5B-Instruct-python has been converted to run on the Axera NPU using w8a16 and w4a16 quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: 4.1

Feature

Support for longer contexts, in this sample it's 2.5k
Support context dialogue
System prompt kvcache is supported

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo : https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8

Pulsar2 Link, How to Convert LLM from Huggingface to axmodel

AXera NPU AXEngine LLM Runtime

AXera NPU AXCL LLM Runtime

Convert script

The follow show how to convert Qwen2.5-1.5B-Instruct-GPTQ-Int8

pulsar2 llm_build --input_path Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8  \
                  --output_path Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8-ctx-ax650 \
                  --hidden_state_type bf16 --kv_cache_len 2047 --prefill_len 128 \
                  --last_kv_cache_len 128 \
                  --last_kv_cache_len 256 \
                  --last_kv_cache_len 384 \
                  --last_kv_cache_len 512 \
                  --last_kv_cache_len 640 \
                  --last_kv_cache_len 768 \
                  --last_kv_cache_len 896 \
                  --last_kv_cache_len 1024 \
                  --chip AX650 -c 1 --parallel 8

Support Platform

AX650
- AX650N DEMO Board
- M4N-Dock(爱芯派Pro)
- M.2 Accelerator card
AX630C
- TBD

How to use

Download all files from this repository to the device

root@ax650:/mnt/qtang/llm-test/Qwen2.5-1.5B-Instruct-python# tree -L 1
.
├── chat.py
├── infer.py
├── infer_torch.py
├── Qwen2.5-1.5B-Instruct-GPTQ-Int8
├── Qwen2.5-1.5B-Instruct-GPTQ-Int8_axmodel
└── README.md

2 directories, 4 files

在 AXERA 650N 开发板上使用 python api 进行模型推理.

在当前目录执行以下命令:

$ python3 chat.py

当出现 prompt (输入 q 退出对话) >> 提示时输入文字, 等待模型输出, 具体示例如下:

$ python3 chat.py
...
The models have been loaded!
2025-07-21 14:23:46.137 | DEBUG    | __main__:<module>:143 - >>> 创建 LlamaChatSession >>>
>>> 系统提示: 你的名字叫小智(allen), 你是一个人畜无害的 AI 助手. 深圳市今天(4月1日)阴天, 愚人节, 气温在 14°C 至 19°C
之间, 微风.
2025-07-21 14:23:46.137 | INFO     | __main__:chat_loop:69 - Type 'q' to exit, Ctrl+c to stop current generation

prompt (输入 q 退出对话) >> 定义函数y=3x^3+2x+1,求解它的导数.
answer: >> 要找到函数 \( y = 3x^3 + 2x + 1 \) 的导数，我们需要对每个项分别求导，然后将它们相加起来。

1. 对 \( 3x^3 \) ���导，结果是 \( 3 \cdot 3x^{3-1} = 9x^2 \)。
2. 对 \( 2x \) ���导，结果是 \( 2 \cdot 1x^{1-1} = 2 \)。
3. 对常数项 \( 1 \) ���导，结果是 \( 0 \)。

将这些结果相加，我们得到：

\[ y' = 9x^2 + 2 \]

所以，函数 \( y = 3x^3 + 2x + 1 \) 的导数是 \( y' = 9x^2 + 2 \)。

prompt (输入 q 退出对话) >> 这个函数中自变量和因变量分别是什么?
answer: >> 在数学中，函数通常由两个变量组成：自变量（也称为输入变量）和因变量（也称为输出变量）。自变量是函数中的一个
  量，它的值决定了因变量的值。

在你提供的函数 \( y = 3x^3 + 2x + 1 \) 中：

- \( x \) 是自变量。
- \( y \) 是因变量。

自变量 \( x \) 的值决定了因变量 \( y \) 的值。例如，如果你给 \( x \) ���值为 2，那么 \( y \) ��等于 \( 3(2)^3 + 2(2) +
 1 = 24 + 4 + 1 = 29 \)。

因此，这个函数描述了一个关于 \( x \) 和 \( y \) 的关系，其中 \( x \) 是自变量，而 \( y \) 是因变量。通过改变 \( x \)
  值，你可以计算出相应的 \( y \) ���。

prompt (输入 q 退出对话) >> 这个函数中最高幂次和最低幂次分别是多少?
answer: >> 在函数 \( y = 3x^3 + 2x + 1 \) 中，最高次幂（最高幂次）是 \( x^3 \)，因此最高幂次是 3。

最低次幂（最低幂次）是 \( x^0 \)，因为 \( x^0 = 1 \) 对于任何 \( x \) ���成立，所以最低幂次是 0。

因此，这个函数的最高幂次是 3，最低幂次是 0。最高幂次和最低幂次的差值是 \( 3 - 0 = 3 \)。这意味着函数的图形是一个三次多
  式，它有一个顶点（如果最高幂次是偶数）或一个拐点（如果最高幂次是奇数）。在这个例子中，由于最高幂次是奇数，函数的图形
  有一个拐点。

当上下文窗口达到上限, 可以输入 reset 命令重置, 例如:

prompt (输入 q 退出对话) >> reset
上下文已重置
prompt (输入 q 退出对话) >> 你是谁?今天天气如何?
answer: >> 我是小智,一名人工智能助手。今天是阴天,愚人节,气温在14°C至19°C之间,微风。

Downloads last month: -; Downloads are not tracked for this model. How to track