ElliotGao

tclf90

65 1 18

AI & ML interests

None yet

Recent Activity

new activity 3 days ago

QuantTrio/GLM-5.2-Int4-Int8Mix:GLM-5.2-Int4-Int8Mix generates only "!!!!!" on H100 8×80GB with vLLM 0.23.0/0.24.0 — IndexShare sparse indexer K-cache never populated during prefill

liked a model 9 days ago

festr2/GLM-5.2-Int8Mix-NVFP4

new activity 10 days ago

QuantTrio/GLM-5.2-Int4-Int8Mix:AWQ 4bit

View all activity

Organizations

New activity in QuantTrio/GLM-5.2-Int4-Int8Mix 3 days ago

GLM-5.2-Int4-Int8Mix generates only "!!!!!" on H100 8×80GB with vLLM 0.23.0/0.24.0 — IndexShare sparse indexer K-cache never populated during prefill

#3 opened 4 days ago by

kinggenguo

New activity in QuantTrio/GLM-5.2-Int4-Int8Mix 10 days ago

AWQ 4bit

#2 opened 10 days ago by

MatthieuZ

New activity in QuantTrio/GLM-4.7-AWQ about 2 months ago

Revert remplate

#8 opened about 2 months ago by

s-yanev

Support for structured output

#7 opened about 2 months ago by

s-yanev

New activity in QuantTrio/DeepSeek-V3.1-AWQ-Lite about 2 months ago

Fix chat_template crash when assistant message omits the `content` key

#5 opened about 2 months ago by

qgallouedec

New activity in QuantTrio/DeepSeek-V3.2-Exp-AWQ about 2 months ago

Fix chat_template crash when assistant message omits the `content` key

#4 opened about 2 months ago by

qgallouedec

New activity in QuantTrio/DeepSeek-V3.1-AWQ about 2 months ago

Fix chat_template crash when assistant message omits the `content` key

#5 opened about 2 months ago by

qgallouedec

New activity in QuantTrio/Qwen3.6-35B-A3B-AWQ 2 months ago

Any plans for a calibration-based AWQ build for better long-context stability?

#6 opened 2 months ago by

hyunw55

PPLX or KLD, or other benchmark

#4 opened 2 months ago by

HenkTenk

New activity in QuantTrio/GLM-5-AWQ 3 months ago

[Request] Great work! Do you have plans to also create GLM-5.1-AWQ?

🤗 1

#6 opened 3 months ago by

ag1988

New activity in QuantTrio/Qwen3.5-122B-A10B-AWQ 3 months ago

CUDA version 13?

#1 opened 3 months ago by

pathosethoslogos

New activity in QuantTrio/gemma-4-31B-it-AWQ 3 months ago

Request for awq of the gemma 4 26B A4B MoE

#1 opened 3 months ago by

rks2302

New activity in QuantTrio/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-AWQ 3 months ago

AWQ 4/5/6-bit request for Qwopus3.5-27B-v3

🚀❤️ 3

#2 opened 3 months ago by

celikburak

New activity in QuantTrio/Qwen3.5-27B-AWQ 3 months ago

AWQ 4-bit version of this Opus-Distilled-v2 model?

#5 opened 3 months ago by

celikburak

New activity in QuantTrio/Qwen3.5-27B-AWQ 4 months ago

--max-model-len 32768 seems a bit too small for agent use cases ?

#3 opened 4 months ago by

edwarddukewu

My personal vLLM launch cmd on my old personal 2x3090 workstation

#1 opened 4 months ago by

tclf90

New activity in QuantTrio/Qwen3.5-35B-A3B-AWQ 4 months ago

Can't get vLLM running on 1xRTX 4090

#1 opened 4 months ago by

slyfox1186

New activity in cyankiwi/Qwen3.5-27B-AWQ-4bit 4 months ago

Easy to fall into infinite loop

👍 1

#2 opened 4 months ago by

dwaynedu

New activity in QuantTrio/GLM-5-AWQ 4 months ago

GLM-5-AWQ vLLM 部署指南

👍 1

#2 opened 4 months ago by

CharlesChen2023

Great work

#1 opened 4 months ago by

JoeyHwong

ElliotGao

AI & ML interests

Recent Activity

Organizations

tclf90's activity

GLM-5.2-Int4-Int8Mix generates only "!!!!!" on H100 8×80GB with vLLM 0.23.0/0.24.0 — IndexShare sparse indexer K-cache never populated during prefill

AWQ 4bit

Revert remplate

Support for structured output

Fix chat_template crash when assistant message omits the `content` key

Fix chat_template crash when assistant message omits the `content` key

Fix chat_template crash when assistant message omits the `content` key

Any plans for a calibration-based AWQ build for better long-context stability?

PPLX or KLD, or other benchmark

[Request] Great work! Do you have plans to also create GLM-5.1-AWQ?

CUDA version 13?

Request for awq of the gemma 4 26B A4B MoE

AWQ 4/5/6-bit request for Qwopus3.5-27B-v3

AWQ 4-bit version of this Opus-Distilled-v2 model?

--max-model-len 32768 seems a bit too small for agent use cases ?

My personal vLLM launch cmd on my old personal 2x3090 workstation

Can't get vLLM running on 1xRTX 4090

Easy to fall into infinite loop

GLM-5-AWQ vLLM 部署指南

Great work