Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks, I can run Qwen 3.6 27B with vllm, but I was curious about antirez tool.


Have you had it getting stuck in endless loops maybe ~10-20% of the invocations? Seems it happens for both the responses and chatcompletion APIs, and no matter what inference parameters I try it happens at least for 1/10 of the requests, I've tried every compatible vLLM version + currently using it from git (#main) yet the issue persists.

Seems to happen with various quantizations too, even the NVFP4 versions and any others, so seems like a deeper issue to me, or hardware incompatible perhaps.


There’s a fixed version out there with corrected templates.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: