Skip to content

Compat layer: qwen-compat

GitHub

Compatibility test matrix and upstream bug fixes for running Qwen3.6 on consumer-grade local inference. This is not a library -- it's a collection of test scripts, upstream PRs, and a pass/fail grid across backends.

Bug tracker

Bug Backend Status Notes
mmproj clip runner missing qwen35moe Ollama PR submitted (#15899) Community GGUFs with separate mmproj files fail with "unknown model architecture: 'qwen35moe'"
NVFP4 corrupted K-projection weights Ollama PR submitted (#15902) in_proj_qkv and in_proj_z need BF16 exemption during NVFP4 quantization
Format ignored with think=false Ollama PR submitted (#15901) Structured output format not applied when thinking is disabled
preserve_thinking on 27B GGUF llama.cpp Resolved upstream Works on HEAD (build b8985). No PR needed.
IQ4 Metal throughput regression llama.cpp Resolved upstream llama.cpp #21655 closed
TurboQuant + speculative decoding vLLM Open (#40831) Degenerate token loops. Not an Ollama/llama.cpp issue. Detected and blocked by qwen3.6-mtp tuner.

Test matrix

Clone the repo and run the compatibility matrix:

git clone https://github.com/ArkaD171717/Qwen3.6-Compat
cd Qwen3.6-Compat
./benchmark/compat_matrix.sh --backend ollama
./benchmark/compat_matrix.sh --backend llamacpp

The matrix tests text generation, preserve_thinking, and quantization throughput across vLLM, SGLang, Ollama, and llama.cpp backends.