Session layer: qwen-think
Manages Qwen3.6's thinking state across sessions, backends, and frameworks. Normalizes the three different invocation patterns, routes requests to the right thinking mode with atomic sampling parameter swaps, and budgets context so it doesn't blow past 128K.
What it does
- ThinkingSession with lifecycle and budget tracking
- Dynamic router: complexity classifier that picks think vs. no-think mode per request
- Backend normalizers for vLLM, DashScope, and llama.cpp (each has a different flag format)
- 128K context budget guard: prevents context exhaustion from unbounded thinking
- Atomic sampling param swap: thinking mode uses temp=0.6/top_p=0.95/top_k=20; instruct mode uses temp=0.7/top_p=0.80/top_k=20/presence_penalty=1.5
Install
Usage
from qwen_think import ThinkingSession
session = ThinkingSession(
model="Qwen/Qwen3.6-27B",
backend="vllm",
budget=200_000,
)
# Thinking mode
response = session.chat("Refactor this module for testability", thinking=True)
# Instruct mode
response = session.chat("What's the return type of foo()?", thinking=False)
# Let the router decide
response = session.chat("Explain merge sort", preserve=True)
Backend normalization
The same enable_thinking: false flag is passed differently depending on the backend:
| Backend | Where the flag goes |
|---|---|
| vLLM | extra_body: { enable_thinking: false } (nested) |
| DashScope | extra_body: { enable_thinking: false } (top-level) |
| llama.cpp | Server-side flag, not in the request body |
qwen-think handles this so your application code doesn't change when you switch backends.
Known upstream bugs this works around
- vLLM semantic-router #858:
use_reasoning: falseremoves the field instead of settingenable_thinking: false. Qwen3.6 thinks by default, so removing the field has no effect. - Ray Serve LLM v2.46:
enable_thinking: falsein HTTP body does not propagate to the model. - Qwen3.6 removed prompt-prefix toggling: The
/thinkand/no_thinkprompt prefixes from Qwen3 don't work on Qwen3.6. Any framework relying on prompt-level toggling is broken.