You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add support for nemotron-3-nano-30b reasoning models
This commit adds full support for nemotron-3-nano-30b-a3b models with
reasoning/thinking capabilities, addressing differences between 9b-v2,
30b, and 49b model variants based on comprehensive testing.
Copy file name to clipboardExpand all lines: docs/enable-nemotron-thinking.md
+43-6Lines changed: 43 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -112,11 +112,17 @@ When the thinking budget is enabled, the model monitors the token count within t
112
112
113
113
As of NIM version 1.12, the Thinking Budget feature is supported on the following models:
114
114
115
-
- **nvidia/nvidia-nemotron-nano-9b-v2**
116
-
- **nvidia/nemotron-3-nano-30b-a3b**
115
+
- **nvidia-nemotron-nano-9b-v2**
116
+
- **nvidia/nemotron-3-nano-30b-a3b** (also accessible as `nvidia/nemotron-3-nano`)
117
117
118
118
For the latest supported models, refer to the [NIM Thinking Budget Control documentation](https://docs.nvidia.com/nim/large-language-models/latest/thinking-budget-control.html).
119
119
120
+
> **Note:** The model `nvidia/nemotron-3-nano` is an alias that can be used interchangeably with `nvidia/nemotron-3-nano-30b-a3b`. Both refer to the same underlying model.
121
+
>
122
+
> **Important - Model Naming:**
123
+
> - **For locally deployed NIMs:** Use model name `nvidia/nemotron-3-nano`
124
+
> - **For NVIDIA-hosted models:** Use model name `nvidia/nemotron-3-nano-30b-a3b`
125
+
120
126
### Enabling Thinking Budget on RAG
121
127
122
128
After enabling the reasoning as per the steps mentioned above, enable the thinking budget feature in RAG by including the following parameters in your API request:
@@ -126,13 +132,29 @@ After enabling the reasoning as per the steps mentioned above, enable the thinki
126
132
| `min_thinking_tokens` | 1 | Minimum number of thinking tokens to allocate for reasoning models. |
127
133
| `max_thinking_tokens` | 8192 | Maximum number of thinking tokens to allocate for reasoning models. |
128
134
129
-
> **Note for `nvidia/nemotron-3-nano-30b-a3b`**
130
-
> This model only uses the `max_thinking_tokens` parameter.
131
-
> - `min_thinking_tokens` is ignored for this model.
135
+
> **Note for `nvidia/nemotron-3-nano-30b-a3b` and `nvidia/nemotron-3-nano`**
136
+
> These models only use the `max_thinking_tokens` parameter.
137
+
> - `min_thinking_tokens` is ignored for these models.
132
138
> - Thinking budget is enabled by passing a positive `max_thinking_tokens` value in the request.
139
+
> - The RAG blueprint automatically handles the model-specific parameter mapping internally (`max_thinking_tokens` → `reasoning_budget`).
140
+
> - Unlike `nvidia-nemotron-nano-9b-v2`, these models return reasoning in a separate `reasoning_content` field rather than using `<think>` tags.
141
+
>
142
+
> **Controlling Reasoning for nemotron-3-nano:**
143
+
> - Set `ENABLE_NEMOTRON_3_NANO_THINKING=true` (default) to enable reasoning/thinking mode
144
+
> - Set `ENABLE_NEMOTRON_3_NANO_THINKING=false` to disable reasoning mode
145
+
> - This controls the `enable_thinking` flag in `chat_template_kwargs`
146
+
>
147
+
> **Model Behavior Differences:**
148
+
>
149
+
> | Model | Reasoning Control | Reasoning Output | Token Budget Parameter |
> | `nvidia-nemotron-nano-9b-v2` | `min_thinking_tokens`, `max_thinking_tokens` | In `content` field with `<think>` tags | `min_thinking_tokens`, `max_thinking_tokens` |
152
+
> | `nvidia/nemotron-3-nano-30b-a3b` | `ENABLE_NEMOTRON_3_NANO_THINKING` env var | In `reasoning_content` field | `reasoning_budget` (mapped from `max_thinking_tokens`) |
153
+
> | `nvidia/llama-3.3-nemotron-super-49b-v1.5` | System prompt (`/think` or `/no_think`) | In `content` field with `<think>` tags | N/A (controlled by prompt) |
133
154
134
155
**Example API requests:**
135
156
157
+
**For nvidia-nemotron-nano-9b-v2:**
136
158
```json
137
159
{
138
160
"messages": [
@@ -143,10 +165,25 @@ After enabling the reasoning as per the steps mentioned above, enable the thinki
143
165
],
144
166
"min_thinking_tokens": 1,
145
167
"max_thinking_tokens": 8192,
146
-
"model": "nvidia/nvidia-nemotron-nano-9b-v2"
168
+
"model": "nvidia-nemotron-nano-9b-v2"
169
+
}
170
+
```
171
+
172
+
**For nemotron-3-nano (locally deployed):**
173
+
```json
174
+
{
175
+
"messages": [
176
+
{
177
+
"role": "user",
178
+
"content": "What is the FY2017 operating cash flow ratio for Adobe?"
0 commit comments