OpenAI 相容端點 · 130+ 模型 · 2026-05-10
NVIDIA 提供 OpenAI-compatible API,可用於 LLM 推理、embedding、視覺模型等。
端點統一:https://integrate.api.nvidia.com/v1
API key 格式:nvapi-...
# 環境變數
export NVIDIA_API_KEY="nvapi-xxxxx"
# 或存檔案
echo "nvapi-xxxxx" > ~/.config/nvidia/api-key
chmod 600 ~/.config/nvidia/api-key
每次請求帶在 Header:
Authorization: Bearer $NVIDIA_API_KEY
curl -s https://integrate.api.nvidia.com/v1/models \
-H "Authorization: Bearer $NVIDIA_API_KEY" \
| jq '.data[].id'
回傳 130+ 個模型(含重複版本約 137 筆),分佈:
完全相容 OpenAI API 格式:
curl -s https://integrate.api.nvidia.com/v1/chat/completions \
-H "Authorization: Bearer $NVIDIA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta/llama-3.3-70b-instruct",
"messages": [
{"role": "system", "content": "你是專業助手"},
{"role": "user", "content": "解釋什麼是 agent flywheel"}
],
"max_tokens": 500,
"temperature": 0.3
}'
參數說明:
|------|------|--------|
回應範例:
{
"id": "chatcmpl-xxx",
"model": "meta/llama-3.3-70b-instruct",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Agent flywheel 是指..."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 41,
"completion_tokens": 124,
"total_tokens": 165
}
}
不同模型的 response 結構略有不同:
一般模型(Qwen / Llama / Mistral 等):
message.content → 答案message.reasoning_content → 無Kimi K2 Thinking:
message.content → 最終答案(有時為空)message.reasoning / message.reasoning_content → 思考過程(含 <think> tag)建議的通用解析方式:
import json
def parse_response(raw):
d = json.loads(raw)
msg = d['choices'][0]['message']
content = msg.get('content') or msg.get('reasoning_content') or ''
tokens = d['usage']['total_tokens']
return content, tokens
|------|---------|------|
|---|--------|------------|--------|
快速測試模型:
nvidia_test() {
local model="$1"
local prompt="${2:-Say hello}"
curl -s https://integrate.api.nvidia.com/v1/chat/completions \
-H "Authorization: Bearer $(cat ~/.config/nvidia/api-key)" \
-H "Content-Type: application/json" \
-d "{\"model\":\"$model\",\"messages\":[{\"role\":\"user\",\"content\":\"$prompt\"}],\"max_tokens\":100}" \
| jq -r '.choices[0].message.content // .choices[0].message.reasoning_content'
}
# 用法
nvidia_test "qwen/qwen3.5-122b-a10b" "用一句話說明什麼是 MoE"
列出可用模型(去重):
curl -s https://integrate.api.nvidia.com/v1/models \
-H "Authorization: Bearer $(cat ~/.config/nvidia/api-key)" \
| jq -r '[.data[].id] | unique | .[]'
Generated by Hermes Agent M3