List Models | Venice API Docs

/api/v1/models

curl --request GET \
  --url https://api.venice.ai/api/v1/models \
  --header 'Authorization: Bearer <token>'

{
  "data": [
    {
      "created": 1727966436,
      "id": "llama-3.2-3b",
      "model_spec": {
        "availableContextTokens": 131072,
        "capabilities": {
          "optimizedForCode": false,
          "quantization": "fp16",
          "supportsAudioInput": false,
          "supportsFunctionCalling": true,
          "supportsLogProbs": true,
          "supportsMultipleImages": false,
          "supportsReasoning": false,
          "supportsReasoningEffort": false,
          "supportsResponseSchema": true,
          "supportsTeeAttestation": false,
          "supportsE2EE": false,
          "supportsVision": false,
          "supportsVideoInput": false,
          "supportsWebSearch": true,
          "supportsXSearch": false
        },
        "constraints": {
          "temperature": {
            "default": 0.8
          },
          "top_p": {
            "default": 0.9
          }
        },
        "description": "Compact and efficient model for quick responses and lighter workloads.",
        "name": "Llama 3.2 3B",
        "modelSource": "https://huggingface.co/meta-llama/Llama-3.2-3B",
        "offline": false,
        "privacy": "private",
        "pricing": {
          "input": {
            "usd": 0.15,
            "diem": 0.15
          },
          "output": {
            "usd": 0.6,
            "diem": 0.6
          }
        },
        "traits": [
          "fastest"
        ]
      },
      "object": "model",
      "owned_by": "venice.ai",
      "type": "text"
    }
  ],
  "type": "text"
}

GET

models

/api/v1/models

curl --request GET \
  --url https://api.venice.ai/api/v1/models \
  --header 'Authorization: Bearer <token>'

{
  "data": [
    {
      "created": 1727966436,
      "id": "llama-3.2-3b",
      "model_spec": {
        "availableContextTokens": 131072,
        "capabilities": {
          "optimizedForCode": false,
          "quantization": "fp16",
          "supportsAudioInput": false,
          "supportsFunctionCalling": true,
          "supportsLogProbs": true,
          "supportsMultipleImages": false,
          "supportsReasoning": false,
          "supportsReasoningEffort": false,
          "supportsResponseSchema": true,
          "supportsTeeAttestation": false,
          "supportsE2EE": false,
          "supportsVision": false,
          "supportsVideoInput": false,
          "supportsWebSearch": true,
          "supportsXSearch": false
        },
        "constraints": {
          "temperature": {
            "default": 0.8
          },
          "top_p": {
            "default": 0.9
          }
        },
        "description": "Compact and efficient model for quick responses and lighter workloads.",
        "name": "Llama 3.2 3B",
        "modelSource": "https://huggingface.co/meta-llama/Llama-3.2-3B",
        "offline": false,
        "privacy": "private",
        "pricing": {
          "input": {
            "usd": 0.15,
            "diem": 0.15
          },
          "output": {
            "usd": 0.6,
            "diem": 0.6
          }
        },
        "traits": [
          "fastest"
        ]
      },
      "object": "model",
      "owned_by": "venice.ai",
      "type": "text"
    }
  ],
  "type": "text"
}

Quality-Tier Pricing

For image models that accept the optional quality parameter (currently gpt-image-2 and gpt-image-2-edit), the response exposes a per-quality price matrix under model_spec.pricing.quality. Each top-level key is a resolution tier (1K, 2K, 4K) and each nested key is a quality level (low, medium, high) carrying its own usd and diem price:

"pricing": {
  "resolutions": {
    "1K": { "usd": 0.27, "diem": 0.27 },
    "2K": { "usd": 0.51, "diem": 0.51 },
    "4K": { "usd": 0.84, "diem": 0.84 }
  },
  "quality": {
    "1K": {
      "low":    { "usd": 0.02, "diem": 0.02 },
      "medium": { "usd": 0.07, "diem": 0.07 },
      "high":   { "usd": 0.26, "diem": 0.26 }
    },
    "2K": {
      "low":    { "usd": 0.03, "diem": 0.03 },
      "medium": { "usd": 0.13, "diem": 0.13 },
      "high":   { "usd": 0.50, "diem": 0.50 }
    },
    "4K": {
      "low":    { "usd": 0.05, "diem": 0.05 },
      "medium": { "usd": 0.21, "diem": 0.21 },
      "high":   { "usd": 0.83, "diem": 0.83 }
    }
  }
}

pricing.resolutions is the legacy per-image schedule kept for backward compatibility. pricing.quality is the per-(resolution, quality) matrix that applies whenever the quality parameter is supported. Both fields are kept in the response so clients can detect quality support and surface the matrix in their own UIs.

Postman Collection

For additional examples, please see this Postman Collection.

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

type

Filter models by type. Use "all" to get all model types.

Available options:

asr,

embedding,

image,

music,

text,

tts,

upscale,

inpaint,

video

Example:

"text"

Response

data

object[]

required

List of available models

Show child attributes

object

enum<string>

required

Available options:

list

type

required

Type of models returned.

Available options:

asr,

embedding,

image,

music,

text,

tts,

upscale,

inpaint,

video

Example:

"text"

Generate Embeddings Compatibility Mapping

⌘I

Documentation Index

​Quality-Tier Pricing

​Postman Collection

Authorizations

Query Parameters

Response

Quality-Tier Pricing

Postman Collection