Interface ServedModel

interface ServedModel {
    burstScalingEnabled?: boolean;
    creationTimestamp?: bigint;
    creator?: string;
    entityName?: string;
    entityVersion?: string;
    environmentVars?: Record<string, string>;
    externalModel?: ExternalModel;
    foundationModel?: FoundationModel;
    instanceProfileArn?: string;
    maxProvisionedConcurrency?: number;
    maxProvisionedThroughput?: number;
    minProvisionedConcurrency?: number;
    minProvisionedThroughput?: number;
    modelName?: string;
    modelVersion?: string;
    name?: string;
    provisionedModelUnits?: bigint;
    scaleToZeroEnabled?: boolean;
    state?: ServedModelState;
    workloadSize?: string;
}

Properties

`Optional`burstScalingEnabled

burstScalingEnabled?: boolean

Whether burst scaling is enabled. When enabled (default), the endpoint can automatically scale up beyond provisioned capacity to handle traffic spikes. When disabled, the endpoint maintains fixed capacity at provisioned_model_units.

`Optional`creationTimestamp

creationTimestamp?: bigint

`Optional`creator

creator?: string

`Optional`entityName

entityName?: string

The name of the entity to be served. The entity may be a model in the Databricks Model Registry, a model in the Unity Catalog (UC), or a function of type FEATURE_SPEC in the UC. If it is a UC object, the full name of the object should be given in the form of catalog_name.schema_name.model_name.

`Optional`entityVersion

entityVersion?: string

`Optional`environmentVars

environmentVars?: Record<string, string>

An object containing a set of optional, user-specified environment variable key-value pairs used for serving this entity. Note: this is an experimental feature and subject to change. Example entity environment variables that refer to secrets: {"OPENAI_API_KEY": "{{secrets/my_scope/my_key}}", "DATABRICKS_TOKEN": "{{secrets/my_scope2/my_key2}}"}

`Optional`externalModel

externalModel?: ExternalModel

The external model to be served. NOTE: Only one of external_model and (entity_name, entity_version, workload_size, workload_type, and scale_to_zero_enabled) can be specified with the latter set being used for custom model serving for a registered model. For an existing endpoint with external_model, it cannot be updated to an endpoint without external_model. If the endpoint is created without external_model, users cannot update it to add external_model later. The task type of all external models within an endpoint must be the same.

`Optional`foundationModel

foundationModel?: FoundationModel

`Optional`instanceProfileArn

instanceProfileArn?: string

ARN of the instance profile that the served entity uses to access AWS resources.

`Optional`maxProvisionedConcurrency

maxProvisionedConcurrency?: number

The maximum provisioned concurrency that the endpoint can scale up to. Do not use if workload_size is specified.

`Optional`maxProvisionedThroughput

maxProvisionedThroughput?: number

The maximum tokens per second that the endpoint can scale up to.

`Optional`minProvisionedConcurrency

minProvisionedConcurrency?: number

The minimum provisioned concurrency that the endpoint can scale down to. Do not use if workload_size is specified.

`Optional`minProvisionedThroughput

minProvisionedThroughput?: number

The minimum tokens per second that the endpoint can scale down to.

`Optional`modelName

modelName?: string

`Optional`modelVersion

modelVersion?: string

`Optional`name

name?: string

The name of a served entity. It must be unique across an endpoint. A served entity name can consist of alphanumeric characters, dashes, and underscores. If not specified for an external model, this field defaults to external_model.name, with '.' and ':' replaced with '-', and if not specified for other entities, it defaults to entity_name-entity_version.

`Optional`provisionedModelUnits

provisionedModelUnits?: bigint

The number of model units provisioned.

`Optional`scaleToZeroEnabled

scaleToZeroEnabled?: boolean

Whether the compute resources for the served entity should scale down to zero.

`Optional`state

state?: ServedModelState

`Optional`workloadSize

workloadSize?: string

The workload size of the served entity. The workload size corresponds to a range of provisioned concurrency that the compute autoscales between. A single unit of provisioned concurrency can process one request at a time. Valid workload sizes are "Small" (4 - 4 provisioned concurrency), "Medium" (8 - 16 provisioned concurrency), and "Large" (16 - 64 provisioned concurrency). Additional custom workload sizes can also be used when available in the workspace. If scale-to-zero is enabled, the lower bound of the provisioned concurrency for each workload size is 0. Do not use if min_provisioned_concurrency and max_provisioned_concurrency are specified.

Interface ServedModel

Index

Properties

Properties

`Optional`burstScalingEnabled

`Optional`creationTimestamp

`Optional`creator

`Optional`entityName

`Optional`entityVersion

`Optional`environmentVars

`Optional`externalModel

`Optional`foundationModel

`Optional`instanceProfileArn

`Optional`maxProvisionedConcurrency

`Optional`maxProvisionedThroughput

`Optional`minProvisionedConcurrency

`Optional`minProvisionedThroughput

`Optional`modelName

`Optional`modelVersion

`Optional`name

`Optional`provisionedModelUnits

`Optional`scaleToZeroEnabled

`Optional`state

`Optional`workloadSize

Settings

On This Page

Interface ServedModel

Index

Properties

Properties

OptionalburstScalingEnabled

OptionalcreationTimestamp

Optionalcreator

OptionalentityName

OptionalentityVersion

OptionalenvironmentVars

OptionalexternalModel

OptionalfoundationModel

OptionalinstanceProfileArn

OptionalmaxProvisionedConcurrency

OptionalmaxProvisionedThroughput

OptionalminProvisionedConcurrency

OptionalminProvisionedThroughput

OptionalmodelName

OptionalmodelVersion

Optionalname

OptionalprovisionedModelUnits

OptionalscaleToZeroEnabled

Optionalstate

OptionalworkloadSize

Settings

On This Page

`Optional`burstScalingEnabled

`Optional`creationTimestamp

`Optional`creator

`Optional`entityName

`Optional`entityVersion

`Optional`environmentVars

`Optional`externalModel

`Optional`foundationModel

`Optional`instanceProfileArn

`Optional`maxProvisionedConcurrency

`Optional`maxProvisionedThroughput

`Optional`minProvisionedConcurrency

`Optional`minProvisionedThroughput

`Optional`modelName

`Optional`modelVersion

`Optional`name

`Optional`provisionedModelUnits

`Optional`scaleToZeroEnabled

`Optional`state

`Optional`workloadSize