Databricks SDK for JavaScript
    Preparing search index...
    interface ServedModel {
        burstScalingEnabled?: boolean;
        creationTimestamp?: bigint;
        creator?: string;
        entityName?: string;
        entityVersion?: string;
        environmentVars?: Record<string, string>;
        externalModel?: ExternalModel;
        foundationModel?: FoundationModel;
        instanceProfileArn?: string;
        maxProvisionedConcurrency?: number;
        maxProvisionedThroughput?: number;
        minProvisionedConcurrency?: number;
        minProvisionedThroughput?: number;
        modelName?: string;
        modelVersion?: string;
        name?: string;
        provisionedModelUnits?: bigint;
        scaleToZeroEnabled?: boolean;
        state?: ServedModelState;
        workloadSize?: string;
    }
    Index

    Properties

    burstScalingEnabled?: boolean

    Whether burst scaling is enabled. When enabled (default), the endpoint can automatically scale up beyond provisioned capacity to handle traffic spikes. When disabled, the endpoint maintains fixed capacity at provisioned_model_units.

    creationTimestamp?: bigint
    creator?: string
    entityName?: string

    The name of the entity to be served. The entity may be a model in the Databricks Model Registry, a model in the Unity Catalog (UC), or a function of type FEATURE_SPEC in the UC. If it is a UC object, the full name of the object should be given in the form of catalog_name.schema_name.model_name.

    entityVersion?: string
    environmentVars?: Record<string, string>

    An object containing a set of optional, user-specified environment variable key-value pairs used for serving this entity. Note: this is an experimental feature and subject to change. Example entity environment variables that refer to secrets: {"OPENAI_API_KEY": "{{secrets/my_scope/my_key}}", "DATABRICKS_TOKEN": "{{secrets/my_scope2/my_key2}}"}

    externalModel?: ExternalModel

    The external model to be served. NOTE: Only one of external_model and (entity_name, entity_version, workload_size, workload_type, and scale_to_zero_enabled) can be specified with the latter set being used for custom model serving for a registered model. For an existing endpoint with external_model, it cannot be updated to an endpoint without external_model. If the endpoint is created without external_model, users cannot update it to add external_model later. The task type of all external models within an endpoint must be the same.

    foundationModel?: FoundationModel
    instanceProfileArn?: string

    ARN of the instance profile that the served entity uses to access AWS resources.

    maxProvisionedConcurrency?: number

    The maximum provisioned concurrency that the endpoint can scale up to. Do not use if workload_size is specified.

    maxProvisionedThroughput?: number

    The maximum tokens per second that the endpoint can scale up to.

    minProvisionedConcurrency?: number

    The minimum provisioned concurrency that the endpoint can scale down to. Do not use if workload_size is specified.

    minProvisionedThroughput?: number

    The minimum tokens per second that the endpoint can scale down to.

    modelName?: string
    modelVersion?: string
    name?: string

    The name of a served entity. It must be unique across an endpoint. A served entity name can consist of alphanumeric characters, dashes, and underscores. If not specified for an external model, this field defaults to external_model.name, with '.' and ':' replaced with '-', and if not specified for other entities, it defaults to entity_name-entity_version.

    provisionedModelUnits?: bigint

    The number of model units provisioned.

    scaleToZeroEnabled?: boolean

    Whether the compute resources for the served entity should scale down to zero.

    workloadSize?: string

    The workload size of the served entity. The workload size corresponds to a range of provisioned concurrency that the compute autoscales between. A single unit of provisioned concurrency can process one request at a time. Valid workload sizes are "Small" (4 - 4 provisioned concurrency), "Medium" (8 - 16 provisioned concurrency), and "Large" (16 - 64 provisioned concurrency). Additional custom workload sizes can also be used when available in the workspace. If scale-to-zero is enabled, the lower bound of the provisioned concurrency for each workload size is 0. Do not use if min_provisioned_concurrency and max_provisioned_concurrency are specified.