Jobs¶

Package: databricks.bundles.jobs

Classes¶

class Adlsgen2Info¶

destination: str¶: abfss destination, e.g. abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<directory-name>.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class AutoScale¶

max_workers: int | None = None¶: The maximum number of workers to which the cluster can scale up when overloaded. Note that max_workers must be strictly greater than min_workers.

min_workers: int | None = None¶: The minimum number of workers to which the cluster can scale down when underutilized. It is also the initial number of workers the cluster will have after creation.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class AwsAttributes¶

availability: AwsAvailability | None = None¶

ebs_volume_count: int | None = None¶

The number of volumes launched for each instance. Users can choose up to 10 volumes. This feature is only enabled for supported node types. Legacy node types cannot specify custom EBS volumes. For node types with no instance store, at least one EBS volume needs to be specified; otherwise, cluster creation will fail.

These EBS volumes will be mounted at /ebs0, /ebs1, and etc. Instance store volumes will be mounted at /local_disk0, /local_disk1, and etc.

If EBS volumes are attached, Databricks will configure Spark to use only the EBS volumes for scratch storage because heterogenously sized scratch devices can lead to inefficient disk utilization. If no EBS volumes are attached, Databricks will configure Spark to use instance store volumes.

Please note that if EBS volumes are specified, then the Spark configuration spark.local.dir will be overridden.

ebs_volume_iops: int | None = None¶: If using gp3 volumes, what IOPS to use for the disk. If this is not set, the maximum performance of a gp2 volume with the same volume size will be used.

ebs_volume_size: int | None = None¶: The size of each EBS volume (in GiB) launched for each instance. For general purpose SSD, this value must be within the range 100 - 4096. For throughput optimized HDD, this value must be within the range 500 - 4096.

ebs_volume_throughput: int | None = None¶: If using gp3 volumes, what throughput to use for the disk. If this is not set, the maximum performance of a gp2 volume with the same volume size will be used.

ebs_volume_type: EbsVolumeType | None = None¶

first_on_demand: int | None = None¶: The first first_on_demand nodes of the cluster will be placed on on-demand instances. If this value is greater than 0, the cluster driver node in particular will be placed on an on-demand instance. If this value is greater than or equal to the current cluster size, all nodes will be placed on on-demand instances. If this value is less than the current cluster size, first_on_demand nodes will be placed on on-demand instances and the remainder will be placed on availability instances. Note that this value does not affect cluster size and cannot currently be mutated over the lifetime of a cluster.

instance_profile_arn: str | None = None¶

Nodes for this cluster will only be placed on AWS instances with this instance profile. If ommitted, nodes will be placed on instances without an IAM instance profile. The instance profile must have previously been added to the Databricks environment by an account administrator.

This feature may only be available to certain customer plans.

If this field is ommitted, we will pull in the default from the conf if it exists.

spot_bid_price_percent: int | None = None¶

The bid price for AWS spot instances, as a percentage of the corresponding instance type’s on-demand price. For example, if this field is set to 50, and the cluster needs a new r3.xlarge spot instance, then the bid price is half of the price of on-demand r3.xlarge instances. Similarly, if this field is set to 200, the bid price is twice the price of on-demand r3.xlarge instances. If not specified, the default value is 100. When spot instances are requested for this cluster, only spot instances whose bid price percentage matches this field will be considered. Note that, for safety, we enforce this field to be no more than 10000.

The default value and documentation here should be kept consistent with CommonConf.defaultSpotBidPricePercent and CommonConf.maxSpotBidPricePercent.

zone_id: str | None = None¶: Identifier for the availability zone/datacenter in which the cluster resides. This string will be of a form like “us-west-2a”. The provided availability zone must be in the same region as the Databricks deployment. For example, “us-west-2a” is not a valid zone id if the Databricks deployment resides in the “us-east-1” region. This is an optional field at cluster creation, and if not specified, a default zone will be used. If the zone specified is “auto”, will try to place cluster in a zone with high availability, and will retry placement in a different AZ if there is not enough capacity. The list of available zones as well as the default value can be found by using the List Zones method.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class AwsAvailability¶

Availability type used for all subsequent nodes past the first_on_demand ones.

Note: If first_on_demand is zero, this availability type will be used for the entire cluster.

SPOT = 'SPOT'¶

ON_DEMAND = 'ON_DEMAND'¶

SPOT_WITH_FALLBACK = 'SPOT_WITH_FALLBACK'¶

class AzureAttributes¶

availability: AzureAvailability | None = None¶

first_on_demand: int | None = None¶: The first first_on_demand nodes of the cluster will be placed on on-demand instances. This value should be greater than 0, to make sure the cluster driver node is placed on an on-demand instance. If this value is greater than or equal to the current cluster size, all nodes will be placed on on-demand instances. If this value is less than the current cluster size, first_on_demand nodes will be placed on on-demand instances and the remainder will be placed on availability instances. Note that this value does not affect cluster size and cannot currently be mutated over the lifetime of a cluster.

log_analytics_info: LogAnalyticsInfo | None = None¶: Defines values necessary to configure and run Azure Log Analytics agent

spot_bid_max_price: float | None = None¶: The max bid price to be used for Azure spot instances. The Max price for the bid cannot be higher than the on-demand price of the instance. If not specified, the default value is -1, which specifies that the instance cannot be evicted on the basis of price, and only on the basis of availability. Further, the value should > 0 or -1.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class AzureAvailability¶

Availability type used for all subsequent nodes past the first_on_demand ones. Note: If first_on_demand is zero (which only happens on pool clusters), this availability type will be used for the entire cluster.

SPOT_AZURE = 'SPOT_AZURE'¶

ON_DEMAND_AZURE = 'ON_DEMAND_AZURE'¶

SPOT_WITH_FALLBACK_AZURE = 'SPOT_WITH_FALLBACK_AZURE'¶

class CleanRoomsNotebookTask¶

clean_room_name: str¶: The clean room that the notebook belongs to.

notebook_name: str¶: Name of the notebook being run.

etag: str | None = None¶: Checksum to validate the freshness of the notebook resource (i.e. the notebook being run is the latest version). It can be fetched by calling the :method:cleanroomassets/get API.

notebook_base_parameters: dict[str, str]¶: Base parameters to be used for the clean room notebook job.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class ClientsTypes¶

jobs: bool | None = None¶: With jobs set, the cluster can be used for jobs

notebooks: bool | None = None¶: With notebooks set, this cluster can be used for notebooks

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class ClusterLogConf¶

dbfs: DbfsStorageInfo | None = None¶: destination needs to be provided. e.g. { “dbfs” : { “destination” : “dbfs:/home/cluster_log” } }

s3: S3StorageInfo | None = None¶: destination and either the region or endpoint need to be provided. e.g. { “s3”: { “destination” : “s3://cluster_log_bucket/prefix”, “region” : “us-west-2” } } Cluster iam role is used to access s3, please make sure the cluster iam role in instance_profile_arn has permission to write data to the s3 destination.

volumes: VolumesStorageInfo | None = None¶: destination needs to be provided. e.g. { “volumes” : { “destination” : “/Volumes/catalog/schema/volume/cluster_log” } }

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class ClusterSpec¶

apply_policy_default_values: bool | None = None¶: When set to true, fixed and default values from the policy will be used for fields that are omitted. When set to false, only fixed values from the policy will be applied.

autoscale: AutoScale | None = None¶: Parameters needed in order to automatically scale clusters up and down based on load. Note: autoscaling works best with DB runtime versions 3.0 or later.

autotermination_minutes: int | None = None¶: Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. Users can also set this value to 0 to explicitly disable automatic termination.

aws_attributes: AwsAttributes | None = None¶: Attributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values will be used.

azure_attributes: AzureAttributes | None = None¶: Attributes related to clusters running on Microsoft Azure. If not specified at cluster creation, a set of default values will be used.

cluster_log_conf: ClusterLogConf | None = None¶: The configuration for delivering spark logs to a long-term storage destination. Three kinds of destinations (DBFS, S3 and Unity Catalog volumes) are supported. Only one destination can be specified for one cluster. If the conf is given, the logs will be delivered to the destination every 5 mins. The destination of driver logs is $destination/$clusterId/driver, while the destination of executor logs is $destination/$clusterId/executor.

cluster_name: str | None = None¶: Cluster name requested by the user. This doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

custom_tags: dict[str, str]¶

Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS instances and EBS volumes) with these tags in addition to default_tags. Notes:

Currently, Databricks allows at most 45 custom tags
Clusters can only reuse cloud resources if the resources’ tags are a subset of the cluster tags

data_security_mode: DataSecurityMode | None = None¶

docker_image: DockerImage | None = None¶

driver_instance_pool_id: str | None = None¶: The optional ID of the instance pool for the driver of the cluster belongs. The pool cluster uses the instance pool with id (instance_pool_id) if the driver pool is not assigned.

driver_node_type_id: str | None = None¶: The node type of the Spark driver. Note that this field is optional; if unset, the driver node type will be set as the same value as node_type_id defined above.

enable_elastic_disk: bool | None = None¶: Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. This feature requires specific AWS permissions to function correctly - refer to the User Guide for more details.

enable_local_disk_encryption: bool | None = None¶: Whether to enable LUKS on cluster VMs’ local disks

gcp_attributes: GcpAttributes | None = None¶: Attributes related to clusters running on Google Cloud Platform. If not specified at cluster creation, a set of default values will be used.

init_scripts: list[InitScriptInfo]¶: The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. If cluster_log_conf is specified, init script logs are sent to <destination>/<cluster-ID>/init_scripts.

instance_pool_id: str | None = None¶: The optional ID of the instance pool to which the cluster belongs.

is_single_node: bool | None = None¶

This field can only be used when kind = CLASSIC_PREVIEW.

When set to true, Databricks will automatically set single node related custom_tags, spark_conf, and num_workers

node_type_id: str | None = None¶: This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call.

num_workers: int | None = None¶

Number of worker nodes that this cluster should have. A cluster has one Spark Driver and num_workers Executors for a total of num_workers + 1 Spark nodes.

Note: When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual current number of workers. For instance, if a cluster is resized from 5 to 10 workers, this field will immediately be updated to reflect the target size of 10 workers, whereas the workers listed in spark_info will gradually increase from 5 to 10 as the new nodes are provisioned.

policy_id: str | None = None¶: The ID of the cluster policy used to create the cluster if applicable.

runtime_engine: RuntimeEngine | None = None¶

single_user_name: str | None = None¶: Single user name if data_security_mode is SINGLE_USER

spark_conf: dict[str, str]¶: An object containing a set of optional, user-specified Spark configuration key-value pairs. Users can also pass in a string of extra JVM options to the driver and the executors via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively.

spark_env_vars: dict[str, str]¶

An object containing a set of optional, user-specified environment variable key-value pairs. Please note that key-value pair of the form (X,Y) will be exported as is (i.e., export X=’Y’) while launching the driver and workers.

In order to specify an additional set of SPARK_DAEMON_JAVA_OPTS, we recommend appending them to $SPARK_DAEMON_JAVA_OPTS as shown in the example below. This ensures that all default databricks managed environmental variables are included as well.

Example Spark environment variables: {“SPARK_WORKER_MEMORY”: “28000m”, “SPARK_LOCAL_DIRS”: “/local_disk0”} or {“SPARK_DAEMON_JAVA_OPTS”: “$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true”}

spark_version: str | None = None¶: The Spark version of the cluster, e.g. 3.3.x-scala2.11. A list of available Spark versions can be retrieved by using the :method:clusters/sparkVersions API call.

ssh_public_keys: list[str]¶: SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified.

use_ml_runtime: bool | None = None¶

This field can only be used when kind = CLASSIC_PREVIEW.

effective_spark_version is determined by spark_version (DBR release), this field use_ml_runtime, and whether node_type_id is gpu node or not.

workload_type: WorkloadType | None = None¶

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class ConditionTask¶

left: str¶: The left operand of the condition task. Can be either a string value or a job state or parameter reference.

op: ConditionTaskOp¶

EQUAL_TO, NOT_EQUAL operators perform string comparison of their operands. This means that “12.0” == “12” will evaluate to false.
GREATER_THAN, GREATER_THAN_OR_EQUAL, LESS_THAN, LESS_THAN_OR_EQUAL operators perform numeric comparison of their operands. “12.0” >= “12” will evaluate to true, “10.0” >= “12” will evaluate to false.

The boolean comparison to task values can be implemented with operators EQUAL_TO, NOT_EQUAL. If a task value was set to a boolean value, it will be serialized to “true” or “false” for the comparison.

right: str¶: The right operand of the condition task. Can be either a string value or a job state or parameter reference.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class ConditionTaskOp¶

EQUAL_TO, NOT_EQUAL operators perform string comparison of their operands. This means that “12.0” == “12” will evaluate to false.
GREATER_THAN, GREATER_THAN_OR_EQUAL, LESS_THAN, LESS_THAN_OR_EQUAL operators perform numeric comparison of their operands. “12.0” >= “12” will evaluate to true, “10.0” >= “12” will evaluate to false.

The boolean comparison to task values can be implemented with operators EQUAL_TO, NOT_EQUAL. If a task value was set to a boolean value, it will be serialized to “true” or “false” for the comparison.

EQUAL_TO = 'EQUAL_TO'¶

GREATER_THAN = 'GREATER_THAN'¶

GREATER_THAN_OR_EQUAL = 'GREATER_THAN_OR_EQUAL'¶

LESS_THAN = 'LESS_THAN'¶

LESS_THAN_OR_EQUAL = 'LESS_THAN_OR_EQUAL'¶

NOT_EQUAL = 'NOT_EQUAL'¶

class Continuous¶

pause_status: PauseStatus | None = None¶: Indicate whether the continuous execution of the job is paused or not. Defaults to UNPAUSED.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class CronSchedule¶

quartz_cron_expression: str¶: A Cron expression using Quartz syntax that describes the schedule for a job. See Cron Trigger for details. This field is required.

timezone_id: str¶: A Java timezone ID. The schedule for a job is resolved with respect to this timezone. See Java TimeZone for details. This field is required.

pause_status: PauseStatus | None = None¶: Indicate whether this schedule is paused or not.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class DataSecurityMode¶

Data security mode decides what data governance model to use when accessing data from a cluster.

The following modes can only be used when kind = CLASSIC_PREVIEW. * DATA_SECURITY_MODE_AUTO: Databricks will choose the most appropriate access mode depending on your compute configuration. * DATA_SECURITY_MODE_STANDARD: Alias for USER_ISOLATION. * DATA_SECURITY_MODE_DEDICATED: Alias for SINGLE_USER.

The following modes can be used regardless of kind. * NONE: No security isolation for multiple users sharing the cluster. Data governance features are not available in this mode. * SINGLE_USER: A secure cluster that can only be exclusively used by a single user specified in single_user_name. Most programming languages, cluster features and data governance features are available in this mode. * USER_ISOLATION: A secure cluster that can be shared by multiple users. Cluster users are fully isolated so that they cannot see each other’s data and credentials. Most data governance features are supported in this mode. But programming languages and cluster features might be limited.

The following modes are deprecated starting with Databricks Runtime 15.0 and will be removed for future Databricks Runtime versions:

LEGACY_TABLE_ACL: This mode is for users migrating from legacy Table ACL clusters.
LEGACY_PASSTHROUGH: This mode is for users migrating from legacy Passthrough on high concurrency clusters.
LEGACY_SINGLE_USER: This mode is for users migrating from legacy Passthrough on standard clusters.
LEGACY_SINGLE_USER_STANDARD: This mode provides a way that doesn’t have UC nor passthrough enabled.

DATA_SECURITY_MODE_AUTO = 'DATA_SECURITY_MODE_AUTO'¶

DATA_SECURITY_MODE_STANDARD = 'DATA_SECURITY_MODE_STANDARD'¶

DATA_SECURITY_MODE_DEDICATED = 'DATA_SECURITY_MODE_DEDICATED'¶

NONE = 'NONE'¶

SINGLE_USER = 'SINGLE_USER'¶

USER_ISOLATION = 'USER_ISOLATION'¶

LEGACY_TABLE_ACL = 'LEGACY_TABLE_ACL'¶

LEGACY_PASSTHROUGH = 'LEGACY_PASSTHROUGH'¶

LEGACY_SINGLE_USER = 'LEGACY_SINGLE_USER'¶

LEGACY_SINGLE_USER_STANDARD = 'LEGACY_SINGLE_USER_STANDARD'¶

class DbfsStorageInfo¶

destination: str¶: dbfs destination, e.g. dbfs:/my/path

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class DbtTask¶

catalog: str | None = None¶: Optional name of the catalog to use. The value is the top level in the 3-level namespace of Unity Catalog (catalog / schema / relation). The catalog value can only be specified if a warehouse_id is specified. Requires dbt-databricks >= 1.1.1.

commands: list[str]¶: A list of dbt commands to execute. All commands must start with dbt. This parameter must not be empty. A maximum of up to 10 commands can be provided.

profiles_directory: str | None = None¶: Optional (relative) path to the profiles directory. Can only be specified if no warehouse_id is specified. If no warehouse_id is specified and this folder is unset, the root directory is used.

project_directory: str | None = None¶: Path to the project directory. Optional for Git sourced tasks, in which case if no value is provided, the root of the Git repository is used.

schema: str | None = None¶: Optional schema to write to. This parameter is only used when a warehouse_id is also provided. If not provided, the default schema is used.

source: Source | None = None¶

Optional location type of the project directory. When set to WORKSPACE, the project will be retrieved from the local Databricks workspace. When set to GIT, the project will be retrieved from a Git repository defined in git_source. If the value is empty, the task will use GIT if git_source is defined and WORKSPACE otherwise.

WORKSPACE: Project is located in Databricks workspace.
GIT: Project is located in cloud Git provider.

warehouse_id: str | None = None¶: ID of the SQL warehouse to connect to. If provided, we automatically generate and provide the profile and connection details to dbt. It can be overridden on a per-command basis by using the –profiles-dir command line argument.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class DockerBasicAuth¶

password: str | None = None¶: Password of the user

username: str | None = None¶: Name of the user

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class DockerImage¶

basic_auth: DockerBasicAuth | None = None¶

url: str | None = None¶: URL of the docker image.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class EbsVolumeType¶

The type of EBS volumes that will be launched with this cluster.

GENERAL_PURPOSE_SSD = 'GENERAL_PURPOSE_SSD'¶

THROUGHPUT_OPTIMIZED_HDD = 'THROUGHPUT_OPTIMIZED_HDD'¶

class Environment¶

The environment entity used to preserve serverless environment side panel and jobs’ environment for non-notebook task. In this minimal environment spec, only pip dependencies are supported.

client: str¶: Client version used by the environment The client is the user-facing environment of the runtime. Each client comes with a specific set of pre-installed libraries. The version is a string, consisting of the major client version.

dependencies: list[str]¶: List of pip dependencies, as supported by the version of pip in this environment.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class FileArrivalTriggerConfiguration¶

url: str¶: URL to be monitored for file arrivals. The path must point to the root or a subpath of the external location.

min_time_between_triggers_seconds: int | None = None¶: If set, the trigger starts a run only after the specified amount of time passed since the last time the trigger fired. The minimum allowed value is 60 seconds

wait_after_last_change_seconds: int | None = None¶: If set, the trigger starts a run only after no file activity has occurred for the specified amount of time. This makes it possible to wait for a batch of incoming files to arrive before triggering a run. The minimum allowed value is 60 seconds.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class ForEachTask¶

inputs: str¶: Array for task to iterate on. This can be a JSON string or a reference to an array parameter.

task: Task¶: Configuration for the task that will be run for each element in the array

concurrency: int | None = None¶: An optional maximum allowed number of concurrent runs of the task. Set this value if you want to be able to execute multiple runs of the task concurrently.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class GcpAttributes¶

availability: GcpAvailability | None = None¶

boot_disk_size: int | None = None¶: boot disk size in GB

google_service_account: str | None = None¶: If provided, the cluster will impersonate the google service account when accessing gcloud services (like GCS). The google service account must have previously been added to the Databricks environment by an account administrator.

local_ssd_count: int | None = None¶: If provided, each node (workers and driver) in the cluster will have this number of local SSDs attached. Each local SSD is 375GB in size. Refer to GCP documentation for the supported number of local SSDs for each instance type.

use_preemptible_executors: bool | None = None¶: This field determines whether the spark executors will be scheduled to run on preemptible VMs (when set to true) versus standard compute engine VMs (when set to false; default). Note: Soon to be deprecated, use the availability field instead.

zone_id: str | None = None¶: Identifier for the availability zone in which the cluster resides. This can be one of the following: - “HA” => High availability, spread nodes across availability zones for a Databricks deployment region [default] - “AUTO” => Databricks picks an availability zone to schedule the cluster on. - A GCP availability zone => Pick One of the available zones for (machine type + region) from https://cloud.google.com/compute/docs/regions-zones.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class GcpAvailability¶

This field determines whether the instance pool will contain preemptible VMs, on-demand VMs, or preemptible VMs with a fallback to on-demand VMs if the former is unavailable.

PREEMPTIBLE_GCP = 'PREEMPTIBLE_GCP'¶

ON_DEMAND_GCP = 'ON_DEMAND_GCP'¶

PREEMPTIBLE_WITH_FALLBACK_GCP = 'PREEMPTIBLE_WITH_FALLBACK_GCP'¶

class GcsStorageInfo¶

destination: str¶: GCS destination/URI, e.g. gs://my-bucket/some-prefix

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class GitProvider¶

GIT_HUB = 'gitHub'¶

BITBUCKET_CLOUD = 'bitbucketCloud'¶

AZURE_DEV_OPS_SERVICES = 'azureDevOpsServices'¶

GIT_HUB_ENTERPRISE = 'gitHubEnterprise'¶

BITBUCKET_SERVER = 'bitbucketServer'¶

GIT_LAB = 'gitLab'¶

GIT_LAB_ENTERPRISE_EDITION = 'gitLabEnterpriseEdition'¶

AWS_CODE_COMMIT = 'awsCodeCommit'¶

class GitSource¶

An optional specification for a remote Git repository containing the source code used by tasks. Version-controlled source code is supported by notebook, dbt, Python script, and SQL File tasks.

If git_source is set, these tasks retrieve the file from the remote repository by default. However, this behavior can be overridden by setting source to WORKSPACE on the task.

Note: dbt and SQL File tasks support only version-controlled sources. If dbt or SQL File tasks are used, git_source must be defined on the job.

git_provider: GitProvider¶: Unique identifier of the service used to host the Git repository. The value is case insensitive.

git_url: str¶: URL of the repository to be cloned by this job.

git_branch: str | None = None¶: Name of the branch to be checked out and used by this job. This field cannot be specified in conjunction with git_tag or git_commit.

git_commit: str | None = None¶: Commit to be checked out and used by this job. This field cannot be specified in conjunction with git_branch or git_tag.

git_tag: str | None = None¶: Name of the tag to be checked out and used by this job. This field cannot be specified in conjunction with git_branch or git_commit.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class InitScriptInfo¶

abfss: Adlsgen2Info | None = None¶: Contains the Azure Data Lake Storage destination path

dbfs: DbfsStorageInfo | None = None¶: destination needs to be provided. e.g. { “dbfs” : { “destination” : “dbfs:/home/cluster_log” } }

file: LocalFileInfo | None = None¶: destination needs to be provided. e.g. { “file” : { “destination” : “file:/my/local/file.sh” } }

gcs: GcsStorageInfo | None = None¶: destination needs to be provided. e.g. { “gcs”: { “destination”: “gs://my-bucket/file.sh” } }

s3: S3StorageInfo | None = None¶: destination and either the region or endpoint need to be provided. e.g. { “s3”: { “destination” : “s3://cluster_log_bucket/prefix”, “region” : “us-west-2” } } Cluster iam role is used to access s3, please make sure the cluster iam role in instance_profile_arn has permission to write data to the s3 destination.

volumes: VolumesStorageInfo | None = None¶: destination needs to be provided. e.g. { “volumes” : { “destination” : “/Volumes/my-init.sh” } }

workspace: WorkspaceStorageInfo | None = None¶: destination needs to be provided. e.g. { “workspace” : { “destination” : “/Users/user1@databricks.com/my-init.sh” } }

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class Job¶

budget_policy_id: str | None = None¶: The id of the user specified budget policy to use for this job. If not specified, a default budget policy may be applied when creating or modifying the job. See effective_budget_policy_id for the budget policy used by this workload.

continuous: Continuous | None = None¶: An optional continuous property for this job. The continuous property will ensure that there is always one run executing. Only one of schedule and continuous can be used.

description: str | None = None¶: An optional description for the job. The maximum length is 27700 characters in UTF-8 encoding.

email_notifications: JobEmailNotifications | None = None¶: An optional set of email addresses that is notified when runs of this job begin or complete as well as when this job is deleted.

environments: list[JobEnvironment]¶: A list of task execution environment specifications that can be referenced by serverless tasks of this job. An environment is required to be present for serverless tasks. For serverless notebook tasks, the environment is accessible in the notebook environment panel. For other serverless tasks, the task environment is required to be specified using environment_key in the task settings.

git_source: GitSource | None = None¶

An optional specification for a remote Git repository containing the source code used by tasks. Version-controlled source code is supported by notebook, dbt, Python script, and SQL File tasks.

If git_source is set, these tasks retrieve the file from the remote repository by default. However, this behavior can be overridden by setting source to WORKSPACE on the task.

Note: dbt and SQL File tasks support only version-controlled sources. If dbt or SQL File tasks are used, git_source must be defined on the job.

health: JobsHealthRules | None = None¶

job_clusters: list[JobCluster]¶: A list of job cluster specifications that can be shared and reused by tasks of this job. Libraries cannot be declared in a shared job cluster. You must declare dependent libraries in task settings. If more than 100 job clusters are available, you can paginate through them using :method:jobs/get.

max_concurrent_runs: int | None = None¶: An optional maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently. This is useful for example if you trigger your job on a frequent schedule and want to allow consecutive runs to overlap with each other, or if you want to trigger multiple runs which differ by their input parameters. This setting affects only new runs. For example, suppose the job’s concurrency is 4 and there are 4 concurrent active runs. Then setting the concurrency to 3 won’t kill any of the active runs. However, from then on, new runs are skipped unless there are fewer than 3 active runs. This value cannot exceed 1000. Setting this value to 0 causes all new runs to be skipped.

name: str | None = None¶: An optional name for the job. The maximum length is 4096 bytes in UTF-8 encoding.

notification_settings: JobNotificationSettings | None = None¶: Optional notification settings that are used when sending notifications to each of the email_notifications and webhook_notifications for this job.

parameters: list[JobParameterDefinition]¶: Job-level parameter definitions

permissions: list[JobPermission]¶

queue: QueueSettings | None = None¶: The queue settings of the job.

run_as: JobRunAs | None = None¶

schedule: CronSchedule | None = None¶: An optional periodic schedule for this job. The default behavior is that the job only runs when triggered by clicking “Run Now” in the Jobs UI or sending an API request to runNow.

tags: dict[str, str]¶: A map of tags associated with the job. These are forwarded to the cluster as cluster tags for jobs clusters, and are subject to the same limitations as cluster tags. A maximum of 25 tags can be added to the job.

tasks: list[Task]¶: A list of task specifications to be executed by this job. If more than 100 tasks are available, you can paginate through them using :method:jobs/get. Use the next_page_token field at the object root to determine if more results are available.

timeout_seconds: int | None = None¶: An optional timeout applied to each run of this job. A value of 0 means no timeout.

trigger: TriggerSettings | None = None¶: A configuration to trigger a run when certain conditions are met. The default behavior is that the job runs only when triggered by clicking “Run Now” in the Jobs UI or sending an API request to runNow.

webhook_notifications: WebhookNotifications | None = None¶: A collection of system notification IDs to notify when runs of this job begin or complete.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class JobCluster¶

job_cluster_key: str¶: A unique name for the job cluster. This field is required and must be unique within the job. JobTaskSettings may refer to this field to determine which cluster to launch for the task execution.

new_cluster: ClusterSpec¶: If new_cluster, a description of a cluster that is created for each task.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class JobEmailNotifications¶

on_duration_warning_threshold_exceeded: list[str]¶: A list of email addresses to be notified when the duration of a run exceeds the threshold specified for the RUN_DURATION_SECONDS metric in the health field. If no rule for the RUN_DURATION_SECONDS metric is specified in the health field for the job, notifications are not sent.

on_failure: list[str]¶: A list of email addresses to be notified when a run unsuccessfully completes. A run is considered to have completed unsuccessfully if it ends with an INTERNAL_ERROR life_cycle_state or a FAILED, or TIMED_OUT result_state. If this is not specified on job creation, reset, or update the list is empty, and notifications are not sent.

on_start: list[str]¶: A list of email addresses to be notified when a run begins. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent.

on_streaming_backlog_exceeded: list[str]¶: A list of email addresses to notify when any streaming backlog thresholds are exceeded for any stream. Streaming backlog thresholds can be set in the health field using the following metrics: STREAMING_BACKLOG_BYTES, STREAMING_BACKLOG_RECORDS, STREAMING_BACKLOG_SECONDS, or STREAMING_BACKLOG_FILES. Alerting is based on the 10-minute average of these metrics. If the issue persists, notifications are resent every 30 minutes.

on_success: list[str]¶: A list of email addresses to be notified when a run successfully completes. A run is considered to have completed successfully if it ends with a TERMINATED life_cycle_state and a SUCCESS result_state. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class JobEnvironment¶

environment_key: str¶: The key of an environment. It has to be unique within a job.

spec: Environment | None = None¶

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class JobNotificationSettings¶

no_alert_for_canceled_runs: bool | None = None¶: If true, do not send notifications to recipients specified in on_failure if the run is canceled.

no_alert_for_skipped_runs: bool | None = None¶: If true, do not send notifications to recipients specified in on_failure if the run is skipped.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class JobParameterDefinition¶

default: str¶: Default value of the parameter.

name: str¶: The name of the defined parameter. May only contain alphanumeric characters, _, -, and .

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class JobPermission¶

level: JobPermissionLevel¶

group_name: str | None = None¶

service_principal_name: str | None = None¶

user_name: str | None = None¶

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class JobPermissionLevel¶

CAN_MANAGE = 'CAN_MANAGE'¶

CAN_MANAGE_RUN = 'CAN_MANAGE_RUN'¶

CAN_VIEW = 'CAN_VIEW'¶

IS_OWNER = 'IS_OWNER'¶

class JobRunAs¶

Write-only setting. Specifies the user or service principal that the job runs as. If not specified, the job runs as the user who created the job.

Either user_name or service_principal_name should be specified. If not, an error is thrown.

service_principal_name: str | None = None¶: The application ID of an active service principal. Setting this field requires the servicePrincipal/user role.

user_name: str | None = None¶: The email of an active workspace user. Non-admin users can only set this field to their own email.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class JobsHealthMetric¶

Specifies the health metric that is being evaluated for a particular health rule.

RUN_DURATION_SECONDS: Expected total time for a run in seconds.
STREAMING_BACKLOG_BYTES: An estimate of the maximum bytes of data waiting to be consumed across all streams. This metric is in Public Preview.
STREAMING_BACKLOG_RECORDS: An estimate of the maximum offset lag across all streams. This metric is in Public Preview.
STREAMING_BACKLOG_SECONDS: An estimate of the maximum consumer delay across all streams. This metric is in Public Preview.
STREAMING_BACKLOG_FILES: An estimate of the maximum number of outstanding files across all streams. This metric is in Public Preview.

RUN_DURATION_SECONDS = 'RUN_DURATION_SECONDS'¶

STREAMING_BACKLOG_BYTES = 'STREAMING_BACKLOG_BYTES'¶

STREAMING_BACKLOG_RECORDS = 'STREAMING_BACKLOG_RECORDS'¶

STREAMING_BACKLOG_SECONDS = 'STREAMING_BACKLOG_SECONDS'¶

STREAMING_BACKLOG_FILES = 'STREAMING_BACKLOG_FILES'¶

class JobsHealthOperator¶

Specifies the operator used to compare the health metric value with the specified threshold.

GREATER_THAN = 'GREATER_THAN'¶

class JobsHealthRule¶

metric: JobsHealthMetric¶

op: JobsHealthOperator¶

value: int¶: Specifies the threshold value that the health metric should obey to satisfy the health rule.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class JobsHealthRules¶

An optional set of health rules that can be defined for this job.

rules: list[JobsHealthRule]¶

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class Library¶

cran: RCranLibrary | None = None¶: Specification of a CRAN library to be installed as part of the library

egg: str | None = None¶: Deprecated. URI of the egg library to install. Installing Python egg files is deprecated and is not supported in Databricks Runtime 14.0 and above.

jar: str | None = None¶: URI of the JAR library to install. Supported URIs include Workspace paths, Unity Catalog Volumes paths, and S3 URIs. For example: { “jar”: “/Workspace/path/to/library.jar” }, { “jar” : “/Volumes/path/to/library.jar” } or { “jar”: “s3://my-bucket/library.jar” }. If S3 is used, please make sure the cluster has read access on the library. You may need to launch the cluster with an IAM role to access the S3 URI.

maven: MavenLibrary | None = None¶: Specification of a maven library to be installed. For example: { “coordinates”: “org.jsoup:jsoup:1.7.2” }

pypi: PythonPyPiLibrary | None = None¶: Specification of a PyPi library to be installed. For example: { “package”: “simplejson” }

requirements: str | None = None¶: URI of the requirements.txt file to install. Only Workspace paths and Unity Catalog Volumes paths are supported. For example: { “requirements”: “/Workspace/path/to/requirements.txt” } or { “requirements” : “/Volumes/path/to/requirements.txt” }

whl: str | None = None¶: URI of the wheel library to install. Supported URIs include Workspace paths, Unity Catalog Volumes paths, and S3 URIs. For example: { “whl”: “/Workspace/path/to/library.whl” }, { “whl” : “/Volumes/path/to/library.whl” } or { “whl”: “s3://my-bucket/library.whl” }. If S3 is used, please make sure the cluster has read access on the library. You may need to launch the cluster with an IAM role to access the S3 URI.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class LocalFileInfo¶

destination: str¶: local file destination, e.g. file:/my/local/file.sh

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class LogAnalyticsInfo¶

log_analytics_primary_key: str | None = None¶: The primary key for the Azure Log Analytics agent configuration

log_analytics_workspace_id: str | None = None¶: The workspace ID for the Azure Log Analytics agent configuration

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class MavenLibrary¶

coordinates: str¶: Gradle-style maven coordinates. For example: “org.jsoup:jsoup:1.7.2”.

exclusions: list[str]¶

List of dependences to exclude. For example: [“slf4j:slf4j”, “*:hadoop-client”].

Maven dependency exclusions: https://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html.

repo: str | None = None¶: Maven repo to install the Maven package from. If omitted, both Maven Central Repository and Spark Packages are searched.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class NotebookTask¶

notebook_path: str¶: The path of the notebook to be run in the Databricks workspace or remote repository. For notebooks stored in the Databricks workspace, the path must be absolute and begin with a slash. For notebooks stored in a remote repository, the path must be relative. This field is required.

base_parameters: dict[str, str]¶

Base parameters to be used for each run of this job. If the run is initiated by a call to :method:jobs/run Now with parameters specified, the two parameters maps are merged. If the same key is specified in base_parameters and in run-now, the value from run-now is used. Use Task parameter variables to set parameters containing information about job runs.

If the notebook takes a parameter that is not specified in the job’s base_parameters or the run-now override parameters, the default value from the notebook is used.

Retrieve these parameters in a notebook using dbutils.widgets.get.

The JSON representation of this field cannot exceed 1MB.

source: Source | None = None¶: Optional location type of the notebook. When set to WORKSPACE, the notebook will be retrieved from the local Databricks workspace. When set to GIT, the notebook will be retrieved from a Git repository defined in git_source. If the value is empty, the task will use GIT if git_source is defined and WORKSPACE otherwise. * WORKSPACE: Notebook is located in Databricks workspace. * GIT: Notebook is located in cloud Git provider.

warehouse_id: str | None = None¶

Optional warehouse_id to run the notebook on a SQL warehouse. Classic SQL warehouses are NOT supported, please use serverless or pro SQL warehouses.

Note that SQL warehouses only support SQL cells; if the notebook contains non-SQL cells, the run will fail.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class PauseStatus¶

UNPAUSED = 'UNPAUSED'¶

PAUSED = 'PAUSED'¶

class PeriodicTriggerConfiguration¶

interval: int¶: The interval at which the trigger should run.

unit: PeriodicTriggerConfigurationTimeUnit¶: The unit of time for the interval.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class PeriodicTriggerConfigurationTimeUnit¶

HOURS = 'HOURS'¶

DAYS = 'DAYS'¶

WEEKS = 'WEEKS'¶

class PipelineParams¶

full_refresh: bool | None = None¶: If true, triggers a full refresh on the delta live table.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class PipelineTask¶

pipeline_id: str¶: The full name of the pipeline task to execute.

full_refresh: bool | None = None¶: If true, triggers a full refresh on the delta live table.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class PythonPyPiLibrary¶

package: str¶: The name of the pypi package to install. An optional exact version specification is also supported. Examples: “simplejson” and “simplejson==3.8.0”.

repo: str | None = None¶: The repository where the package can be found. If not specified, the default pip index is used.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class PythonWheelTask¶

entry_point: str¶: Named entry point to use, if it does not exist in the metadata of the package it executes the function from the package directly using $packageName.$entryPoint()

package_name: str¶: Name of the package to execute

named_parameters: dict[str, str]¶: Command-line parameters passed to Python wheel task in the form of [”–name=task”, “–data=dbfs:/path/to/data.json”]. Leave it empty if parameters is not null.

parameters: list[str]¶: Command-line parameters passed to Python wheel task. Leave it empty if named_parameters is not null.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class QueueSettings¶

enabled: bool¶: If true, enable queueing for the job. This is a required field.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class RCranLibrary¶

package: str¶: The name of the CRAN package to install.

repo: str | None = None¶: The repository where the package can be found. If not specified, the default CRAN repo is used.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class RunIf¶

An optional value indicating the condition that determines whether the task should be run once its dependencies have been completed. When omitted, defaults to ALL_SUCCESS.

Possible values are: * ALL_SUCCESS: All dependencies have executed and succeeded * AT_LEAST_ONE_SUCCESS: At least one dependency has succeeded * NONE_FAILED: None of the dependencies have failed and at least one was executed * ALL_DONE: All dependencies have been completed * AT_LEAST_ONE_FAILED: At least one dependency failed * ALL_FAILED: ALl dependencies have failed

ALL_SUCCESS = 'ALL_SUCCESS'¶

ALL_DONE = 'ALL_DONE'¶

NONE_FAILED = 'NONE_FAILED'¶

AT_LEAST_ONE_SUCCESS = 'AT_LEAST_ONE_SUCCESS'¶

ALL_FAILED = 'ALL_FAILED'¶

AT_LEAST_ONE_FAILED = 'AT_LEAST_ONE_FAILED'¶

class RunJobTask¶

job_id: int¶: ID of the job to trigger.

job_parameters: dict[str, str]¶: Job-level parameters used to trigger the job.

pipeline_params: PipelineParams | None = None¶: Controls whether the pipeline should perform a full refresh

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class RuntimeEngine¶

Determines the cluster’s runtime engine, either standard or Photon.

This field is not compatible with legacy spark_version values that contain -photon-. Remove -photon- from the spark_version and set runtime_engine to PHOTON.

If left unspecified, the runtime engine defaults to standard unless the spark_version contains -photon-, in which case Photon will be used.

NULL = 'NULL'¶

STANDARD = 'STANDARD'¶

PHOTON = 'PHOTON'¶

class S3StorageInfo¶

destination: str¶: S3 destination, e.g. s3://my-bucket/some-prefix Note that logs will be delivered using cluster iam role, please make sure you set cluster iam role and the role has write access to the destination. Please also note that you cannot use AWS keys to deliver logs.

canned_acl: str | None = None¶: (Optional) Set canned access control list for the logs, e.g. bucket-owner-full-control. If canned_cal is set, please make sure the cluster iam role has s3:PutObjectAcl permission on the destination bucket and prefix. The full list of possible canned acl can be found at http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl. Please also note that by default only the object owner gets full controls. If you are using cross account role for writing data, you may want to set bucket-owner-full-control to make bucket owner able to read the logs.

enable_encryption: bool | None = None¶: (Optional) Flag to enable server side encryption, false by default.

encryption_type: str | None = None¶: (Optional) The encryption type, it could be sse-s3 or sse-kms. It will be used only when encryption is enabled and the default type is sse-s3.

endpoint: str | None = None¶: S3 endpoint, e.g. https://s3-us-west-2.amazonaws.com. Either region or endpoint needs to be set. If both are set, endpoint will be used.

kms_key: str | None = None¶: (Optional) Kms key which will be used if encryption is enabled and encryption type is set to sse-kms.

region: str | None = None¶: S3 region, e.g. us-west-2. Either region or endpoint needs to be set. If both are set, endpoint will be used.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class Source¶

Optional location type of the SQL file. When set to WORKSPACE, the SQL file will be retrieved from the local Databricks workspace. When set to GIT, the SQL file will be retrieved from a Git repository defined in git_source. If the value is empty, the task will use GIT if git_source is defined and WORKSPACE otherwise.

WORKSPACE: SQL file is located in Databricks workspace.
GIT: SQL file is located in cloud Git provider.

WORKSPACE = 'WORKSPACE'¶

GIT = 'GIT'¶

class SparkJarTask¶

main_class_name: str¶

The full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library.

The code must use SparkContext.getOrCreate to obtain a Spark context; otherwise, runs of the job fail.

parameters: list[str]¶

Parameters passed to the main method.

Use Task parameter variables to set parameters containing information about job runs.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class SparkPythonTask¶

python_file: str¶: The Python file to be executed. Cloud file URIs (such as dbfs:/, s3:/, adls:/, gcs:/) and workspace paths are supported. For python files stored in the Databricks workspace, the path must be absolute and begin with /. For files stored in a remote repository, the path must be relative. This field is required.

parameters: list[str]¶

Command line parameters passed to the Python file.

Use Task parameter variables to set parameters containing information about job runs.

source: Source | None = None¶

Optional location type of the Python file. When set to WORKSPACE or not specified, the file will be retrieved from the local Databricks workspace or cloud location (if the python_file has a URI format). When set to GIT, the Python file will be retrieved from a Git repository defined in git_source.

WORKSPACE: The Python file is located in a Databricks workspace or at a cloud filesystem URI.
GIT: The Python file is located in a remote Git repository.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class SparkSubmitTask¶

parameters: list[str]¶

Command-line parameters passed to spark submit.

Use Task parameter variables to set parameters containing information about job runs.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class SqlTask¶

warehouse_id: str¶: The canonical identifier of the SQL warehouse. Recommended to use with serverless or pro SQL warehouses. Classic SQL warehouses are only supported for SQL alert, dashboard and query tasks and are limited to scheduled single-task jobs.

alert: SqlTaskAlert | None = None¶: If alert, indicates that this job must refresh a SQL alert.

dashboard: SqlTaskDashboard | None = None¶: If dashboard, indicates that this job must refresh a SQL dashboard.

file: SqlTaskFile | None = None¶: If file, indicates that this job runs a SQL file in a remote Git repository.

parameters: dict[str, str]¶: Parameters to be used for each run of this job. The SQL alert task does not support custom parameters.

query: SqlTaskQuery | None = None¶: If query, indicates that this job must execute a SQL query.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class SqlTaskAlert¶

alert_id: str¶: The canonical identifier of the SQL alert.

pause_subscriptions: bool | None = None¶: If true, the alert notifications are not sent to subscribers.

subscriptions: list[SqlTaskSubscription]¶: If specified, alert notifications are sent to subscribers.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class SqlTaskDashboard¶

dashboard_id: str¶: The canonical identifier of the SQL dashboard.

custom_subject: str | None = None¶: Subject of the email sent to subscribers of this task.

pause_subscriptions: bool | None = None¶: If true, the dashboard snapshot is not taken, and emails are not sent to subscribers.

subscriptions: list[SqlTaskSubscription]¶: If specified, dashboard snapshots are sent to subscriptions.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class SqlTaskFile¶

path: str¶: Path of the SQL file. Must be relative if the source is a remote Git repository and absolute for workspace paths.

source: Source | None = None¶

Optional location type of the SQL file. When set to WORKSPACE, the SQL file will be retrieved from the local Databricks workspace. When set to GIT, the SQL file will be retrieved from a Git repository defined in git_source. If the value is empty, the task will use GIT if git_source is defined and WORKSPACE otherwise.

WORKSPACE: SQL file is located in Databricks workspace.
GIT: SQL file is located in cloud Git provider.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class SqlTaskQuery¶

query_id: str¶: The canonical identifier of the SQL query.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class SqlTaskSubscription¶

destination_id: str | None = None¶: The canonical identifier of the destination to receive email notification. This parameter is mutually exclusive with user_name. You cannot set both destination_id and user_name for subscription notifications.

user_name: str | None = None¶: The user name to receive the subscription email. This parameter is mutually exclusive with destination_id. You cannot set both destination_id and user_name for subscription notifications.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class Task¶

task_key: str¶: A unique name for the task. This field is used to refer to this task from other tasks. This field is required and must be unique within its parent job. On Update or Reset, this field is used to reference the tasks to be updated or reset.

clean_rooms_notebook_task: CleanRoomsNotebookTask | None = None¶: The task runs a clean rooms notebook when the clean_rooms_notebook_task field is present.

condition_task: ConditionTask | None = None¶: The task evaluates a condition that can be used to control the execution of other tasks when the condition_task field is present. The condition task does not require a cluster to execute and does not support retries or notifications.

dbt_task: DbtTask | None = None¶: The task runs one or more dbt commands when the dbt_task field is present. The dbt task requires both Databricks SQL and the ability to use a serverless or a pro SQL warehouse.

depends_on: list[TaskDependency]¶: An optional array of objects specifying the dependency graph of the task. All tasks specified in this field must complete before executing this task. The task will run only if the run_if condition is true. The key is task_key, and the value is the name assigned to the dependent task.

description: str | None = None¶: An optional description for this task.

disable_auto_optimization: bool | None = None¶: An option to disable auto optimization in serverless

email_notifications: TaskEmailNotifications | None = None¶: An optional set of email addresses that is notified when runs of this task begin or complete as well as when this task is deleted. The default behavior is to not send any emails.

environment_key: str | None = None¶: The key that references an environment spec in a job. This field is required for Python script, Python wheel and dbt tasks when using serverless compute.

existing_cluster_id: str | None = None¶: If existing_cluster_id, the ID of an existing cluster that is used for all runs. When running jobs or tasks on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs and tasks on new clusters for greater reliability

for_each_task: ForEachTask | None = None¶: The task executes a nested task for every input provided when the for_each_task field is present.

health: JobsHealthRules | None = None¶

job_cluster_key: str | None = None¶: If job_cluster_key, this task is executed reusing the cluster specified in job.settings.job_clusters.

libraries: list[Library]¶: An optional list of libraries to be installed on the cluster. The default value is an empty list.

max_retries: int | None = None¶: An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it completes with the FAILED result_state or INTERNAL_ERROR life_cycle_state. The value -1 means to retry indefinitely and the value 0 means to never retry.

min_retry_interval_millis: int | None = None¶: An optional minimal interval in milliseconds between the start of the failed run and the subsequent retry run. The default behavior is that unsuccessful runs are immediately retried.

new_cluster: ClusterSpec | None = None¶: If new_cluster, a description of a new cluster that is created for each run.

notebook_task: NotebookTask | None = None¶: The task runs a notebook when the notebook_task field is present.

notification_settings: TaskNotificationSettings | None = None¶: Optional notification settings that are used when sending notifications to each of the email_notifications and webhook_notifications for this task.

pipeline_task: PipelineTask | None = None¶: The task triggers a pipeline update when the pipeline_task field is present. Only pipelines configured to use triggered more are supported.

python_wheel_task: PythonWheelTask | None = None¶: The task runs a Python wheel when the python_wheel_task field is present.

retry_on_timeout: bool | None = None¶: An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout.

run_if: RunIf | None = None¶

An optional value specifying the condition determining whether the task is run once its dependencies have been completed.

ALL_SUCCESS: All dependencies have executed and succeeded
AT_LEAST_ONE_SUCCESS: At least one dependency has succeeded
NONE_FAILED: None of the dependencies have failed and at least one was executed
ALL_DONE: All dependencies have been completed
AT_LEAST_ONE_FAILED: At least one dependency failed
ALL_FAILED: ALl dependencies have failed

run_job_task: RunJobTask | None = None¶: The task triggers another job when the run_job_task field is present.

spark_jar_task: SparkJarTask | None = None¶: The task runs a JAR when the spark_jar_task field is present.

spark_python_task: SparkPythonTask | None = None¶: The task runs a Python file when the spark_python_task field is present.

spark_submit_task: SparkSubmitTask | None = None¶

(Legacy) The task runs the spark-submit script when the spark_submit_task field is present. This task can run only on new clusters and is not compatible with serverless compute.

In the new_cluster specification, libraries and spark_conf are not supported. Instead, use –jars and –py-files to add Java and Python libraries and –conf to set the Spark configurations.

master, deploy-mode, and executor-cores are automatically configured by Databricks; you _cannot_ specify them in parameters.

By default, the Spark submit job uses all available memory (excluding reserved memory for Databricks services). You can set –driver-memory, and –executor-memory to a smaller value to leave some room for off-heap usage.

The –jars, –py-files, –files arguments support DBFS and S3 paths.

sql_task: SqlTask | None = None¶: The task runs a SQL query or file, or it refreshes a SQL alert or a legacy SQL dashboard when the sql_task field is present.

timeout_seconds: int | None = None¶: An optional timeout applied to each run of this job task. A value of 0 means no timeout.

webhook_notifications: WebhookNotifications | None = None¶: A collection of system notification IDs to notify when runs of this task begin or complete. The default behavior is to not send any system notifications.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class TaskDependency¶

task_key: str¶: The name of the task this task depends on.

outcome: str | None = None¶: Can only be specified on condition task dependencies. The outcome of the dependent task that must be met for this task to run.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class TaskEmailNotifications¶

on_duration_warning_threshold_exceeded: list[str]¶: A list of email addresses to be notified when the duration of a run exceeds the threshold specified for the RUN_DURATION_SECONDS metric in the health field. If no rule for the RUN_DURATION_SECONDS metric is specified in the health field for the job, notifications are not sent.

on_failure: list[str]¶: A list of email addresses to be notified when a run unsuccessfully completes. A run is considered to have completed unsuccessfully if it ends with an INTERNAL_ERROR life_cycle_state or a FAILED, or TIMED_OUT result_state. If this is not specified on job creation, reset, or update the list is empty, and notifications are not sent.

on_start: list[str]¶: A list of email addresses to be notified when a run begins. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent.

on_streaming_backlog_exceeded: list[str]¶: A list of email addresses to notify when any streaming backlog thresholds are exceeded for any stream. Streaming backlog thresholds can be set in the health field using the following metrics: STREAMING_BACKLOG_BYTES, STREAMING_BACKLOG_RECORDS, STREAMING_BACKLOG_SECONDS, or STREAMING_BACKLOG_FILES. Alerting is based on the 10-minute average of these metrics. If the issue persists, notifications are resent every 30 minutes.

on_success: list[str]¶: A list of email addresses to be notified when a run successfully completes. A run is considered to have completed successfully if it ends with a TERMINATED life_cycle_state and a SUCCESS result_state. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class TaskNotificationSettings¶

alert_on_last_attempt: bool | None = None¶: If true, do not send notifications to recipients specified in on_start for the retried runs and do not send notifications to recipients specified in on_failure until the last retry of the run.

no_alert_for_canceled_runs: bool | None = None¶: If true, do not send notifications to recipients specified in on_failure if the run is canceled.

no_alert_for_skipped_runs: bool | None = None¶: If true, do not send notifications to recipients specified in on_failure if the run is skipped.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class TriggerSettings¶

file_arrival: FileArrivalTriggerConfiguration | None = None¶: File arrival trigger settings.

pause_status: PauseStatus | None = None¶: Whether this trigger is paused or not.

periodic: PeriodicTriggerConfiguration | None = None¶: Periodic trigger settings.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class VolumesStorageInfo¶

destination: str¶: Unity Catalog volumes file destination, e.g. /Volumes/catalog/schema/volume/dir/file

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class Webhook¶

id: str¶

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class WebhookNotifications¶

on_duration_warning_threshold_exceeded: list[Webhook]¶: An optional list of system notification IDs to call when the duration of a run exceeds the threshold specified for the RUN_DURATION_SECONDS metric in the health field. A maximum of 3 destinations can be specified for the on_duration_warning_threshold_exceeded property.

on_failure: list[Webhook]¶: An optional list of system notification IDs to call when the run fails. A maximum of 3 destinations can be specified for the on_failure property.

on_start: list[Webhook]¶: An optional list of system notification IDs to call when the run starts. A maximum of 3 destinations can be specified for the on_start property.

on_streaming_backlog_exceeded: list[Webhook]¶: An optional list of system notification IDs to call when any streaming backlog thresholds are exceeded for any stream. Streaming backlog thresholds can be set in the health field using the following metrics: STREAMING_BACKLOG_BYTES, STREAMING_BACKLOG_RECORDS, STREAMING_BACKLOG_SECONDS, or STREAMING_BACKLOG_FILES. Alerting is based on the 10-minute average of these metrics. If the issue persists, notifications are resent every 30 minutes. A maximum of 3 destinations can be specified for the on_streaming_backlog_exceeded property.

on_success: list[Webhook]¶: An optional list of system notification IDs to call when the run completes successfully. A maximum of 3 destinations can be specified for the on_success property.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class WorkloadType¶

clients: ClientsTypes¶: defined what type of clients can use the cluster. E.g. Notebooks, Jobs

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class WorkspaceStorageInfo¶

destination: str¶: workspace files destination, e.g. /Users/user1@databricks.com/my-init.sh

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

Navigation

Related Topics

Jobs¶

Classes¶