OptionalapplyOptionalautoterminationAutomatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. Users can also set this value to 0 to explicitly disable automatic termination.
OptionalawsAttributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values will be used.
OptionalazureAttributes related to clusters running on Microsoft Azure. If not specified at cluster creation, a set of default values will be used.
OptionalcloneWhen specified, this clones libraries from a source cluster during the creation of a new cluster.
OptionalclusterThe configuration for delivering spark logs to a long-term storage destination.
Three kinds of destinations (DBFS, S3 and Unity Catalog volumes) are supported. Only one destination can be specified
for one cluster. If the conf is given, the logs will be delivered to the destination every
5 mins. The destination of driver logs is $destination/$clusterId/driver, while
the destination of executor logs is $destination/$clusterId/executor.
OptionalclusterCluster name requested by the user. This doesn't have to be unique. If not specified at creation, the cluster name will be an empty string. For job clusters, the cluster name is automatically set based on the job and job run IDs.
OptionalcustomAdditional tags for cluster resources. default_tags. Notes:
Currently,
Clusters can only reuse cloud resources if the resources' tags are a subset of the cluster tags
OptionaldataOptionaldockerCustom docker image BYOC
OptionaldriverThe optional ID of the instance pool for the driver of the cluster belongs. The pool cluster uses the instance pool with id (instance_pool_id) if the driver pool is not assigned.
OptionaldriverFlexible node type configuration for the driver node.
OptionaldriverThe node type of the Spark driver.
Note that this field is optional; if unset, the driver node type will be set as the same value
as node_type_id defined above.
This field, along with node_type_id, should not be set if virtual_cluster_size is set. If both driver_node_type_id, node_type_id, and virtual_cluster_size are specified, driver_node_type_id and node_type_id take precedence.
OptionalenableAutoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space.
OptionalenableWhether to enable LUKS on cluster VMs' local disks
OptionalgcpAttributes related to clusters running on Google Cloud Platform. If not specified at cluster creation, a set of default values will be used.
OptionalinitThe configuration for storing init scripts. Any number of destinations can be specified.
The scripts are executed sequentially in the order provided.
If cluster_log_conf is specified, init script logs are sent to <destination>/<cluster-ID>/init_scripts.
OptionalinstanceThe optional ID of the instance pool to which the cluster belongs.
OptionalisThis field can only be used when kind = CLASSIC_PREVIEW.
When set to true, custom_tags, spark_conf, and num_workers
OptionalkindOptionalnodeThis field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call.
OptionalpolicyThe ID of the cluster policy used to create the cluster if applicable.
OptionalremoteIf set, what the configurable throughput (in Mb/s) for the remote disk is. Currently only supported for GCP HYPERDISK_BALANCED disks.
OptionalruntimeDetermines the cluster's runtime engine, either standard or Photon.
This field is not compatible with legacy spark_version values that contain -photon-.
Remove -photon- from the spark_version and set runtime_engine to PHOTON.
If left unspecified, the runtime engine defaults to standard unless the spark_version contains -photon-, in which case Photon will be used.
OptionalsingleSingle user name if data_security_mode is SINGLE_USER
OptionalsizeNumber of worker nodes that this cluster should have. A cluster has one Spark Driver
and num_workers Executors for a total of num_workers + 1 Spark nodes.
Note: When reading the properties of a cluster, this field reflects the desired number
of workers rather than the actual current number of workers. For instance, if a cluster
is resized from 5 to 10 workers, this field will immediately be updated to reflect
the target size of 10 workers, whereas the workers listed in spark_info will gradually
increase from 5 to 10 as the new nodes are provisioned.
Parameters needed in order to automatically scale clusters up and down based on load. Note: autoscaling works best with DB runtime versions 3.0 or later.
OptionalsparkAn object containing a set of optional, user-specified Spark configuration key-value pairs.
Users can also pass in a string of extra JVM options to the driver and the executors via
spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively.
OptionalsparkAn object containing a set of optional, user-specified environment variable key-value pairs.
Please note that key-value pair of the form (X,Y) will be exported as is (i.e.,
export X='Y') while launching the driver and workers.
In order to specify an additional set of SPARK_DAEMON_JAVA_OPTS, we recommend appending
them to $SPARK_DAEMON_JAVA_OPTS as shown in the example below. This ensures that all
default databricks managed environmental variables are included as well.
Example Spark environment variables:
{"SPARK_WORKER_MEMORY": "28000m", "SPARK_LOCAL_DIRS": "/local_disk0"} or
{"SPARK_DAEMON_JAVA_OPTS": "$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true"}
OptionalsparkThe Spark version of the cluster, e.g. 3.3.x-scala2.11.
A list of available Spark versions can be retrieved by using
the :method:clusters/sparkVersions API call.
OptionalsshSSH public key contents that will be added to each Spark node in this cluster. The
corresponding private keys can be used to login with the user name ubuntu on port 2200.
Up to 10 keys can be specified.
OptionaltotalIf set, what the total initial volume size (in GB) of the remote disks should be. Currently only supported for GCP HYPERDISK_BALANCED disks.
OptionaluseThis field can only be used when kind = CLASSIC_PREVIEW.
effective_spark_version is determined by spark_version (DBR release), this field use_ml_runtime, and whether node_type_id is gpu node or not.
OptionalworkerFlexible node type configuration for worker nodes.
Optionalworkload
When set to true, fixed and default values from the policy will be used for fields that are omitted. When set to false, only fixed values from the policy will be applied.