Pipelines¶

Package: databricks.bundles.pipelines

Classes¶

class Adlsgen2Info¶

destination: str¶: abfss destination, e.g. abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<directory-name>.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class AwsAttributes¶

availability: AwsAvailability | None = None¶

ebs_volume_count: int | None = None¶

The number of volumes launched for each instance. Users can choose up to 10 volumes. This feature is only enabled for supported node types. Legacy node types cannot specify custom EBS volumes. For node types with no instance store, at least one EBS volume needs to be specified; otherwise, cluster creation will fail.

These EBS volumes will be mounted at /ebs0, /ebs1, and etc. Instance store volumes will be mounted at /local_disk0, /local_disk1, and etc.

If EBS volumes are attached, Databricks will configure Spark to use only the EBS volumes for scratch storage because heterogenously sized scratch devices can lead to inefficient disk utilization. If no EBS volumes are attached, Databricks will configure Spark to use instance store volumes.

Please note that if EBS volumes are specified, then the Spark configuration spark.local.dir will be overridden.

ebs_volume_iops: int | None = None¶: If using gp3 volumes, what IOPS to use for the disk. If this is not set, the maximum performance of a gp2 volume with the same volume size will be used.

ebs_volume_size: int | None = None¶: The size of each EBS volume (in GiB) launched for each instance. For general purpose SSD, this value must be within the range 100 - 4096. For throughput optimized HDD, this value must be within the range 500 - 4096.

ebs_volume_throughput: int | None = None¶: If using gp3 volumes, what throughput to use for the disk. If this is not set, the maximum performance of a gp2 volume with the same volume size will be used.

ebs_volume_type: EbsVolumeType | None = None¶

first_on_demand: int | None = None¶: The first first_on_demand nodes of the cluster will be placed on on-demand instances. If this value is greater than 0, the cluster driver node in particular will be placed on an on-demand instance. If this value is greater than or equal to the current cluster size, all nodes will be placed on on-demand instances. If this value is less than the current cluster size, first_on_demand nodes will be placed on on-demand instances and the remainder will be placed on availability instances. Note that this value does not affect cluster size and cannot currently be mutated over the lifetime of a cluster.

instance_profile_arn: str | None = None¶

Nodes for this cluster will only be placed on AWS instances with this instance profile. If ommitted, nodes will be placed on instances without an IAM instance profile. The instance profile must have previously been added to the Databricks environment by an account administrator.

This feature may only be available to certain customer plans.

If this field is ommitted, we will pull in the default from the conf if it exists.

spot_bid_price_percent: int | None = None¶

The bid price for AWS spot instances, as a percentage of the corresponding instance type’s on-demand price. For example, if this field is set to 50, and the cluster needs a new r3.xlarge spot instance, then the bid price is half of the price of on-demand r3.xlarge instances. Similarly, if this field is set to 200, the bid price is twice the price of on-demand r3.xlarge instances. If not specified, the default value is 100. When spot instances are requested for this cluster, only spot instances whose bid price percentage matches this field will be considered. Note that, for safety, we enforce this field to be no more than 10000.

The default value and documentation here should be kept consistent with CommonConf.defaultSpotBidPricePercent and CommonConf.maxSpotBidPricePercent.

zone_id: str | None = None¶: Identifier for the availability zone/datacenter in which the cluster resides. This string will be of a form like “us-west-2a”. The provided availability zone must be in the same region as the Databricks deployment. For example, “us-west-2a” is not a valid zone id if the Databricks deployment resides in the “us-east-1” region. This is an optional field at cluster creation, and if not specified, a default zone will be used. If the zone specified is “auto”, will try to place cluster in a zone with high availability, and will retry placement in a different AZ if there is not enough capacity. The list of available zones as well as the default value can be found by using the List Zones method.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class AwsAvailability¶

Availability type used for all subsequent nodes past the first_on_demand ones.

Note: If first_on_demand is zero, this availability type will be used for the entire cluster.

SPOT = 'SPOT'¶

ON_DEMAND = 'ON_DEMAND'¶

SPOT_WITH_FALLBACK = 'SPOT_WITH_FALLBACK'¶

class AzureAttributes¶

availability: AzureAvailability | None = None¶

first_on_demand: int | None = None¶: The first first_on_demand nodes of the cluster will be placed on on-demand instances. This value should be greater than 0, to make sure the cluster driver node is placed on an on-demand instance. If this value is greater than or equal to the current cluster size, all nodes will be placed on on-demand instances. If this value is less than the current cluster size, first_on_demand nodes will be placed on on-demand instances and the remainder will be placed on availability instances. Note that this value does not affect cluster size and cannot currently be mutated over the lifetime of a cluster.

log_analytics_info: LogAnalyticsInfo | None = None¶: Defines values necessary to configure and run Azure Log Analytics agent

spot_bid_max_price: float | None = None¶: The max bid price to be used for Azure spot instances. The Max price for the bid cannot be higher than the on-demand price of the instance. If not specified, the default value is -1, which specifies that the instance cannot be evicted on the basis of price, and only on the basis of availability. Further, the value should > 0 or -1.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class AzureAvailability¶

Availability type used for all subsequent nodes past the first_on_demand ones. Note: If first_on_demand is zero (which only happens on pool clusters), this availability type will be used for the entire cluster.

SPOT_AZURE = 'SPOT_AZURE'¶

ON_DEMAND_AZURE = 'ON_DEMAND_AZURE'¶

SPOT_WITH_FALLBACK_AZURE = 'SPOT_WITH_FALLBACK_AZURE'¶

class ClusterLogConf¶

dbfs: DbfsStorageInfo | None = None¶: destination needs to be provided. e.g. { “dbfs” : { “destination” : “dbfs:/home/cluster_log” } }

s3: S3StorageInfo | None = None¶: destination and either the region or endpoint need to be provided. e.g. { “s3”: { “destination” : “s3://cluster_log_bucket/prefix”, “region” : “us-west-2” } } Cluster iam role is used to access s3, please make sure the cluster iam role in instance_profile_arn has permission to write data to the s3 destination.

volumes: VolumesStorageInfo | None = None¶: destination needs to be provided. e.g. { “volumes” : { “destination” : “/Volumes/catalog/schema/volume/cluster_log” } }

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class DbfsStorageInfo¶

destination: str¶: dbfs destination, e.g. dbfs:/my/path

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class EbsVolumeType¶

The type of EBS volumes that will be launched with this cluster.

GENERAL_PURPOSE_SSD = 'GENERAL_PURPOSE_SSD'¶

THROUGHPUT_OPTIMIZED_HDD = 'THROUGHPUT_OPTIMIZED_HDD'¶

class EventLogSpec¶

Configurable event log parameters.

catalog: str | None = None¶: The UC catalog the event log is published under.

name: str | None = None¶: The name the event log is published to in UC.

schema: str | None = None¶: The UC schema the event log is published under.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class FileLibrary¶

path: str | None = None¶: The absolute path of the file.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class Filters¶

exclude: list[str]¶: Paths to exclude.

include: list[str]¶: Paths to include.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class GcpAttributes¶

availability: GcpAvailability | None = None¶

boot_disk_size: int | None = None¶: boot disk size in GB

google_service_account: str | None = None¶: If provided, the cluster will impersonate the google service account when accessing gcloud services (like GCS). The google service account must have previously been added to the Databricks environment by an account administrator.

local_ssd_count: int | None = None¶: If provided, each node (workers and driver) in the cluster will have this number of local SSDs attached. Each local SSD is 375GB in size. Refer to GCP documentation for the supported number of local SSDs for each instance type.

use_preemptible_executors: bool | None = None¶: This field determines whether the spark executors will be scheduled to run on preemptible VMs (when set to true) versus standard compute engine VMs (when set to false; default). Note: Soon to be deprecated, use the availability field instead.

zone_id: str | None = None¶: Identifier for the availability zone in which the cluster resides. This can be one of the following: - “HA” => High availability, spread nodes across availability zones for a Databricks deployment region [default] - “AUTO” => Databricks picks an availability zone to schedule the cluster on. - A GCP availability zone => Pick One of the available zones for (machine type + region) from https://cloud.google.com/compute/docs/regions-zones.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class GcpAvailability¶

This field determines whether the instance pool will contain preemptible VMs, on-demand VMs, or preemptible VMs with a fallback to on-demand VMs if the former is unavailable.

PREEMPTIBLE_GCP = 'PREEMPTIBLE_GCP'¶

ON_DEMAND_GCP = 'ON_DEMAND_GCP'¶

PREEMPTIBLE_WITH_FALLBACK_GCP = 'PREEMPTIBLE_WITH_FALLBACK_GCP'¶

class GcsStorageInfo¶

destination: str¶: GCS destination/URI, e.g. gs://my-bucket/some-prefix

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class IngestionConfig¶

report: ReportSpec | None = None¶: Select a specific source report.

schema: SchemaSpec | None = None¶: Select all tables from a specific source schema.

table: TableSpec | None = None¶: Select a specific source table.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class IngestionPipelineDefinition¶

connection_name: str | None = None¶: Immutable. The Unity Catalog connection that this ingestion pipeline uses to communicate with the source. This is used with connectors for applications like Salesforce, Workday, and so on.

ingestion_gateway_id: str | None = None¶: Immutable. Identifier for the gateway that is used by this ingestion pipeline to communicate with the source database. This is used with connectors to databases like SQL Server.

objects: list[IngestionConfig]¶: Required. Settings specifying tables to replicate and the destination for the replicated tables.

table_configuration: TableSpecificConfig | None = None¶: Configuration settings to control the ingestion of tables. These settings are applied to all tables in the pipeline.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class InitScriptInfo¶

abfss: Adlsgen2Info | None = None¶: Contains the Azure Data Lake Storage destination path

dbfs: DbfsStorageInfo | None = None¶: destination needs to be provided. e.g. { “dbfs” : { “destination” : “dbfs:/home/cluster_log” } }

file: LocalFileInfo | None = None¶: destination needs to be provided. e.g. { “file” : { “destination” : “file:/my/local/file.sh” } }

gcs: GcsStorageInfo | None = None¶: destination needs to be provided. e.g. { “gcs”: { “destination”: “gs://my-bucket/file.sh” } }

s3: S3StorageInfo | None = None¶: destination and either the region or endpoint need to be provided. e.g. { “s3”: { “destination” : “s3://cluster_log_bucket/prefix”, “region” : “us-west-2” } } Cluster iam role is used to access s3, please make sure the cluster iam role in instance_profile_arn has permission to write data to the s3 destination.

volumes: VolumesStorageInfo | None = None¶: destination needs to be provided. e.g. { “volumes” : { “destination” : “/Volumes/my-init.sh” } }

workspace: WorkspaceStorageInfo | None = None¶: destination needs to be provided. e.g. { “workspace” : { “destination” : “/Users/user1@databricks.com/my-init.sh” } }

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class LocalFileInfo¶

destination: str¶: local file destination, e.g. file:/my/local/file.sh

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class LogAnalyticsInfo¶

log_analytics_primary_key: str | None = None¶: The primary key for the Azure Log Analytics agent configuration

log_analytics_workspace_id: str | None = None¶: The workspace ID for the Azure Log Analytics agent configuration

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class MavenLibrary¶

coordinates: str¶: Gradle-style maven coordinates. For example: “org.jsoup:jsoup:1.7.2”.

exclusions: list[str]¶

List of dependences to exclude. For example: [“slf4j:slf4j”, “*:hadoop-client”].

Maven dependency exclusions: https://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html.

repo: str | None = None¶: Maven repo to install the Maven package from. If omitted, both Maven Central Repository and Spark Packages are searched.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class NotebookLibrary¶

path: str | None = None¶: The absolute path of the notebook.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class Notifications¶

alerts: list[str]¶

A list of alerts that trigger the sending of notifications to the configured destinations. The supported alerts are:

on-update-success: A pipeline update completes successfully.
on-update-failure: Each time a pipeline update fails.
on-update-fatal-failure: A pipeline update fails with a non-retryable (fatal) error.
on-flow-failure: A single data flow fails.

email_recipients: list[str]¶: A list of email addresses notified when a configured alert is triggered.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class Pipeline¶

catalog: str | None = None¶: A catalog in Unity Catalog to publish data from this pipeline to. If target is specified, tables in this pipeline are published to a target schema inside catalog (for example, catalog.`target`.`table`). If target is not specified, no data is published to Unity Catalog.

channel: str | None = None¶: DLT Release Channel that specifies which version to use.

clusters: list[PipelineCluster]¶: Cluster settings for this pipeline deployment.

configuration: dict[str, str]¶: String-String configuration for this pipeline execution.

continuous: bool | None = None¶: Whether the pipeline is continuous or triggered. This replaces trigger.

development: bool | None = None¶: Whether the pipeline is in Development mode. Defaults to false.

edition: str | None = None¶: Pipeline product edition.

event_log: EventLogSpec | None = None¶: Event log configuration for this pipeline

filters: Filters | None = None¶: Filters on which Pipeline packages to include in the deployed graph.

id: str | None = None¶: Unique identifier for this pipeline.

ingestion_definition: IngestionPipelineDefinition | None = None¶: The configuration for a managed ingestion pipeline. These settings cannot be used with the ‘libraries’, ‘schema’, ‘target’, or ‘catalog’ settings.

libraries: list[PipelineLibrary]¶: Libraries or code needed by this deployment.

name: str | None = None¶: Friendly identifier for this pipeline.

notifications: list[Notifications]¶: List of notification settings for this pipeline.

permissions: list[PipelinePermission]¶

photon: bool | None = None¶: Whether Photon is enabled for this pipeline.

schema: str | None = None¶: The default schema (database) where tables are read from or published to.

serverless: bool | None = None¶: Whether serverless compute is enabled for this pipeline.

storage: str | None = None¶: DBFS root directory for storing checkpoints and tables.

target: str | None = None¶: Target schema (database) to add tables in this pipeline to. Exactly one of schema or target must be specified. To publish to Unity Catalog, also specify catalog. This legacy field is deprecated for pipeline creation in favor of the schema field.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class PipelineCluster¶

apply_policy_default_values: bool | None = None¶: Note: This field won’t be persisted. Only API users will check this field.

autoscale: PipelineClusterAutoscale | None = None¶: Parameters needed in order to automatically scale clusters up and down based on load. Note: autoscaling works best with DB runtime versions 3.0 or later.

aws_attributes: AwsAttributes | None = None¶: Attributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values will be used.

azure_attributes: AzureAttributes | None = None¶: Attributes related to clusters running on Microsoft Azure. If not specified at cluster creation, a set of default values will be used.

cluster_log_conf: ClusterLogConf | None = None¶: The configuration for delivering spark logs to a long-term storage destination. Only dbfs destinations are supported. Only one destination can be specified for one cluster. If the conf is given, the logs will be delivered to the destination every 5 mins. The destination of driver logs is $destination/$clusterId/driver, while the destination of executor logs is $destination/$clusterId/executor.

custom_tags: dict[str, str]¶

Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS instances and EBS volumes) with these tags in addition to default_tags. Notes:

Currently, Databricks allows at most 45 custom tags
Clusters can only reuse cloud resources if the resources’ tags are a subset of the cluster tags

driver_instance_pool_id: str | None = None¶: The optional ID of the instance pool for the driver of the cluster belongs. The pool cluster uses the instance pool with id (instance_pool_id) if the driver pool is not assigned.

driver_node_type_id: str | None = None¶: The node type of the Spark driver. Note that this field is optional; if unset, the driver node type will be set as the same value as node_type_id defined above.

enable_local_disk_encryption: bool | None = None¶: Whether to enable local disk encryption for the cluster.

gcp_attributes: GcpAttributes | None = None¶: Attributes related to clusters running on Google Cloud Platform. If not specified at cluster creation, a set of default values will be used.

init_scripts: list[InitScriptInfo]¶: The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. If cluster_log_conf is specified, init script logs are sent to <destination>/<cluster-ID>/init_scripts.

instance_pool_id: str | None = None¶: The optional ID of the instance pool to which the cluster belongs.

label: str | None = None¶: A label for the cluster specification, either default to configure the default cluster, or maintenance to configure the maintenance cluster. This field is optional. The default value is default.

node_type_id: str | None = None¶: This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call.

num_workers: int | None = None¶

Number of worker nodes that this cluster should have. A cluster has one Spark Driver and num_workers Executors for a total of num_workers + 1 Spark nodes.

Note: When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual current number of workers. For instance, if a cluster is resized from 5 to 10 workers, this field will immediately be updated to reflect the target size of 10 workers, whereas the workers listed in spark_info will gradually increase from 5 to 10 as the new nodes are provisioned.

policy_id: str | None = None¶: The ID of the cluster policy used to create the cluster if applicable.

spark_conf: dict[str, str]¶: An object containing a set of optional, user-specified Spark configuration key-value pairs. See :method:clusters/create for more details.

spark_env_vars: dict[str, str]¶

An object containing a set of optional, user-specified environment variable key-value pairs. Please note that key-value pair of the form (X,Y) will be exported as is (i.e., export X=’Y’) while launching the driver and workers.

In order to specify an additional set of SPARK_DAEMON_JAVA_OPTS, we recommend appending them to $SPARK_DAEMON_JAVA_OPTS as shown in the example below. This ensures that all default databricks managed environmental variables are included as well.

Example Spark environment variables: {“SPARK_WORKER_MEMORY”: “28000m”, “SPARK_LOCAL_DIRS”: “/local_disk0”} or {“SPARK_DAEMON_JAVA_OPTS”: “$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true”}

ssh_public_keys: list[str]¶: SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class PipelineClusterAutoscale¶

max_workers: int¶: The maximum number of workers to which the cluster can scale up when overloaded. max_workers must be strictly greater than min_workers.

min_workers: int¶: The minimum number of workers the cluster can scale down to when underutilized. It is also the initial number of workers the cluster will have after creation.

mode: PipelineClusterAutoscaleMode | None = None¶: Databricks Enhanced Autoscaling optimizes cluster utilization by automatically allocating cluster resources based on workload volume, with minimal impact to the data processing latency of your pipelines. Enhanced Autoscaling is available for updates clusters only. The legacy autoscaling feature is used for maintenance clusters.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class PipelineClusterAutoscaleMode¶

Databricks Enhanced Autoscaling optimizes cluster utilization by automatically allocating cluster resources based on workload volume, with minimal impact to the data processing latency of your pipelines. Enhanced Autoscaling is available for updates clusters only. The legacy autoscaling feature is used for maintenance clusters.

ENHANCED = 'ENHANCED'¶

LEGACY = 'LEGACY'¶

class PipelineLibrary¶

file: FileLibrary | None = None¶: The path to a file that defines a pipeline and is stored in the Databricks Repos.

notebook: NotebookLibrary | None = None¶: The path to a notebook that defines a pipeline and is stored in the Databricks workspace.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class PipelinePermission¶

level: PipelinePermissionLevel¶

group_name: str | None = None¶

service_principal_name: str | None = None¶

user_name: str | None = None¶

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class PipelinePermissionLevel¶

CAN_MANAGE = 'CAN_MANAGE'¶

CAN_RUN = 'CAN_RUN'¶

CAN_VIEW = 'CAN_VIEW'¶

IS_OWNER = 'IS_OWNER'¶

class ReportSpec¶

destination_catalog: str | None = None¶: Required. Destination catalog to store table.

destination_schema: str | None = None¶: Required. Destination schema to store table.

destination_table: str | None = None¶: Required. Destination table name. The pipeline fails if a table with that name already exists.

source_url: str | None = None¶: Required. Report URL in the source system.

table_configuration: TableSpecificConfig | None = None¶: Configuration settings to control the ingestion of tables. These settings override the table_configuration defined in the IngestionPipelineDefinition object.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class S3StorageInfo¶

destination: str¶: S3 destination, e.g. s3://my-bucket/some-prefix Note that logs will be delivered using cluster iam role, please make sure you set cluster iam role and the role has write access to the destination. Please also note that you cannot use AWS keys to deliver logs.

canned_acl: str | None = None¶: (Optional) Set canned access control list for the logs, e.g. bucket-owner-full-control. If canned_cal is set, please make sure the cluster iam role has s3:PutObjectAcl permission on the destination bucket and prefix. The full list of possible canned acl can be found at http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl. Please also note that by default only the object owner gets full controls. If you are using cross account role for writing data, you may want to set bucket-owner-full-control to make bucket owner able to read the logs.

enable_encryption: bool | None = None¶: (Optional) Flag to enable server side encryption, false by default.

encryption_type: str | None = None¶: (Optional) The encryption type, it could be sse-s3 or sse-kms. It will be used only when encryption is enabled and the default type is sse-s3.

endpoint: str | None = None¶: S3 endpoint, e.g. https://s3-us-west-2.amazonaws.com. Either region or endpoint needs to be set. If both are set, endpoint will be used.

kms_key: str | None = None¶: (Optional) Kms key which will be used if encryption is enabled and encryption type is set to sse-kms.

region: str | None = None¶: S3 region, e.g. us-west-2. Either region or endpoint needs to be set. If both are set, endpoint will be used.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class SchemaSpec¶

destination_catalog: str | None = None¶: Required. Destination catalog to store tables.

destination_schema: str | None = None¶: Required. Destination schema to store tables in. Tables with the same name as the source tables are created in this destination schema. The pipeline fails If a table with the same name already exists.

source_catalog: str | None = None¶: The source catalog name. Might be optional depending on the type of source.

source_schema: str | None = None¶: Required. Schema name in the source database.

table_configuration: TableSpecificConfig | None = None¶: Configuration settings to control the ingestion of tables. These settings are applied to all tables in this schema and override the table_configuration defined in the IngestionPipelineDefinition object.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class TableSpec¶

destination_catalog: str | None = None¶: Required. Destination catalog to store table.

destination_schema: str | None = None¶: Required. Destination schema to store table.

destination_table: str | None = None¶: Optional. Destination table name. The pipeline fails if a table with that name already exists. If not set, the source table name is used.

source_catalog: str | None = None¶: Source catalog name. Might be optional depending on the type of source.

source_schema: str | None = None¶: Schema name in the source database. Might be optional depending on the type of source.

source_table: str | None = None¶: Required. Table name in the source database.

table_configuration: TableSpecificConfig | None = None¶: Configuration settings to control the ingestion of tables. These settings override the table_configuration defined in the IngestionPipelineDefinition object and the SchemaSpec.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class TableSpecificConfig¶

primary_keys: list[str]¶: The primary key of the table used to apply changes.

sequence_by: list[str]¶: The column names specifying the logical order of events in the source data. Delta Live Tables uses this sequencing to handle change events that arrive out of order.

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class VolumesStorageInfo¶

destination: str¶: Unity Catalog volumes file destination, e.g. /Volumes/catalog/schema/volume/dir/file

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

class WorkspaceStorageInfo¶

destination: str¶: workspace files destination, e.g. /Users/user1@databricks.com/my-init.sh

classmethod from_dict( value: dict, ) → Self¶

as_dict( self, ) → dict¶

Navigation

Related Topics

Pipelines¶

Classes¶