Skip to main content

SRA Components Breakdown

This section outlines the core components of the Security Reference Architeture (SRA). Several .tf scripts contain direct links to the Databricks Terraform documention. The full Databricks Terraform Provider Documentation can be found here.

Network Configuration

Two network configurations are available for workspaces: Isolated or Custom:

  • Isolated (Default): Restricts all traffic from reaching the public internet. Communication is limited to AWS PrivateLink endpoints for AWS services and the Databricks control plane.

    Note: A Unity Catalog–only configuration is required for clusters running without internet access. Read the offical documentation for details.

  • Custom: Enables specification of a VPC ID, subnet IDs, security group IDs, and Databricks PrivateLink endpoint IDs. This option is recommended when networking assets are provisioned in separate pipelines or pre-assigned by a centralized infrastructure team.

Core AWS Components

  • Customer Managed VPC: A customer-managed VPC provides full control over network configurations to meet organizational cloud security and governance standards.
  • S3 Buckets: Three S3 buckets are created to support the following functionalities:
  • IAM Roles: Three IAM roles are created to support the following functionalities:
  • Scoped-down IAM Policy for the Databricks Cross-Account Role: A cross-account role is required for clusters provisioned within the classic compute plane. The role is scoped to the VPC, subnets, and security group associated with the deployment.
  • AWS VPC Endpoints for S3 Gateway, STS, and Kinesis: AWS PrivateLink is used to connect the VPC to AWS services without traversing public IP addresses. S3 Gateway, STS, and Kinesis endpoints are best practices for enterprise Databricks deployments. Additional endpoints such as those for AWS DynamoDB and AWS Glue can be configured based on your use cases.

    NOTE: In the Isolated network mode, restrictive VPC endpoint policies are applied for S3, STS, and Kinesis. These must be updated if additional access is required through the classic compute plane.

  • Back-end AWS PrivateLink Connectivity: Ensures private communication between the classic compute plane and the Databricks control plane (Back-end PrivateLink) via Databricks-specific interface VPC endpoints. Front-end PrivateLink, which keeps user traffic on the AWS backbone, is available but not included in these templates.
  • AWS KMS Keys: Three AWS KMS Keys are created for:

Core Databricks Components

  • Unity Catalog: Unity Catalog is a unified governance solution for data and AI assets such as files, tables, and machine learning models. Unity Catalog enforces fine-grained access controls, centralized policy management, auditing, and lineage tracking—all integrated into the Databricks workflow.
  • System Table Schemas: System Tables provide operational visibility across access, compute, Lakeflow, query, serving, and storage logs. These tables are located within the system catalog in Unity Catalog.
  • Audit Log Delivery: Enables low-latency delivery of Databricks audit logs to an S3 bucket within the customer's AWS account. Audit Logs capture both workspace-level and account-level events, with an option to enable verbose logging for more detailed event data.

    NOTE: Audit log delivery can only be configured twice per account. Once enabled, set audit_log_delivery_exists = true for subsequent runs.

  • Network Connectivity Configuration: Serverless network connectivity is managed with network connectivity configurations (NCC), which are account-level regional constructs that are used to manage private endpoints creation and firewall enablement at scale. An NCC is created and attached to the workspace, which contains a list of stable IP addresses, which will be used by the serverless compute in that workspace to connect to customer cloud resources.
  • Restrictive Network Policy: Network Policies implement egress controls for serverless compute by enforcing a restrictive network policy that permits outbound traffic only to required data buckets.
  • Example Classic Cluster: Includes a sample cluster and associated cluster policy to illustrate secure configuration patterns.

    NOTE: Deploying this example creates a cluster within the Databricks workspace, including the underlying AWS EC2 instance.