Skip to main content

Getting Started

SRA Installation and Deployment Steps

Follow the steps below to deploy the Security Reference Architecture (SRA) using Terraform:

  1. Clone the SRA repository.
  2. Install Terraform.
  3. Navigate to the azure -> tf folder and open up the template.tfvars.example file.
    • Fill in the required values for all of the variables and relevant feature flags that are required for the deployment.
    • Rename the file to terraform.tfvars.
  4. From the terminal, ensure you are in the correct working directory for the tf folder.
  5. Run terraform init.
  6. Run terraform validate.
  7. Run terraform plan.
  8. Run terraform apply.

Provider Initialization with Azure CLI and Possible Errors

If using the Azure CLI Authentication, a possible error that can be encountered is:

Error: cannot create mws network connectivity config: io.jsonwebtoken.IncorrectClaimException: Expected iss claim to be: https://sts.windows.net/00000000-0000-0000-0000-000000000000/, but was: https://sts.windows.net/ffffffff-ffff-ffff-ffff-ffffffffffff/

This typically happens if you are running Terraform in a tenant where you are a guest user, or if you have multiple Azure accounts configured.

To resolve this error, set the Azure Tenant ID by exporting the ARM_TENANT_ID environment variable:

export ARM_TENANT_ID="00000000-0000-0000-0000-000000000000"

Alternatively, you can set the tenant ID directly in the Databricks provider configuration (see the provider documentation for details).

You may also encounter errors like the following when Terraform begins provisioning workspace resources:


│ Error: cannot read current user: Unauthorized access to Org: 0000000000000000

│ with module.sat[0].module.sat.data.databricks_current_user.me,
│ on .terraform/modules/sat.sat/terraform/common/data.tf line 1, in data "databricks_current_user" "me":
│ 1: data "databricks_current_user" "me" {}

To fix this error:

  1. Log in to the newly created spoke workspace by clicking Launch Workspace in the Azure portal.

  2. Ensure this is done as the same user running Terraform.

  3. Alternatively, grant workspace admin permissions to the Terraform user after the first user launches the workspace.

Critical Next Steps

The following steps outline essential security configurations that should be implemented after the initial deployment to further harden and operationalize the Databricks environment.

  • Implement a Front-End Mitigation Strategy:
    • IP Access Lists: IP Access Lists enhance security by providing control over which networks can connect to your Azure Databricks account and workspaces.
    • Front-End PrivateLink: Front-End PrivateLink establishes a private connection to the Databricks web application over the Azure backbone, preventing exposure to the public internet.

Additional Security Recommendations

  • Segment Workspaces for Data Separation: Use distinct workspaces for different teams or functions (e.g., security, marketing) to enforce data access boundaries and reduce risk exposure.
  • Avoid Storing Production Datasets in Databricks File Store (DBFS): The DBFS root is accessible to all users in a workspace. Use external storage locations for production data and databases to ensure proper access control and auditing.
  • Back Up Assets from the Databricks Control Plane: Regularly export and back up notebooks, jobs, and configurations using tools such as the Databricks Terraform Exporter.
  • Regularly Restart Classic Compute Clusters: Restart clusters periodically to ensure the latest compute images and security patches are applied. Databricks recommends that admins restart clusters manually during a scheduled maintenance window to minimze the risk of disrupting a scheduled job or workflows.
  • Implement a Tagging Strategy: Cluster and pool tags enable organizations to monitor costs and accurately attribute Databricks usage to specific business units or teams. These tags propagate to detailed DBU usage reports, supporting cost analysis and internal chargeback processes.
  • Integrate CI/CD and Code Management: Evaluate workflow needs for Git-based version control and CI/CD automation. Incorporate code scanning, permission enforcement, and secret detection to enhance governance and operational efficiency.
  • Run and Monitor the Security Analyis Tool (SAT): SAT analyzes your Databricks account and workspace configurations, providing recommendations to help you follow Databricks' security best practices.