Skip to main content

Getting Started

SRA Installation and Deployment Steps

Follow the steps below to deploy the Security Reference Architecture (SRA) using Terraform:

  1. Clone the SRA repository.
  2. Install Terraform.
  3. Navigate to the gcp -> examples folder and choose the deployment model to be used (byo_gcp_workspace_deployment or simple_workspace_deployment) and open up the *.tfvars.example file.
    • Fill in the required values for all of the variables and relevant feature flags that are required for the deployment.
    • Rename the file to terraform.tfvars.
  4. From the terminal, ensure you are in the correct working directory for the tf folder.
  5. Run terraform init.
  6. Run terraform validate.
  7. Run terraform plan.
  8. Run terraform apply.

NOTE: When deploying the workspace module, you must set the DATABRICKS_GOOGLE_SERVICE_ACCOUNT environment variable to the Service Account email that will be used for authentication.

Example:

export DATABRICKS_GOOGLE_SERVICE_ACCOUNT=<<Your GCP Service Account Email>>

Critical Next Steps

The following steps outline essential security configurations that should be implemented after the initial deployment to further harden and operationalize the Databricks environment.

  • Implement a Front-End Mitigation Strategy:
    • IP Access Lists: IP Access Lists enhance security by providing control over which networks can connect to your GCP Databricks account and workspaces.
    • Front-End PrivateLink: Front-End Private Link establishes a private connection to the Databricks web application over the GCP backbone, preventing exposure to the public internet.

Additional Security recommendations

  • Segment Workspaces for Data Separation: Use distinct workspaces for different teams or functions (e.g., security, marketing) to enforce data access boundaries and reduce risk exposure.
  • Avoid Storing Production Datasets in Databricks File Store (DBFS): The DBFS root is accessible to all users in a workspace. Use external storage locations for production data and databases to ensure proper access control and auditing.
  • Back Up Assets from the Databricks Control Plane: Regularly export and back up notebooks, jobs, and configurations using tools such as the Databricks Terraform Exporter.
  • Regularly Restart Classic Compute Clusters: Restart clusters periodically to ensure the latest compute images and security patches are applied. Databricks recommends that admins restart clusters manually during a scheduled maintenance window to minimze the risk of disrupting a scheduled job or workflows.
  • Implement a Tagging Strategy: Cluster and pool tags enable organizations to monitor costs and accurately attribute Databricks usage to specific business units or teams. These tags propagate to detailed DBU usage reports, supporting cost analysis and internal chargeback processes.
  • Integrate CI/CD and Code Management: Evaluate workflow needs for Git-based version control and CI/CD automation. Incorporate code scanning, permission enforcement, and secret detection to enhance governance and operational efficiency.
  • Run and Monitor the Security Analyis Tool (SAT): SAT analyzes your Databricks account and workspace configurations, providing recommendations to help you follow Databricks' security best practices.