Skip to main content

Lakebase plugin

Provides a PostgreSQL connection pool for Databricks Lakebase Autoscaling with automatic OAuth token refresh.

Key features:

  • Standard pg.Pool compatible with any PostgreSQL library or ORM
  • Automatic OAuth token refresh (1-hour tokens, 2-minute refresh buffer)
  • Token caching to minimize API calls
  • Built-in OpenTelemetry instrumentation (query duration, pool connections, token refresh)
  • AppKit logger configured by default for query and connection events

Getting started with the Lakebase

The easiest way to get started with the Lakebase plugin is to use the Databricks CLI to create a new Databricks app with AppKit installed and the Lakebase plugin.

Prerequisites

Steps

  1. Firstly, create a new Lakebase Postgres Autoscaling project according to the Get started documentation.
  2. To add the Lakebase plugin to your project, run the databricks apps init command and interactively select the Lakebase plugin. The CLI will guide you through picking a Lakebase project, branch, and database.
    • When asked, select Yes to deploy the app to Databricks Apps right after its creation.

Basic usage

import { createApp, lakebase, server } from "@databricks/appkit";

await createApp({
plugins: [server(), lakebase()],
});

Accessing the pool

After initialization, access Lakebase through the AppKit.lakebase object:

const AppKit = await createApp({
plugins: [server(), lakebase()],
});

await AppKit.lakebase.query(`CREATE SCHEMA IF NOT EXISTS app`);

await AppKit.lakebase.query(`CREATE TABLE IF NOT EXISTS app.orders (
id SERIAL PRIMARY KEY,
user_id VARCHAR(255) NOT NULL,
amount DECIMAL(10, 2) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)`);

const result = await AppKit.lakebase.query(
"SELECT * FROM app.orders WHERE user_id = $1",
[userId],
);

// Raw pg.Pool (for ORMs or advanced usage)
const pool = AppKit.lakebase.pool;

// ORM-ready config objects
const ormConfig = AppKit.lakebase.getOrmConfig(); // { host, port, database, ... }
const pgConfig = AppKit.lakebase.getPgConfig(); // pg.PoolConfig

Configuration

Environment variables

The required environment variables are:

VariableDescription
LAKEBASE_ENDPOINTEndpoint resource path (e.g. projects/.../branches/.../endpoints/...)
PGHOSTLakebase host (auto-injected in production by the postgres Databricks Apps resource)
PGDATABASEDatabase name (auto-injected in production by the postgres Databricks Apps resource)
PGSSLMODETLS mode - set to require (auto-injected in production by the postgres Databricks Apps resource)

When deployed to Databricks Apps with a postgres database resource configured, PGHOST, PGDATABASE, PGSSLMODE, PGUSER, PGPORT, and PGAPPNAME are automatically injected by the platform. Only LAKEBASE_ENDPOINT must be set explicitly:

env:
- name: LAKEBASE_ENDPOINT
valueFrom: postgres

For local development, the .env file is automatically generated by databricks apps init with the correct values for your Lakebase project.

For the full configuration reference (SSL, pool size, timeouts, logging, ORM examples), see the @databricks/lakebase README.

Pool configuration

Pass a pool object to override any defaults:

await createApp({
plugins: [
lakebase({
pool: {
max: 10, // Max pool connections (default: 10)
connectionTimeoutMillis: 5000, // Connection timeout ms (default: 10000)
idleTimeoutMillis: 30000, // Idle connection timeout ms (default: 30000)
},
}),
],
});

Database Permissions

When you create the app with the Lakebase resource using the Getting started guide, the Service Principal is automatically granted CONNECT_AND_CREATE permission on the postgres resource. This lets the Service Principal connect to the database and create new objects, but not access any existing schemas or tables.

Local development

To develop locally against a deployed Lakebase database:

  1. Deploy the app first. The Service Principal creates the database schema and tables on first deploy. Apps generated from databricks apps init handle this automatically - they check if tables exist on startup and skip creation if they do.

  2. Grant databricks_superuser via the Lakebase UI:

    1. Open the Lakebase Autoscaling UI and navigate to your project's Branch Overview page.
    2. Click Add role (or Edit role if your OAuth role already exists).
    3. Select your Databricks identity as the principal and check the databricks_superuser system role.
  3. Run locally - your Databricks user identity (email) is used for OAuth authentication. The databricks_superuser role gives full DML access (read/write data) but not DDL (creating schemas or tables) - that's why deploying first matters (see note below).

For other users, use the same Add role flow in the Lakebase UI to create an OAuth role with databricks_superuser for each user.

tip

Postgres password authentication is a simpler alternative that avoids OAuth role permission complexity. However, it requires you to set up a password for the user in the Branch Overview page in the Lakebase Autoscaling UI.

Why deploy first?

When the app is deployed, the Service Principal creates schemas and tables and becomes their owner. A databricks_superuser has full DML access (SELECT, INSERT, UPDATE, DELETE) to these objects, but cannot run DDL (CREATE SCHEMA, CREATE TABLE) on schemas owned by the Service Principal. Deploying first ensures all objects exist before local development begins.

Fine-grained permissions

For most use cases, databricks_superuser is sufficient. If you need schema-level grants instead, refer to the official documentation:

SQL script for fine-grained grants

Deploy and run the app at least once before executing these grants so the Service Principal initializes the database schema first.

Replace subject with the user email and schema with your schema name:

CREATE EXTENSION IF NOT EXISTS databricks_auth;

DO $$
DECLARE
subject TEXT := 'your-subject'; -- User email like name@databricks.com
schema TEXT := 'your_schema'; -- Replace 'your_schema' with your schema name
BEGIN
-- Create OAuth role for the Databricks identity
PERFORM databricks_create_role(subject, 'USER');

-- Connection and schema access
EXECUTE format('GRANT CONNECT ON DATABASE "databricks_postgres" TO %I', subject);
EXECUTE format('GRANT ALL ON SCHEMA %s TO %I', schema, subject);

-- Privileges on existing objects
EXECUTE format('GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA %s TO %I', schema, subject);
EXECUTE format('GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA %s TO %I', schema, subject);
EXECUTE format('GRANT ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA %s TO %I', schema, subject);
EXECUTE format('GRANT ALL PRIVILEGES ON ALL PROCEDURES IN SCHEMA %s TO %I', schema, subject);

-- Default privileges on future objects
EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA %s GRANT ALL ON TABLES TO %I', schema, subject);
EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA %s GRANT ALL ON SEQUENCES TO %I', schema, subject);
EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA %s GRANT ALL ON FUNCTIONS TO %I', schema, subject);
EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA %s GRANT ALL ON ROUTINES TO %I', schema, subject);
END $$;