Files plugin
File operations against Databricks Unity Catalog Volumes. Supports listing, reading, downloading, uploading, deleting, and previewing files with built-in caching, retry, and timeout handling via the execution interceptor pipeline.
Key features:
- Multi-volume: Define named volumes (e.g.
uploads,exports) and access them independently - CRUD operations on Unity Catalog Volume files
- Streaming downloads with content-type resolution
- Inline raw serving with XSS-safe content type enforcement
- Upload size limits with streaming enforcement
- Automatic cache invalidation on write operations
- Custom content type mappings
- Per-user execution context (OBO)
- Access policies: Per-volume policy functions that gate read and write operations
Basic usage
import { createApp, files, server } from "@databricks/appkit";
await createApp({
plugins: [
server(),
files(),
],
});
Set DATABRICKS_VOLUME_* environment variables in your app.yaml (or .env). The plugin auto-discovers them at startup:
DATABRICKS_VOLUME_UPLOADS=/Volumes/catalog/schema/uploads
DATABRICKS_VOLUME_EXPORTS=/Volumes/catalog/schema/exports
That's it — no volumes config needed. The env var suffix becomes the volume key (lowercased):
| Environment variable | Volume key |
|---|---|
DATABRICKS_VOLUME_UPLOADS | uploads |
DATABRICKS_VOLUME_EXPORTS | exports |
Auto-discovery
The plugin scans process.env for keys matching DATABRICKS_VOLUME_* and registers each as a volume with default {} config. Env vars with an empty value or the bare DATABRICKS_VOLUME_ prefix (no suffix) are skipped.
Merge semantics: auto-discovered volumes are always merged with explicitly configured ones. Explicit config wins for per-volume overrides (e.g., maxUploadSize), while discovered-only volumes get default settings.
// Explicit overrides for uploads; exports is auto-discovered from env
files({
volumes: {
uploads: { maxUploadSize: 100_000_000 },
},
});
This produces two volumes (uploads with a 100 MB limit, exports with defaults), assuming both DATABRICKS_VOLUME_UPLOADS and DATABRICKS_VOLUME_EXPORTS are set.
Configuration
interface IFilesConfig {
/** Named volumes to expose. Each key becomes a volume accessor. */
volumes?: Record<string, VolumeConfig>;
/** Operation timeout in milliseconds. Overrides the per-tier defaults. */
timeout?: number;
/** Map of file extensions to MIME types (priority over built-in map). Inherited by all volumes. */
customContentTypes?: Record<string, string>;
/** Maximum upload size in bytes. Defaults to 5 GB. Inherited by all volumes. */
maxUploadSize?: number;
}
interface VolumeConfig {
/** Access policy for this volume. */
policy?: FilePolicy;
/** Maximum upload size in bytes for this volume. Overrides plugin-level default. */
maxUploadSize?: number;
/** Map of file extensions to MIME types for this volume. Overrides plugin-level default. */
customContentTypes?: Record<string, string>;
}
Per-volume overrides
Each volume inherits the plugin-level maxUploadSize and customContentTypes unless overridden:
files({
maxUploadSize: 5_000_000_000, // 5 GB default for all volumes
customContentTypes: { ".avro": "application/avro" },
volumes: {
uploads: { maxUploadSize: 100_000_000 }, // 100 MB limit for uploads only
exports: {}, // uses plugin-level defaults
},
});
Permission model
There are three layers of access control in the files plugin. Understanding how they interact is critical for securing your app:
┌─────────────────────────────────────────────────┐
│ Unity Catalog grants │
│ (WRITE_VOLUME on the SP — set at deploy time) │
├─────────────────────────────────────────────────┤
│ Execution identity │
│ HTTP routes → always service principal │
│ Programmatic → SP by default, asUser() for OBO │
├─────────────────────────────────────────────────┤
│ File policies │
│ Per-volume (action, resource, user) → boolean │
│ Only app-level gate for HTTP routes │
└─────────────────────────────────────────────────┘
- UC grants control what the service principal can do at the Databricks level. These are set at deploy time via
app.yamlresource bindings. The SP needsWRITE_VOLUME— the plugin declares this via resource requirements. - Execution identity determines whose credentials are used for the actual API call. HTTP routes always use the SP. The programmatic API uses SP by default but supports
asUser(req)for OBO. - File policies are application-level checks evaluated before the API call. They receive the requesting user's identity (from the
x-forwarded-userheader) and decide allow/deny. This is the only gate that distinguishes between users on HTTP routes.
Since HTTP routes always execute as the service principal, removing a user's UC WRITE_VOLUME grant has no effect on HTTP access — the SP's grant is what's used. Policies are how you restrict what individual users can do through your app.
File policies are new. Volumes without an explicit policy now default to publicRead(), which denies all write operations (upload, mkdir, delete). If your app relies on write access, set an explicit policy — for example files.policy.allowAll() — on each volume that needs it.
Access policies
Attach a policy to a volume to control which actions are allowed:
import { files } from "@databricks/appkit";
files({
volumes: {
uploads: { policy: files.policy.publicRead() },
},
});
Actions
Policies receive an action string. The full list, split by category:
| Category | Actions |
|---|---|
| Read | list, read, download, raw, exists, metadata, preview |
| Write | upload, mkdir, delete |
Built-in policies
| Helper | Allows | Denies |
|---|---|---|
files.policy.publicRead() | all read actions | all write actions |
files.policy.allowAll() | everything | nothing |
files.policy.denyAll() | nothing | everything |
Composing policies
Combine built-in and custom policies with three combinators:
files.policy.all(a, b)— AND: all policies must allow. Short-circuits on first denial.files.policy.any(a, b)— OR: at least one policy must allow. Short-circuits on first allow.files.policy.not(p)— Inverts a policy. For example,not(publicRead())yields a write-only policy (useful for ingestion/drop-box volumes).
// Read-only for regular users, full access for the service principal
files({
volumes: {
shared: {
policy: files.policy.any(
(_action, _resource, user) => !!user.isServicePrincipal,
files.policy.publicRead(),
),
},
},
});
Custom policies
FilePolicy is a function (action, resource, user) → boolean | Promise<boolean>, so you can inline arbitrary logic:
import { type FilePolicy, WRITE_ACTIONS } from "@databricks/appkit";
const ADMIN_IDS = ["admin-sp-id", "lead-user-id"];
const adminOnly: FilePolicy = (action, _resource, user) => {
if (WRITE_ACTIONS.has(action)) {
return ADMIN_IDS.includes(user.id);
}
return true; // reads allowed for everyone
};
files({
volumes: { reports: { policy: adminOnly } },
});
Enforcement
- HTTP routes: Policy checked before every operation. Denied →
403JSON response withPolicy denied "{action}" on volume "{volumeKey}". - Programmatic API: Policy checked on both
appkit.files("vol").list()(SP identity,isServicePrincipal: true) andappkit.files("vol").asUser(req).list()(user identity). Denied → throwsPolicyDeniedError. - No policy configured: Defaults to
files.policy.publicRead()— read actions are allowed, write actions are denied. A startup warning is logged encouraging you to set an explicit policy.
Custom content types
Override or extend the built-in extension → MIME map:
files({
volumes: { data: {} },
customContentTypes: {
".avro": "application/avro",
".ndjson": "application/x-ndjson",
},
});
Dangerous MIME types (text/html, text/javascript, application/javascript, application/xhtml+xml, image/svg+xml) are blocked to prevent stored-XSS when files are served inline via /raw.
HTTP routes
Routes are mounted at /api/files/*. All routes execute as the service principal. Policy enforcement checks user identity (from the x-forwarded-user header) before allowing operations — see Access policies.
| Method | Path | Query / Body | Response |
|---|---|---|---|
| GET | /volumes | — | { volumes: string[] } |
| GET | /:volumeKey/list | ?path (optional) | DirectoryEntry[] |
| GET | /:volumeKey/read | ?path (required) | text/plain body |
| GET | /:volumeKey/download | ?path (required) | Binary stream (Content-Disposition: attachment) |
| GET | /:volumeKey/raw | ?path (required) | Binary stream (inline for safe types, attachment for unsafe) |
| GET | /:volumeKey/exists | ?path (required) | { exists: boolean } |
| GET | /:volumeKey/metadata | ?path (required) | FileMetadata |
| GET | /:volumeKey/preview | ?path (required) | FilePreview |
| POST | /:volumeKey/upload | ?path (required), raw body | { success: true } |
| POST | /:volumeKey/mkdir | body.path (required) | { success: true } |
| DELETE | /:volumeKey | ?path (required) | { success: true } |
The :volumeKey parameter must match one of the configured volume keys. Unknown volume keys return a 404 with the list of available volumes.
Path validation
All endpoints that accept a path parameter enforce:
- Path is required (non-empty)
- Maximum 4096 characters
- No null bytes
Raw endpoint security
The /:volumeKey/raw endpoint serves files inline for browser display but applies security headers:
X-Content-Type-Options: nosniffContent-Security-Policy: sandbox- Unsafe content types (HTML, JS, SVG) are forced to download via
Content-Disposition: attachment
Execution defaults
Every operation runs through the interceptor pipeline with tier-specific defaults:
| Tier | Cache | Retry | Timeout | Operations |
|---|---|---|---|---|
| Read | 60 s | 3x | 30 s | list, read, exists, metadata, preview |
| Download | none | 3x | 30 s | download, raw |
| Write | none | none | 600 s | upload, mkdir, delete |
Retry uses exponential backoff with a 1 s initial delay.
The download timeout applies to the stream start, not the full transfer.
Cache isolation
Cache keys include the volume key, ensuring volumes have independent caches. For example, uploads:list and exports:list are cached separately.
Write operations (upload, mkdir, delete) automatically invalidate the cached list entry for the parent directory of the affected volume.
Programmatic API
The exports() API is a callable that accepts a volume key and returns a VolumeHandle. The handle exposes all VolumeAPI methods directly (service principal, logs a warning) and an asUser(req) method for OBO access (recommended).
// OBO access (recommended)
const entries = await appkit.files("uploads").asUser(req).list();
const content = await appkit.files("exports").asUser(req).read("report.csv");
// Service principal access (logs a warning encouraging OBO)
const entries = await appkit.files("uploads").list();
// Named accessor
const vol = appkit.files.volume("uploads");
await vol.asUser(req).list();
VolumeAPI methods
| Method | Signature | Returns |
|---|---|---|
list | (directoryPath?: string) | DirectoryEntry[] |
read | (filePath: string, options?: { maxSize?: number }) | string |
download | (filePath: string) | DownloadResponse |
exists | (filePath: string) | boolean |
metadata | (filePath: string) | FileMetadata |
upload | (filePath: string, contents: ReadableStream | Buffer | string, options?: { overwrite?: boolean }) | void |
createDirectory | (directoryPath: string) | void |
delete | (filePath: string) | void |
preview | (filePath: string) | FilePreview |
read()loads the entire file into memory as a string. Files larger than 10 MB (default) are rejected — usedownload()for large files, or pass{ maxSize: <bytes> }to override.
Path resolution
Paths can be absolute or relative:
- Absolute — starts with
/, must begin with/Volumes/(e.g./Volumes/catalog/schema/vol/data.csv) - Relative — prepended with the volume path resolved from the environment variable (e.g.
data.csv→/Volumes/catalog/schema/uploads/data.csv)
Path traversal (../) is rejected. If a relative path is used and the volume's environment variable is not set, an error is thrown.
The list() method with no arguments lists the volume root.
Types
// Re-exported from @databricks/sdk-experimental
type DirectoryEntry = files.DirectoryEntry;
type DownloadResponse = files.DownloadResponse;
interface FileMetadata {
/** File size in bytes. */
contentLength: number | undefined;
/** MIME content type of the file. */
contentType: string | undefined;
/** ISO 8601 timestamp of the last modification. */
lastModified: string | undefined;
}
interface FilePreview extends FileMetadata {
/** First portion of text content, or null for non-text files. */
textPreview: string | null;
/** Whether the file is detected as a text format. */
isText: boolean;
/** Whether the file is detected as an image format. */
isImage: boolean;
}
type FileAction =
| "list" | "read" | "download" | "raw"
| "exists" | "metadata" | "preview"
| "upload" | "mkdir" | "delete";
interface FileResource {
/** Relative path within the volume. */
path: string;
/** The volume key (e.g. `"uploads"`). */
volume: string;
/** Content length in bytes — only present for uploads. */
size?: number;
}
interface FilePolicyUser {
/** User ID from the `x-forwarded-user` header. */
id: string;
/** `true` when the caller is the service principal (direct SDK call, not `asUser`). */
isServicePrincipal?: boolean;
}
type FilePolicy = (
action: FileAction,
resource: FileResource,
user: FilePolicyUser,
) => boolean | Promise<boolean>;
interface VolumeConfig {
/** Access policy for this volume. */
policy?: FilePolicy;
/** Maximum upload size in bytes for this volume. */
maxUploadSize?: number;
/** Map of file extensions to MIME types for this volume. */
customContentTypes?: Record<string, string>;
}
interface VolumeAPI {
list(directoryPath?: string): Promise<DirectoryEntry[]>;
read(filePath: string, options?: { maxSize?: number }): Promise<string>;
download(filePath: string): Promise<DownloadResponse>;
exists(filePath: string): Promise<boolean>;
metadata(filePath: string): Promise<FileMetadata>;
upload(filePath: string, contents: ReadableStream | Buffer | string, options?: { overwrite?: boolean }): Promise<void>;
createDirectory(directoryPath: string): Promise<void>;
delete(filePath: string): Promise<void>;
preview(filePath: string): Promise<FilePreview>;
}
/** Volume handle: all VolumeAPI methods (service principal) + asUser() for OBO. */
type VolumeHandle = VolumeAPI & {
asUser: (req: Request) => VolumeAPI;
};
Content-type resolution
contentTypeFromPath(filePath, reported?, customTypes?) resolves a file's MIME type:
- Check
customContentTypesmap first (if configured). - Match the file extension against the built-in map.
- Fall back to the server-reported type, or
application/octet-stream.
Built-in extensions: .png, .jpg, .jpeg, .gif, .webp, .svg, .bmp, .ico, .html, .css, .js, .ts, .py, .txt, .md, .csv, .json, .jsonl, .xml, .yaml, .yml, .sql, .pdf, .ipynb, .parquet, .zip, .gz.
User context
HTTP routes always execute as the service principal — the SP's Databricks credentials are used for all API calls. User identity is extracted from the x-forwarded-user header and passed to the volume's access policy for authorization. This means UC grants on the SP (not individual users) determine what operations are possible, while policies control what each user is allowed to do through the app.
The programmatic API returns a VolumeHandle that exposes all VolumeAPI methods directly (service principal) and an asUser(req) method for OBO access. Calling any method without asUser() logs a warning encouraging OBO usage but does not throw. OBO access is strongly recommended for production use.
Resource requirements
Volume resources are declared dynamically via getResourceRequirements(config) based on discovered + configured volumes. Each volume key generates a required resource with WRITE_VOLUME permission and a DATABRICKS_VOLUME_{KEY_UPPERCASE} environment variable.
For example, if DATABRICKS_VOLUME_UPLOADS and DATABRICKS_VOLUME_EXPORTS are set, calling files() generates two required volume resources validated at startup — no explicit volumes config needed.
Error responses
All errors return JSON:
{
"error": "Human-readable message",
"plugin": "files"
}
| Status | Description |
|---|---|
| 400 | Missing or invalid path parameter |
| 403 | Policy denied "{action}" on volume "{volumeKey}" |
| 404 | Unknown volume key |
| 413 | Upload exceeds maxUploadSize |
| 500 | Operation failed (SDK, network, upstream, or unhandled error) |
Frontend components
The @databricks/appkit-ui package provides ready-to-use React components for building a file browser:
FileBrowser
A composable set of components for browsing, previewing, and managing files in a Unity Catalog Volume:
import {
DirectoryList,
FileBreadcrumb,
FilePreviewPanel,
} from "@databricks/appkit-ui/react";
function FileBrowserPage() {
return (
<div style={{ display: "flex", gap: 16 }}>
<div style={{ flex: 1 }}>
<FileBreadcrumb
rootLabel="uploads"
segments={["data"]}
onNavigateToRoot={() => {}}
onNavigateToSegment={() => {}}
/>
<DirectoryList
entries={[]}
onEntryClick={() => {}}
resolveEntryPath={(entry) => entry.path ?? ""}
/>
</div>
<FilePreviewPanel selectedFile={null} preview={null} />
</div>
);
}
See the Files (UC) components reference for the full props API.