Skip to content

Storage Backends

FlexFS supports four cloud object storage backends behind a unified Store interface. Every backend implements the same operations — GetBlock, PutBlock, DeleteBlock, DeleteBlocks, IterateBlocks — so the rest of the system is agnostic to the underlying storage provider.

API CodeBackendSDK
s3Amazon S3 (and S3-compatible stores)AWS SDK for Go v2
gcsGoogle Cloud StorageGoogle Cloud Go client
azureAzure Blob StorageAzure SDK for Go
ociOracle Cloud Infrastructure Object StorageOCI Go SDK

The API code is specified when creating a block store via configure.flexfs (Enterprise) or during installation (Community).

All four backends share the same block-level semantics:

  • GetBlock: Download a single block by its store key.
  • PutBlock: Upload a single block. For S3, optionally enables server-side encryption (SSE-S3 with AES-256).
  • DeleteBlock / DeleteBlocks: Remove one or many blocks. S3 supports batch deletion of up to 1,000 objects per call; GCS, Azure, and OCI delete blocks individually in parallel (concurrency limit of 10).
  • IterateBlocks: List all objects under the volume’s prefix and invoke a callback for each. Used by the metadata server for block reconciliation (garbage collection of orphaned blocks).
  • NewBlockKey: Generate a timestamp-based key in the format {unixSeconds}_{nanoseconds}. This format enables chronological ordering and point-in-time auditing.

Each backend manages its own HTTP client, credential refresh, retry logic, and concurrency control (via a semaphore of maxBops slots).

Blocks are stored as objects with keys following this structure:

{prefix}/{inode}/{blockIndex}/{timestampKey}

For example, with prefix flexfs/vol-abc123, inode 42, block index 3, written at Unix timestamp 1711234567 with nanosecond offset 890123456:

flexfs/vol-abc123/42/3/1711234567_890123456

When the prefix contains the literal string partition, flexFS replaces it with a 16-bit binary hash derived from the inode number and block index:

hash := md5.Sum(fmt.Appendf(nil, "%d.%d", bid.Ino, bid.Idx))
hashString := fmt.Sprintf("%08b%08b", hash[0], hash[15])
prefix = strings.ReplaceAll(prefix, "partition", hashString)

This produces prefixes like flexfs/0110101110010011/42/3/1711234567_890123456, distributing objects across 65,536 possible prefix partitions. This is beneficial for storage backends that use key-prefix-based partitioning to scale throughput (notably S3, which partitions by prefix for request rate scaling).

Each backend supports multiple credential strategies, resolved in priority order:

  1. Static credentials: Access key ID and secret access key provided in the block store configuration (username/password fields).
  2. EC2 instance role: Automatic credential retrieval from the EC2 instance metadata service with a 30-minute expiry window.
  3. Local credentials: Falls back to ~/.aws/credentials or environment variables.

S3 also supports custom endpoints for S3-compatible stores (MinIO, Wasabi, Ceph RGW, etc.). When the endpoint does not end in amazonaws.com, path-style addressing is automatically enabled.

  1. Bearer token: Injected by the metadata service for delegated authentication.
  2. Service account JSON key: Provided as the password in the block store configuration.
  3. Application default credentials: Uses the standard Google Cloud credential chain.
  1. Bearer token: Used when a session token is present (delegated authentication from the metadata service).
  2. Shared key credential: Storage account name (username) and access key (password).
  3. Default Azure credential: Managed identity or environment-based credential chain.

The Azure endpoint is constructed as https://{storageAccount}.blob.core.windows.net/.

  1. Static configuration: OCI user OCID, tenancy, region, and private key provided in the block store configuration.
  2. Instance principal: Automatic credential retrieval from the OCI instance metadata service.
  3. Default config provider: Uses ~/.oci/config.

OCI uses a structured bucket format (JSON containing namespace and name fields) because OCI Object Storage requires a namespace in addition to the bucket name.

All four backends share the same resilience strategy:

  • Retry with backoff: Operations retry on transient failures (HTTP 429, 503) with randomized exponential backoff, capped at 5 seconds per retry. The maximum retry window is 24 hours.
  • Credential refresh: On HTTP 401 or 403 errors, the backend resets its client and re-acquires credentials. Client resets are rate-limited to at most once per second to avoid thundering-herd credential refreshes.
  • Rate limiting: HTTP 429 (Too Many Requests) and 503 (Service Unavailable) responses trigger a random sleep of 500-3000 ms before retrying.
  • Concurrency control: Each backend limits concurrent operations via a semaphore (maxBops slots), preventing the client from overwhelming the storage service.

When a mount client initializes its block store, multiple store layers are composed in a decorator pattern. From outermost (closest to the application) to innermost (closest to storage):

Store pipeline showing decorator layers from Timed through MemCached, Processed, DiskCached, Proxy, to Backend Store pipeline showing decorator layers from Timed through MemCached, Processed, DiskCached, Proxy, to Backend
  1. Timed: Logs round-trip times for each operation when store RTT logging is enabled.
  2. MemCached: LRU-based in-memory block cache. Concurrent fetches for the same block are coalesced via singleflight.
  3. Processed: Handles compression (LZ4, Snappy, zstd) and encryption (AES-256-GCM). On write: compress then encrypt. On read: decrypt then decompress.
  4. DiskCached: Persistent on-disk block cache with LRU eviction. Supports writeback mode where writes are acknowledged immediately and flushed to the downstream store asynchronously.
  5. Proxy: Routes block operations through a proxy group selected by lowest RTT. Falls back to the underlying backend store on proxy errors.
  6. Backend: The actual cloud storage implementation (S3, GCS, Azure, or OCI).

The disk cache is positioned between the Processed and Proxy layers. This means that when compression or encryption is enabled, disk-cached blocks are stored in their processed (compressed and/or encrypted) form, avoiding redundant processing on cache hits.