Storage Backends
FlexFS supports four cloud object storage backends behind a unified Store interface. Every backend implements the same operations — GetBlock, PutBlock, DeleteBlock, DeleteBlocks, IterateBlocks — so the rest of the system is agnostic to the underlying storage provider.
Supported backends
Section titled “Supported backends”| API Code | Backend | SDK |
|---|---|---|
s3 | Amazon S3 (and S3-compatible stores) | AWS SDK for Go v2 |
gcs | Google Cloud Storage | Google Cloud Go client |
azure | Azure Blob Storage | Azure SDK for Go |
oci | Oracle Cloud Infrastructure Object Storage | OCI Go SDK |
The API code is specified when creating a block store via configure.flexfs (Enterprise) or during installation (Community).
Common abstraction
Section titled “Common abstraction”All four backends share the same block-level semantics:
- GetBlock: Download a single block by its store key.
- PutBlock: Upload a single block. For S3, optionally enables server-side encryption (SSE-S3 with AES-256).
- DeleteBlock / DeleteBlocks: Remove one or many blocks. S3 supports batch deletion of up to 1,000 objects per call; GCS, Azure, and OCI delete blocks individually in parallel (concurrency limit of 10).
- IterateBlocks: List all objects under the volume’s prefix and invoke a callback for each. Used by the metadata server for block reconciliation (garbage collection of orphaned blocks).
- NewBlockKey: Generate a timestamp-based key in the format
{unixSeconds}_{nanoseconds}. This format enables chronological ordering and point-in-time auditing.
Each backend manages its own HTTP client, credential refresh, retry logic, and concurrency control (via a semaphore of maxBops slots).
Block key layout
Section titled “Block key layout”Blocks are stored as objects with keys following this structure:
{prefix}/{inode}/{blockIndex}/{timestampKey}For example, with prefix flexfs/vol-abc123, inode 42, block index 3, written at Unix timestamp 1711234567 with nanosecond offset 890123456:
flexfs/vol-abc123/42/3/1711234567_890123456Partition-based key distribution
Section titled “Partition-based key distribution”When the prefix contains the literal string partition, flexFS replaces it with a 16-bit binary hash derived from the inode number and block index:
hash := md5.Sum(fmt.Appendf(nil, "%d.%d", bid.Ino, bid.Idx))hashString := fmt.Sprintf("%08b%08b", hash[0], hash[15])prefix = strings.ReplaceAll(prefix, "partition", hashString)This produces prefixes like flexfs/0110101110010011/42/3/1711234567_890123456, distributing objects across 65,536 possible prefix partitions. This is beneficial for storage backends that use key-prefix-based partitioning to scale throughput (notably S3, which partitions by prefix for request rate scaling).
Authentication and credentials
Section titled “Authentication and credentials”Each backend supports multiple credential strategies, resolved in priority order:
Amazon S3
Section titled “Amazon S3”- Static credentials: Access key ID and secret access key provided in the block store configuration (username/password fields).
- EC2 instance role: Automatic credential retrieval from the EC2 instance metadata service with a 30-minute expiry window.
- Local credentials: Falls back to
~/.aws/credentialsor environment variables.
S3 also supports custom endpoints for S3-compatible stores (MinIO, Wasabi, Ceph RGW, etc.). When the endpoint does not end in amazonaws.com, path-style addressing is automatically enabled.
Google Cloud Storage
Section titled “Google Cloud Storage”- Bearer token: Injected by the metadata service for delegated authentication.
- Service account JSON key: Provided as the password in the block store configuration.
- Application default credentials: Uses the standard Google Cloud credential chain.
Azure Blob Storage
Section titled “Azure Blob Storage”- Bearer token: Used when a session token is present (delegated authentication from the metadata service).
- Shared key credential: Storage account name (username) and access key (password).
- Default Azure credential: Managed identity or environment-based credential chain.
The Azure endpoint is constructed as https://{storageAccount}.blob.core.windows.net/.
Oracle Cloud Infrastructure
Section titled “Oracle Cloud Infrastructure”- Static configuration: OCI user OCID, tenancy, region, and private key provided in the block store configuration.
- Instance principal: Automatic credential retrieval from the OCI instance metadata service.
- Default config provider: Uses
~/.oci/config.
OCI uses a structured bucket format (JSON containing namespace and name fields) because OCI Object Storage requires a namespace in addition to the bucket name.
Retry and resilience
Section titled “Retry and resilience”All four backends share the same resilience strategy:
- Retry with backoff: Operations retry on transient failures (HTTP 429, 503) with randomized exponential backoff, capped at 5 seconds per retry. The maximum retry window is 24 hours.
- Credential refresh: On HTTP 401 or 403 errors, the backend resets its client and re-acquires credentials. Client resets are rate-limited to at most once per second to avoid thundering-herd credential refreshes.
- Rate limiting: HTTP 429 (Too Many Requests) and 503 (Service Unavailable) responses trigger a random sleep of 500-3000 ms before retrying.
- Concurrency control: Each backend limits concurrent operations via a semaphore (
maxBopsslots), preventing the client from overwhelming the storage service.
Store pipeline
Section titled “Store pipeline”When a mount client initializes its block store, multiple store layers are composed in a decorator pattern. From outermost (closest to the application) to innermost (closest to storage):
- Timed: Logs round-trip times for each operation when store RTT logging is enabled.
- MemCached: LRU-based in-memory block cache. Concurrent fetches for the same block are coalesced via singleflight.
- Processed: Handles compression (LZ4, Snappy, zstd) and encryption (AES-256-GCM). On write: compress then encrypt. On read: decrypt then decompress.
- DiskCached: Persistent on-disk block cache with LRU eviction. Supports writeback mode where writes are acknowledged immediately and flushed to the downstream store asynchronously.
- Proxy: Routes block operations through a proxy group selected by lowest RTT. Falls back to the underlying backend store on proxy errors.
- Backend: The actual cloud storage implementation (S3, GCS, Azure, or OCI).
The disk cache is positioned between the Processed and Proxy layers. This means that when compression or encryption is enabled, disk-cached blocks are stored in their processed (compressed and/or encrypted) form, avoiding redundant processing on cache hits.