Data Flow
FlexFS separates metadata operations from block data operations. Metadata flows through an RPC protocol to the metadata server; block data flows over HTTPS REST to object storage (or through proxy servers). This page traces both paths in detail.
Read path
Section titled “Read path”When an application reads a file, the request passes through the kernel’s FUSE layer, into mount.flexfs, and then fans out to the metadata server and block storage.
Read path details
Section titled “Read path details”-
FUSE dispatch: The kernel delivers a
FUSE_READrequest containing the file handle, offset, and length. mount.flexfs translates the offset into a block index:blockIdx = offset / blockSize. -
Block key lookup: The mount client looks up the block’s storage key from a local cache of inode-to-block-key mappings. If the mapping is not cached, it issues an RPC to the metadata server to retrieve the current block keys for the inode.
-
Cache lookup: The block is looked up by its composite key (inode number, block index, storage key) in the L1 memory cache first, then the L2 disk cache. The L1 cache is an in-memory LRU with configurable capacity. The L2 disk cache uses an LRU eviction policy for clean blocks and can optionally operate in writeback mode.
-
Remote fetch: On a cache miss, the block is fetched from either a proxy server (if proxy groups are configured and reachable) or directly from object storage. Concurrent requests for the same block are coalesced into a single downstream fetch using a singleflight mechanism.
-
Processing pipeline: The raw block from storage is first decrypted (AES-256-GCM, if encryption is enabled), then decompressed (LZ4, Snappy, or zstd, depending on volume configuration). The result is a full-size block matching the volume’s configured block size.
-
Prefetching: When a sequential read pattern is detected, mount.flexfs prefetches subsequent blocks in the background. Prefetched blocks populate the cache before they are needed, reducing read latency for sequential workloads.
Write path
Section titled “Write path”Writes follow a similar but reversed flow, with dirty block management adding an additional layer.
Write path details
Section titled “Write path details”-
FUSE dispatch: The kernel delivers a
FUSE_WRITErequest. mount.flexfs determines the target block index and, for partial-block writes, reads the existing block first (read-modify-write). -
Dirty block cache: The modified block is stored in the in-memory dirty block cache. The FUSE write returns immediately, giving the application low-latency write acknowledgment. The dirty cache has a configurable capacity; when it fills, the oldest dirty blocks are flushed synchronously.
-
Sync trigger: Dirty blocks are flushed to storage on any of these events:
- Application calls
fsync()orfdatasync() - File handle is flushed (
close()) - Dirty cache capacity pressure
- Periodic background sync
- Application calls
-
Processing pipeline: Before upload, the block is compressed using the volume’s configured algorithm (LZ4 by default, or Snappy, zstd, or none). If encryption is enabled, the compressed block is then encrypted with AES-256-GCM. The processing order on write is: compress then encrypt. On read, it is reversed: decrypt then decompress.
-
Block key allocation: Each new version of a block gets a fresh timestamp-based key (formatted as
unixSeconds_nanoseconds). This allows flexFS to retain previous versions of blocks for the volume’s configured retention period, enabling point-in-time recovery. -
Upload: The processed block is uploaded to object storage (or through a proxy server). If local disk writeback caching is enabled, the block is written to the disk cache first and the upload happens asynchronously in the background, further reducing write latency.
-
Metadata update: After the block is persisted, mount.flexfs updates the metadata server with the new block key mapping for that inode and block index.
Block ID structure
Section titled “Block ID structure”Every block in object storage is identified by three components:
| Component | Description | Example |
|---|---|---|
Inode number (ino) | The file’s unique inode number | 42 |
Block index (idx) | The block’s position within the file (0-indexed) | 3 |
Storage key (key) | Timestamp-based key for versioning | 1711234567_890123456 |
These components are combined to form the object key in the storage bucket:
{prefix}/{ino}/{idx}/{key}For example: flexfs/vol-abc123/42/3/1711234567_890123456
When a prefix contains the string partition, flexFS replaces it with an MD5-based hash of the inode and block index to distribute objects across key prefixes. This improves throughput on storage backends that partition by key prefix.
Metadata RPC protocol
Section titled “Metadata RPC protocol”The mount client maintains a persistent TCP connection to the metadata server, communicating via a custom binary RPC protocol. Each RPC carries a request payload and returns a response with a status code. The protocol supports the full range of filesystem operations:
- Namespace: Lookup, Create, MkDir, MkNod, Symlink, Link, Rename, Unlink, RmDir
- Attributes: GetAttr, SetAttr
- Data: block key lookups and updates
- Directory: ReadDir, ReadDirPlus (paginated directory streams)
- Locking: GetLk, SetLk (both POSIX fcntl and BSD flock semantics)
- Extended attributes: GetXAttr, SetXAttr, ListXAttr, RemoveXAttr
- Access control: ACL checks, permission validation
- Session: Connect, Disconnect, StatFs
TLS is enabled by default on the RPC connection. It can be disabled for testing with the --noMetaSSL flag.