Skip to content

Caching Architecture

FlexFS employs a three-tier caching architecture to minimize latency and reduce the number of requests to object storage. Each tier serves a different access pattern and can be independently configured.

Three-tier cache hierarchy from L1 Memory through L2 Disk and L3 Proxy to Object Storage Three-tier cache hierarchy from L1 Memory through L2 Disk and L3 Proxy to Object Storage
TierLocationEvictionWritebackEdition
L1mount client memoryLRUNo (read cache)Both
L2mount client diskLRU (clean blocks)OptionalBoth
L3proxy server diskLRU (clean blocks)OptionalEnterprise

The L1 cache is an in-memory LRU (Least Recently Used) cache that sits at the top of the store pipeline, closest to the application. It caches processed (decompressed, decrypted) blocks, so cache hits avoid all processing overhead.

  • Capacity: Configurable via --memCapacity (in blocks). When set to 0 (the default), it is auto-sized to approximately 2.5% of total system RAM divided by the volume’s block size, with a minimum of 16 blocks.
  • Eviction: LRU. When the cache is full, the least recently used block is evicted and its buffer is returned to the pool.
  • Coalesced fetches: Concurrent requests for the same block are coalesced into a single downstream fetch using a singleflight mechanism. This prevents cache stampedes when many threads read the same file region simultaneously.
  • Prefetch integration: Prefetched blocks are inserted into the L1 cache. If a block is already cached, the prefetch is skipped.
  • Write-through: When a block is written (PutBlock), it is added to the L1 cache and simultaneously passed downstream.

On a host with 32 GiB of RAM and a 4 MiB block size:

  • 0.025 * 32 GiB / 4 MiB = 204 blocks (approximately 816 MiB of cached data)

The L2 cache persists blocks to local disk, surviving process restarts and providing much larger capacity than memory. It caches blocks in their processed form (after compression and encryption), meaning cache hits still avoid network I/O but do require decompression and decryption.

FlagDefaultDescription
--diskFolder/dev/shm/.flexfs-cache-<pid>Directory for cached block files
--diskQuota(empty — disabled)Maximum disk usage for the cache (e.g., 5%, 64M, 10G)
--diskMaxBlockSize131072 (128 KiB)Maximum processed block size to cache on disk. Blocks larger than this after compression/encryption bypass the disk cache. Set to 0 for no limit.
--diskWritebackfalseEnable writeback mode (see below)

The disk cache maintains an LRU list for clean blocks (blocks that have been persisted to object storage). When the cache needs space for new blocks:

  1. The oldest clean block is evicted from the LRU list.
  2. Its disk file is deleted.
  3. If no clean blocks remain (the cache is full of dirty blocks in writeback mode), the new block bypasses the cache entirely.

When --diskWriteback is enabled, the disk cache operates as a writeback cache for writes:

  1. The mount client writes the processed block to the disk cache.
  2. The write is acknowledged to the application immediately (low latency).
  3. A pool of background worker goroutines reads dirty blocks from disk and persists them to object storage asynchronously.
  4. After a dirty block is successfully persisted, it transitions to a clean state and becomes eligible for LRU eviction.
Writeback cache sequence showing write acknowledgment followed by background worker flushing to storage Writeback cache sequence showing write acknowledgment followed by background worker flushing to storage

Writeback mode is especially useful for:

  • Latency-sensitive workloads: Write latency is bounded by local disk speed rather than object storage round-trip time.
  • Bursty write patterns: The disk cache absorbs write bursts while background workers drain to storage at a sustainable rate.
  • On-premises deployments: When object storage is in a remote cloud region, writeback caching masks the network latency.

The number of writeback worker goroutines equals the maxBops setting (auto-sized based on CPU count if not set). Workers retry indefinitely on transient errors, with randomized backoff capped at 10 seconds.

Proxy groups provide a shared caching layer between mount clients and object storage. They function like a content delivery network (CDN) for block data.

  1. Group selection: When a mount client starts, it probes all proxy groups configured for its volume by sending a health-check request to the first address in each group. It selects the group with the lowest round-trip time (RTT), using a 125 ms timeout for the probe.

  2. Block routing: Within the selected group, blocks are distributed across proxy servers using rendezvous hashing (highest random weight hashing) with xxHash. This ensures that:

    • The same block always routes to the same proxy server (cache consistency).
    • Adding or removing a proxy server only redistributes a fraction of blocks (minimal cache disruption).
  3. Proxy-side caching: Each proxy server maintains its own disk cache. On a GET, if the block is cached locally, it is returned immediately. On a cache miss, the proxy fetches the block from object storage, caches it, and returns it to the client.

  4. Proxy-side writeback: Proxy servers can operate in writeback mode, where PUT requests are acknowledged as soon as the block is persisted to the proxy’s local disk. The proxy then asynchronously flushes the block to object storage.

  5. Graceful fallback: If the selected proxy group becomes unreachable (e.g., network failure), mount clients automatically bypass the proxy and communicate directly with object storage for 5 minutes before retrying the proxy.

Proxy groups are created and associated with volumes via configure.flexfs:

  • proxy-group: Defines a group with a provider, region, and comma-separated list of proxy server addresses.
  • volume-proxy-group: Associates a proxy group with a volume. A volume can have multiple proxy groups; the mount client selects the best one based on RTT.

Volumes have a maxProxied setting that limits how many blocks per file (by block index) are routed through proxies. Blocks beyond this index bypass the proxy and go directly to object storage. This is useful for large files where only the first portion benefits from proxy caching. When set to 0, all blocks are eligible for proxying.

In addition to the three read-cache tiers, mount.flexfs maintains an in-memory dirty block cache for write operations. This cache holds modified blocks that have not yet been synced to storage.

SettingDefaultDescription
--dirtyCapacityAuto-sized (0.5% RAM / block size, capped)Maximum number of dirty blocks in memory
--dirtyActiveAuto-sized (half of dirtyCapacity)Maximum number of dirty blocks actively syncing

Dirty blocks are flushed when:

  • The application calls fsync(), fdatasync(), or close()
  • The dirty cache reaches capacity (back-pressure flush)
  • The FUSE Flush operation is triggered

Mount.flexfs includes a block prefetcher that proactively loads blocks into the L1 cache before they are requested.

SettingDefaultDescription
--prefetchActiveAuto-sized (maxBops/2, capped at memCapacity/4)Maximum number of concurrent prefetch operations
--noPrefetchfalseDisable prefetching entirely

Prefetched blocks share the singleflight mechanism with regular reads, so a prefetch and a concurrent read for the same block result in a single downstream fetch.

All block buffers are managed by a reusable buffer pool to minimize memory allocation overhead and garbage collection pressure. The pool capacity is auto-sized based on the combined active prefetch and dirty-sync concurrency.

SettingDefaultDescription
--poolCapacityAuto-sized (prefetchActive + dirtyActive)Number of reusable block buffers in the pool

In addition to the userspace caches described above, mount.flexfs configures the Linux FUSE kernel module’s caching behavior:

SettingDefaultDescription
--attrValid3600 (1 hour)Seconds the kernel caches file attributes
--entryValid1Seconds the kernel caches directory entries

These kernel-level caches reduce the number of FUSE round trips for repeated stat() and directory lookups. They are separate from and additive to the block caching tiers.

FlexFS also uses event-driven inode invalidation: when another mount client modifies a file or directory, the metadata server pushes a notification to all other connected clients, which immediately invalidate the affected kernel cache entries. The TTL values above are therefore a fallback for inodes that have not been explicitly invalidated by a remote update.