High Availability

This guide covers high availability considerations for each flexFS component. FlexFS is designed so that most component failures are either tolerated gracefully or recoverable with minimal downtime.

Admin Server

The admin server (admin.flexfs or free.flexfs) is queried at mount time for volume settings and periodically for auto-updates. It is not on the data path during normal filesystem operations.

Impact of failure: Active mounts continue operating. New mounts and credential initialization will fail. Auto-updates pause until the server returns.

Recommendations:

Run on a reliable host with systemd Restart=always configured.
Back up the admin database directory regularly.
The admin server is a single instance per deployment. For HA, use VM-level redundancy (e.g., auto-restart, warm standby).

Metadata Server

The metadata server (meta.flexfs) is on the critical path for all filesystem operations. Every file open, directory listing, and attribute lookup goes through it.

Impact of failure: Active mounts become unresponsive. Mount clients will retry and reconnect automatically when the server returns. No data is lost — all metadata is persisted to disk.

Recommendations:

Run on high-reliability infrastructure with local SSD storage.
Configure systemd with Restart=always.
Enable --sync for crash durability at the cost of write performance.
Back up the database folder regularly. The metadata database supports online backup.
Consider separate metadata servers for separate volume groups to limit blast radius.

Proxy Servers

Proxy servers (proxy.flexfs) are on the read path but are not required for correctness.

Impact of failure: Mount clients automatically fall back to direct object storage access. Reads continue without interruption, though with higher latency for uncached blocks.

Recommendations:

Deploy multiple proxy servers per proxy group for redundancy.
Rendezvous hashing redistributes blocks when a proxy leaves the group.
Dynamic membership allows adding replacement proxies without restarting mounts.
Monitor cache hit rates and disk usage via the proxy’s /metrics endpoint.

Mount Client

The mount client (mount.flexfs) runs as a daemon on each compute host.

Impact of failure: The mount point becomes stale (“transport endpoint not connected”). The mount.flexfs start command detects stale mounts and cleans them up automatically.

Recommendations:

Use fstab entries with _netdev and nofail for automatic remounting.
The auto-update mechanism performs seamless FUSE session handoff, so updates do not cause mount interruptions.
For containerized workloads, the CSI driver manages mount lifecycle automatically.

Object Storage

Object storage (S3, GCS, Azure Blob, OCI) provides the durability layer for block data.

Impact of failure: Reads and writes to blocks will fail. This is catastrophic but exceedingly rare — cloud object storage services offer 99.99%+ availability SLAs.

Recommendations:

Use the default storage class for your cloud provider.
Enable versioning on the bucket as a defense-in-depth measure.
Enable S3 server-side encryption (--sse) for at-rest protection.

Active-Passive Setup

For deployments requiring minimal downtime on the metadata server or admin server, an active-passive configuration can be used. In this setup, a standby instance is ready to take over if the primary fails. This approach applies to both meta.flexfs and admin.flexfs.

Shared storage approach

Both the active and passive nodes mount a shared block device (e.g., an EBS volume, Azure Managed Disk, or a SAN LUN) containing the service’s database folder. Only one node runs the service at a time. On failover, the shared storage is detached from the failed node, attached to the standby, and the service is started.

Replicated storage approach

Use a block-level replication solution (e.g., DRBD, or cloud-native disk replication) to mirror the database folder from the active node to the standby. On failover, the standby promotes its replica and starts the service.

Failover mechanics

In both approaches:

Stop the service on the failed node (or confirm it is down).
Ensure the standby node has access to the current database folder.
Start the service on the standby node with the same --bindAddr and credentials.
If the standby has a different IP address, update the relevant address record:
- For meta.flexfs: configure.flexfs update meta-store <id> --address <new-address>
- For admin.flexfs: update the adminAddr in the credentials files of dependent services (meta.flexfs, configure.flexfs, and mount clients)
Active mount clients will reconnect automatically once the services are reachable.

Fencing and STONITH

In an active-passive cluster, it is critical to ensure that the failed node is truly stopped before the standby takes over. Without proper fencing, a “split-brain” scenario can occur where both nodes access the database simultaneously, leading to corruption.

Use a fencing mechanism to guarantee that the failed node is powered off or isolated before failover:

STONITH (Shoot The Other Node In The Head): Cluster managers like Pacemaker/Corosync support STONITH agents that forcibly power off or reset the failed node via IPMI, cloud provider APIs (e.g., AWS EC2 stop-instances, Azure vm deallocate), or PDU power control.
Storage fencing: With shared block devices, use SCSI persistent reservations or cloud-level disk detach operations to ensure the failed node cannot write to the shared storage after failover.

Recovery Procedures

Metadata Server Recovery

Stop the metadata service: manage.flexfs stop meta
Restore the database folder from backup.
Start the metadata service: manage.flexfs start meta
Active mounts will reconnect automatically.

Full Cluster Recovery

Start the admin server first: manage.flexfs start admin
Start metadata servers: manage.flexfs start meta
Start proxy servers: manage.flexfs start proxy
Remount on clients: mount.flexfs start <name> <mount-point> or use update.flexfs --mount