Scaling to 1000+ Mounts
This guide covers tuning recommendations for deployments with hundreds or thousands of concurrent mount clients accessing a single metadata server.
Metadata Server Sizing
Section titled “Metadata Server Sizing”The metadata server is the primary scaling bottleneck. Each mount client maintains a persistent RPC session.
Hardware Recommendations
Section titled “Hardware Recommendations”| Mount Count | CPU Cores | RAM | Storage |
|---|---|---|---|
| 100-500 | 4-8 | 16-32 GiB | SSD |
| 500-1000 | 8-16 | 32-64 GiB | NVMe SSD |
| 1000+ | 16+ | 64+ GiB | NVMe SSD |
Memory Tuning
Section titled “Memory Tuning”The metadata server’s database memory cache is controlled by the internal --dbMemCapacity flag (default 40% of system RAM). For high session counts, this is generally appropriate. Monitor the metadata server’s Prometheus metrics for cache hit rates.
Proxy Group Sizing
Section titled “Proxy Group Sizing”For large deployments, proxy groups reduce the load on object storage and improve read performance.
Sizing Guidelines
Section titled “Sizing Guidelines”- 2-4 proxy servers per group for up to 500 mount clients.
- 4-8 proxy servers per group for 500-2000 mount clients.
- Rendezvous hashing distributes blocks evenly across group members.
- Each proxy server should have fast local SSD storage sized to the active working set.
Mount Client Tuning
Section titled “Mount Client Tuning”FUSE Tuning
Section titled “FUSE Tuning”For high-throughput workloads with many concurrent readers:
- Ensure the default max_pages setting is used (do not set
--noMaxPages). - The default
--attrValid(3600 seconds) and--entryValid(1 second) values are appropriate for most workloads. Increase--entryValidfor read-heavy workloads where directory structure rarely changes.
Block Cache
Section titled “Block Cache”For compute nodes with available local disk:
mount.flexfs start my-volume /mnt/flexfs \ --diskFolder /local-ssd/cache \ --diskQuota 80%This reduces repeated reads from hitting the metadata or proxy layer.
Kernel Tuning
Section titled “Kernel Tuning”Network Tuning
Section titled “Network Tuning”For hosts running many mount clients or a metadata server handling many sessions:
# Increase connection tracking and socket bufferssysctl -w net.core.somaxconn=4096sysctl -w net.ipv4.tcp_max_syn_backlog=4096Deployment Patterns
Section titled “Deployment Patterns”Separate Volume Groups
Section titled “Separate Volume Groups”For very large deployments, split workloads across multiple volumes with separate metadata servers. This eliminates the single metadata server as a bottleneck:
- Volume A (team 1): meta-server-1, block-store-1
- Volume B (team 2): meta-server-2, block-store-2
Each volume can use the same or different proxy groups.
fstab-Based Mass Deployment
Section titled “fstab-Based Mass Deployment”For deploying mounts across many hosts, use fstab entries:
my-volume /mnt/flexfs flexfs _netdev,nofail 0 0Combined with configuration management (Ansible, Puppet, Chef), credential initialization and fstab entries can be rolled out to thousands of hosts.
Kubernetes
Section titled “Kubernetes”The CSI driver automatically manages mount lifecycle in Kubernetes. For large clusters, deploy the CSI node DaemonSet on all worker nodes and use PersistentVolumeClaims for pod access.
Monitoring at Scale
Section titled “Monitoring at Scale”Monitor these key metrics across all metadata servers:
- RPC operations per second: Tracks overall load.
- RPC latency percentiles: Detects degradation.
- Active sessions: Counts connected mount clients.
- Volume size gauges: Capacity planning.
Set up Prometheus alerting for:
- RPC latency p99 exceeding thresholds.
- Session count approaching known limits.
- Metadata server process restarts.