Capacity Planning

This guide helps you estimate storage requirements, configure quotas, and understand the flexFS billing model for capacity planning.

Storage components

FlexFS storage usage has three components:

Component	Where stored	Sizing factors
Block data	Cloud object storage	Total file data size (after compression). Directly proportional to the data written.
Metadata	Metadata server local disk	Proportional to the number of files, directories, and their attributes.
Cache	Mount client / proxy local disk	Configurable via `--diskQuota`. Working set dependent.

Block data estimates

Block data is the dominant storage cost. After compression, the actual storage used depends on the data type:

Data type	Typical compression ratio (LZ4)	1 TB raw data stored as
Text / source code	3-5x	200-330 GB
Genomics (BAM)	1.1-1.3x	770 GB - 910 GB
Compressed files (gzip, zstd)	1.0x (no benefit)	~1 TB
Binary / random data	1.0x	~1 TB
Parquet / columnar data	1.5-2x	500 GB - 670 GB

Retired blocks

When files are deleted or modified, old blocks are retired but may be retained for time-travel access during the retention period. Factor this into your storage estimates:

Total block storage = active blocks + retired blocks (within retention window)

Monitor retired blocks via the flexfs_meta_volume_blocks_retired metric.

Metadata estimates

Metadata storage scales with the number of filesystem objects, not with file data size.

Filesystem objects	Approximate metadata size
1 million files	1-2 GB
10 million files	10-20 GB
100 million files	100-200 GB
1 billion files	1-2 TB

Factors that increase metadata size per file:

Extended ACLs
Extended attributes
Long file names
Deep directory nesting (more directory entries)
Time-travel retention (historical versions of metadata)

Metadata server disk sizing

Provision metadata server disk space at 2-3x the estimated metadata size to allow for:

Database compaction overhead
WAL (write-ahead log) files
Growth headroom

Monitor disk usage with the flexfs_meta_db_disk_usage_bytes and flexfs_meta_db_folder_disk_capacity_bytes metrics.

Cache sizing

Mount client disk cache

The disk cache holds recently accessed blocks on local disk. Size it based on your working set:

Working set	Recommended `--diskQuota`
Small (< 10 GB active data)	20-50 GB
Medium (10-100 GB active data)	100-500 GB
Large (> 100 GB active data)	500 GB - 1 TB+

Use NVMe SSDs for the cache folder when low latency is important.

Proxy cache

Proxy server caches are shared across all mount clients in the proxy group. Size them based on the combined working set of all clients:

Proxy cache size >= working set of all clients / number of proxy servers

Volume quotas

Enterprise volumes support quotas to limit resource usage:

Quota type	Description
`maxBlocks`	Maximum number of active blocks. Limits total data volume.
`maxInodes`	Maximum number of inodes (files + directories). Limits total file count.
`maxProxied`	Maximum number of blocks that can be proxied. Limits proxy cache consumption.

Quotas are set during volume creation or update via configure.flexfs. When a quota is reached, write operations that would exceed the limit will fail.

Monitor quota usage via the flexfs_meta_volume_blocks, flexfs_meta_volume_inodes, and flexfs_meta_volume_size_bytes metrics.

Billing model

Enterprise flexFS usage is metered on a GB-month basis:

Monthly cost = sum(volume_size_bytes * hours_active) / (1 GB * hours_in_month)

Key points:

Billing is based on the logical storage size of the volume.
Proxy cache and mount client cache do not count toward billed storage.

Planning checklist

Question	How to answer
How much data will I store?	Estimate total file sizes, apply compression ratio.
How many files will I have?	Count files and directories for metadata sizing.
What is my working set?	Identify the subset of data accessed frequently for cache sizing.
How long do I need time-travel?	Set retention period; longer retention = more metadata and retired block storage.
What are my write patterns?	High write rates need more dirty cache capacity and writeback tuning.

Next steps

Performance tuning — optimize caches and block size
Backup and recovery
Metrics reference — monitor usage metrics