Skip to content

Capacity Planning

This guide helps you estimate storage requirements, configure quotas, and understand the flexFS billing model for capacity planning.

FlexFS storage usage has three components:

ComponentWhere storedSizing factors
Block dataCloud object storageTotal file data size (after compression). Directly proportional to the data written.
MetadataMetadata server local diskProportional to the number of files, directories, and their attributes.
CacheMount client / proxy local diskConfigurable via --diskQuota. Working set dependent.

Block data is the dominant storage cost. After compression, the actual storage used depends on the data type:

Data typeTypical compression ratio (LZ4)1 TB raw data stored as
Text / source code3-5x200-330 GB
Genomics (BAM)1.1-1.3x770 GB - 910 GB
Compressed files (gzip, zstd)1.0x (no benefit)~1 TB
Binary / random data1.0x~1 TB
Parquet / columnar data1.5-2x500 GB - 670 GB

When files are deleted or modified, old blocks are retired but may be retained for time-travel access during the retention period. Factor this into your storage estimates:

Total block storage = active blocks + retired blocks (within retention window)

Monitor retired blocks via the flexfs_meta_volume_blocks_retired metric.

Metadata storage scales with the number of filesystem objects, not with file data size.

Filesystem objectsApproximate metadata size
1 million files1-2 GB
10 million files10-20 GB
100 million files100-200 GB
1 billion files1-2 TB

Factors that increase metadata size per file:

  • Extended ACLs
  • Extended attributes
  • Long file names
  • Deep directory nesting (more directory entries)
  • Time-travel retention (historical versions of metadata)

Provision metadata server disk space at 2-3x the estimated metadata size to allow for:

  • Database compaction overhead
  • WAL (write-ahead log) files
  • Growth headroom

Monitor disk usage with the flexfs_meta_db_disk_usage_bytes and flexfs_meta_db_folder_disk_capacity_bytes metrics.

The disk cache holds recently accessed blocks on local disk. Size it based on your working set:

Working setRecommended --diskQuota
Small (< 10 GB active data)20-50 GB
Medium (10-100 GB active data)100-500 GB
Large (> 100 GB active data)500 GB - 1 TB+

Use NVMe SSDs for the cache folder when low latency is important.

Proxy server caches are shared across all mount clients in the proxy group. Size them based on the combined working set of all clients:

Proxy cache size >= working set of all clients / number of proxy servers

Enterprise volumes support quotas to limit resource usage:

Quota typeDescription
maxBlocksMaximum number of active blocks. Limits total data volume.
maxInodesMaximum number of inodes (files + directories). Limits total file count.
maxProxiedMaximum number of blocks that can be proxied. Limits proxy cache consumption.

Quotas are set during volume creation or update via configure.flexfs. When a quota is reached, write operations that would exceed the limit will fail.

Monitor quota usage via the flexfs_meta_volume_blocks, flexfs_meta_volume_inodes, and flexfs_meta_volume_size_bytes metrics.

Enterprise flexFS usage is metered on a GB-month basis:

Monthly cost = sum(volume_size_bytes * hours_active) / (1 GB * hours_in_month)

Key points:

  • Billing is based on the logical storage size of the volume.
  • Proxy cache and mount client cache do not count toward billed storage.
QuestionHow to answer
How much data will I store?Estimate total file sizes, apply compression ratio.
How many files will I have?Count files and directories for metadata sizing.
What is my working set?Identify the subset of data accessed frequently for cache sizing.
How long do I need time-travel?Set retention period; longer retention = more metadata and retired block storage.
What are my write patterns?High write rates need more dirty cache capacity and writeback tuning.