Performance

Aggregate Read Throughput

Aggregate read throughput is arguably the most important performance metric for I/O-bound data analysis clusters. The following chart compares flexFS against AWS EFS and FSx for Lustre across varying cluster sizes.

Benchmark configuration:

Tool: DFS-Perf (similar results observed with alternative tools)
Workload: 64 threads per host, each concurrently reading 2 GiB files
FSx for Lustre: 12,000 GiB provisioned, 12,000 MiB/s throughput ($7,200/mo)
Instance type: AWS c6gn.16xlarge

Real-World Task: Bioinformatics

To illustrate real-world performance, the following benchmark processes UK Biobank chromosome data — filtering and converting genetic data files using PLINK, a standard bioinformatics tool.

While S3 offers economical storage, processing tasks against S3 data incurs substantially higher execution time and associated computing expenses compared to POSIX filesystems. Optimizing for storage cost alone can dramatically increase computing costs — the total cost of ownership must account for both.