Skip to content

Performance

Aggregate read throughput is arguably the most important performance metric for I/O-bound data analysis clusters. The following chart compares flexFS against AWS EFS and FSx for Lustre across varying cluster sizes.

Aggregate read throughput comparison: flexFS vs EFS vs FSx for Lustre Aggregate read throughput comparison: flexFS vs EFS vs FSx for Lustre

Benchmark configuration:

  • Tool: DFS-Perf (similar results observed with alternative tools)
  • Workload: 64 threads per host, each concurrently reading 2 GiB files
  • FSx for Lustre: 12,000 GiB provisioned, 12,000 MiB/s throughput ($7,200/mo)
  • Instance type: AWS c6gn.16xlarge

To illustrate real-world performance, the following benchmark processes UK Biobank chromosome data — filtering and converting genetic data files using PLINK, a standard bioinformatics tool.

Wall clock time and computing cost for a bioinformatics task Wall clock time and computing cost for a bioinformatics task

While S3 offers economical storage, processing tasks against S3 data incurs substantially higher execution time and associated computing expenses compared to POSIX filesystems. Optimizing for storage cost alone can dramatically increase computing costs — the total cost of ownership must account for both.