Performance
Aggregate Read Throughput
Section titled “Aggregate Read Throughput”Aggregate read throughput is arguably the most important performance metric for I/O-bound data analysis clusters. The following chart compares flexFS against AWS EFS and FSx for Lustre across varying cluster sizes.
Benchmark configuration:
- Tool: DFS-Perf (similar results observed with alternative tools)
- Workload: 64 threads per host, each concurrently reading 2 GiB files
- FSx for Lustre: 12,000 GiB provisioned, 12,000 MiB/s throughput ($7,200/mo)
- Instance type: AWS c6gn.16xlarge
Real-World Task: Bioinformatics
Section titled “Real-World Task: Bioinformatics”To illustrate real-world performance, the following benchmark processes UK Biobank chromosome data — filtering and converting genetic data files using PLINK, a standard bioinformatics tool.
While S3 offers economical storage, processing tasks against S3 data incurs substantially higher execution time and associated computing expenses compared to POSIX filesystems. Optimizing for storage cost alone can dramatically increase computing costs — the total cost of ownership must account for both.