Performance
Aggregate read throughput
Aggregate read throughput is argubly the most important performance metric for I/O bound data analysis clusters. In the following chart, we observe that flexFS easily outpaces EFS and FSx for Lustre even at relatively small cluster sizes. FlexFS was designed to support loaded clusters of over 1000 mounts.
Benchmark Specifications:
- Running DFS-Perf (other tools tested with similar results)
- 64 threads per host, each concurrently reading 2 GiB files
- Lustre 12k GiB provisioned, 12,000 MiB/s = $7,200/mo
- AWS Instance Type: c6gn.16xlarge
Real-world task (bioinformatics)
In the following chart, we can see the walk clock time and associated computing cost needed to perform a relatively mundane real-world task of filtering and converting the data for a chromosome of UK Biobank data. This is a classic example of an I/O bound task on a non-trivial reference data file.
Note that, while it is cheap to store the reference data file in S3, the time needed to complete a simple task against that file is much higher than with POSIX file systems, and that translates to much higher associated computing costs. Thus, for tasks such as this one, there is a much larger hidden compute cost associated with saving money on storage costs.