Build, test, and compare workflow systems on realistic data

WfCommons is an open-source ecosystem of workflow execution instances, synthetic workflow generators, and benchmark specifications. It helps the community study scheduling, performance, resilience, and emerging AI-driven workflow automation on modern distributed and HPC platforms.

   
Real instances Workflow executions curated in a common JSON format (WfFormat).
Synthetic realism Generate realistic workflows from real traces.
Benchmarks Produce executable specs for repeatable experiments and fair comparisons.

WfGen

Synthetic Workflow Traces

WfGen targets the generation of realistic synthetic workflow traces with a variety of characteristics. The workflow generator, provided as part of the WfCommons Python package, uses recipes of workflows for creating different synthetic workflows based on distributions of workflow job runtime, and input and output file sizes. The resulting workflows are represented in the WfFormat, which is already supported by simulation frameworks such as WRENCH.

The WfCommons Python package provides a number of workflow recipes for generating realistic synthetic workflow traces. Each recipe provides different methods for generating synthetic, yet realistic, workflow traces depending on the properties that define the structure of the actual workflow.

The current list of available workflow recipes include the following workflow applications:

  • 1000Genome: A high-throughput data-intensive bioinformatics workflow.
  • BLAST: A high-throughput compute-intensive bioinformatics workflow.
  • BWA: A high-throughput data-intensive bioinformatics workflow.
  • Cycles: A high-throughput compute-intensive scientific workflow for agroecosystems modeling.
  • Epigenomics: A high-throughput data-intensive bioinformatics workflow.
  • Montage: A high-throughput compute-intensive astronomy workflow.
  • Seismology: A high-throughput data-intensive seismology workflow.
  • SoyKB: A high-throughput data-intensive bioinformatics workflow.
  • SRASearch: A high-throughput data-intensive bioinformatics workflow.

WfGen is constantly evolving and additional workflow recipes will be added into new releases of the Python package.