WfGen

Synthetic Workflow Traces

WfGen targets the generation of realistic synthetic workflow traces with a variety of characteristics. The workflow generator, provided as part of the WfCommons Python package, uses recipes of workflows for creating different synthetic workflows based on distributions of workflow job runtime, and input and output file sizes. The resulting workflows are represented in the WfFormat, which is already supported by simulation frameworks such as WRENCH.

The WfCommons Python package provides a number of workflow recipes for generating realistic synthetic workflow traces. Each recipe provides different methods for generating synthetic, yet realistic, workflow traces depending on the properties that define the structure of the actual workflow.

The current list of available workflow recipes include the following workflow applications:

  • 1000Genome: A high-throughput data-intensive bioinformatics workflow.
  • BLAST: A high-throughput compute-intensive bioinformatics workflow.
  • BWA: A high-throughput data-intensive bioinformatics workflow.
  • Cycles: A high-throughput compute-intensive scientific workflow for agroecosystems modeling.
  • Epigenomics: A high-throughput data-intensive bioinformatics workflow.
  • Montage: A high-throughput compute-intensive astronomy workflow.
  • Seismology: A high-throughput data-intensive seismology workflow.
  • SoyKB: A high-throughput data-intensive bioinformatics workflow.
  • SRASearch: A high-throughput data-intensive bioinformatics workflow.

WfGen is constantly evolving and additional workflow recipes will be added into new releases of the Python package.