Cloud303 leveraged AWS Batch as the ideal solution for their HPC workload, as the customer would only need to pay for resources used.
To start, a templatized Rosetta environment was created using Docker containers so the jobs could seamlessly scale. Then, multiple compute environments were deployed (for testing as well as production jobs). To optimize costs, S3 buckets were used to house data. To give faster access to storage and ensure data did not leave Solugen’s VPC, VPC endpoints were created.
One goal was to simplify Solugen’s experience as much as possible, so the data pipeline starts with the upload of an input file to S3. That file contains all the relevant instructions. The runtime will download the file, read the instructions and start the job based on those instructions. It is also possible to use the environment to spin up a single server (which picks up the job from S3, runs the job, then uploads the output artifact back to S3).
The more elegant solution, however, and the one that truly changed Solugen’s workflow, was the parallel computing solution that was designed. By leveraging the OpenMPI framework in the runtime environment, multiple nodes could be spun up by AWS Batch to process a single job. A number of instances could be spun up - one assigned as the master and the rest of them being worker nodes. The worker nodes would report their unique ID to the master and once the master had enough nodes to run the submitted job, it would run the Rosetta script while OpenMPI managed the computational distribution between the many worker nodes.
Building an ephemeral, distributed workload like this does have one significant challenge compared to a single server - storage. To solve that, Cloud303 incorporated an EFS file system to serve as a common storage solution. All worker nodes were mounted to the EFS share as a local drive so all artifacts produced by the cluster ended up in the same place when the nodes finished processing. Then the master node would compile the artifacts into a deliverable and upload them to an S3 bucket.