To help Interline overcome these challenges, Cloud303, an AWS Advanced Consulting Partner, was brought in to assist with the deployment of their HPC pipeline on AWS. Cloud303 delivered three accounts to Interline: a Pipeline Account to process all of the data required for their project, a Storage Account to store all of the files and metadata needed for further studies, and an Audit Account to centralize logging for auditing purposes. Additionally, a Root Account was set up as the main AWS Organizations account for users to login into all other accounts. This account also created the entry and exit point for all internet traffic inbound and outbound.
To facilitate the compute workload, Cloud303 deployed AWS Batch and used AWS Lambda functions to invoke the pipelines and process data. All of their files were stored in an S3 Bucket with Cross Region Replication to ensure a disaster recovery plan was in place. Amazon Redshift was used to store all of the metadata necessary for Interline, and Amazon FSx for Luster was used as scratch storage for the pipelines, storing temporary files and enabling parallel computing if required in the future.
To better support Interline's drug discovery process and handle the complexities of life sciences data, Cloud303 integrated proteomics tools such as Rosetta and Alphafold into the AWS-based HPC solution. This allowed Interline to systematically process and analyze large amounts of genomics and proteomics data, providing valuable insights into protein communities and their interactions.
By building a robust HPC pipeline on AWS, Interline was able to efficiently process and analyze life sciences data, accelerating their drug discovery efforts and enabling them to quickly develop new therapeutic candidates.
For the benchling pipeline, Cloud303 set up an Event Bus to push events from Benchling and trigger AWS Batch for the Ingestion stage. This process would take the transaction id to query the Benchling APIs that retrieved the uploaded metadata for the new DNA sequence, the file, and selected data. This process then stored this data in an S3 Bucket. Additionally, any recent metadata events were configured to trigger the AWS Lambda function that launched the AWS Batch pipeline for the processing stage. During this stage, all necessary data would be pulled from S3, using
the scripts provided by Interline run its computation, and then store the results into a folder named "result" within the same S3 bucket.
For any other manual data ingestion, Cloud303 provided Interline with an SFTP server using AWS Transfer Family. All connections to the SFTP server were made through a VPN Server deployed by Cloud303. All users could be added or removed using CloudFormation Templates. All credentials were stored in Secrets Manager, encrypted with custom Key Management Service (KMS) keys provided by Cloud303. These keys were configured with a rotation policy to rotate every 60 days. All logs produced by the SFTP server were pushed into the Audit account on an application logging bucket for retention to satisfy HIPPA Compliance.