HLA-on-aws

Upload your fastq, we will run HLA pipeline for your in aws.

Github: https://github.com/linnil1/hla-on-aws

Architecture

API

API logic

  1. I deploy my nuxt APP in Cloudflare
  2. APIGateway + lambda as API server
  3. lambda give user an ID and a temporary s3_url
  4. User upload fastq into s3
  5. lambda trigger step function
  6. lambda retrieve status for specific ID from s3 (I store status in s3 not in database XD)

step function

AWS Step function = data-pipeline

  1. lambda copy s3 object to EFS
  2. batch run hisat2 or bwakit
  3. lambda parse the result and upload to s3
  4. lambda set running status

Setup IAM by root user (In console)

I create a HLA user/group to run the aws command below.

The user has the right the create, list, update, write the settings, thus the permission is not minimal (Exclude IAM part, and IAM part is the only part we need to setup manually).

Here is the policy

  • AmazonEC2FullAccess
  • AmazonEC2ContainerRegistryFullAccess
  • AmazonS3FullAccess
  • AmazonAPIGatewayAdministrator
  • AWSBatchFullAccess
  • AmazonVPCFullAccess
  • AmazonElasticFileSystemFullAccess
  • AWSStepFunctionsFullAccess
  • AWSLambda_FullAccess
  • Permission_for_assigning_role_to_service(Inline Policy)
    ​​{
    ​​  "Version": "2012-10-17",
    ​​  "Statement": [
    ​​      {
    ​​          "Sid": "VisualEditor0",
    ​​          "Effect": "Allow",
    ​​          "Action": "iam:PassRole",
    ​​          "Resource": "*"
    ​​      }
    ​​  ]
    ​​}
    

AWS cli

After the user setup, you can get the access key and token in security credentials.

Then, you can configure your awscli(An AWS command line tool)

I recommend to set profile as default, otherwise you will add --profile=awshla for every command.

pip install awscli
aws configure --profile=awshla

Setup IAM role (In console)

Here is our Role

Lambda

Name: hla_lambda

  • AWSLambdaVPCAccessExecutionRole
  • Permission_for_trigger_stepfunctions(Inline Policy)
    ​​{
    ​​  "Version": "2012-10-17",
    ​​  "Statement": [
    ​​      {
    ​​          "Sid": "VisualEditor0",
    ​​          "Effect": "Allow",
    ​​          "Action": "states:StartExecution",
    ​​          "Resource": "arn:aws:states:us-east-2:493445452763:stateMachine:hla"
    ​​      }
    ​​  ]
    ​​}
    
  • Permission_for_readwrite_s3(Inline Policy)
    ​​{
    ​​  "Version": "2012-10-17",
    ​​  "Statement": [
    ​​      {
    ​​          "Sid": "VisualEditor0",
    ​​          "Effect": "Allow",
    ​​          "Action": [
    ​​              "s3:PutObject",
    ​​              "s3:GetObject"
    ​​          ],
    ​​          "Resource": "arn:aws:s3:::hla-bucket/*"
    ​​      }
    ​​  ]
    ​​}
    

Step Functions

Name: hla_step

  • AWSBatchServiceRole
  • AWSLambdaRole
  • Permission_for_run_batch(Inline Policy)(It's needed)
    ​​{
    ​​  "Version": "2012-10-17",
    ​​  "Statement": [
    ​​      {
    ​​          "Effect": "Allow",
    ​​          "Action": [
    ​​              "batch:SubmitJob",
    ​​              "batch:DescribeJobs",
    ​​              "batch:TerminateJob"
    ​​          ],
    ​​          "Resource": "*"
    ​​      },
    ​​      {
    ​​          "Effect": "Allow",
    ​​          "Action": [
    ​​              "events:PutTargets",
    ​​              "events:PutRule",
    ​​              "events:DescribeRule"
    ​​          ],
    ​​          "Resource": [
    ​​              "*"
    ​​          ]
    ​​      }
    ​​  ]
    ​​}
    

API gateway

Name: hla_api

  • AWSLambdaRole

Batch

Name: hla

  • AmazonECSTaskExecutionRolePolicy

Index prepare

The index data will save into EFS

bwakit

follow https://github.com/lh3/bwa/tree/master/bwakit#introduction

wget http://sourceforge.net/projects/bio-bwa/files/bwakit/bwakit-0.7.12_x64-linux.tar.bz2/download -O bwakit-0.7.12_x64-linux.tar.bz2
tar xf bwakit-0.7.12_x64-linux.tar.bz2
cd bwa.kit
dk quay.io/biocontainers/bwakit:0.7.17.dev1--0 run-gen-ref hs38DH
dk quay.io/biocontainers/bwakit:0.7.17.dev1--0 bwa index hs38DH.fa
mkdir bwakit_index
mv hs38* bwakit_index
tar zcf bwakit.tar.gz bwakit_index
cd ..

hisat2

cd hisat2

mkdir hisat_index_1
cd hisat_index_1
wget ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat-genotype/data/genotype_genome_20180128.tar.gz
tar xf genotype_genome_20180128.tar.gz
cd ..

git clone https://github.com/DaehwanKimLab/hisat-genotype.git
echo "2.2.1" >> hisat-genotype/hisat2/VERSION

docker build . -f Dockerfile_hisat2 -t linnil1/hisat2-conda
dk -e PYTHONPATH=hisat-genotype/hisatgenotype_modules  linnil1/hisat2-conda hisat-genotype/hisatgenotype -z hisat2_index_1/ --base hla -v --keep-alignment --keep-extract -1 hla-a.R1.fq.gz -2 hla-a.R2.fq.gz --out-dir result --threads 16

mkdir hisat_index
mv hisat2_index_1/hla* hisat2_index
mv hisat2_index_1/geno* hisat2_index
mv hisat2_index/genotype_genome_20180128.tar.gz hisat2_index_1
mkdir hisat2_index/grch38 hisat2_index/hisatgenotype_db
tar zcf hisat2.tar.gz hisat2_index
cd ..

s3

Create a s3 bucket for saving

  • fastq
  • HLA result
  • status
aws s3 mb s3://hla-bucket --region us-east-2\
aws s3 ls

EFS

Create two EFS

  • hla_index for saving index
  • hla_tmp for saving temporary data
aws efs create-file-system --tags Key=Name,Value=hla_index --encrypted
aws efs create-file-system --tags Key=Name,Value=hla_tmp --encrypted
aws efs describe-file-systems

ECR

Because samtools is not in hisat2 container, so we need to build a new docker Image.

(Change 493445452763.dkr.ecr.us-east-2.amazonaws.com to your URL)

# create repo
aws ecr create-repository --repository-name linnil1/hisat2_conda
aws ecr describe-repositories

# upload image
aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin 493445452763.dkr.ecr.us-east-2.amazonaws.com
docker tag linnil1/hisat2-conda 493445452763.dkr.ecr.us-east-2.amazonaws.com/linnil1/hisat2_conda:2.2.1
docker push 493445452763.dkr.ecr.us-east-2.amazonaws.com/linnil1/hisat2_conda:2.2.1

# check
aws ecr list-images --repository-name linnil1/hisat2_conda

Network

VPC

Create a private network (Must of aws services need to use it) and subnet also.

aws ec2 create-vpc --cidr-block 10.0.0.0/16  --tag-specifications ResourceType=vpc,Tags='[{Key=Name,Value="hla"}]'
aws ec2 describe-vpcs

aws ec2  modify-vpc-attribute --vpc-id vpc-0ffe39707fa04e0e6 --enable-dns-hostnames "{\"Value\": true}"
aws ec2  modify-vpc-attribute --vpc-id vpc-0ffe39707fa04e0e6 --enable-dns-support "{\"Value\": true}"

DNS support is important when mounting EFS

subnet

Create two subnet under the VPC (Change vpc-id to your own VPC)

aws ec2 describe-availability-zones

aws ec2 create-subnet --cidr-block 10.0.0.0/18 \
    --tag-specifications ResourceType=subnet,Tags='[{Key=Name,Value="hla-1"}]'  \
    --vpc-id vpc-0ffe39707fa04e0e6 \
    --availability-zone us-east-2b
aws ec2 create-subnet --cidr-block 10.0.64.0/18 \
    --tag-specifications ResourceType=subnet,Tags='[{Key=Name,Value="hla-2"}]'  \
    --vpc-id vpc-0ffe39707fa04e0e6 \
    --availability-zone us-east-2a

aws ec2 describe-subnets

Security group

Once you create VPC, aws will create a default security for you

aws ec2 describe-security-groups \
        --filters Name=vpc-id,Values=vpc-0ffe39707fa04e0e6

Routing

This allow instances in VPC to access internet.

internet-gateway

aws ec2 create-internet-gateway \
    --tag-specifications "ResourceType=internet-gateway,Tags=[{Key=Name,Value=hla_admin_internet}]"
aws ec2 attach-internet-gateway \
    --internet-gateway-id igw-0f479995f0feaeae9 \
    --vpc-id vpc-0ffe39707fa04e0e6
aws ec2 describe-internet-gateways

Routing table (add another rule in default route-table)

aws ec2 describe-route-tables
aws ec2 create-route \
    --route-table-id rtb-077a46c1b42cd1f8c \
    --destination-cidr-block 0.0.0.0/0 \
    --gateway-id igw-0f479995f0feaeae9
aws ec2 describe-route-tables

Mounting EFS

Allow NFS(port 2049)

Add security group for NFS port 2049

aws ec2  create-security-group \
    --group-name hla-efs \
    --description "EFS group" \
    --vpc-id vpc-0ffe39707fa04e0e6
aws ec2 authorize-security-group-ingress \
    --group-id sg-01b3f3fbfc5be9118 \
    --cidr 10.0.0.0/16 --port 2049 --protocol tcp

accessible in VPC

Allow ec2 or ecs(container) to access EFS, we need to set EFS under the same VPC, subnet

aws efs create-mount-target \
    --file-system-id fs-0b6fcc539fde3326d \
    --subnet-id subnet-0d2af03055f6c8198 \
    --security-groups sg-01b3f3fbfc5be9118
aws efs create-mount-target \
    --file-system-id fs-0b6fcc539fde3326d \
    --subnet-id subnet-08822fdba8b2a6572 \
    --security-groups sg-01b3f3fbfc5be9118

aws efs describe-mount-targets \
    --file-system-id fs-0b6fcc539fde3326d

aws efs create-mount-target \
    --file-system-id fs-02b3281e00a6df32a \
    --subnet-id subnet-0d2af03055f6c8198 \
    --security-groups sg-01b3f3fbfc5be9118
aws efs create-mount-target \
    --file-system-id fs-02b3281e00a6df32a \
    --subnet-id subnet-08822fdba8b2a6572 \
    --security-groups sg-01b3f3fbfc5be9118

Accessible in lambda

To give the lambda a permission to access EFS, we need to set the access point in EFS hla_tmp (hla_index is not needed for lambda to read)

aws efs create-access-point \
    --file-system-id fs-0b6fcc539fde3326d \
    --posix-user Uid=0,Gid=0
aws efs describe-access-points

lambda

Create many function for APIGateway and step function.

zip -jr hla_lambda.zip lambda

aws lambda create-function \
    --function-name hla_init \
    --role arn:aws:iam::493445452763:role/hla-lambda \
    --runtime python3.9 --architectures arm64 \
    --zip-file  fileb://hla_lambda.zip \
    --memory-size 1024 \
    --timeout 60 \
    --handler hla_init.main \
    --file-system-configs Arn=arn:aws:elasticfilesystem:us-east-2:493445452763:access-point/fsap-037456da3db417cbc,LocalMountPath=/mnt/data \
    --vpc-config SubnetIds=subnet-0d2af03055f6c8198,subnet-08822fdba8b2a6572,SecurityGroupIds=sg-0851d5b74a506b8e7
    
aws lambda create-function \
    --function-name hla_bwakit_result \
    --role arn:aws:iam::493445452763:role/hla-lambda \
    --runtime python3.9 --architectures arm64 \
    --zip-file  fileb://hla_lambda.zip \
    --file-system-configs Arn=arn:aws:elasticfilesystem:us-east-2:493445452763:access-point/fsap-037456da3db417cbc,LocalMountPath=/mnt/data \
    --vpc-config SubnetIds=subnet-0d2af03055f6c8198,subnet-08822fdba8b2a6572,SecurityGroupIds=sg-0851d5b74a506b8e7 \
    --handler hla_bwakit_result.main

aws lambda create-function \
    --function-name hla_hisat2_result \
    --role arn:aws:iam::493445452763:role/hla-lambda \
    --runtime python3.9 --architectures arm64 \
    --zip-file  fileb://hla_lambda.zip \
    --file-system-configs Arn=arn:aws:elasticfilesystem:us-east-2:493445452763:access-point/fsap-037456da3db417cbc,LocalMountPath=/mnt/data \
    --vpc-config SubnetIds=subnet-0d2af03055f6c8198,subnet-08822fdba8b2a6572,SecurityGroupIds=sg-0851d5b74a506b8e7 \
    --handler hla_hisat2_result.main

aws lambda create-function \
    --function-name hla_final \
    --role arn:aws:iam::493445452763:role/hla-lambda \
    --runtime python3.9 --architectures arm64 \
    --zip-file  fileb://hla_lambda.zip \
    --file-system-configs Arn=arn:aws:elasticfilesystem:us-east-2:493445452763:access-point/fsap-037456da3db417cbc,LocalMountPath=/mnt/data \
    --vpc-config SubnetIds=subnet-0d2af03055f6c8198,subnet-08822fdba8b2a6572,SecurityGroupIds=sg-0851d5b74a506b8e7 \
    --handler hla_final.main \
    --timeout 5

aws lambda create-function \
    --function-name hla_api \
    --role arn:aws:iam::493445452763:role/hla-lambda \
    --runtime python3.9 --architectures arm64 \
    --zip-file  fileb://hla_lambda.zip \
    --vpc-config SubnetIds=subnet-0d2af03055f6c8198,subnet-08822fdba8b2a6572,SecurityGroupIds=sg-0851d5b74a506b8e7 \
    --handler hla_api.main
    
aws lambda create-function \
    --function-name hla_set_method_status \
    --role arn:aws:iam::493445452763:role/hla-lambda \
    --runtime python3.9 --architectures arm64 \
    --zip-file  fileb://hla_lambda.zip \
    --vpc-config SubnetIds=subnet-0d2af03055f6c8198,subnet-08822fdba8b2a6572,SecurityGroupIds=sg-0851d5b74a506b8e7 \
    --handler hla_set_method_status.main

Access stepfuncitons and s3 in lambda

s3

aws ec2 create-vpc-endpoint \
    --service-name hla_lambda_s3_gateway \
    --vpc-id vpc-0ffe39707fa04e0e6 \
    --service-name com.amazonaws.us-east-2.s3 \
    --vpc-endpoint-type Gateway \
    --route-table-ids rtb-077a46c1b42cd1f8c

step function

aws ec2 create-vpc-endpoint \
    --service-name hla_lambda_stepfunction_gateway  \
    --vpc-id vpc-0ffe39707fa04e0e6 \
    --service-name com.amazonaws.us-east-2.states \
    --vpc-endpoint-type Interface \
    --subnet-ids '["subnet-0d2af03055f6c8198","subnet-08822fdba8b2a6572"]' \
    --security-group-ids=sg-0851d5b74a506b8e7

Developing lambda

You can change lambda code and then re-upload and testing

zip -jr hla_lambda.zip lambda
aws lambda update-function-code \
    --function-name hla_init \
    --zip-file  fileb://hla_lambda.zip
aws lambda invoke --function-name hla_init  --payload '{ "name": "test1" }' test.json && cat test.json | jq

EC2

Moving our index data to EFS by EC2 instances

Add ssh key and security group for port 22

aws ec2 create-key-pair --key-name hlakey | jq ".KeyMaterial" -r > hlakey.pem
aws ec2 create-security-group --group-name hla-admin --description "admin for HLA ec2"   --vpc-id vpc-0ffe39707fa04e0e6
aws ec2 authorize-security-group-ingress --group-id sg-01308e7f097c05a5c --cidr 0.0.0.0/0 --port 22 --protocol tcp 

Create t2.nano to write

aws ec2 run-instances \
    --image-id ami-0b614a5d911900a9b  \
    --instance-type t2.nano \
    --subnet-id subnet-08822fdba8b2a6572 \
    --key-name hlakey \
    --security-group-ids sg-01308e7f097c05a5c \
    --network-interfaces AssociatePublicIpAddress=true,DeviceIndex=0

aws ec2 describe-instances

You can run anything in ec2

# init
ssh -i hlakey.pem ec2-user@18.221.199.110
sudo yum install -y amazon-efs-utils
mkdir index
sudo mount -t efs -o tls fs-02b3281e00a6df32a:/ index
sudo chown ec2-user:ec2-user index
exit

# copy
scp -i hlakey.pem bwakit/bwa.kit/bwakit.tar.gz ec2-user@18.221.199.110:~/index/
scp -i hlakey.pem hisat2/hisat2.tar.gz  ec2-user@18.221.199.110:~/index/
scp -i hlakey.pem run_bwakit.sh ec2-user@18.221.199.110:~/index/


ssh -i hlakey.pem ec2-user@18.221.199.110
cd index
git clone https://github.com/DaehwanKimLab/hisat-genotype.git
echo "2.2.1" >> hisat-genotype/hisat2/VERSION
tar xf bwakit.tar.gz
rm bwakit.tar.gz
tar xf hisat2.tar.gz
rm hisat2.tar.gz
exit

# remember to stop it
# it's costly
aws ec2 stop-instances --instance-ids i-0dc46050bf7812889

Batch

Batch is the system to queue our job and run the job in container

# setup environemt to FARGATE_SPOT(cheapest solution)
aws batch create-compute-environment \
    --compute-environment-name hla_env \
    --type MANAGED \
    --compute-resources type=FARGATE_SPOT,maxvCpus=32,subnets=subnet-0d2af03055f6c8198,subnet-08822fdba8b2a6572,securityGroupIds=sg-0851d5b74a506b8e7
aws batch describe-compute-environments


# setup queue
aws batch create-job-queue  --job-queue-name hla_queue --priority 1 --compute-environment-order order=1,computeEnvironment=arn:aws:batch:us-east-2:493445452763:compute-environment/hla_env
aws batch describe-job-queues

Setup definition (hisat2 and bwakit)

aws batch register-job-definition \
    --cli-input-json file://job_bwakit.json
aws batch register-job-definition \
    --cli-input-json file://job_hisat2.json
aws batch describe-job-definitions --status ACTIVE

Developing batch

# no edit in definition, i will automatically add revision number
aws batch register-job-definition \
    --cli-input-json file://job_bwakit.json
# remove previous revision
aws batch deregister-job-definition \
    --job-definition hla-bwakit:2

aws batch submit-job \
    --job-name hla_test2 \
    --job-queue hla_queue \
    --job-definition hla-bwakit:2 \
    --parameters read1=/mnt/data/test1/test1.R1.fq.gz,read2=/mnt/data/test1/test1.R2.fq.gz,outputname=/mnt/data/test1/bwakit/test1
aws batch submit-job \
    --job-name hla_test4 \
    --job-queue hla_queue \
    --job-definition hla_hisat2:1 \
    --parameters read1=/mnt/data/test1/test1.R1.fq.gz,read2=/mnt/data/test1/test1.R2.fq.gz,output_folder=/mnt/data/test1/hisat2_1
aws batch list-jobs \
    --job-queue hla_queue \
    --job-status FAILED

Step Function

Create pipeline, state machine language is written in step_hla.json

aws stepfunctions create-state-machine \
    --name hla --role-arn "arn:aws:iam::493445452763:role/hla-step" \
    --definition "$(cat step_hla.json)"

aws stepfunctions list-state-machines
aws stepfunctions describe-state-machine \
    --state-machine-arn "arn:aws:states:us-east-2:493445452763:stateMachine:hla"

Developing stepfunction

But I recommend to read the result in console and using Workflow studio to write the language

aws stepfunctions update-state-machine \
    --state-machine-arn "arn:aws:states:us-east-2:493445452763:stateMachine:hla" \
    --definition "$(cat step_hla.json)"
aws stepfunctions start-execution \
    --state-machine-arn "arn:aws:states:us-east-2:493445452763:stateMachine:hla" \
    --input '{"name": "test1"}'
aws stepfunctions list-executions \
    --state-machine-arn "arn:aws:states:us-east-2:493445452763:stateMachine:hla"

APIGateway

The apigateway can

  • Associate path and method to lambda function
  • Add stage: In this project is /hla
  • Limit the API calling rate

https://docs.aws.amazon.com/apigateway/latest/developerguide/set-up-lambda-proxy-integrations.html

# create API
aws apigateway create-rest-api --name hla_api
aws apigateway get-rest-apis
aws apigateway get-resources \
    --rest-api-id oy1431r9p1

# create path and method
aws apigateway create-resource \
    --rest-api-id oy1431r9p1 \
    --parent-id cinl4m8ph3 \
    --path-part "{proxy+}"
aws apigateway get-resources \
    --rest-api-id oy1431r9p1
aws apigateway put-method \
    --rest-api-id oy1431r9p1 \
    --resource-id n1qb8d \
    --http-method ANY \
    --authorization-type None

# lambda
aws apigateway put-integration \
    --rest-api-id oy1431r9p1 \
    --resource-id n1qb8d \
    --http-method ANY \
    --type AWS_PROXY \
    --uri "arn:aws:apigateway:us-east-2:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-2:493445452763:function:hla_api/invocations" \
    --integration-http-method POST \
    --credentials "arn:aws:iam::493445452763:role/hla-api"
aws apigateway test-invoke-method \
    --rest-api-id oy1431r9p1 \
    --resource-id n1qb8d \
    --http-method POST \
    --path-with-query-string "/create"

## deploy: The url will become `https://oy1431r9p1.execute-api.us-east-2.amazonaws.com/hla`
aws apigateway create-deployment \
    --rest-api-id oy1431r9p1 \
    --stage-name hla
aws apigateway get-deployments \
    --rest-api-id oy1431r9p1

Deploy Frontend

I write the web interface by nuxt in web/

Edit wrnagler.toml and nuxt.config.ts to change hla.linnil1.me and AWS_API

dk -p 2002:3000 -p 2003:24678 node:17-alpine sh

yarn install
yarn global add @cloudflare/wrangler
wrangler publish

see https://hla.linnil1.me/

And new tools(kourami)

Kourami

Build the index in local

wget https://github.com/Kingsford-Group/kourami/releases/download/v0.9.6/kourami-0.9.6_bin.zip
unzip kourami-0.9.6_bin.zip
cd kourami-0.9.6
wget https://github.com/Kingsford-Group/kourami/releases/download/v0.9/kouramiDB_3.24.0.tar.gz
tar xf kouramiDB_3.24.0.tar.gz
dk quay.io/biocontainers/bwakit:0.7.17.dev1--0 bwa index db/All_FINAL_with_Decoy.fa.gz
bash ./scripts/download_grch38.sh hs38NoAltDH
dk quay.io/biocontainers/bwakit:0.7.17.dev1--0 bwa index ./resources/hs38NoAltDH.fa

mkdir kourami_index
mv db/* kourami_index/
mv resources/hs38NoAltDH.fa* kourami_index/
mv build/Kourami.jar kourami_index
tar czf kourami_index.tar.gz kourami_index
cd ..

aws

# copy index
scp -i hlakey.pem run_bwakit.sh ec2-user@18.221.199.110:~/index/
ssh -i hlakey.pem kourami-0.9.6/kourami_index.tar.gz ec2-user@18.221.199.110:~/index
cd index
tar xf kourami_index.tar.gz
rm kourami_index.tar.gz


# lambda
zip -jr hla_lambda.zip lambda
aws lambda create-function \
    --function-name hla_kourami_result \
    --role arn:aws:iam::493445452763:role/hla-lambda \
    --runtime python3.9 --architectures arm64 \
    --zip-file  fileb://hla_lambda.zip \
    --file-system-configs Arn=arn:aws:elasticfilesystem:us-east-2:493445452763:access-point/fsap-037456da3db417cbc,LocalMountPath=/mnt/data \
    --vpc-config SubnetIds=subnet-0d2af03055f6c8198,subnet-08822fdba8b2a6572,SecurityGroupIds=sg-0851d5b74a506b8e7 \
    --handler hla_kourami_result.main \
    --timeout 5

aws batch register-job-definition \
    --cli-input-json file://job_kourami_preprocess.json
aws batch register-job-definition \
    --cli-input-json file://job_kourami_main.json

testing

aws s3 cp hisat2/hla-a.R1.fq.gz s3://hla-bucket/test1.R1.fq.gz
aws s3 cp hisat2/hla-a.R2.fq.gz s3://hla-bucket/test1.R2.fq.gz


aws batch submit-job \
    --job-name hla_test6 \
    --job-queue hla_queue \
    --job-definition hla_kourami_preprocess:1 \
    --parameters bam=/mnt/data/test1/bwakit/test1.aln.bam,output_folder=/mnt/data/test1/kourami,kourami_panel=/mnt/index/kourami_index/All_FINAL_with_Decoy.fa.gz,kourami_hs38=/mnt/index/kourami_index/hs38NoAltDH.fa
    
    
aws batch submit-job \
    --job-name hla_test8 \
    --job-queue hla_queue \
    --job-definition hla_kourami:1 \
    --parameters bam=/mnt/data/test1/kourami/test1.aln.panel.bam,outputname=/mnt/data/test1/kourami/test1.aln.panel.kourami,kourami_db=/mnt/index/kourami_index,kourami_jar=/mnt/index/kourami_index/Kourami.jar
    
aws lambda invoke --function-name hla_kourami_result  --payload '{ "name": "test1" }' test.json && cat test.json | jq

tmp:

dk -e PYTHONPATH=hisat-genotype/hisatgenotype_modules  linnil1/hisat2-conda hisat-genotype/hisatgenotype -z hisat2_index_1/ --base hla -v --keep-alignment --keep-extract -1 ERR194147_1.fastq.gz -2 ERR194147_2.fastq.gz --out-dir result --threads 16
samtools view ERR194147_1_fastq_gz-hla-extracted-1_fq.bam "A*BACKBONE" -o hla-a.bam
samtools sort -n hla-a.bam -o hla-a.sort.bam
samtools fastq hla-a.sort.bam -1 hla.R1.fq.gz -2 hla.R2.fq.gz -0 /dev/null -s /dev/null
aws s3 cp hisat2/hla-a.R1.fq.gz s3://hla-bucket/test1.R1.fq.gz
aws s3 cp hisat2/hla-a.R2.fq.gz s3://hla-bucket/test1.R2.fq.gz

step function pipeline