Last active
June 1, 2020 10:15
-
-
Save mkarmona/8c95fc6a4471e802ab160351adfbb5a4 to your computer and use it in GitHub Desktop.
Script to run the jobs for the benchmark
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import graph; | |
| usepackage("kpfonts"); | |
| size(10cm, 7cm, IgnoreAspect); | |
| scale(Linear, Log); | |
| real [] x1={32,48,64,80,96, 112, 128}; | |
| real [] y1={1451,532,504,343,288,271,250}; | |
| real [] x2={16,32,64,96}; | |
| real [] y2={1304,797,543,343}; | |
| real [] x12={16,32,48,64,80,96, 112, 128}; | |
| marker mark1=marker(scale(1mm)*polygon(3),black); | |
| marker mark2=marker(scale(1mm)*polygon(4),black); | |
| draw(legend="$\text{Cluster template }C_1: n \text{ nodes } \times 16 \text{ cores per node } = c$",graph(x1,y1), mark1,L="$C_1$"); | |
| draw(legend="$\text{Cluster template }C_2: 1 \text{ node } \times m \text{ cores per node } = c$",graph(x2,y2), mark2, L="$C_2$"); | |
| xaxis("Number of total cores $c$ per run",BottomTop, LeftTicks(x12)); | |
| yaxis("Elaspsed time $t$ seconds.", LeftRight, RightTicks); | |
| label(shift(5mm*N)*"Computational resources used vs time taken",point(NW),E); | |
| add(legend(),(point(S).x,truepoint(S).y),20S); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/bash | |
| worker_list="16 32 64 96" | |
| for work_run in $worker_list; do | |
| machine_cores=$work_run | |
| workers=0 | |
| job_prefix=$(uuidgen -r) | |
| cluster_name="etl-openfda-faers-test-${machine_cores}-${workers}" | |
| job_name="scaling-test-${job_prefix}-${machine_cores}-${workers}" | |
| job_conf="scaling-test-${machine_cores}-${workers}.conf" | |
| job_jar=gs://ot-snapshots/jarrod/io-opentargets-etl-backend-assembly-openfda-0.1.0.jar | |
| cat <<EOF > "${job_conf}" | |
| common.output = "gs://ot-snapshots/carmona/etl-openfda-faers-test-${machine_cores}-${workers}" | |
| fda.outputs = ["json"] | |
| EOF | |
| echo scaling test machine type $machine_cores with $workers workers | |
| (gcloud beta dataproc clusters create \ | |
| $cluster_name \ | |
| --image-version=1.5-debian10 \ | |
| --properties=yarn:yarn.nodemanager.vmem-check-enabled=false,spark:spark.debug.maxToStringFields=1024,spark:spark.master=yarn \ | |
| --single-node \ | |
| --master-machine-type=n1-highmem-$machine_cores \ | |
| --master-boot-disk-size=1000 \ | |
| --zone=europe-west1-d \ | |
| --project=open-targets-eu-dev \ | |
| --region=europe-west1 \ | |
| --initialization-action-timeout=20m \ | |
| --max-idle=10m && \ | |
| gcloud dataproc jobs submit spark \ | |
| --id=$job_name \ | |
| --cluster=$cluster_name \ | |
| --project=open-targets-eu-dev \ | |
| --region=europe-west1 \ | |
| --async \ | |
| --files=$job_conf \ | |
| --properties=spark.executor.extraJavaOptions=-Dconfig.file=$job_conf,spark.driver.extraJavaOptions=-Dconfig.file=$job_conf \ | |
| --jar=$job_jar) & | |
| done | |
| wait | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/bash | |
| # worker_list="2 3 4 5 6 7 8 10" | |
| worker_list="2" | |
| for work_run in $worker_list; do | |
| machine_cores=16 | |
| workers=$work_run | |
| job_prefix=$(uuidgen -r) | |
| cluster_name="etl-openfda-faers-test-${machine_cores}-${workers}" | |
| job_name="scaling-test-${job_prefix}-${machine_cores}-${workers}" | |
| job_conf="scaling-test-${machine_cores}-${workers}.conf" | |
| job_jar=gs://ot-snapshots/jarrod/io-opentargets-etl-backend-assembly-openfda-0.1.0.jar | |
| cat <<EOF > "${job_conf}" | |
| common.output = "gs://ot-snapshots/carmona/etl-openfda-faers-test-${machine_cores}-${workers}" | |
| fda.outputs = ["json"] | |
| EOF | |
| echo scaling test machine type $machine_cores with $workers workers | |
| (gcloud beta dataproc clusters create \ | |
| $cluster_name \ | |
| --image-version=1.5-debian10 \ | |
| --properties=yarn:yarn.nodemanager.vmem-check-enabled=false,spark:spark.debug.maxToStringFields=1024,spark:spark.master=yarn \ | |
| --master-machine-type=n1-highmem-$machine_cores \ | |
| --master-boot-disk-size=500 \ | |
| --num-secondary-workers=0 \ | |
| --worker-machine-type=n1-highmem-$machine_cores \ | |
| --num-workers=$workers \ | |
| --worker-boot-disk-size=500 \ | |
| --zone=europe-west1-d \ | |
| --project=open-targets-eu-dev \ | |
| --region=europe-west1 \ | |
| --initialization-action-timeout=20m \ | |
| --max-idle=10m && \ | |
| gcloud dataproc jobs submit spark \ | |
| --id=$job_name \ | |
| --cluster=$cluster_name \ | |
| --project=open-targets-eu-dev \ | |
| --region=europe-west1 \ | |
| --async \ | |
| --files=$job_conf \ | |
| --properties=spark.executor.extraJavaOptions=-Dconfig.file=$job_conf,spark.driver.extraJavaOptions=-Dconfig.file=$job_conf \ | |
| --jar=$job_jar) & | |
| done | |
| wait | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment