Skip to content

Instantly share code, notes, and snippets.

@mkarmona
Last active June 1, 2020 10:15
Show Gist options
  • Select an option

  • Save mkarmona/8c95fc6a4471e802ab160351adfbb5a4 to your computer and use it in GitHub Desktop.

Select an option

Save mkarmona/8c95fc6a4471e802ab160351adfbb5a4 to your computer and use it in GitHub Desktop.
Script to run the jobs for the benchmark
import graph;
usepackage("kpfonts");
size(10cm, 7cm, IgnoreAspect);
scale(Linear, Log);
real [] x1={32,48,64,80,96, 112, 128};
real [] y1={1451,532,504,343,288,271,250};
real [] x2={16,32,64,96};
real [] y2={1304,797,543,343};
real [] x12={16,32,48,64,80,96, 112, 128};
marker mark1=marker(scale(1mm)*polygon(3),black);
marker mark2=marker(scale(1mm)*polygon(4),black);
draw(legend="$\text{Cluster template }C_1: n \text{ nodes } \times 16 \text{ cores per node } = c$",graph(x1,y1), mark1,L="$C_1$");
draw(legend="$\text{Cluster template }C_2: 1 \text{ node } \times m \text{ cores per node } = c$",graph(x2,y2), mark2, L="$C_2$");
xaxis("Number of total cores $c$ per run",BottomTop, LeftTicks(x12));
yaxis("Elaspsed time $t$ seconds.", LeftRight, RightTicks);
label(shift(5mm*N)*"Computational resources used vs time taken",point(NW),E);
add(legend(),(point(S).x,truepoint(S).y),20S);
#!/bin/bash
worker_list="16 32 64 96"
for work_run in $worker_list; do
machine_cores=$work_run
workers=0
job_prefix=$(uuidgen -r)
cluster_name="etl-openfda-faers-test-${machine_cores}-${workers}"
job_name="scaling-test-${job_prefix}-${machine_cores}-${workers}"
job_conf="scaling-test-${machine_cores}-${workers}.conf"
job_jar=gs://ot-snapshots/jarrod/io-opentargets-etl-backend-assembly-openfda-0.1.0.jar
cat <<EOF > "${job_conf}"
common.output = "gs://ot-snapshots/carmona/etl-openfda-faers-test-${machine_cores}-${workers}"
fda.outputs = ["json"]
EOF
echo scaling test machine type $machine_cores with $workers workers
(gcloud beta dataproc clusters create \
$cluster_name \
--image-version=1.5-debian10 \
--properties=yarn:yarn.nodemanager.vmem-check-enabled=false,spark:spark.debug.maxToStringFields=1024,spark:spark.master=yarn \
--single-node \
--master-machine-type=n1-highmem-$machine_cores \
--master-boot-disk-size=1000 \
--zone=europe-west1-d \
--project=open-targets-eu-dev \
--region=europe-west1 \
--initialization-action-timeout=20m \
--max-idle=10m && \
gcloud dataproc jobs submit spark \
--id=$job_name \
--cluster=$cluster_name \
--project=open-targets-eu-dev \
--region=europe-west1 \
--async \
--files=$job_conf \
--properties=spark.executor.extraJavaOptions=-Dconfig.file=$job_conf,spark.driver.extraJavaOptions=-Dconfig.file=$job_conf \
--jar=$job_jar) &
done
wait
#!/bin/bash
# worker_list="2 3 4 5 6 7 8 10"
worker_list="2"
for work_run in $worker_list; do
machine_cores=16
workers=$work_run
job_prefix=$(uuidgen -r)
cluster_name="etl-openfda-faers-test-${machine_cores}-${workers}"
job_name="scaling-test-${job_prefix}-${machine_cores}-${workers}"
job_conf="scaling-test-${machine_cores}-${workers}.conf"
job_jar=gs://ot-snapshots/jarrod/io-opentargets-etl-backend-assembly-openfda-0.1.0.jar
cat <<EOF > "${job_conf}"
common.output = "gs://ot-snapshots/carmona/etl-openfda-faers-test-${machine_cores}-${workers}"
fda.outputs = ["json"]
EOF
echo scaling test machine type $machine_cores with $workers workers
(gcloud beta dataproc clusters create \
$cluster_name \
--image-version=1.5-debian10 \
--properties=yarn:yarn.nodemanager.vmem-check-enabled=false,spark:spark.debug.maxToStringFields=1024,spark:spark.master=yarn \
--master-machine-type=n1-highmem-$machine_cores \
--master-boot-disk-size=500 \
--num-secondary-workers=0 \
--worker-machine-type=n1-highmem-$machine_cores \
--num-workers=$workers \
--worker-boot-disk-size=500 \
--zone=europe-west1-d \
--project=open-targets-eu-dev \
--region=europe-west1 \
--initialization-action-timeout=20m \
--max-idle=10m && \
gcloud dataproc jobs submit spark \
--id=$job_name \
--cluster=$cluster_name \
--project=open-targets-eu-dev \
--region=europe-west1 \
--async \
--files=$job_conf \
--properties=spark.executor.extraJavaOptions=-Dconfig.file=$job_conf,spark.driver.extraJavaOptions=-Dconfig.file=$job_conf \
--jar=$job_jar) &
done
wait
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment