Skip to content

Instantly share code, notes, and snippets.

@parsapoorsh
Created December 25, 2025 14:18
Show Gist options
  • Select an option

  • Save parsapoorsh/ee7e182911ac1fb531efcab53ff6e180 to your computer and use it in GitHub Desktop.

Select an option

Save parsapoorsh/ee7e182911ac1fb531efcab53ff6e180 to your computer and use it in GitHub Desktop.
STREAM Memory Benchmark's Download, Build, and Execution script
#!/bin/bash
# STREAM Memory Benchmark's Download, Build, and Execution script
# Usage: ./run-stream-benchmark.sh
set -e
SOURCE_URL="https://raw.githubusercontent.com/jeffhammond/STREAM/refs/heads/master/stream.c"
SOURCE_FILE="./stream.c"
OUTPUT_PATH="./stream_native"
if ! command -v awk >/dev/null; then
echo "awk not found!"
exit 1
fi
L3_CACHE_MULTIPLIER=8
AVG_L3_CACHE_SIZE=$(awk '{sum += $1} END {print sum/NR * 1024}' /sys/devices/system/cpu/cpu*/cache/index3/size 2>/dev/null || echo 0)
hline() {
awk 'BEGIN{for(i=1;i<=61;i++)printf "-";print""}'
}
hline
echo "STREAM Memory Benchmark D.B.E script"
if [ "$AVG_L3_CACHE_SIZE" -eq 0 ]; then
hline
echo "WARNING -- Failed to read average L3 cache size, defaulting to 16 MiB."
AVG_L3_CACHE_SIZE=$((16 * 1024 * 1024))
fi
if [ ! -f "$SOURCE_FILE" ]; then
hline
echo "WARNING -- File $SOURCE_FILE is missing, downloading it..."
if ! command -v curl >/dev/null; then
echo "curl not found!"
exit 1
fi
if ! curl -fSLo "$SOURCE_FILE" "$SOURCE_URL"; then
echo "Download failed!"
if [ -f "$SOURCE_FILE" ]; then
rm -I "$SOURCE_FILE"
fi
exit 1
fi
if [ ! -f "$SOURCE_FILE" ]; then
echo "Download success but file $SOURCE_FILE is missing."
exit 1
fi
echo "Download successful!"
fi
hline
ARRAY_SIZE=$(awk "BEGIN {print $AVG_L3_CACHE_SIZE * $L3_CACHE_MULTIPLIER}")
AVG_L3_CACHE_SIZE_IN_MIB=$(awk "BEGIN {print $AVG_L3_CACHE_SIZE / 1024 / 1024}")
echo "Average L3 cache size is $AVG_L3_CACHE_SIZE_IN_MIB MiB."
ARRAY_SIZE_IN_MIB=$(awk "BEGIN {print $AVG_L3_CACHE_SIZE * $L3_CACHE_MULTIPLIER / 1024 / 1024}")
echo "Set STREAM_ARRAY_SIZE to $ARRAY_SIZE_IN_MIB MiB (x$L3_CACHE_MULTIPLIER of L3 cache)."
echo " Array size should be at least x4 of L3 cache size!"
hline
COMPILE_CMD=(
gcc
"$SOURCE_FILE"
-O3
-fopenmp
-march=native
-mcpu=native
-mtune=native
-mcmodel=medium
-DSTREAM_ARRAY_SIZE=$ARRAY_SIZE
-o "$OUTPUT_PATH"
)
echo "Compiler: $(${COMPILE_CMD[0]} --version | awk 'NR==1')"
echo "Compiling command:"
echo "${COMPILE_CMD[@]}"
time ( "${COMPILE_CMD[@]}" && echo -n "Compiled in:" )
if [ ! -f "$OUTPUT_PATH" ]; then
echo "Compiling failed!"
exit 1
fi
hline
echo "Executing $OUTPUT_PATH"
time ( "$OUTPUT_PATH" && echo -n "Benchmark time:" )
hline
@parsapoorsh
Copy link
Author

parsapoorsh commented Dec 25, 2025

$ ./run-stream-benchmark.sh 
-------------------------------------------------------------
STREAM Memory Benchmark D.B.E script
-------------------------------------------------------------
WARNING -- File ./stream.c is missing, downloading it...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  19968 100  19968   0      0  24236      0  --:--:-- --:--:-- --:--:--      0
Download successful!
-------------------------------------------------------------
Average L3 cache size is 16 MiB.
Set STREAM_ARRAY_SIZE to 128 MiB (x8 of L3 cache).
 Array size should be at least x4 of L3 cache size!
-------------------------------------------------------------
Compiler: gcc (Debian 15.2.0-11) 15.2.0
Compiling command:
gcc ./stream.c -O3 -fopenmp -march=native -mcpu=native -mtune=native -mcmodel=medium -DSTREAM_ARRAY_SIZE=134217728 -o ./stream_native
Compiled in:
real    0m0.112s
user    0m0.105s
sys     0m0.007s
-------------------------------------------------------------
Executing ./stream_native
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 134217728 (elements), Offset = 0 (elements)
Memory per array = 1024.0 MiB (= 1.0 GiB).
Total memory required = 3072.0 MiB (= 3.0 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 20
Number of Threads counted = 20
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 27568 microseconds.
   (= 27568 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           77770.3     0.027930     0.027613     0.029921
Scale:          67121.2     0.033634     0.031994     0.035443
Add:            69318.1     0.048086     0.046470     0.051488
Triad:          70218.8     0.047316     0.045874     0.048374
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
Benchmark time:
real    0m1.871s
user    0m28.026s
sys     0m1.232s
-------------------------------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment