Skip to content

Instantly share code, notes, and snippets.

@powderluv
powderluv / gist:bca1df6ad09a6946568c34894db92541
Created December 20, 2025 06:16
hipblasLT benchmark scripts
hipblaslt-bench --api_method c -m 768 -n 77 -k 768 --lda 768 --ldb 768 --ldc 768 --ldd 768 --stride_a 0 --stride_b 0 --stride_c 0 --stride_d 0 --alpha 1.000000 --beta 0.000000 --transA T --transB N --batch_count 1 --scaleA 0 --scaleB 0 --bias_vector --bias_source d --a_type f32_r --b_type f32_r --c_type f32_r --d_type f32_r --scale_type f32_r --bias_type f32_r --compute_type f32_r --algo_method index --solution_index 1270 --activation_type none --any_stride --rotating 0 --cold_iters 0 --iters 0
@powderluv
powderluv / Microsoft.PowerShell_profile.ps1
Created December 18, 2025 09:52
Powershell Profile PS1
# Stick this in C:\Users\<login>\OneDrive\Documents\PowerShell\
# Test-Path -Path $PROFILE
# New-Item -ItemType File -Path $PROFILE -Force
# code $PROFILE
$vswhere = "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe"
# Check if vswhere exists, if not you may need to adjust the path or install the VS installer
if (-not (Test-Path $vswhere)) {
Write-Host "vswhere.exe not found at the expected location."
# Provide a link to the official vswhere documentation if needed
}
@powderluv
powderluv / gist:f7e62938209dc3479ca86553df608bc1
Created February 20, 2025 03:11
Bert Fine Tune on 7900XTX
# Training data consist of X=(str, str), y=float:
from sklearn.model_selection import train_test_split
X = [
["Hello World!", "Good morning!"],
["It is raining", "It is cold"],
["Beautiful city beside mountain", "Quiet street in downtown area"],
["AI is the future", "AI is just a tool"],
["This application is great", "software is the problem"],
["Hello World!", "Good morning!"],
["It is raining", "It is cold"],
@powderluv
powderluv / MobileNet-v3-large-visualization.md
Created January 17, 2021 17:44 — forked from bjacob/MobileNet-v3-large-visualization.md
Matmul shapes in MobileNet-v3-large, EfficientNet-Lite2 and EfficientNet-B4, 8bit quantized, by decreasing CPU time % on Pixel4

Visualization of the MobileNet-v3-large shapes (ordered similarly by decreasing time percentage, so the most important shapes come first).

mobilenet-v3-large-matmuls-ordered

@powderluv
powderluv / tfcompile.ipynb
Created January 15, 2020 23:05 — forked from carlthome/tfcompile.ipynb
Example of how to use XLA AOT via tfcompile to build a Keras model into a shared library.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.