Skip to content

Instantly share code, notes, and snippets.

View lhl's full-sized avatar

Leonard lhl

View GitHub Profile
@lhl
lhl / JA-MT-Harness-vs-Swallow-Evaluation.md
Last active December 19, 2025 10:30
Analysis of difference between JA MT-Bench Tests

This is a comparison between https://github.com/shisa-ai/ja-mt-bench-harness which aims to be faithful to the original JA MT-Bench and the version that is used in Swallow Evalulation Instruct v202510 https://github.com/swallow-llm/swallow-evaluation-instruct/releases/tag/v202510

JA-MT-Harness vs Swallow-Evaluation (JA MT-Bench)

High-level summary

The two frameworks both use an OpenAI-compatible API, but they run and score JA MT‑Bench in materially different ways. The FastChat-based harness is closer to the original MT‑Bench pipeline (question file layout, judge prompts, and single-sample judging), while Swallow’s lighteval task intentionally modifies the evaluation: Japanese-enforced judge prompts, a Japanese system prompt for model generation, multi-sample averaging (N=5), output truncation by character length, different judge model, and additional metrics. These differences alone can easily move scores by multiple points.

Key takeaways:

  • Prompting and judging are different (language constraint
@lhl
lhl / ANALYSIS-gpus.md
Created October 24, 2025 06:56
Codex 5 High analysis of GGML CUDA paths

GGML CUDA/HIP Inference Paths and Precision by Architecture

This document summarizes how ggml’s CUDA/HIP backend executes inference on different GPU families, which code paths are used, and at what numeric precision the major compute happens. It also provides rough workload composition percentages to relate paths to each architecture’s FLOPS/TOPs.

References are to files under ggml/src/ggml-cuda unless noted.

  • Matmul (quantized): mmq.cu, mmq.cuh, vecdotq.cuh, quantize.cu/.cuh
  • Matmul (float): mmf.cu, mmvf.cu, cuBLAS/hipBLAS calls in ggml-cuda.cu
  • FlashAttention: fattn*.cu/.cuh
  • Softmax: softmax.cu
@lhl
lhl / power-usage.py
Created January 13, 2025 05:58
2025-01 vLLM/Llama 3.3 70B FP8 tokens/joule
# Power Usage Calculator for AI Workloads
'''
# Serving
$ vllm serve meta-llama/Llama-3.3-70B-Instruct --tensor-parallel-size 4 --num-scheduler-steps 20 --quantization=fp8 --gpu-memory-utilization=0.97
INFO 01-13 04:59:05 api_server.py:712] vLLM API server version 0.6.6.post2.dev5+g5ce4627a
# Benchmark - we do bs=64 to emulate https://arxiv.org/pdf/2310.03003
cmd = [
"python", os.path.expanduser("~/vllm/benchmarks/benchmark_serving.py"),
@lhl
lhl / HOWTO.md
Last active July 19, 2018 12:22
How to configure NGINX with LetsEncrypt using the simp_le client

How to configure NGINX with LetsEncrypt using the simp_le client.

this includes the nginx configs, as well as the auto renewal steps. I took a bunch of these steps from this blog, and adapted it to how I like.

simp_le issues three return codes depending on the status of the request.

  • 0 if certificate data was created or updated;
  • 1 if renewal not necessary;
  • 2 in case of errors.
! http://crunchbang.org/forums/viewtopic.php?id=5618
! Xft.dpi: 110
Xft.dpi: 96
Xft.autohint: 0
Xft.lcdfilter: lcddefault
Xft.hintstyle: hintfull
Xft.hinting: 1
Xft.antialias: 1
Xft.rgba: rgb
@lhl
lhl / DAMP.ahk
Last active August 29, 2015 14:10
AutoHotKey for DAI MP
/*
Dragon Age Inquisition Multiplayer Key Bindings
---
You should map WASD (from WQSE to movement).
These tweaks should make DAI easier to control.
What the script does:
* MB4 toggles RMB down/up (freelook)
* Caps lock toggles sprint

Keybase proof

I hereby claim:

  • I am lhl on github.
  • I am lhl (https://keybase.io/lhl) on keybase.
  • I have a public key whose fingerprint is 4DAB 5922 AD2C B6F2 780C CC2A CE9A 69D9 663F C373

To claim this, I am signing this object:

@lhl
lhl / script.rpy
Created August 6, 2013 00:14
You can simply copy this file into the Save the Date http://paperdino.com/games/save-the-date/ game folder and it'll overwrite the script.rpyc on the next open. The edited option still requires the I_AM_A_HACKER boolean but incorporates that as a valid choice instead of an FU.
init:
define f = Character('Felicia', color="#c8ffc8", show_two_window=False, image="felicia")
$ narrator = Character(None, color="#c8ffc8")
init:
image felicia happy = Image("art/f_happy.png")
image felicia sad = Image("art/f_sad.png")
image felicia angry = Image("art/f_angry.png")
image felicia pensive = Image("art/f_pensive.png")
image felicia surprised = Image("art/f_surprised.png")
image felicia suspicious = Image("art/f_suspicious.png")
@lhl
lhl / gist:4368260
Created December 24, 2012 07:48
Open a new window and load a page in Google Chrome w/ Applescript
tell application "Google Chrome"
set myWindow to make new window
set myTab to active tab of myWindow
set URL of myTab to "http://randomfoo.net/"
activate end tell
@lhl
lhl / gist:4238942
Created December 8, 2012 06:39
Get Recent Tweets for WordPress
function get_tweets($num=3) {
// Cached
if($tweets = get_transient('tweets')) {
return $tweets;
}
$url = 'http://api.twitter.com/1/statuses/user_timeline.json?include_entities=true&include_rts=true&screen_name=lhl&count=20&exclude_replies=true';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);