When tested properly, using the same GLM simple evals reference implementation provided by Z.ai, the evaluations resulted in the following scores:
{
"chars": 970.0044191919192,
"chars:std": 153.57443776558713,
"Chemistry": 72.1774193548387,
"Chemistry:std": 44.81252136132964,