Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save joshalanwagner/66fea2d0b2bf33e29a7527e7f225d11e to your computer and use it in GitHub Desktop.

Select an option

Save joshalanwagner/66fea2d0b2bf33e29a7527e7f225d11e to your computer and use it in GitHub Desktop.
Text2Image shootout: Prompt Adherence.

Text2Image shootout: Prompt Adherence.

ComparisonGrid_03

Wan2.1 T2V 14B :: Stable Diffusion 3.5 Large :: Flux.1 Dev :: Chroma .27 :: HiDream I1 full

May 2025

Who delivers the best prompt adherence for various challenges?

Campfire (Cowboy & Daughter)

WINNER: Wan2.1

CampfireWan-dpmpp_2m-beta-20-52_BEST

+ grizzled old cowboy sits next to a campfire crocheting a pink (-mitten). 
+ his daughter sitting across reaches out to add a log to the fire. 
+ embers and smoke dance above. 
+ shot with 50mm lens mirrorless body.

Some issues w/ hands at 20 steps. 2nd place: Chroma & HiDream

Chinchilla in Diner (16:9)

WINNERS: TIE SD, Chroma and Wan

SD3.5L

ChinchillaSD-dpmpp_2m-beta-40-50_BEST

+ 3d animated film. sad anthropomorphic teen chinchilla sitting in a booth, 
+ looking at her smart phone in a sparkling 1950's diner.

HiDream would not give me a sad looking character, even though the character design was possibly the best.

Hound on Car (In Town Ruins) 1:1

WINNERS TIE: Chroma & SD

Chroma

HoundChrm-euler-beta-20-53_BEST

+ Gritty comic book graphic novel style. feral hound sitting on the hood of a 
+ rusted-out muscle car in post-apocalyptic ruins of a small town.

SD3.5L

Hound-dpmpp_2m-sgm_uniform-30-52_BEST

+ Gritty comic book graphic novel style. 
+ feral hound sitting on the hood of a rusted-out muscle car in post-apocalyptic ruins of a small town.

Invite (Typography) 2:3

WINNERS: TIE HiDream & SD3.5

HiDream

InvitationHiDrm-dpmpp_2m-beta-40-50_BEST

+ Invitation for a baby shower. Generic clipart image of a stork carrying a baby. 
+ Headline "We are expecting you!" and smaller text that says "Feb 24 at the Grizwalt Hotel" and "Games and Prizes!"
+ Graphic design in pastel colors. 

Design is usable and got most of the letters right!

SD3.5L

Invitation-dpmpp_2m-sgm_uniform-30-52_BEST

+ Invitation for a baby shower. Generic clipart image of a stork carrying a baby. 
+ Headline "We are expecting you!" and smaller text that says "Feb 24 at the Grizwalt Hotel" and "Games and Prizes!"
+ Graphic design in pastel colors. 

Ugly type, but I didn't ask for beautiful type.

Magazine (Lipstick Triangle) 3:4

WINNER: Wan2.1

MagazineWan-dpmpp_2m-beta-20-50_00002_

+ Magazine advertisement. Kenyan supermodel closeup glitter eye shadow 
+ small triangle of blue lipstick on lower lip. Haute fashion. Dramatic lighting. 

Queen (Torn Dress) 9:16

WINNER: Chroma

InvitationHiDrm-dpmpp_2m-beta-20-50_BEST

+ Ancient Egyptian Queen in her royal dressing room  dismayed to discover her dress is torn. 

2nd place: Wan

StarPort (Unconscious, Broken Glass) 3:2

WINNER: HiDream

StarPortHiD-dpmpp_2m-beta-40-53_BEST

+ a young alien man lies unconscious on the black polished flor in a star port. 
+ broken shards of glass all around. His helmet visor  broken off. 
- an alien 
+ woman stands holding a (smoking) blaster pistol. 

runners up: Wan & Chroma

Witch (Watermelon Carry) 16:9

WINNER: Flux

witchHD-dpmpp_2m-beta-30-53_BEST-SOFT

+ movie still of a young witch in swirling black robe and black hat walks along the surface of a 
+ bioluminescent lake carrying a large watermelon through fog under moonlight. 
  • Soft focus at this resolution. Runner up: Wan

Notes

Relative Inference speeds: (same resoultion and steps)

  • SD 3.5 L 43 secs
  • Flux.1 64 secs
  • Chroma 108 secs
  • Wan 2.1 221 secs
  • HiDream 243 secs

Overall findings

Wan 2.1

  • good adherence (3/8 wins)
  • Very high resolution capability
  • Very slow
  • Issues with anatomy

Stable Diffusion 3.5L

  • Good adherence (3/8 wins)
  • ok resolution capability
  • Very fast
  • great text accuracy

Flux.1 Dev

  • Ok adherence (1/8 wins)
  • ok resolution capability
  • Good speed
  • Issues with softness
  • Good composition and aesthetics

Chroma .27

  • Good adherence (2-3 wins) the egyptian one should probably count
  • High resolution capability
  • ok speed
  • good text accuracy

HiDream I1 full (Q6)

  • Very Good adherence (3/8 wins)
  • Very high resolution capability
  • worst speed
  • great text accuracy
  • impressive photographic quality
  • favors photography
  • not sure about art mediums - need testing.

I did four iterations of each prompt: seeds 50-53. I tried the recommended UNIPC/Simple for Wan, but ended up using DPM++2m/beta for better results. I did my best to write prompts that didn't advantage any particular model.
I focused on challenging concepts, trying to find the edges of where things break to reveal where models excell at comprehension and prompt adherence. Scheduler/Sampler combination had a large impact on the nature of some of the results. It seems possible that UniPC works better for Wan video and worse for Wan images? These are all rendered at large resolutions cause I often need higher resolutions. I've learned that the resolution ceiling is a soft ceiling, where more iterations can sometimes make higher resolutions work. I found that cfg 4.5 (on the models that support cfg) gave better adherence than higher values. HiDream full was too slow on my hardware. I will try out HiDream mid - I expect it will run much faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment