I attempted to create core plugin functionality for the Responses API by combining gpt-5.1-codex-max with Gemini3, but indeed, their specifications differ significantly from the Completion API. When attempting to use Vision features, I confirmed that GPT responses become noticeably slower. The responses come back extremely sluggish.
When interactions don't involve images, the system appears to function without issues. Since gpt-5.1-codex is difficult to use in Azure AI Foundry's Chat Playground, having it compatible with Dify would certainly be extremely helpful. In some cases, I might need to narrow down its functionality, but it's also possible that I just lack the necessary technical expertise. I'd like to have someone more experienced review this area to see if it could be improved further.
Using this setup, I've created a plugin for my self-hosted Dify-CE tenant that currently restricts itself to GPT-5.x versions. (I've excluded models like Embedding, TTS, and STT from this implementation.)