Skip to content

Instantly share code, notes, and snippets.

@cgalo5758
Created January 14, 2024 08:36
Show Gist options
  • Select an option

  • Save cgalo5758/9a1bdab67433bb8917d6e0dcaae1092b to your computer and use it in GitHub Desktop.

Select an option

Save cgalo5758/9a1bdab67433bb8917d6e0dcaae1092b to your computer and use it in GitHub Desktop.
HTML to Markdown ✨using AI✨ (example of premature pessimization)

This is a prompt I built before finding out about LangChain. I didn't know how to pass documentation I found on a website as context to the LLM, so I first attempted to pass it as pure html. Because a web page's HTML in the modern web can often be a bit bloated, I couldn't pass it directly as context because some pages easily exceeded the size of even the largest context window I could find in any available chat model.

Second step? Attempt to convert these to markdown through conventional means. I tried using html2markdown, markdownify, and pandoc, but I wasn't satisfied with any of the results. So I came up with a solution:

The world's most expensive HTML to markdown utility: leverage LLMs and pass it a task to GPT-4 turbo.

                messages=[
                    {"role": "system", "content": "The user provides HTML content in chunks. Convert only the new chunk to Markdown. The previous messages are provided as context to maintain consistency."},
                    {"role": "user", "content": "Convert this HTML to Markdown: " + additional_contents['sample-input']},
                    {"role": "system", "content": additional_contents['sample-output']},
                    {"role": "user", "content": "Here's the previous context in Markdown: " + context_message},
                    {"role": "user", "content": "Convert this new HTML chunk to Markdown: " + chunk}
                ],

Watch out! This tomfoolery contraption allows you to burn your OpenAI API credits faster than any other translation method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment