HTML to Markdown ✨using AI✨ (example of premature pessimization)

This is a prompt I built before finding out about LangChain. I didn't know how to pass documentation I found on a website as context to the LLM, so I first attempted to pass it as pure html. Because a web page's HTML in the modern web can often be a bit bloated, I couldn't pass it directly as context because some pages easily exceeded the size of even the largest context window I could find in any available chat model.

Second step? Attempt to convert these to markdown through conventional means. I tried using html2markdown, markdownify, and pandoc, but I wasn't satisfied with any of the results. So I came up with a solution:

The world's most expensive HTML to markdown utility: leverage LLMs and pass it a task to GPT-4 turbo.

                messages=[
                    {"role": "system", "content": "The user provides HTML content in chunks. Convert only the new chunk to Markdown. The previous messages are provided as context to maintain consistency."},
                    {"role": "user", "content": "Convert this HTML to Markdown: " + additional_contents['sample-input']},
                    {"role": "system", "content": additional_contents['sample-output']},
                    {"role": "user", "content": "Here's the previous context in Markdown: " + context_message},
                    {"role": "user", "content": "Convert this new HTML chunk to Markdown: " + chunk}
                ],

Watch out! This tomfoolery contraption allows you to burn your OpenAI API credits faster than any other translation method.

cgalo5758/html-to-markdown.md

Select an option

No results found

Select an option

No results found