This is a prompt I built before finding out about LangChain. I didn't know how to pass documentation I found on a website as context to the LLM, so I first attempted to pass it as pure html. Because a web page's HTML in the modern web can often be a bit bloated, I couldn't pass it directly as context because some pages easily exceeded the size of even the largest context window I could find in any available chat model.
Second step? Attempt to convert these to markdown through conventional means. I tried using html2markdown, markdownify, and pandoc, but I wasn't satisfied with any of the results. So I came up with a solution:
The world's most expensive HTML to markdown utility: leverage LLMs and pass it a task to GPT-4 turbo.
messages=[
{"role": "system", "content": "The user provides HTML content in chunks. Convert only the new chunk to Markdown. The previous messages are provided as context to maintain consistency."},
{"role": "user", "content": "Convert this HTML to Markdown: " + additional_contents['sample-input']},
{"role": "system", "content": additional_contents['sample-output']},
{"role": "user", "content": "Here's the previous context in Markdown: " + context_message},
{"role": "user", "content": "Convert this new HTML chunk to Markdown: " + chunk}
],
Watch out! This tomfoolery contraption allows you to burn your OpenAI API credits faster than any other translation method.