Get the latest tech news
Minifying HTML for GPT-4o: Remove all the HTML tags
tl;dr; if you want to pass HTML data to GPT-4o, just strip out all the HTML and pass raw text, it’s cheaper and there is little to no performance degradation. Source code and demo available.
Since I wanted to test to what extent the HTML structure would have an effect on extraction quality, I asked GPT-4o two types of questions: When asking unstructured questions, GPT-4o and its mini version have similar performance and the pre-processing doesn’t make a difference. Since the price gap is big, I recommend using GPT-4o mini for unstructured questions with all the HTML removed to maximize savings.
Or read this on Hacker News