Get the latest tech news

Minifying HTML for GPT-4o: Remove all the HTML tags


tl;dr; if you want to pass HTML data to GPT-4o, just strip out all the HTML and pass raw text, it’s cheaper and there is little to no performance degradation. Source code and demo available.

Since I wanted to test to what extent the HTML structure would have an effect on extraction quality, I asked GPT-4o two types of questions: When asking unstructured questions, GPT-4o and its mini version have similar performance and the pre-processing doesn’t make a difference. Since the price gap is big, I recommend using GPT-4o mini for unstructured questions with all the HTML removed to maximize savings.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of GPT-4o

GPT-4o

Photo of html tags

html tags

Photo of minifying html

minifying html

Related news:

News photo

OpenAI finds that GPT-4o does some truly bizarre stuff sometimes

News photo

OpenAI's budget GPT-4o mini model is now cheaper to fine-tune, too

News photo

OpenAI, Nvidia, and Hugging Face unveil small AI models: GPT-4o Mini, Mistral-Nemo, and SmolLM lead industry shift