Get the latest tech news
GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2
And How They Stack Up Against Qwen3
This grouping decreases the total number of key and value computations, leading to lower memory usage and improved efficiency — without noticeably affecting modeling performance, according to ablation studies. Concretely, gpt-oss models can receive "Reasoning effort: low/medium/high" instructions as part of their system prompt, which directly affects the response length and accuracy, as shown in Figure 21. Tool integration in open-source LLMs is still in its early stages, but as it matures, I expect that we increasingly let models consult external sources (like search engines) when answering factual or knowledge-based queries.
Or read this on Hacker News