Get the latest tech news
Can modern LLMs count the number of b's in "blueberry"?
It’s an adversarial question for LLMs, but it’s not unfair.
Bluesky — whose community is skeptical at-best of generative AI in all its forms — began putting the model through its paces: Michael Paulauski asked GPT-5 through the ChatGPT app interface “how many b’s are there in blueberry?”. Additionally, it’s been a year since the strawberry test and hundreds of millions of dollars have been invested into improving RLHF regimens and creating more annotated training data: it’s hard for me to believe that modern LLMs have made zero progress on these types of trivial tasks. In order to ensure the results are most representative of what a normal user would encounter when querying these LLMs, I will not add any generation parameters besides the original question: no prompt engineering and no temperature adjustments.
Or read this on Hacker News