Get the latest tech news
Is Gemini 2.5 good at bounding boxes?
Can Gemini 2.5 replace CNN for object detection?
MS-COCO is a classic in the object detection world, sure it's a bit dated and the masks/bounding boxes aren't super tight, still, it has a long history and it should be easy to place Gemini among the historical results. Unfortunately, including the mask field caused the model to spiral into infinite loops, spewing out meaningless tokens, and burning my budget (maybe 5% of the time, but enough that I didn't complete the testing). CNNs remain faster, cheaper, and easier to reason about, especially with good training data, but Gemini's versatility across open-set tasks feels almost magical.
Or read this on Hacker News