Get the latest tech news
Moondream 3 Preview: Frontier-level reasoning at a blazing speed
Think deeper, detect smarter, run just as fast.
Sorting produce, or detecting missing herd animals from a drone, or recognizing security incidents - none of these tasks can be built without fast vision inference. We do not use a separate context-length extension phase during training, instead opting to interleave long-context samples while pretraining with a default context length of 4096 tokens. It was trained with load-balancing and router orthogonality losses to help similar tokens specialize together early on, then had load balancing disabled in post-training to avoid catastrophic forgetting from distribution shift.
Or read this on Hacker News