Get the latest tech news
Show HN: Wordllama – Things you can do with the token embeddings of an LLM
Things you can do with the token embeddings of an LLM - dleemiller/WordLlama
WordLlama is a fast, lightweight NLP toolkit that handles tasks like fuzzy-deduplication, similarity and ranking with minimal inference-time dependencies and optimized for CPU hardware. Low Resource Requirements: A simple token lookup with average pooling, enables this to operate fast on CPU. Binarization: Models trained using the straight through estimator can be packed to small integer arrays for even faster hamming distance calculations.
Or read this on Hacker News