Get the latest tech news

Show HN: Mutable.ai Codebase chat that uses a Wiki for RAG


Dive into high-quality, AI-generated documentation for your repositories, powered by Mutable.ai Auto Wiki.

The Flash Attention implementation leverages techniques like split-KV, paged KV cache, and rotary positional embeddings to achieve high efficiency on modern CUDA-enabled GPUs, particularly Ampere and Hopper architectures. Read more The flash-attention library provides custom implementations of various activation functions, including Gaussian Error Linear Unit (GELU), ReLU, and SwiGLU, along with their corresponding backward passes. Read more The various utility functions in the flash-attention library focus on remapping state dictionaries and converting configurations between different pre-trained language model formats, such as BERT, GPT, GPT-NeoX, GPT-J, LLaMA, OPT, and Falcon.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of RAG

RAG

Photo of wiki

wiki

Related news:

News photo

How LlamaIndex is ushering in the future of RAG for enterprises

News photo

Writer drops mind-blowing AI update: RAG on steroids, 10M word capacity, and AI ‘thought process’ revealed

News photo

HuggingFace releases support for tool-use and RAG models