Speculative Sampling is the principle of using two language models simultaneously to produce output faster at a reduced cost. https://buff.ly/3WAGKNq Author: https://buff.ly/3JVGHEn
Machine Learning - IAEE’s Post
More Relevant Posts
-
It's weird that LLMs immediately forget old conversations — or even parts of a conversation that are simply too far back in time, after the context window runs out. Yes, there are some huge context windows out there. But still, computers can easily store, word-for-word, the exact history of practically unlimited conversation. It feels like LLMs ought to remember things it heard years ago, just as a person (with a good memory) might. As a fun thought experiment: I type about 90 words per minute. If I typed non-stop for 100 years (no lazy things like sleeping or eating allowed), I'd produce less than 30 gigs of data. In other words, the entire lifetime text output of a single human can easily fit in the memory of a modern laptop. Are researchers just going to sit around and let LLMs have poor memories??? Obviously no, otherwise I wouldn't be writing this. The progress to report here is about a new paper from Google about an idea called Infini-attention. I have to give the authors credit for not flippantly using the word "infini:" This work really does provide a modification to attention that — in theory — has no time limit in terms of what it can remember. Well ... let's get somewhat technical. The "infini" memory of the model is a matrix that can effectively capture many key-value data points from all past conversation history. One very cool fact about high-dimensional vectors is that most of them (chosen at random) are mostly orthogonal to each other, which means that, in some sense, you can pack more information into a matrix than what you might expect by the size of the matrix alone. Having said that, the rank of the matrix (which is limited by the smaller of: number of rows, or number of columns) is indeed a kind of upper bound on the truly independent directions of information a single matrix can hold. The capacity is _not_ infinite. So it's inevitable that such a matrix must gradually forget things over time. In a way — if this is a model for the human brain — it may help us to understand why most people do tend to forget, or experience a fading away — of memories as time passes. This week's Learn & Burn summary explains the clever way the authors are able to incrementally update the memory matrix and incorporate non-linearity into the key-value lookups: https://lnkd.in/g-Gjr5aP
LLMs that never forget
learnandburn.ai
To view or add a comment, sign in
-
CLIP bridges text and image, but literally nobody used it for text retrieval—𝙪𝙣𝙩𝙞𝙡 𝙣𝙤𝙬. We're excited to introduce 𝐉𝐢𝐧𝐚 𝐂𝐋𝐈𝐏: a CLIP-like model that's great at text-text, text-image, image-text, and image-image retrieval. From now on, your Jina CLIP model 𝐢𝐬 𝐚𝐥𝐬𝐨 your text retriever. No need to switch between different embedding models when building MuRAG (Multimodal RAG) - one model, two modalities, four search directions. Not to mention it also handle 8K context length. So how we did it? Read more: https://lnkd.in/e8mYHNYJ
To view or add a comment, sign in
-
Generating cat videos is fun and all, but this research is important: https://lnkd.in/gyyRwVSa
V-JEPA trains a visual encoder by predicting masked spatio-temporal regions in a learned latent space.
ai.meta.com
To view or add a comment, sign in
-
Jens Fiederer Thanks for the link . I remember you talking about his method -- and perhaps even reading his paper . Perhaps stop by tomorrow's CoSy TuesZoom , https://lnkd.in/g9tBW-X5 and describe it in more detail .
Learn about the only language which spans from the chip in Forth to Array Mathematics ( eg: accounting , AI , voxel modeling ) simplified from APL/K ,
cosy.com
To view or add a comment, sign in
-
Support your answers with course material concepts, principles, and theories fro
Support your answers with course material concepts, principles, and theories fro
https://professionalwriters.blog
To view or add a comment, sign in
-
Here's a beautiful interactive map referencing all the cognitive biases that affect your thinking. Buster Benson's codex: https://lnkd.in/g7TTTjUq
To view or add a comment, sign in
-
In this episode, we discuss Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer? by Nishant Balepur, Feng Gu, Abhilasha Ravichander, Shi Feng, Jordan Boyd-Graber, Rachel Rudinger. The paper investigates the reverse question answering (RQA) task where a question is generated based on a given answer and examines how 16 large language models (LLMs) perform on this task compared to traditional question answering (QA). The study reveals that LLMs are less accurate in RQA for numerical answers but perform better with textual ones, and they often can answer their incorrectly generated questions accurately in traditional QA, indicating that errors are not solely due to knowledge gaps. Findings also highlight that RQA errors correlate with question difficulty and are inversely related to the frequency of answers in the data corpus, presenting challenges in generating valid multi-hop questions and suggesting areas for improvement in LLM reasoning for RQA.
Arxiv Paper - Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer?
podbean.com
To view or add a comment, sign in
-
The Llama-3.1 paper about the new model is one of the most open and transparent so far on all aspects, highly recommended for reading
The Llama 3 Herd of Models
ai.meta.com
To view or add a comment, sign in
-
For a clear and accessible introduction to LLM fine-tuning with Low Rank Adaptation (LoRA), don't miss Matthew Gunton's latest paper walkthrough.
Understanding Low Rank Adaptation (LoRA) in Fine Tuning LLMs
towardsdatascience.com
To view or add a comment, sign in
20 followers