This is the way to unlock the next trillion high-quality tokens, currently frozen in textbook pixels that are not LLM-ready. Nougat: an open-source OCR model that accurately scans books with heavy math/scientific notations. It's ages ahead of other open OCR options. Meta is doing extraordinary open-source AI, sometimes without as much fanfare as Llama. My first serious AI research project (back at Columbia, 2012) was to convert chemical engineering PDFs into NLP-ready corpus. I still remember the immense pain of Tesseract, a much older OCR system (https://lnkd.in/gq4G47hx). Now Nougat runs a powerful Swin Transformer backbone and blows the benchmarks out of the water. We're talking about double-digit improvements across all metrics. Now, textbooks are all we need for the next GPT! Website: https://lnkd.in/gtCF2QZ4 Open-source code: https://lnkd.in/ggTgzX3N Paper "Nougat: Neural Optical Understanding for Academic Documents": https://lnkd.in/gBXZDyyk
the first equation is already off 👀
Damn - I admit it. I am definitely getting future shock. One thing for sure - neuromorphic/ANN/Connectionist computing will be deeply interconnected with one system interacting with to feed or receive from others. What we have no clue about is how this will form neuromorphic computing "societies" and what it will do to knowledge as a social construct. Impressed by individual accomplishments - think about the collective ones. The shock just keeps getting worse. Prediction: Waiting for AGI is like Waiting for Godot - pointless. The real deal will be the time that collectively human talents, individual and societal, are exceeded on all scales by collective actions of neuromorphic computing, without any of us able to tell for sure when it does. Regulate that EU.
Accurate? In the very first formula on the top of the page, the n has turned into a w on the right hand side, and in the exponent into an asterix (*). And then feed countless typos into the next Chat GPT? Kudos to the original authors for not messing up the chain rule.
This is great! but this very expensive computationally. For example, to run this with gpus on 1M pdfs will cost $80k
Just tried it, this is so much better and powerfull than Pytesseract. It can handle equations far better, isn't bothered by noise, low brightness. This is simply amazing.
Tim Wedde , for AIDA?
Next-gen OCR to unlock insights hidden in agricultural grey literature and paper documents
I wonder how well LLM handle changes in language. As you go further back in time, words had different meanings. While the underlying structure of language (because our brains haven't changed for hundreds of thousands of years) is static, the words themselves have changed. Has there been any research into whether the higher layer latent space shifts or stays static?
I too have had the misfortune of using Tesseract! Thank you for sharing about Nougat. I was thinking about going for a PhD at Stanford, would you recommend it if I want to go further into AI?
Sustaining Innovation: People, Purpose, Process
1yDid you buy the texts? Why do you feel entitled to apprpriate the content, then crow about your method?