Nougat: an open-source OCR model for books | Jim Fan posted on the topic

Jim Fan

NVIDIA Senior Research Manager & Lead of Embodied AI (GEAR Group). Stanford Ph.D. Building Humanoid robot and gaming foundation models. OpenAI's first intern. Sharing insights on the bleeding edge of AI.

This is the way to unlock the next trillion high-quality tokens, currently frozen in textbook pixels that are not LLM-ready. Nougat: an open-source OCR model that accurately scans books with heavy math/scientific notations. It's ages ahead of other open OCR options. Meta is doing extraordinary open-source AI, sometimes without as much fanfare as Llama. My first serious AI research project (back at Columbia, 2012) was to convert chemical engineering PDFs into NLP-ready corpus. I still remember the immense pain of Tesseract, a much older OCR system (https://lnkd.in/gq4G47hx). Now Nougat runs a powerful Swin Transformer backbone and blows the benchmarks out of the water. We're talking about double-digit improvements across all metrics. Now, textbooks are all we need for the next GPT! Website: https://lnkd.in/gtCF2QZ4 Open-source code: https://lnkd.in/ggTgzX3N Paper "Nougat: Neural Optical Understanding for Academic Documents": https://lnkd.in/gBXZDyyk

236 Comments

Garrick Villaume

Sustaining Innovation: People, Purpose, Process

Did you buy the texts? Why do you feel entitled to apprpriate the content, then crow about your method?

20 Reactions

Joshua Hemmerich

Co-Founder at Lessmore

the first equation is already off 👀

7 Reactions

Robert DuWors

Digital Substrate Architect with insight from being there (Retired)

Damn - I admit it. I am definitely getting future shock. One thing for sure - neuromorphic/ANN/Connectionist computing will be deeply interconnected with one system interacting with to feed or receive from others. What we have no clue about is how this will form neuromorphic computing "societies" and what it will do to knowledge as a social construct. Impressed by individual accomplishments - think about the collective ones. The shock just keeps getting worse. Prediction: Waiting for AGI is like Waiting for Godot - pointless. The real deal will be the time that collectively human talents, individual and societal, are exceeded on all scales by collective actions of neuromorphic computing, without any of us able to tell for sure when it does. Regulate that EU.

13 Reactions

Juhani Koivisto

Senior Risk Expert, PhD

Accurate? In the very first formula on the top of the page, the n has turned into a w on the right hand side, and in the exponent into an asterix (*). And then feed countless typos into the next Chat GPT? Kudos to the original authors for not messing up the chain rule.

29 Reactions

Shaurya Rohatgi

LLM and IR Researcher | ex. Ai2 @UChicago @PennState

This is great! but this very expensive computationally. For example, to run this with gpus on 1M pdfs will cost $80k

7 Reactions

Yanis Labeyrie

Data Scientist | Ingénieur Centralien | Master IAAA

Just tried it, this is so much better and powerfull than Pytesseract. It can handle equations far better, isn't bothered by noise, low brightness. This is simply amazing.

5 Reactions

Roy Lenders

Entrepreneur, Artificial Intelligence, Quant Trading, eCommerce, Supply Chain

Tim Wedde , for AIDA?

2 Reactions

Neil Hausmann

Senior Program Officer at Bill & Melinda Gates Foundation

Next-gen OCR to unlock insights hidden in agricultural grey literature and paper documents

5 Reactions

Maxim Khailo

Technical Trouble Maker

I wonder how well LLM handle changes in language. As you go further back in time, words had different meanings. While the underlying structure of language (because our brains haven't changed for hundreds of thousands of years) is static, the words themselves have changed. Has there been any research into whether the higher layer latent space shifts or stays static?

3 Reactions

Tyler Suard

Senior AI Researcher & Developer at Parker-Hannifin. Ex-Apple, Ex-Meta. Contributor to Autogen, Tensorflow, PyTorch, Huggingface Transformers. Stanford affiliate. Interested in longevity, AI +Bio.

I too have had the misfortune of using Tesseract! Thank you for sharing about Nougat. I was thinking about going for a PhD at Stanford, would you recommend it if I want to go further into AI?

4 Reactions

See more comments

To view or add a comment, sign in

Jim Fan’s Post

Explore topics