Jim Fan’s Post

View profile for Jim Fan, graphic
Jim Fan Jim Fan is an Influencer

NVIDIA Senior Research Manager & Lead of Embodied AI (GEAR Group). Stanford Ph.D. Building Humanoid robot and gaming foundation models. OpenAI's first intern. Sharing insights on the bleeding edge of AI.

This is the way to unlock the next trillion high-quality tokens, currently frozen in textbook pixels that are not LLM-ready. Nougat: an open-source OCR model that accurately scans books with heavy math/scientific notations. It's ages ahead of other open OCR options. Meta is doing extraordinary open-source AI, sometimes without as much fanfare as Llama. My first serious AI research project (back at Columbia, 2012) was to convert chemical engineering PDFs into NLP-ready corpus. I still remember the immense pain of Tesseract, a much older OCR system (https://lnkd.in/gq4G47hx). Now Nougat runs a powerful Swin Transformer backbone and blows the benchmarks out of the water. We're talking about double-digit improvements across all metrics. Now, textbooks are all we need for the next GPT! Website: https://lnkd.in/gtCF2QZ4 Open-source code: https://lnkd.in/ggTgzX3N Paper "Nougat: Neural Optical Understanding for Academic Documents": https://lnkd.in/gBXZDyyk

  • No alternative text description for this image
Garrick Villaume

Sustaining Innovation: People, Purpose, Process

1y

Did you buy the texts? Why do you feel entitled to apprpriate the content, then crow about your method?

the first equation is already off 👀

Robert DuWors

Digital Substrate Architect with insight from being there (Retired)

1y

Damn - I admit it. I am definitely getting future shock. One thing for sure - neuromorphic/ANN/Connectionist computing will be deeply interconnected with one system interacting with to feed or receive from others. What we have no clue about is how this will form neuromorphic computing "societies" and what it will do to knowledge as a social construct. Impressed by individual accomplishments - think about the collective ones. The shock just keeps getting worse. Prediction: Waiting for AGI is like Waiting for Godot - pointless. The real deal will be the time that collectively human talents, individual and societal, are exceeded on all scales by collective actions of neuromorphic computing, without any of us able to tell for sure when it does. Regulate that EU.

Juhani Koivisto

Senior Risk Expert, PhD

1y

Accurate? In the very first formula on the top of the page, the n has turned into a w on the right hand side, and in the exponent into an asterix (*). And then feed countless typos into the next Chat GPT? Kudos to the original authors for not messing up the chain rule.

Shaurya Rohatgi

LLM and IR Researcher | ex. Ai2 @UChicago @PennState

1y

This is great! but this very expensive computationally. For example, to run this with gpus on 1M pdfs will cost $80k

Yanis Labeyrie

Data Scientist | Ingénieur Centralien | Master IAAA

1y

Just tried it, this is so much better and powerfull than Pytesseract. It can handle equations far better, isn't bothered by noise, low brightness. This is simply amazing.

Roy Lenders

Entrepreneur, Artificial Intelligence, Quant Trading, eCommerce, Supply Chain

1y

Tim Wedde , for AIDA?

Neil Hausmann

Senior Program Officer at Bill & Melinda Gates Foundation

1y

Next-gen OCR to unlock insights hidden in agricultural grey literature and paper documents

Maxim Khailo

Technical Trouble Maker

1y

I wonder how well LLM handle changes in language. As you go further back in time, words had different meanings. While the underlying structure of language (because our brains haven't changed for hundreds of thousands of years) is static, the words themselves have changed. Has there been any research into whether the higher layer latent space shifts or stays static?

Tyler Suard

Senior AI Researcher & Developer at Parker-Hannifin. Ex-Apple, Ex-Meta. Contributor to Autogen, Tensorflow, PyTorch, Huggingface Transformers. Stanford affiliate. Interested in longevity, AI +Bio.

1y

I too have had the misfortune of using Tesseract! Thank you for sharing about Nougat. I was thinking about going for a PhD at Stanford, would you recommend it if I want to go further into AI?

See more comments

To view or add a comment, sign in

Explore topics