DatologyAI

Technology, Information and Internet

Redwood City, California 2,082 followers

better data, better models, better business

See jobs Follow

Discover all 20 employees

About us

DatologyAI builds tools to automatically select the best data on which to train deep learning models. Our tools leverage cutting-edge research—much of which we perform ourselves—to identify redundant, noisy, or otherwise harmful data points. The algorithms that power our tools are modality-agnostic—they’re not limited to text or images—and don’t require labels, making them ideal for realizing the next generation of large deep learning models. Our products allow customers in nearly any vertical to train better models for cheaper.

Website: www.datologyai.com
External link for DatologyAI
Industry: Technology, Information and Internet
Company size: 2-10 employees
Headquarters: Redwood City, California
Type: Privately Held
Founded: 2023

Locations

Primary

1001 Main St

Redwood City, California 94063, US

Get directions

Employees at DatologyAI

See all employees

Updates

DatologyAI

2,082 followers
1w Edited
Report this post
Come on over to our booth to grab some delicious Data fortune cookies and pick up a fun DatologyAI-branded fidget cube! You can find us at booth 303, right next to the entrance. We can't wait to see you!! #NeurIPS2024
Like Comment Share
DatologyAI

2,082 followers
2w
Report this post
We’re thrilled to announce that DatologyAI will be at NeurIPS 2024 in Vancouver, Canada! 🎉 Visit us at Booth 303 from December 10-12 to learn more about how we’re revolutionizing data curation. If you’re passionate about unlocking the power of your data or just curious about what we do, we’d love to meet you! Let’s talk about how data curation can transform your AI initiatives. See you! #NeurIPS2024 #DataCuration #DatologyAI
Like Comment Share
DatologyAI reposted this
Rob Toews

Partner at Radical Ventures, AI Columnist at Forbes
4w
Report this post
Nothing matters more for a startup's success than its ability to ship quickly. And DatologyAI has been shipping amazingly quickly. Congratulations to the team on another incredible release! 🚀 https://lnkd.in/ggvNwAxd

Train LLMs Faster, Better, and Smaller with DatologyAI’s Data Curation

datologyai.com

Like Comment Share
DatologyAI

2,082 followers
4w
Report this post
Starting your Thanksgiving holiday with some fresh-out-of-oven DatologyAI Data:
Ari Morcos

CEO and Co-founder at DatologyAI | ex-FAIR, DeepMind
4w

Two weeks ago, we at DatologyAI released our first results demonstrating massive gains from data curation on contrastive image-text models. Today, I'm incredibly excited to share our newest results, applying our curation pipeline to LLMs. It's absolutely astonishing to see what a small, incredibly talented group of individuals can accomplish, and boy have we cooked! Starting with an exact-deduplicated version of Red Pajama v1 as our baseline and by manipulating only the training data for the model: Train Faster -- Training on our curated data reached the same baseline performance 7.7x faster, meaning results cost dramatically less to obtain and drastically improving iteration speed. Train Better -- Push the frontier of what's possible with a given budget, improving performance by 8.5 absolute percentage points (60.5% Datology vs. 52.0% RPJv1). This isn't just because of Red Pajama: compared to the strongest publicly curated datasets, DataComp-LM and FineWeb-Edu, improve performance by 4.4% and 6.1%, respectively. Train Smaller -- Better data enables you to train smaller models. Reduce cost per query at inference by 2.1x while simultaneously increasing performance over the baseline by 5.7%. As with our image-text results, we present these results both at a high-level (https://lnkd.in/g_hMR5Tx) and with an extremely meaty technical deep-dive for all of you who want the nitty-gritty details (https://lnkd.in/gY5tpq3s). We are just getting started on our journey and are so excited about what's in store. Are you training or customizing your own text models and want to improve performance, training efficiency, and inference efficiency through better data? Get in touch (https://lnkd.in/gSGckr6s)! Are you a data-obsessed researcher, engineer, or somewhere in between who wants to push the bounds of what's possible with better data? We're hiring Members of Technical Staff across a number of roles (https://lnkd.in/gHCwPk8e).
Like Comment Share
DatologyAI reposted this
Ari Morcos

CEO and Co-founder at DatologyAI | ex-FAIR, DeepMind
4w
Report this post
Two weeks ago, we at DatologyAI released our first results demonstrating massive gains from data curation on contrastive image-text models. Today, I'm incredibly excited to share our newest results, applying our curation pipeline to LLMs. It's absolutely astonishing to see what a small, incredibly talented group of individuals can accomplish, and boy have we cooked! Starting with an exact-deduplicated version of Red Pajama v1 as our baseline and by manipulating only the training data for the model: Train Faster -- Training on our curated data reached the same baseline performance 7.7x faster, meaning results cost dramatically less to obtain and drastically improving iteration speed. Train Better -- Push the frontier of what's possible with a given budget, improving performance by 8.5 absolute percentage points (60.5% Datology vs. 52.0% RPJv1). This isn't just because of Red Pajama: compared to the strongest publicly curated datasets, DataComp-LM and FineWeb-Edu, improve performance by 4.4% and 6.1%, respectively. Train Smaller -- Better data enables you to train smaller models. Reduce cost per query at inference by 2.1x while simultaneously increasing performance over the baseline by 5.7%. As with our image-text results, we present these results both at a high-level (https://lnkd.in/g_hMR5Tx) and with an extremely meaty technical deep-dive for all of you who want the nitty-gritty details (https://lnkd.in/gY5tpq3s). We are just getting started on our journey and are so excited about what's in store. Are you training or customizing your own text models and want to improve performance, training efficiency, and inference efficiency through better data? Get in touch (https://lnkd.in/gSGckr6s)! Are you a data-obsessed researcher, engineer, or somewhere in between who wants to push the bounds of what's possible with better data? We're hiring Members of Technical Staff across a number of roles (https://lnkd.in/gHCwPk8e).
1 Comment

Like Comment Share
DatologyAI

2,082 followers
1mo
Report this post
Train better, train faster, train smaller with DatologyAI!
Ari Morcos

CEO and Co-founder at DatologyAI | ex-FAIR, DeepMind
1mo

Models are what they eat: high quality data lead to high quality models, enabling faster training of better models with fewer parameters. However, identifying and curating high quality data at scale, automatically, is an incredibly challenging problem requiring deep expertise. Our goal at DatologyAI is to make state of the art data curation accessible to anyone who wants to train a model, and we’ve been hard at work realizing this vision over the last year. On a personal note, I am so proud of the incredible work our small, but mighty team has accomplished, and today, I’m incredibly excited to share our first set of results at DatologyAI! We focused on contrastive models (ala CLIP) trained on the large-scale DataComp dataset, and the results we’ve been able to achieve have exceeded our already high expectations! Train Faster - Training on DatologyAI’s optimized dataset, we were able to reach the same performance with up to ~98% less compute, meaning that models cost dramatically less to train and train dramatically faster! Train Better - Models trained on our optimized data for the same compute budget achieve up to 13 absolute percentage points better performance relative to models trained on raw data. Train Smaller - Train models with >60% fewer parameters to better performance by training on our curated data. Check out our high-level blog post here (https://shorturl.at/jkYqk), and if you’re interested in all the nitty, gritty details, check out our technical deep dive here (https://shorturl.at/Mt0k9). We are so excited about these results, and we are just getting started! Stay tuned for more exciting results on text models coming very soon!
Like Comment Share
DatologyAI reposted this
Rob Toews

Partner at Radical Ventures, AI Columnist at Forbes
1mo Edited
Report this post
Incredible results and exciting announcement today from DatologyAI! If you are training or fine-tuning AI models, you can't afford not to use Datology. The gains in speed, cost, performance and efficiency that Datology's platform unlocks are astounding.

DatologyAI’s Image-Text Data Curation: Train Better, Faster, Smaller

datologyai.com

2 Comments

Like Comment Share
DatologyAI reposted this
Haoli Yin

Data Research @DatologyAI | Neo Finalist, Goldwater, CV Scholar | prev @ Modern Intelligence, Bowden Lab | See haoliyin.me
1mo
Report this post
Check out what I've been working on for the past 6 months! tl;dr: we've pretrained the most data-efficient and best-in-class CLIP models! See this X/Twitter thread for more information as well: https://lnkd.in/g9RwS7uG
Ari Morcos

CEO and Co-founder at DatologyAI | ex-FAIR, DeepMind
1mo

Models are what they eat: high quality data lead to high quality models, enabling faster training of better models with fewer parameters. However, identifying and curating high quality data at scale, automatically, is an incredibly challenging problem requiring deep expertise. Our goal at DatologyAI is to make state of the art data curation accessible to anyone who wants to train a model, and we’ve been hard at work realizing this vision over the last year. On a personal note, I am so proud of the incredible work our small, but mighty team has accomplished, and today, I’m incredibly excited to share our first set of results at DatologyAI! We focused on contrastive models (ala CLIP) trained on the large-scale DataComp dataset, and the results we’ve been able to achieve have exceeded our already high expectations! Train Faster - Training on DatologyAI’s optimized dataset, we were able to reach the same performance with up to ~98% less compute, meaning that models cost dramatically less to train and train dramatically faster! Train Better - Models trained on our optimized data for the same compute budget achieve up to 13 absolute percentage points better performance relative to models trained on raw data. Train Smaller - Train models with >60% fewer parameters to better performance by training on our curated data. Check out our high-level blog post here (https://shorturl.at/jkYqk), and if you’re interested in all the nitty, gritty details, check out our technical deep dive here (https://shorturl.at/Mt0k9). We are so excited about these results, and we are just getting started! Stay tuned for more exciting results on text models coming very soon!
2 Comments

Like Comment Share
DatologyAI reposted this
Bogdan Gaza

Co-Founder & CTO at DatologyAI
1mo Edited
Report this post
High-quality data leads to better models! At DatologyAI, we've made data curation accessible! Our curation pipeline enables training faster (up to 98% less compute), better (up to 13% higher performance), and smaller (>60% fewer parameters). Check out the blog posts below: https://lnkd.in/dAySHZGK
Ari Morcos

CEO and Co-founder at DatologyAI | ex-FAIR, DeepMind
1mo

Models are what they eat: high quality data lead to high quality models, enabling faster training of better models with fewer parameters. However, identifying and curating high quality data at scale, automatically, is an incredibly challenging problem requiring deep expertise. Our goal at DatologyAI is to make state of the art data curation accessible to anyone who wants to train a model, and we’ve been hard at work realizing this vision over the last year. On a personal note, I am so proud of the incredible work our small, but mighty team has accomplished, and today, I’m incredibly excited to share our first set of results at DatologyAI! We focused on contrastive models (ala CLIP) trained on the large-scale DataComp dataset, and the results we’ve been able to achieve have exceeded our already high expectations! Train Faster - Training on DatologyAI’s optimized dataset, we were able to reach the same performance with up to ~98% less compute, meaning that models cost dramatically less to train and train dramatically faster! Train Better - Models trained on our optimized data for the same compute budget achieve up to 13 absolute percentage points better performance relative to models trained on raw data. Train Smaller - Train models with >60% fewer parameters to better performance by training on our curated data. Check out our high-level blog post here (https://shorturl.at/jkYqk), and if you’re interested in all the nitty, gritty details, check out our technical deep dive here (https://shorturl.at/Mt0k9). We are so excited about these results, and we are just getting started! Stay tuned for more exciting results on text models coming very soon!
1 Comment

Like Comment Share
DatologyAI reposted this
Ari Morcos

CEO and Co-founder at DatologyAI | ex-FAIR, DeepMind
1mo
Report this post
Models are what they eat: high quality data lead to high quality models, enabling faster training of better models with fewer parameters. However, identifying and curating high quality data at scale, automatically, is an incredibly challenging problem requiring deep expertise. Our goal at DatologyAI is to make state of the art data curation accessible to anyone who wants to train a model, and we’ve been hard at work realizing this vision over the last year. On a personal note, I am so proud of the incredible work our small, but mighty team has accomplished, and today, I’m incredibly excited to share our first set of results at DatologyAI! We focused on contrastive models (ala CLIP) trained on the large-scale DataComp dataset, and the results we’ve been able to achieve have exceeded our already high expectations! Train Faster - Training on DatologyAI’s optimized dataset, we were able to reach the same performance with up to ~98% less compute, meaning that models cost dramatically less to train and train dramatically faster! Train Better - Models trained on our optimized data for the same compute budget achieve up to 13 absolute percentage points better performance relative to models trained on raw data. Train Smaller - Train models with >60% fewer parameters to better performance by training on our curated data. Check out our high-level blog post here (https://shorturl.at/jkYqk), and if you’re interested in all the nitty, gritty details, check out our technical deep dive here (https://shorturl.at/Mt0k9). We are so excited about these results, and we are just getting started! Stay tuned for more exciting results on text models coming very soon!
19 Comments

Like Comment Share

Browse jobs

Funding

DatologyAI 2 total rounds

Last Round

Series A Jun 8, 2024

US$ 46.0M

Investors

Felicis + 5 Other investors

See more info on crunchbase

DatologyAI

Technology, Information and Internet

Redwood City, California 2,082 followers

better data, better models, better business

About us

Locations

Employees at DatologyAI

Bogdan Gaza

Co-Founder & CTO at DatologyAI

Ari Morcos

CEO and Co-founder at DatologyAI | ex-FAIR, DeepMind

Ricardo Pio Monti

Research Scientist

Rob Toews

Partner at Radical Ventures, AI Columnist at Forbes

Updates

Train LLMs Faster, Better, and Smaller with DatologyAI’s Data Curation

datologyai.com

DatologyAI’s Image-Text Data Curation: Train Better, Faster, Smaller

datologyai.com

Join now to see what you are missing

Similar pages

Vultron

Pika

Moonsense

Metaplane

Conviction

Felicis

Cartesia

Amplify Partners

Orbital

HeyGen

Browse jobs

Engineer jobs

Graduate Research Assistant jobs

Head of Marketing jobs

Machine Learning Engineer jobs

Specialist jobs

Software Engineer jobs

Funding