Stephen Li’s Post

Apple Foundation Model: What’s the Big Deal for AI Infra? Apple dropped some AI heat in June, and iPhone users are hyped. Sure, it’s a bummer that it only works on iPhone 15 Pro and above, but Apple’s paper on their Foundation Model (AFM-on-Device) shows exciting potential! First, the model is small - just 3 billion parameters, similar size to Phi-3. Apple squished it down using INT4 precision, meaning model weights only needs around 1.5GB of memory. With KV and overheads, the whole thing should stay under 2GB! 🚀 What’s even smarter? Apple introduced the concept of adaptors as Apple has strong controls on the whole stack, meaning iPhone can only load the small portion of the model depending on the app you’re using. If you’re in Mail, it’ll load just what’s needed for email replies. That keeps the model zippy and sharp. Now, here’s the kicker: Apple trained the big server model on 8,192 TPUv4s in Google Cloud (yep, Google’s TPUs!), then distilled it down to the tiny on-device version using 2,048 TPUv5p. If this works, it might shift more attention to TPU training, which could eat into NVIDIA’s GPU dominance. 🤔 If AFM-on-Device performs well, we might see: • More Edge AI use, where small models lighten the memory and compute load. • Data centers can get more flexible with location, as edge AI would reduce latency concerns. That said, some heavy lifting will still need to happen in the cloud (hello, O1!). • A rise in TPU training customers. Before the first iPhone, we’d seen Android and Blackberry. But the iPhone totally changed the world. This time, let’s see if Apple can lead the way for Edge AI! 🧠📱 This post is part of my Independent Study in AI Infrastructure at Stanford GSB, where I’m exploring potential challenges and opportunities in the next 2-3 years. If you have any insights or suggestions, feel free to comment or DM me! https://lnkd.in/gP3kKsTS #AI #Apple #EdgeAI #NVIDIA #TPU #LLM #390IndependentStudy #StanfordGSB #MSx

Introducing Apple’s On-Device and Server Foundation Models

Introducing Apple’s On-Device and Server Foundation Models

machinelearning.apple.com

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

2mo

The use of INT4 precision and adaptors is ingenious, enabling efficient on-device inference while minimizing memory footprint. This approach could significantly impact the landscape of federated learning by allowing for more privacy-preserving model training directly on user devices. However, how will Apple's AFM-on-Device navigate the complexities of dynamic resource allocation in heterogeneous edge environments, considering factors like device heterogeneity and fluctuating network conditions?

Like
Reply

To view or add a comment, sign in

Explore topics