🖼️ Interested in productionizing diffusion models? Our developer advocate Charles Frye is hosting a webinar on Wednesday, Dec 18. Using a live example of building an artsy QR code generator, we'll cover general ML topics (iteration, evaluation, and GPU acceleration) as well as Modal-specific ones (repo structure, job handling, and CI/CD). Link to register below. https://lnkd.in/gH_AxfwW
Modal
Software Development
New York City, New York 6,349 followers
The serverless platform for AI, data and ML teams.
About us
Deploy generative AI models, large-scale batch jobs, job queues, and more on Modal's platform. We help data science and machine learning teams accelerate development, reduce costs, and effortlessly scale workloads across thousands of CPUs and GPUs. Our pay-per-use model ensures you're billed only for actual compute time, down to the CPU cycle. No more wasted resources or idle costs—just efficient, scalable computing power when you need it.
- Website
-
https://modal.com
External link for Modal
- Industry
- Software Development
- Company size
- 11-50 employees
- Headquarters
- New York City, New York
- Type
- Privately Held
Locations
-
Primary
New York City, New York 10038, US
-
Stockholm , SE
Employees at Modal
Updates
-
NVIDIA L40S GPUs are now available on Modal at $1.95/hr. Dropping just in time for the holiday content freeze 🫡 L40S's are juicier than the our current most popular inference-focused accelerator, the A10. They are priced between A10s and A100s on our platform. When might you use them? ⬆️ To run bigger models (like Flux schnell) that don't fit on an A10. At 48GB, the L40S has 2x the memory of the A10. ⚡ To run faster inference. Compared to using A10s, users can get up to a 1.4x speedup for memory-bound jobs like small-batch inference and up to 2x speedup for compute-bound jobs. We ran a quick benchmark on LLaMA 3.1 8B with vLLM and were able to able to achieve a 1.2x speedup on requests per second without optimizations. Give them a try!
-
Ever wondered what CUDA kernels actually get compiled to? Or tried to figure out just what all the components of the CUDA Toolkit do? Or the difference between "CUDA Cores" & "Tensor Cores"? We've put together a one-stop shop with all the answers! Introducing: the GPU Glossary The glossary covers the entire stack, from device hardware like warp schedulers and register files, through device software like PTX and kernels, up to host software like the CUDA Toolkit and Nsight Systems. We've included explanatory diagrams for key concepts, like the paired thread and memory hierarchies of the CUDA programming model and how they map onto the hardware. Check out the full resource here: https://lnkd.in/gZcpfBy2
-
Modal reposted this
A Modal x Cometeer collab was not on the 2024 bingo card but *stoked* about this gift! Thank you Modal, Erik Bernhardsson, and Akshat Bubna for the serverless cloud brew! 🙏 ☕ 💚
-
⚽️ New case study ft. one of the top football teams in the world right now ⚽️ Really cool to get an insider's look at how ML is applied to sports and entertainment. The best teams are leveraging computer vision data to get an edge when making decisions both on and off the field. Before Modal, this team tried using a GPU cluster from a hyperscaler for their data processing. The inflexible instance types and long cold starts made scaling and developer iteration very slow, though. By switching to Modal, this team was able to: 💰 Cut their costs in half with Modal’s usage-based serverless pricing ⏳ Reduce data processing time from hours to minutes with Modal’s fast autoscaling Excited to be a small part of this team's quest to be top of the table!
️I've followed and played ⚽ for decades. So naturally, when I saw a chance to help one of the best teams in the world, I jumped at the opportunity! Computer vision and data analytics have revolutionized the world of sports in recent years. Using Modal, one of the world's best teams collects 3.5 million spatio-temporal data points on players and the ball for each game. This data is turned into high-dimensional embeddings and is used to answer questions like - Was it the right time to take a shot? - How effective was the positioning of the players? - How do other teams handle such situations? Read more at https://lnkd.in/g-awZYt8
-
🧬 Biotech is one of the fastest growing industry segments on Modal. The team at Chai Discovery uses Modal to build state-of-the-art computational bio models. Last week, they updated the license of their Chai-1 molecular structure prediction model so that it can be used commercially. Today, we added a detailed example on how you can run this model yourself on Modal! If you're a biotech company looking to speed up batch data preprocessing or parallelize experimentation on GPUs, we'd love to chat!
-
Dive into our newest blog post, where Eric Zhang breaks down the technical process of implementing static IP address capabilities for serverless containers! ✏️
Excited to share a low-level networking project we've been working on at Modal, static IPs for serverless function containers using WireGuard proxies. Engineers often need to balance design tradeoffs, and one of the most common hesitations to using serverless functions we've heard at Modal is the lack of ability to set IP whitelists on your databases. This is important for security and regulatory compliance! We've engineered a plug-and-play solution that encrypts traffic and transparently masks the source IP, no matter how many containers you run around the world. Blog post explains how this is done (to my knowledge, it's first of its kind) & some network diagrams :) https://lnkd.in/esrsmAWc
WireGuard at Modal: Static IPs for Serverless Containers
modal.com
-
Modal reposted this
Last-minute sponsor addition: Modal! Compete to win the coveted "But I'm Not a Wrapper" prize for best self-hosted solution. $5k in credits & a stuffed llama.
Voice/Video AI is transforming our daily digital interactions. We're excited to channel this momentum into CreatorsCorner's third #GenAI Agents Hackathon 🔥 Following previous successes, Amazon Web Services (AWS), Tandem, Unified.to, Retell AI, Marly, Coval (YC S24), Temporal Technologies, Simli, SpeedLegal, Senso, and other pioneers are joining forces again to provide you with cutting-edge tools to build the next generation of AI solutions. 📅 When: Saturday, December 7th, 2024 ⏰ Time: 10 AM - 8 PM 📍 Location: Downtown San Francisco Prize Pool: $20,000+ in Credits & Cash Whether you're a seasoned AI builder or just starting your journey, this is your chance to help shape the future of human-AI interaction through voice and video. Apply now: https://lu.ma/i8bow7sr
-
Yesterday was a fun day for text-to-video models 🍵 On the non-protest-leak side of the house, Genmo released LoRA fine-tuning scripts for its Mochi model. Mochi is open-source and currently the top trending video diffusion model on Hugging Face. You can get started by running these scripts on Modal—no waiting around to get a GPU or fussing with dependencies. We tried this out ourselves on fine-tuning Mochi to create "explosion" videos of popular logos!