We've started optimizing Claude models to run on Amazon Web Services (AWS) Trainium2—their most advanced AI chip. It's already bearing fruit: Our first release is a faster version of Claude 3.5 Haiku in Amazon Bedrock. We’re also introducing Amazon Bedrock Model Distillation. In distillation, a "teacher" (Claude 3.5 Sonnet) transfers knowledge to a "student" (Claude 3 Haiku), helping the “student” run more sophisticated tasks at a fraction of the cost. In addition to offering a faster version on Trainium2, we're lowering the base price of Claude 3.5 Haiku across all platforms. The faster Claude 3.5 Haiku and model distillation are available in preview today in Amazon Bedrock: https://lnkd.in/eYbnXEm4
Anthropic
Research Services
Anthropic is an AI safety and research company working to build reliable, interpretable, and steerable AI systems.
About us
We're an AI research company that builds reliable, interpretable, and steerable AI systems. Our first product is Claude, an AI assistant for tasks at any scale. Our research interests span multiple areas including natural language, human feedback, scaling laws, reinforcement learning, code generation, and interpretability.
- Website
-
https://www.anthropic.com/
External link for Anthropic
- Industry
- Research Services
- Company size
- 501-1,000 employees
- Type
- Privately Held
Employees at Anthropic
Updates
-
We’re starting a Fellows program to help engineers and researchers transition into doing frontier AI safety research full-time. Beginning in March 2025, we’ll provide funding, compute, and research mentorship to 10–15 Fellows with strong coding and technical backgrounds. Fellows will have access to: - A weekly stipend of $2,100 - ~$10k per month for compute & research costs - 1:1 mentorship from an Anthropic researcher - Shared workspaces in the Bay Area and London Fellows will collaborate with Anthropic researchers for 6 months on projects in areas such as: - Adversarial robustness & AI control - Scalable oversight - Model organisms of misalignment - Interpretability Fellows can participate while affiliated with other organizations (e.g., while in a PhD program). At the end of the program, we expect Fellows to be stronger candidates for roles at Anthropic, and we might directly extend some full-time offers. Apply by January 20 to join our first cohort! Full details are at the following link: https://lnkd.in/dDNzwZCp
-
Read how Intercom's Fin, powered by Claude, helps 25,000+ companies deliver instant, high-quality support: https://lnkd.in/eYSKc__n
How Intercom Achieves 86% Resolution Rates with Claude AI: A Customer Service Revolution
anthropic.com
-
With styles, you can now customize how Claude responds. Select from the new preset options: Concise, Explanatory, or Formal. Whether you're a developer writing formal documentation, a marketer crafting clear brand guidelines, or a product team planning extensive project requirements, Claude can adapt to your preferred way of writing.
-
We’re expanding our collaboration with Amazon Web Services (AWS) to develop and deploy next-generation AI systems. This includes a new $4 billion investment from Amazon and establishes AWS as our primary cloud and training partner. This brings Amazon's total investment in Anthropic to $8 billion. Working closely with AWS, we're developing future generations of Trainium chips. Designing both hardware and software together lets us optimize every aspect of model training. Our engineers work with the AWS chip design team to maximize computational efficiency, writing low-level kernels to directly interface with Trainium silicon and contributing to the AWS Neuron software stack. Through Amazon Bedrock, Claude has become core infrastructure for tens of thousands of companies seeking reliable and practical AI at scale. Together, we're laying a new technological foundation—from silicon to software—to train and power our most advanced AI models. Read more: https://lnkd.in/gfbKewvz
Powering the next generation of AI development with AWS
anthropic.com
-
Our new research paper: Adding Error Bars to Evals. AI model evaluations don’t usually include statistics or uncertainty. We think they should. Read the blog post: https://lnkd.in/d2jKfpyT When a new AI model is released, the accompanying model card typically reports a matrix of evaluation scores on a variety of standard evaluations, such as MMLU, GPQA, or the LSAT. But it’s unusual for these scores to include any indication of the uncertainty, or randomness, surrounding them. This omission makes it difficult to compare the evaluation scores of two models in a rigorous way. “Randomness” in language model evaluations may take a couple of forms. Any stream of output tokens from a model may be nondeterministic, and so re-evaluating the same model on the same evaluation may produce slightly different results each time. This randomness is known as measurement error. But there’s another form of randomness that’s not visible by the time an evaluation is performed. This is the sampling error; of all possible questions one could ask about a topic, we decide to include some questions in the evaluation, but not others. In our research paper, we recommend techniques for reducing measurement error and properly quantifying sampling error in model evaluations. With a simple assumption in place—that evaluation questions were randomly drawn from some underlying distribution—we develop an analytic framework for model evaluations using statistical theory. Drawing on the science of experimental design, we make a series of recommendations for performing evaluations and reporting the results in a way that maximizes the amount of information conveyed. Our paper makes five core recommendations. These recommendations will likely not surprise readers with a background in statistics or experimentation, but they are not standard in the world of model evaluations. Specifically, our paper recommends: 1. Computing standard errors using the Central Limit Theorem 2. Using clustered standard errors when questions are drawn in related groups 3. Reducing variance by resampling answers and by analyzing next-token probabilities 4. Using paired analysis when two models are tested on the same questions 5. Conducting power analysis to determine whether an evaluation can answer a specific hypothesis. For mathematical details on the theory behind each recommendation, read the full research paper here: https://lnkd.in/dBrr9zFi.
A statistical approach to model evaluations
anthropic.com
-
We’ve added a new prompt improver to the Anthropic Console. Take an existing prompt and Claude will automatically refine it with prompt engineering techniques like chain-of-thought reasoning. The prompt improver also makes it easy to adapt prompts originally written for other AI models to work better with Claude. Read more: https://lnkd.in/dx-5sp5P.
-
Coinbase customers now get faster and more accurate support with Claude powering their chatbot, help center search, and customer service teams across 100+ countries: https://lnkd.in/gWCvNy2u
Coinbase transforms their customer support with Claude
anthropic.com