Kamran S.’s Post

View profile for Kamran S., graphic

EVP @ Klick | Driving AI Adoption and Business Solutions

Some quick observations from trying out the o1-preview from OpenAI. 1. These new models reduce the need to master prompt engineering will make conversational AI tools more useful for a broader audience 2. The new models reach a "final" good response quicker even though these models take longer to think by avoiding the need for longer conversations 3. The responses sound more convincing, making errors and mistakes in responses harder to identify There are lots of technical innovations and improvements in these models. If you'd like to see a discussion on the importance and challenges with reasoning check this excerpt from a longer podcast with Yann LeCun. https://lnkd.in/gi-6XdFQ

Can LLMs reason? | Yann LeCun and Lex Fridman

https://www.youtube.com/

Kamran S.

EVP @ Klick | Driving AI Adoption and Business Solutions

3mo

The above comparison was looking quickly at GPT 4o vs o1. Claude Sonnet 3.5 does pretty well compared to o1. It’s faster and also input tokens are $3/million vs the $15/million for o1. That 5x price difference makes it important to think about the use case, accuracy improvement and cost vs benefit. I’ve been using Sonnet a lot more for coding assistance in tools like Cursor. Gemini has been much better for recently published material, APIs and long document(s) summarization. Gemini also tends to have more guarails on topics it won’t cover. My habit is pulling up GPT otherwise. There still isn’t “one ring to rule them all”, there will unlikely ever be one and that’s not a bad thing. It can feel overwhelming at times, it’s easy to question if we’re on the right path, making the right choices. These models are all extremely capable, for most use cases using any of them will be sufficient. Just take time periodically to try others and make decisions based on your needs, cost of switching and the benefit. You can also invest in building your infrastructure, including tracing and evaluation pipelines, to me multi LLM if you’re beyond the learning and prototyping phase.

Like
Reply

To view or add a comment, sign in

Explore topics