LLMs aren't problem solvers, they're eloquent search engines. I've been heavily leveraging AI coding assistants in a personal project for the last few months. In recent weeks, I've been working heavily on a web frontend, an area where this lifelong infrastructure engineer is way out of his depth. Overall, I feel 10x as productive with an assistant as without. But it's far from perfect. I've started to notice a pattern in the good and bad answers I get from the assistant. "Make this text bounce" → awesome. "Make a 4-column layout for these divs" → very good start. "Why is there a scrollbar here?" → terrible. "Explain React's component lifecycle" → awesome. "Why isn't React picking up the updated value for this state variable?" → terrible. "Explain how 'flex' works in CSS" → awesome. "What are popular frontend frameworks and what should I know about them?" → awesome. "Write a test for this component" → medium. "This call generates [compiler error]" → solid. The theme is that it's really good with "knowledge" and it's really bad at "thinking." When my question is just "what CSS property controls this?" it nails it. It doesn't just know the answer (which "old tech" like Google or StackOverflow could help with), but it can contextualize that to my program and integrate it with no trouble. Brilliant. Similarly, if I need a lecture, like "tell me how React works," it's perfect. It gives accurate info, contextualized to my current scope, and can effortlessly handle my follow-up questions so I learn at my own speed. Where it fails is with problem-solving skills. Its contextualization helps it fake a degree of problem solving, but over time it becomes clear that it only really solves problems that it has seen before. When things get truly tricky, it absolutely founders. "Make this scroll bar go away" just gives me random stabs in the dark, which either change nothing or make it worse. It feels like it's basically just saying, "I found 30 half-relevant threads on StackOverflow, let's just try the same stuff they tried until it works!" It's garbage. Contextualizing is not the same as thinking. The corollary is that these LLMs are not prepared to make real problem solvers—i.e. good software engineers—obsolete. They can turbocharge a good problem solver with encyclopedic access to contextualized information and much faster typing speed. But you can't really solve hard problems without a professional problem solver in the room.
Yeah, I would love to think so too, but I wonder if there’s an element of wishful thinking when we say, "this tech can't replace us as problem solvers." I think there are a couple of nuances in the statement: "it only really solves problems that it has seen before": 1. Among the problems we solve as software engineers, are there really that many truly unique ones? 2. When we're solving problems, don’t we mostly just aggregate what worked for us in the past?
The boundary between "problem solvers" and "solutions look-up assistant" is very blurry when majority of knowledge job is not about solving new problem, but apply someone's existing solution to a new context.
I totally agree with this take. I have been experimenting with LLMs to make a game just for fun (link below if curious). What I love the most is the productivity boost and help in the things I like doing the least. Create a basic structure for the website: html, js and css. Add CSS animations and make it mobile responsive. Compare deployment options for 2024, give me tradeoffs and instructions for deployment. While none of it was totally perfect, it is such a big head start and removes a ton of activation energy to get moving. The bits to glue it together, write a simple backend and deploy were the fun parts anyways. The subtle bugs in the different local and deployed environments required me drawing on experience and knowledge to complete. While uploading screenshots may have allowed the LLM to figure it all out with enough iteration, it certainly is not faster than an experienced human. Lastly, I used the LLM to generate the questions for the game itself. With enough prompt engineering, the questions were good enough enough for a V1. It's not going to make it into a publication for a medical journal, but certainly enough for a daily quiz for my mother and sister: https://www.nutrigamed.com/
A tool is only as good as the person using it.
I’m interested to see if the new o1 or o1-pro will shift your opinion at all!
I agree almost completely. The one part I'm not fully onboard with is the "explain how X works". Sometimes it's spot on, and sometimes it fails in the same ways that 90% of Medium articles do. The problem there is that if I'm asking questions like that, I'm not qualified to discern between the former and latter.
Yep. Sounding like you can reason is not the same as reasoning. By definition, LLMs are trained for the former. :)
Which AI coding assistants have you tried and have you noticed much difference between them?
What model?
Geek god (or at least a minor deity)
1wI’m not so sure. The other day while hacking on a complex prototype, I needed to add a function to my own messy, janky, Python (i.e. dynamically typed) code that I had lost all mental context on. It had about 3 different graphical coordinate systems, organically evolved during experimentation over months, all jumbled together. Two of those systems were captured in the SAME coordinate variables, with the system in used being conditional on the values of some OTHER variables being non-negative. I asked it to write a function to scale between those systems, and it did so correctly 🤯 All I had to do was run the code and things worked as expected. Saved me about an hour of software archaeology for about a minute's worth of prompting. Plus, this was through a free tier model in Cody, so not even one of the frontier models. I know a lot of janky code exists in the world, but I highly doubt this precise form of jankiness has been written before and has made it into some training dataset. As such, I cannot see how this could be just search and not some form of reasoning. I think you will find more success if you keep the code modular such that all relevant code can be fit into a relatively small context size. Functional code FTW!