New study shows LLMs outperform neuroscience experts at predicting experimental results in advance of experiments (86% vs 63% accuracy). They use a fine-tuned Mistral 7B but other models worked too. Suggests LLMs can integrate scientific knowledge at scale to support research.
Interesting finding. Any thought as to whether the researcher/scientist learned tendancy of letting the experiment play out as to letting the evidence decide the result instead of potentially becoming ego-involved in their understanding of what they think the results could be - and possibly influencing their interpretation of the results - impacting their overall mean here in this paper?! I have always had the perspective - we need to run the experiment(s) instead of anticipating that we know what the results will be before those experiments are run. There are often suprising results. I have certainly observed many 'leader-types' resisting the need to set-up and run the experiments, because they 'already knew what the results would be' - and sometimes overriding the recommendations of the researchers/scientists - and the 'leader' later finding out they were wrong! Thought experiment: - will this mean that in the future, if an AI or many AI's predict results/findings - will humans decide not to run the experiment(s) - because the AI(s) already know what the result will be?! That seems like a scenario that will be highly problematic!
"BrainBench evaluates test-takers' ability to predict neuroscience results. BrainBench’s test cases were sourced from recent Journal of Neuroscience abstracts across five neuroscience domains: behavioural/cognitive, systems/circuits, neurobiology of disease, cellular/molecular and developmental/plasticity/repair. Test-takers chose between the original abstract and one altered to substantially change the result while maintaining coherency. Human experts and LLMs were tasked with selecting the correct (that is, original) version from the two options. Human experts made choices and provided confidence and expertise ratings in an online study. LLMs were scored as choosing the abstract with the lower perplexity (that is, the text passage that was less surprising to the model), and their confidence was proportional to the difference in perplexity between the two options." They are not "predicting experimental results in advance of experiments." They are deciding which of two abstracts of the research is most closely aligned with the research itself. You are again, misrepresenting the results.
Here is the link to the paper (extracted by ChatGPT from the screenshot): https://doi.org/10.1038/s41562-024-02046-9
Very exciting, but it also raises the concern of 'optimising towards the known'. AI's are great at predicting future results based on what's already happened, but many of the biggest discoveries in human history were a mistake. We don't want to optimise all of the randomness out of research, because that's often where the magic happens.
So I guess we can have LLMs tell us what experiements to run. That might be a very good thing, right?
The most shocking aspect is that it’s just a 7B model. This study suggests it can predict results better, and a recent MIT study indicates that AI was basically taking over the idea generation and conception part. This is perfect, as it means humans can focus on embellishing studies.
How might the observed predictive prowess of these finely-tuned LLMs, especially the Mistral 7B, in forecasting neuroscience experiment outcomes, be systematically leveraged to enhance the experimental design and hypothesis-generation process within other scientific domains? Considering Professor Mollick's expertise in Co-Intelligence, this question delves into the profound implications of technological synergy with human cognitive processes, scrutinising how such integration might revolutionise traditional methodologies across diverse fields.
Impressive findings! What methodology did the study employ to determine the predictive accuracy of LLMs compared to neuroscience experts? https://lnkd.in/ghrd-r3R
I have also found very powerful predictive capability from Gen AI. -- Hi, I’m Lisa Carlin from The Turbochargers. My focus is to fast track your strategic projects. Subscribe to Turbocharge Weekly for 3 tips in 3 minutes from 50+ successful implementations. Join 8,000 leaders now. Click link under my name.
CO-FOUNDER of the non-profit organization PA:KOBLET | Multi-disciplinary Educator Specializing in AI in Education | Master's in History
3dIt's not hard to outperform humans. Pigeons outperformed human radiologists in 2015. https://www.nytimes.com/2015/11/25/science/pigeons-detect-breast-cancer-tumors.html#:~:text=Well%2C%20yes.,elsewhere%20experimented%20with%2016%20birds.