Xavier BERTHET’s Post

Explore, Discover, and Share

3mo

Assessing the role of evolutionary information for enhancing protein language model embeddings https://lnkd.in/e8CTvk2h

Assessing the role of evolutionary information for enhancing protein language model embeddings - Scientific Reports

nature.com

To view or add a comment, sign in

More Relevant Posts

Rahmad Akbar

I design antibody to enable people live the very best version of themselves | Antibody Designer at Novo Nordisk | Health & Safety Rep. Digital Science & Innovation at Novo Nordisk
3mo
Report this post
Assessing the role of evolutionary information for enhancing protein language model embeddings ABSTRACT Embeddings from protein Language Models (pLMs) are replacing evolutionary information from multiple sequence alignments (MSAs) as the most successful input for protein prediction. Is this because embeddings capture evolutionary information? We tested various approaches to explicitly incorporate evolutionary information into embeddings on various protein prediction tasks. While older pLMs (SeqVec, ProtBert) significantly improved through MSAs, the more recent pLM ProtT5 did not benefit. For most tasks, pLM-based outperformed MSA-based methods, and the combination of both even decreased performance for some (intrinsic disorder). We highlight the effectiveness of pLM-based methods and find limited benefits from integrating MSAs. PAPER: https://lnkd.in/d4TViaKY CODE: https://lnkd.in/dMUWiF5Q

Assessing the role of evolutionary information for enhancing protein language model embeddings - Scientific Reports

nature.com
Like Comment
To view or add a comment, sign in
Andrei Lopatenko 🇺🇦

VP AI & Engineering | Co-Founder | Keynote speaker | Ex-Google, Apple, WML
7mo Edited
Report this post
Nathan Lambert wrote an article about chatbot arena with a lot of important observations I am a proponent of side by side evaluation methods that work well for many problems The article is not about side by sides but about chatbot arena, in particular but there are a lot of details that are genetically applicable to any side by side comparison. For example , about interpretation of Elon points , what do they mean for understanding difference in models accuracies One of the reasons why I do not like Elo system for side by side ranking, the Elo system was designed for particular game tournaments and models comparisons are very different in their nature Even for tournaments , there were many better models were invented after the Elo Good reading : There was also a good article from Anthropic about arena evaluation

ChatBotArena: The peoples’ LLM evaluation, the future of evaluation, the incentives of evaluation, and gpt2chatbot

interconnects.ai
Like Comment
To view or add a comment, sign in
Ferenc Szalai

Physicist
3mo
Report this post
Genetic algorithm with an LLM twist. Very good for generating ideas. Very bad at coming up with generalizable solutions. https://lnkd.in/dnFYUcDQ

Mathematical discoveries from program search with large language models - Nature

nature.com
Like Comment
To view or add a comment, sign in
Apheris

4,936 followers
10mo
Report this post
Protein language models are trained on diverse sequences and materially impact the fields of sequencing design, variant effect prediction and structure prediction. 💡 Why proprietary data is key: Access to proprietary data is key to unlocking the full potential of these language models. While public datasets have laid the groundwork, the deep insights into complex molecular interactions and pathways to novel treatments lie in the rich but guarded realm of proprietary data. 🔒 Empower drug discovery with Apheris: Our Compute Gateway (CG) is designed to provide governed, private and secure computational access to proprietary, sensitive drug discovery data. Through our federated learning backbone, data insights from multiple, distributed datasets, are your new reality. 🔗 https://hubs.li/Q02lLYFy0 The next pharmaceutical breakthrough is just a collaboration away. #DrugDiscovery #AI #ProteinEngineering #DataPrivacy #PharmaInnovation

Nathan Benaich on LinkedIn: **Designing proteins with language models** Protein language models learn…

linkedin.com
Like Comment
To view or add a comment, sign in
Iván Kisialiou

Machine Learning Lead @factorymaker | AI Enthusiast
8mo Edited
Report this post
The idea of automatically optimising LLM prompts leads to significant performance improvements, and inspires researchers to find the best approach. Applying genetic algorithms to prompt optimization is just an expected and brilliant next step. https://lnkd.in/dsq7EeNN

Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

arxiv.org
Like Comment
To view or add a comment, sign in
Antonio Montano 🪄

Delivering perpetual agility via technology ✨
3w
Report this post
💥💥💥 The Protein Language Visualizer: Sequence Similarity Networks for the Era of Language Models Abstract The advent of high-throughput sequencing technologies and the availability of biological “big data” has accelerated the discovery of new protein sequences, making it challenging to keep pace with their functional annotation. To address this annotation challenge, techniques such as Sequence Similarity Networks (SSNs) have been employed to visually group proteins for faster identification. In this paper, we present an alternative visual analysis tool that uses Protein Language Model (PLM) embeddings. Our PLVis pipeline employs dimensionality reduction algorithms to cluster similar sequences, enabling rapid assessment of proteins based on their neighbors. Through analysis using average Jaccard distance and cosine similarity metrics, we found that well-separated clusters (those with silhouette scores above 0.95) captured high-dimensional information better than other regions of the projection. While proteins in poorly defined “fuzzy” regions showed similar embeddings to those in neighboring clusters, we note that distances in these projections should not be directly interpreted. To make this pipeline accessible to a wider research community, we have created a Google Colab Notebook for the comparison of protein datasets. 👉 https://lnkd.in/dcDgkysJ #machinelearning
2 Comments
Like Comment
To view or add a comment, sign in
Rahmad Akbar

I design antibody to enable people live the very best version of themselves | Antibody Designer at Novo Nordisk | Health & Safety Rep. Digital Science & Innovation at Novo Nordisk
4w
Report this post
The Protein Language Visualizer: Sequence Similarity Networks for the Era of Language Models ABSTRACT The advent of high-throughput sequencing technologies and the availability of biological "big data" has accelerated the discovery of new protein sequences, making it challenging to keep pace with their functional annotation. To address this annotation challenge, techniques such as Sequence Similarity Networks (SSNs) have been employed to visually group proteins for faster identification. In this paper, we present an alternative visual analysis tool that uses Protein Language Model (PLM) embeddings. Our PLVis pipeline employs dimensionality reduction algorithms to cluster similar sequences, enabling rapid assessment of proteins based on their neighbors. Through analysis using average Jaccard distance and cosine similarity metrics, we found that well-separated clusters (those with silhouette scores above 0.95) captured high-dimensional information better than other regions of the projection. While proteins in poorly defined "fuzzy" regions showed similar embeddings to those in neighboring clusters, we note that distances in these projections should not be directly interpreted. To make this pipeline accessible to a wider research community, we have created a Google Colab Notebook for the comparison of protein datasets. PAPER: https://lnkd.in/d92ZGrJK

The Protein Language Visualizer: Sequence Similarity Networks for the Era of Language Models

biorxiv.org

1 Comment
Like Comment
To view or add a comment, sign in
frank a schmidt

Data analyst
3mo
Report this post
Exploring Algorithmic Processes in Biological Evolution: Beyond Mutation and Natural Selection With GPT4... Hot off the presses. See it here. https://lnkd.in/edEWVWTz

Exploring Algorithmic Processes in Biological Evolution: Beyond Mutation and Natural Selection

https://lfyadda.com
Like Comment
To view or add a comment, sign in
Chris Hayduk

Lead Machine Learning Engineer for Drug Discovery @ Deloitte | AI, Drug Discovery, and Large Language Models
5mo
Report this post
With the release of the newest advance in protein language modeling, ESM3, and the staggering $142 million funding round announced by EvolutionaryScale, we're left wondering - does it warrant the hype? In this article, I delve into the details behind protein language models and explain why ESM3 represents a shift to a new paradigm of models that incorporate data across multiple scales of biological complexity. https://lnkd.in/gp_nRyq6

ESM3 and the Future of Protein Language Models

chrishayduk.com
Like Comment
To view or add a comment, sign in
Russell Jurney

Graphs and Generative AI
1w
Report this post
Another great survey paper, this one about state of the art genetic systems, which are rapidly evolving. I just got the book Agentic Systems… but like Deep Learning, the book, it seems… > we present this survey to offer an in-depth discussion on the essential aspects of multi-agent systems based on LLMs, as well as the challenges. Our goal is for readers to gain substantial insights on the following questions: What domains and environments do LLM-based multi-agents simulate? How are these agents profiled and how do they communicate? What mechanisms contribute to the growth of agents' capacities?

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

arxiv.org
Like Comment
To view or add a comment, sign in

3,775 followers

3000+ Posts

View Profile Connect

Xavier BERTHET’s Post

More Relevant Posts

Explore topics