AI4Bhārat’s Post

View organization page for AI4Bhārat, graphic

13,712 followers

1mo

🚨🚨 𝗡𝗲𝘄 𝗜𝗻𝗱𝗶𝗰 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗟𝗟𝗠 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗕𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸! 🚨🚨 🎉 Excited to share our latest work: 𝗠𝗜𝗟𝗨 - A Multi-task Indic Language Understanding Benchmark, done as a collaboration between AI4Bhārat and IBM Research India, as part of The AI Alliance🌏🤝. 𝗞𝗲𝘆 𝗛𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁𝘀: • 85K MCQ questions in 11 Indian Languages. • Spanning 8 diverse domains and more than 40 subjects. • Built with an India-centric approach, evaluating both general and cultural knowledge. We also evaluated 40+ different LLMs (proprietary, open-source, and Indic language-specific). Our key findings: • GPT-4o achieved the highest accuracy at 72%. • Open LLMs (like Llama3.1, Gemma, etc) outperform Indic language finetuned LLMs. • Models struggle more with culturally relevant domains vs STEM. We hope this benchmark helps in driving the development of more culturally aware and linguistically-competent AI systems for India's 1.4B+ people. 𝗣𝗮𝗽𝗲𝗿 📄: arxiv.org/abs/2411.02538 𝗖𝗼𝗱𝗲 💻: https://lnkd.in/gDhnRsf5 𝗗𝗮𝘁𝗮𝘀𝗲𝘁 🤗: https://lnkd.in/gjruEKjP Work done by: Sshubam Verma Mohammed Safi Ur Rahman Khan Vishwajeet Kumar, PhD Rudra Murthy Jaydeep Sen #AI4Bharat #IBMResearch #NLP #AI #IndianLanguages #Benchmark #LLMs #Evaluation #LLMevaluation #AIAlliance

MILU: A Multi-task Indic Language Understanding Benchmark

arxiv.org

9 Comments

Max Barker

Hire FAANG talent on Discord 🕹️ | Trusted by top VC backed startups | Send me a DM for access 👋

1mo

https://discord.gg/learnmutiny

Rushi Tulasi

Research @proshort | finetuning, RAGs, KG, Agentic.

1mo

\usepackage[backend=biber, style=numeric-comp, maxcitenames=1, maxbibnames=2]{biblatex}

Pursottam Sah

1mo

Always there is love and respect for AI4Bhārat 🤩 Congratulations Team

Vishnu Prasad J (ヴィシュヌ)

Artificial Intelligence Engineer at Examroom.ai | Ex Quest Global [Canon Medicals, Japan]

1mo

Sagar Sarkale

2 Reactions

Krishnanjaneyulu Payala

Research Scholar- IIIT- Kottayam, Professional Member- ACM, ACM Anveshan Setu Fellow (2024- 25), ISRO (IIRS) Outreach Coordinator, Member (97999340) Software Defined Networks Community, IEEE

1mo

Excellent work

Zuhair hasan Shaik

Modelling (LLMs)

1mo

MSVPJ Sathvik

Rishav Dash

Data Science @ Xelpmoc Design and Tech Ltd || Prev AI @ Squareyards @iNeuron.ai || Kaggle Competition Expert || AI for Good @Omdena|| RAITian

1mo

Wasim Madha Aravind Selvam

Lodem Rakesh

1mo

Congratulations AI4bharat

See more comments

To view or add a comment, sign in

More Relevant Posts

Jakub Kosterna

Senior Machine Learning Specialist & AI Researcher | Natural Language Processing Engineer and Data Science Expert
2mo Edited
Report this post
I’m honored to begin work on my thesis for my second studies degree in #CognitiveScience, under the supervision of Professor Justyna Grudzinska. Our research will focus on how #LanguageModels leverage world knowledge when interpreting ambiguous sentences, specifically in the context of Quantifier Scope Disambiguation 🧠 The work builds on the article "Scope Ambiguities in Large Language Models”, which proposes that language models may rely on world knowledge to resolve scope ambiguities. We aim to investigate this hypothesis in more detail and analyze what kind of knowledge these models utilize. 📊 We’ll be working with a large dataset of scope-ambiguous sentences, supplemented by human judgment data, and plan to significantly expand this dataset to deepen the #research. 📚 Our #AIResearch could help shed light on how language models can more effectively interpret nuanced linguistic structures, which is a critical challenge in modern #NLP systems. The findings may have implications for improving #AI applications in fields such as automated reasoning, translation, and conversational agents. Eager to dive into this research and uncover new insights! 🔍 #Linguistics #ResearchProject #LLM #MachineLearning #ComputationalLinguistics #DataScience #NaturalLanguageProcessing

Scope Ambiguities in Large Language Models

direct.mit.edu

5 Comments
Like Comment
To view or add a comment, sign in
Arvind S.

Generative AI Strategist | Sr Director | 📊 Data & Analytics | 🤖 | 📖 Story Teller 🚀✨
1mo Edited
Report this post
Unlocking the Power of Language: Bridging the Script Gap in LLMs 🗣️🌎 Multilingual LLMs are amazing, but they struggle with non-Latin script languages like Arabic and Chinese 😕. Why? Because they're trained on Latin-based text, ignoring the shared sounds across scripts! these experts have got a solution! 💡 They propose using phonemic transcriptions, the sounds of words, as a complementary signal to create script-invariant representations. Here's how it works: ☑️ Enhanced Performance: This approach improves performance for both Latin and non-Latin languages! 🎉 ☑️ Closing the Gap: We significantly reduce the performance gap between the two script families! 🤝 ☑️ Mixed-ICL Retrieval: We use a novel retrieval strategy combining phonemic and orthographic information to boost performance even further! 🚀 Further Research shows that phonemic transcriptions unlock a new level of language understanding for LLMs, leading to more equitable and robust performance across the globe! 🌎 Read the full article here: https://lnkd.in/gF-zAbWR #phonemes, #multilingual, #llm, #nlp, #research, #language, #script, #ai, #innovation, #technology, #academia, #ltimindtree, #genai, #generativeai, #aiml, #trends

1 Comment
Like Comment
To view or add a comment, sign in
Sai Ruthvik

Sharing my observations | Startup Enthusiast | ML enthusiast | Data Science from IIT Madras |
1mo
Report this post
I recently came across a fascinating paper on MILU (Multi-task Indic Language Understanding Benchmark), a collaborative effort between AI4Bhārat and IBM Research India as part of The AI Alliance. This benchmark aims to enhance the evaluation of large language models (LLMs) specifically for Indian languages, and I believe it’s a significant step forward in our understanding of AI capabilities in diverse linguistic contexts. Key Highlights: - 85K Multiple Choice Questions: Covering 11 Indian languages, this benchmark spans 8 diverse domains and over 40 subjects. - Culturally Relevant Evaluation: MILU is designed with an India-centric approach, assessing both general knowledge and cultural insights. - Extensive Model Evaluation: Over 40 different LLMs were evaluated, including proprietary and open-source models. Notably: - GPT-4o achieved the highest accuracy at 72%. - Open LLMs like Llama3.1 outperformed Indic language fine-tuned models. - Models faced more challenges in culturally relevant domains compared to STEM subjects. This benchmark not only highlights the performance of existing models but also emphasizes the need for AI systems that are culturally aware and linguistically competent for India's diverse population of over **1.4 billion people. 📄 For those interested in diving deeper, check out the full paper here: [MILU Paper](https://lnkd.in/gwEBeBCt) 💻 Explore the code: [GitHub Repository](https://lnkd.in/gDhnRsf5) 📊 Access the dataset: [Download Here](https://lnkd.in/gjruEKjP) I’d love to hear your thoughts on how benchmarks like MILU can drive advancements in AI for multilingual contexts! What are your experiences with language models in diverse cultural settings? #AI #LanguageProcessing #IndicLanguages #Benchmark #Research #IBM #AI4Bharat #NaturalLanguageProcessing https://lnkd.in/gwEBeBCt

MILU: A Multi-task Indic Language Understanding Benchmark

arxiv.org
Like Comment
To view or add a comment, sign in
Alessandro Scirè

Ph.D. Student in AI @Sapienza NLP | NLP Researcher @Babelscape
4mo
Report this post
📢#MetricAlert Today we will present FENICE, our factuality-oriented metric for summarization with a strong focus on interpretability. See you at the poster session 12:15 ⌚ Karim Ghonim, Roberto Navigli Sapienza NLP Babelscape #ACL2024 #Bangkok #Summarization #Factuality #NLProc

Sapienza NLP

1,657 followers
4mo

Great News 📢 Our paper: "FENICE: Factuality Evaluation of Summarization based on Natural Language Inference and Claim Extraction" has been accepted at ACL 2024! Exciting strides in text summarization with LLMs, but verifying their factual accuracy remains an open challenge. We introduce FENICE, a summarization factuality metric with a strong focus on interpretability. FENICE leverages NLI-based alignments to match claims extracted from the summary with specific passages in the source document. This allows us to pinpoint the sections that support or contradict each claim 🔍 FENICE achieves state-of-the-art performance on AGGREFACT, and excels in our human-annotated dataset for long-form summarization. 📊✨ 👥 Authors: Alessandro Scirè, Karim Ghonim, Roberto Navigli 🤝 Joint work between Babelscape and Sapienza NLP. 📝 Paper: https://lnkd.in/dkrfiXeZ 💻 Code: https://lnkd.in/dYWvCujZ 🤗 Data: https://lnkd.in/dUAZyxFT #NLP #Summarization #Factuality #SummarizationEvaluation #MachineLearning #SapienzaNLP #Babelscape #Research

FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction

arxiv.org
Like Comment
To view or add a comment, sign in
Umar Iftikhar

Computer Vision Engineer | Data Scientist | AI Specialist | Revolutionizing Real-Time Analytics and Automation Technologies
1w
Report this post
Multilingual Datasets Are Evolving: Hugging Face's FineWeb2 Hugging Face has unveiled FineWeb2, an advanced 8TB dataset featuring over 3 trillion non-English words spanning 1,893 languages. This release marks a significant leap in the availability and quality of multilingual datasets for machine learning and natural language processing. Key Highlights: - Extensive Language Support: FineWeb2 includes 1,893 languages, with 486 languages surpassing 1MB of data and 80 languages exceeding 1GB. - Performance Excellence: Outperforms existing multilingual datasets like CC-100, mC4, CulturaX, and HPLT across a wide range of languages. - Robust Construction: The dataset is built from 96 CommonCrawl snapshots (2013–2024) using advanced filtering and cleaning mechanisms. - Ablation Studies: Hundreds of experiments were conducted on 1.45B parameter models trained with 30B tokens, ensuring the dataset's efficacy. - Innovative Techniques: Employs deduplication via "re-hydration," language-specific filtering, and PII anonymization while providing opt-out options. - Specialized Evaluation: Assessed on nine diverse languages using the FineTasks benchmark, demonstrating superior performance. This release represents a major step forward for multilingual and low-resource language research. Researchers and practitioners can explore FineWeb2's data and filtering configurations, which are openly available. Learn more and access the dataset here: https://lnkd.in/eSpRCSJE #AI #MachineLearning #NLP #MultilingualAI #FineWeb2 #NaturalLanguageProcessing #DataScience #CommonCrawl #MultilingualDatasets #HuggingFace #LanguageModeling #OpenSourceAI #DataEngineering #DeepLearning #LowResourceLanguages #GlobalAI #AIDevelopment Umar Iftikhar
Like Comment
To view or add a comment, sign in
Asif Razzaq

AI Research Editor | CEO @ Marktechpost | 1 Million Monthly Readers and 56k+ ML Subreddit
2mo
Report this post
OpenAI Releases Multilingual Massive Multitask Language Understanding (MMMLU) Dataset on Hugging Face to Easily Evaluate Multilingual LLMs OpenAI released the Multilingual Massive Multitask Language Understanding (MMMLU) dataset on Hugging Face. As language models grow increasingly powerful, the necessity of evaluating their capabilities across diverse linguistic, cognitive, and cultural contexts has become a pressing concern. OpenAI’s decision to introduce the MMMLU dataset addresses this challenge by offering a robust, multilingual, and multitask dataset designed to assess the performance of large language models (LLMs) on various tasks. This dataset comprises a comprehensive collection of questions covering various topics, subject areas, and languages. It is structured to evaluate a model’s performance on tasks that require general knowledge, reasoning, problem-solving, and comprehension across different fields of study. The creation of MMMLU reflects OpenAI’s focus on measuring models’ real-world proficiency, especially in languages that are underrepresented in NLP research. Including diverse languages ensures that models are effective in English and can perform competently in other languages spoken globally... Read the full article here: https://lnkd.in/gnTvTVb8 Dataset: https://lnkd.in/g5RzKd_k OpenAI #dataset #opensource
2 Comments
Like Comment
To view or add a comment, sign in
Ganesh Jagadeesan

Enterprise Data Science Specialist @Mastech Digital | NLP | NER | Deep Learning | Gen AI | MLops
3mo
Report this post
🌍 Low-resource NER is definitely a tough nut to crack! 🤖 You’ve perfectly captured the core challenges of Named Entity Recognition for low-resource languages—from data scarcity to the complexities of linguistic diversity. The struggle is real, especially when dealing with languages that lack standardized resources or pre-trained models like BERT for English. The fact that each language comes with its own unique orthographic quirks and syntactic rules just adds to the complexity. 1️⃣ Data scarcity is the biggest roadblock—getting annotated data for languages like Quechua or Yoruba is nearly impossible. And when you factor in the need for linguistically diverse data, it’s clear that one-size-fits-all approaches used for English just don’t work for agglutinative or tonal languages. 2️⃣ Pre-trained models for these languages? Forget about it! Starting from scratch can feel like reinventing the wheel, but luckily, transfer learning from related languages can give you a head start. 3️⃣ Active learning is another promising approach. With limited data, every annotation counts, and active learning can help you make the most of what little you have. 4️⃣ Unsupervised methods like clustering for pseudo-labeling can be a game-changer when annotating is costly or impossible. 5️⃣ While multilingual models aren’t perfect, they offer a starting point for low-resource languages. Models like mBERT or XLM-R can provide some support, even if they don’t match the performance of language-specific models. 6️⃣ Finally, community involvement is key—crowdsourcing annotations from native speakers can help build a stronger dataset. The more we can engage communities to help annotate and refine NER tasks, the closer we get to solving this challenge. It’s a tough space to work in, but with techniques like transfer learning, active learning, and community engagement, there’s a lot of potential to make meaningful progress. I’d love to hear how others are tackling these challenges—any creative solutions out there? #NLP #NamedEntityRecognition #LowResourceLanguages #AIchallenges #TransferLearning #ActiveLearning #UnsupervisedLearning #mBERT #Crowdsourcing #Linguistics

Pallavi Saxena

NLP Visionary | AI Powerhouse | Automating Tasks with Predictive Modeling & NLP Solutions
3mo

🌍 NER for Low-Resource Languages: The Struggle is Real Knee-deep in the low-resource NER trenches, I've hit some serious roadblocks. Here's the unvarnished truth: • Data scarcity is a killer. Good luck finding large annotated datasets for Quechua or Yoruba! • Linguistic diversity = headache. What works for English often fails for agglutinative or tonal languages. • Lack of pre-trained models. No BERT for Inuktitut? You're starting from scratch, pal. • Orthographic nightmares. Non-standardized spellings? Welcome to the wild west of NER. • Limited language expertise. Finding annotators who speak the language AND understand NLP? Unicorn hunt. But there's hope! Some tricks up my sleeve: • Transfer learning from related languages. It's not perfect, but it's a start. • Active learning to make the most of limited data. • Unsupervised techniques like clustering for pseudo-labeling. • Leveraging multilingual models. They're not ideal, but beggars can't be choosers. • Community involvement. Crowdsourcing can work wonders. What's your experience with low-resource NER? Any clever hacks to share? #NLP #NamedEntityRecognition #LowResourceLanguages #AIchallenges
Like Comment
To view or add a comment, sign in
Amit Singh

Senior Data Scientist | LLM | GenAI | Artificial intelligence | Deep learning | Computer vision | Forecasting
2mo
Report this post
OpenAI Releases Multilingual Massive Multitask Language Understanding (@MMMLU) Dataset on Hugging Face to Easily Evaluate Multilingual LLMs As language models grow increasingly powerful, the necessity of evaluating their capabilities across diverse linguistic, cognitive, and cultural contexts has become a pressing concern. OpenAI’s decision to introduce the MMMLU dataset addresses this challenge by offering a robust, multilingual, and multitask dataset designed to assess the performance of large language models (LLMs) on various tasks. This dataset comprises a comprehensive collection of questions covering various topics, subject areas, and languages. It is structured to evaluate a model’s performance on tasks that require general knowledge, reasoning, problem-solving, and comprehension across different fields of study. The creation of MMMLU reflects OpenAI’s focus on measuring models’ real-world proficiency, especially in languages that are underrepresented in NLP research. Including diverse languages ensures that models are effective in English and can perform competently in other languages spoken globally... Dataset: https://lnkd.in/gCGgkXPN
Like Comment
To view or add a comment, sign in
The Year of the Graph

3,420 followers
4mo
Report this post
Do LLMs Really Adapt to Domains? An Ontology Learning Perspective Large Language Models (LLMs) have demonstrated unprecedented prowess across various natural language processing tasks in various application domains. Recent studies show that LLMs can be leveraged to perform lexical semantic tasks, such as Knowledge Base Completion (KBC) or Ontology Learning (OL). However, it has not effectively been verified whether their success is due to their ability to reason over unstructured or semi-structured data, or their effective learning of linguistic patterns and senses alone. This unresolved question is particularly crucial when dealing with domain-specific data, where the lexical senses and their meaning can completely differ from what a LLM has learned during its training stage. New research investigates the following question: Do LLMs really adapt to domains and remain consistent in the extraction of structured knowledge, or do they only learn lexical senses instead of reasoning? To answer this question a controlled experiment was set up using WordNet to synthesize parallel corpora, with English and gibberish terms. Differences in the outputs of LLMs for each corpus in two OL tasks were examined: relation extraction and taxonomy discovery. Empirical results show that, while adapting to the gibberish corpora, off-the-shelf LLMs do not consistently reason over semantic relationships between concepts, and instead leverage senses and their frame. However, fine-tuning improves the performance of LLMs on lexical semantic tasks even when the domain-specific terms are arbitrary and unseen during pre-training, hinting at the applicability of pre-trained LLMs for OL. The material is available on Github. Link in comments #LLM #AI #KnowledgeGraph #DataModeling #Research #EmergingTech #OpenSource
8 Comments
Like Comment
To view or add a comment, sign in
Amirhossein Abaskohi

Computer Science Masters Student at the University of British Columbia
8mo
Report this post
Thrilled to announce that our paper, "Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT" is now available! 📝 Our paper, accepted at #LRECCOLING2024, delves into the efficacy of large language models (LLMs) for Persian, shedding light on their performance and potential in low-resource language settings. In this comprehensive study, we thoroughly evaluate LLMs such as GPT-3.5-turbo, GPT-4, and OpenChat-3.5 across various tasks in the Persian language domain. From classic to reasoning to knowledge-based tasks, we provide insights into their strengths and limitations, comparing them against task-specific fine-tuned models. Additionally, we introduce new benchmarks for reasoning tasks in Persian, contributing to the advancement of NLP research in this language. 📄 Paper: https://lnkd.in/gihR4UjM 💻Code: https://lnkd.in/gvRP8TyD #NLP #PersianLanguage #ChatGPT #LREC #COLING #LRECCOLING

Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT

arxiv.org

1 Comment
Like Comment
To view or add a comment, sign in

13,712 followers

View Profile Follow

AI4Bhārat’s Post

More Relevant Posts

Explore topics