Khaled Elleithy’s Post

Dean, College of Engineering, Business, and Education at University of Bridgeport

5mo

Hot of the press our new MDPI journal paper about distinguishing between human generated text and AI-generated text co-authored with my Ph.D. Student Hamed Alshammari. Current AI detection systems often struggle to distinguish between Arabic human-written text (HWT) and AI-generated text (AIGT) due to the small marks present above and below the Arabic text called diacritics. This study introduces robust Arabic text detection models using Transformer-based pre-trained models, specifically AraELECTRA, AraBERT, XLM-R, and mBERT. While our focus was on Arabic language due to its writing challenges, the model architecture is adaptable to any language.

Toward Robust Arabic AI-Generated Text Detection: Tackling Diacritics Challenges

mdpi.com

13 Comments

Reima Almajdoub

Sabha University

5mo

Good luck 🍀

Dr. Jamil Bakhashwain

Associate Professor at PMU

5mo

الف مبروك د. خالد مع تمنياتي لك بالتوفيق والنجاح

1 Reaction

Laiali Almazaydeh

Professor في The American University in the Emirates

5mo

Prof. Khaled Elleithy I firmly believe that the best research I have conducted was achieved under your supervision

Brahim BELAHCENE

Doctor of Science & Industrial sector & Materials Science, Energy, IA, Peer Review **********

5mo

مبروك عليكم و مزيد من نجاح

Tim Raynor DBA

Strategic Leader, Builder of Innovative Programs and Collaborative Teams

5mo

Congratulations!!!

Mike Barlow

Award-winning author, editor and ghostwriter

5mo

Excellent! Congratulations!

See more comments

To view or add a comment, sign in

More Relevant Posts

Benjamin Naderi

Building and Deploying Machine Learning Solutions at Scale
1mo
Report this post
I often get questions from colleagues and friends about why large language models differ so significantly from traditional approaches in handling natural language tasks. In simple terms, large language models can better understand language by focusing on the most important parts of a sentence, leading to more efficient and accurate models 🎯 Traditional models, on the other hand, usually process language word by word or in fixed chunks, without paying attention to the relationships between words in a flexible way. This often leads to less accurate understanding of context. This article (from 2017!) does a refreshing deep dive into the concepts that have shaped modern NLP. It’s fascinating how this groundbreaking paper has laid the foundation for the language models we use today. arxiv.org/abs/1706.03762 #MachineLearning #LanguageModels

Attention Is All You Need

arxiv.org

3 Comments
Like Comment
To view or add a comment, sign in
Antonio Montano 🪄

Delivering perpetual agility via technology ✨
6mo
Report this post
You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism Mehran Hosseini, Peyman Hosseini Abstract Scaled Dot Product Attention (SDPA) is the backbone of many modern deep-learning models. It is so versatile that it has been used in natural language, vision, and multi-modal domains with very little change compared to its original formulation. This paper discusses why the current formulation is inefficient by delving into the mathematical details of the attention mechanism. We propose three improvements to mitigate these inefficiencies, thereby, introducing three enhanced attention mechanisms: Optimised, Efficient, and Super Attention. Optimised and Efficient Attention have one and two matrix multiplications fewer per head, respectively, and 25% and 50% fewer parameters, respectively, than standard SDPA, but perform similarly to standard SDPA in both vision and natural language tasks. They can be used in all applications where SDPA is used while offering smaller model sizes and faster training and inference without noticeable loss in performance. Super Attention introduces a new linear transformation on the values, transforming them from the left. It outperforms standard SPDA on vision and natural language tasks by up to 17% while having one fewer matrix multiplication per head and 25% fewer parameters than standard SDPA. Consequently, it is also faster than standard SDPA. Super Attention is ideal in applications where the attention layer's context length is fixed, such as Vision Transformers. In addition to providing mathematical reasoning, we evaluate the presented attention mechanisms on several datasets including MNIST, CIFAR100, ImageNet, IMDB Movie Reviews, and Amazon Reviews datasets, as well as combined Europarl and Anki English-Spanish datasets for neural machine translation. 👉 https://lnkd.in/d7AmyT7s #machinelearning
Like Comment
To view or add a comment, sign in
Alex Donkers

Postdoc FireBIM at TU/e
3mo
Report this post
🔥 FireBIM update! The FireBIM project aims to digitize fire regulations across European borders. Ekaterina Petrova and I did some first tests where we implemented NLP techniques to automatically convert regulations in natural language to regulations in SHACL shapes. The work also contains some first insights in the FireBIM ontology stack. We aim to have a European ontology with all country-specific fire safety terms and insights, that bridges terms in regulations with information in BIM models. Please comment, share ideas and critique, and reuse if you work on similar topics! 📄 Converting Fire Safety Regulations to SHACL Shapes Using Natural Language Processing 🔗 https://lnkd.in/e3UjrAYb I presented this at the 3rd International Workshop on Natural Language Processing for Knowledge Graph Creation, taking place at the SEMANTiCS 2024 conference. Thanks to Associate Professor Edlira Kalemi Vakaj and others for organising this! #NLP4KGC #SEMANTiCS2024 #SEMANTiCS #FireBIM

10 Comments
Like Comment
To view or add a comment, sign in
Ferhat Ozgur Catak

Assoc. Professor at the University of Stavanger | IEEE Norway ComSoc Chair
5mo
Report this post
Our latest research paper on arXiv: "Uncertainty Quantification in Large Language Models Through Convex Hull Analysis". In this study, we introduce a novel geometric approach to quantify uncertainty in Large Language Models (LLMs) by leveraging convex hull analysis. Our method provides deeper insights into the reliability of LLM outputs, particularly crucial for high-risk applications. 🔍 Key Highlights: - Geometric Approach: Utilizes convex hull analysis to measure dispersion and variability of model outputs. - Prompt Categorization: Analyzes responses to easy, moderate, and confusing prompts. - Temperature Settings: Explores the impact of different temperature settings on model uncertainty. 📄 Check out the full preprint here: https://lnkd.in/d9UYqJin TL;DR: Introducing a novel method for uncertainty quantification in LLMs using convex hull analysis. This approach helps measure the reliability of LLM outputs across different prompt complexities and temperature settings. #AI #MachineLearning #Research #LanguageModels #UncertaintyQuantification #arXiv #NLP #ConvexHull

Uncertainty Quantification in Large Language Models Through Convex Hull Analysis

arxiv.org
Like Comment
To view or add a comment, sign in
Pavan Belagatti Pavan Belagatti is an Influencer

GenAI Evangelist | Developer Advocate | Tech Content Creator | 30k Newsletter Subscribers | Empowering AI/ML/Data Startups
1mo
Report this post
It is time migrate from traditional #RAG to advanced RAG systems. Traditional RAG systems often struggle with context retention, accuracy, and the ability to handle complex queries. Advanced techniques enhance the robustness and efficiency of these systems, enabling them to generate more relevant and contextually aware responses. The integration of advanced techniques such as query transformation, semantic chunking, multi-stage retrieval, reranking, knowledge graph integration, and others into the RAG framework significantly enhances its capabilities. By employing these techniques, we can achieve more accurate, relevant, and contextually aware responses, paving the way for more sophisticated applications in natural language processing. As the field continues to evolve, ongoing research and development will further refine these techniques, ensuring that RAG remains at the forefront of AI-driven communication. Here are some advanced RAG techniques you should know: https://lnkd.in/gdyVqgaY Create advanced RAG applications with confidence: https://lnkd.in/gvWN5Wrk
5 Comments
Like Comment
To view or add a comment, sign in
Terence Nero

Transformational Cloud Architect | Application Architecture & Solution Design Expert | Thought Leader | Microservices & DevSecOps | Fullstack | Expert in Scalable Cloud-Native Solutions
1mo
Report this post
Crisp and clear details of the various layers in an Advanced RAG #RAG #GenAI
Pavan Belagatti Pavan Belagatti is an Influencer

GenAI Evangelist | Developer Advocate | Tech Content Creator | 30k Newsletter Subscribers | Empowering AI/ML/Data Startups
1mo

It is time migrate from traditional #RAG to advanced RAG systems. Traditional RAG systems often struggle with context retention, accuracy, and the ability to handle complex queries. Advanced techniques enhance the robustness and efficiency of these systems, enabling them to generate more relevant and contextually aware responses. The integration of advanced techniques such as query transformation, semantic chunking, multi-stage retrieval, reranking, knowledge graph integration, and others into the RAG framework significantly enhances its capabilities. By employing these techniques, we can achieve more accurate, relevant, and contextually aware responses, paving the way for more sophisticated applications in natural language processing. As the field continues to evolve, ongoing research and development will further refine these techniques, ensuring that RAG remains at the forefront of AI-driven communication. Here are some advanced RAG techniques you should know: https://lnkd.in/gdyVqgaY Create advanced RAG applications with confidence: https://lnkd.in/gvWN5Wrk
Like Comment
To view or add a comment, sign in
Lingua Custodia

3,757 followers
5mo
Report this post
🎉 Our second research paper for 2024!! We are on a roll! Well done Gaëtan Caillaut Mariam N. Jingshu Liu Raheel Qader 😍 The paper « Améliorer la traduction au niveau du document grâce au sur-échantillonnage négatif et au masquage ciblé » (Improve document-level machine translation using negative-sampling and focused-masking) published during the French conference JEP-#TALN describes strategies we explored at Lingua Custodia to make the most of our sentence-level data. 👨🔬 Our research proposes alternative strategies to teach neural networks to extract contextual information as this helps to optimise the quality of the translated text. 👩🔬 We explored two methods, 1) negative-sampling and 2) masking highly contextual words in the source sentence, both of which improved contextual-awareness of the models. Our future research will explore the capacity of Large Language Models (LLM) to handle large contexts, not only for translation, but also for RAG systems. 🙌 We will finetune the best LLM on high quality data and also leverage the European Datacentre "Leonardo" (thank you AI BOOST 😊 ) to train, from scratch, a multilingual LLM based on the Mamba architecture. 💪 #rag #llm #research #ai #foundationmodel
Like Comment
To view or add a comment, sign in
Kirthika Rajaganesh

Software Engineer 2 at Microsoft IDC | ex-Honeywell | Masters of Data Science (Global) at Deakin University
5mo Edited
Report this post
Hi connections, I am excited to share that I've successfully completed my capstone project in the AIML course, focusing on Machine Translation! Domain: Machine Translation Context: Machine Translation automates translating source material into another language, enabling communication and idea exchange across countries. The dataset for this project is sourced from the ACL2014 Ninth Workshop on Statistical Machine Translation. Project Objective: Designing a Machine Translation model to translate sentences between German and English. Models Built: 1. LSTM (Long Short-Term Memory) 2. LSTM with GloVe Embedding using Seq2Seq Architecture 3. Bidirectional RNN (Recurrent Neural Network) and LSTM 4. Bidirectional RNN and LSTM using Seq2Seq Architecture The model that performed best in my tests was the Bidirectional RNN and LSTM using Seq2Seq Architecture. Advanced Models: Utilized Hugging Face models: TheBloke/Llama-2 and DunnBC22/mbart-large-50-English_German_Translation, achieving notable results. This project has been an incredible journey, enhancing my understanding and skills in machine translation and deep learning. Special thanks to the AIML course instructors and my peers for their guidance and support. #MachineTranslation #DeepLearning #AIML #NaturalLanguageProcessing #LSTM #RNN #Seq2Seq #HuggingFace #AI #CapstoneProject

Kirthika Rajaganesh has successfully completed a project on Capstone Project - AIML as a part of PGP-AIML-Online at Great Learning

olympus.mygreatlearning.com

4 Comments
Like Comment
To view or add a comment, sign in
Sameera Horawalavithana

Staff Data Scientist at Pacific Northwest National Laboratory, Ph.D.
9mo
Report this post
📣 ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science LLMs have shown impressive performance on many NLP tasks. However, their knowledge capacity is still limited to what was present in their pretraining corpus. Retrieval Augmentation Generation (RAG) has emerged as an effective solution by retrieving relevant context from external knowledge to complement the LLMs. But existing techniques ignore the structural relationships between documents in the corpus. To address these issues, we proposed a novel structure-aware retrieval augmented LLM that incorporates document structure. First, we constructed a heterogeneous document graph capturing multiple relationship types (e.g. citation, co-authorship) between scientific documents across 15+ scientific disciplines (e.g., Physics, Medicine, Chemistry, etc.) and then a GNN was trained to encode structural information about relevant retrieved passages. Along with passage text embeddings, structural embeddings of documents/passages are obtained and fused before feeding to the LLM. Extensive evaluation was done on scientific question answering and document classification tasks. Results show that modeling structure enables retrieving more coherent, faithful and contextually relevant passages, while maintaining overall accuracy. The key innovations are leveraging knowledge structure with document graphs and GNNs, advancing retrieval augmentation for scientific applications, and demonstrating improved coherence and faithfulness of retrieved passages. This research moves towards more robust and reliable augmented language models for complex scientific tasks. Paper: https://lnkd.in/eyTpQ7eF Poster: https://lnkd.in/eXAPqMkq Published and featured at 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE) https://lnkd.in/eTSFXfKS Association for the Advancement of Artificial Intelligence (AAAI) Incredibly proud of the work done by Sai Munikoti and the team Anurag Acharya Sridevi Wagle #AI #LLM #retrieval #science
1 Comment
Like Comment
To view or add a comment, sign in
Sudatta Jana

ASE at Tech Mahindra • KIIT University, Bhubaneswar - 24'
4mo Edited
Report this post
I'm thrilled to announce that our paper, "Language Detection Using Machine Learning" has been published by IEEE! In this research, my co-authors Amlan Nayak, Debapam Pal, Amiya Ranjan Panda, Manoj Kumar Mishra and I have explored the effective integration of machine learning models(K-nearest neighbour, Decision tree, Random Forest, Multinomial NB, Logistic Regression, Extra Trees Classifier, Support Vector Machines, Ridge Classifier, SGD Classifier) for Language Detection. We observed that Multinomial NB gives the highest accuracy of 0.981.This approach harnesses the strengths of natural language processing to accurately identify the language of a given text, leveraging contextual understanding and linguistic patterns. Our advanced model showcases significant improvements in accuracy and efficiency over existing methods, making strides in the field of language detection. Thank you to the IEEE and the 2023 OITS International Conference on Information Technology (OCIT) for this incredible opportunity! Link of the paper- https://lnkd.in/gjShRJDn #MachineLearning #LanguageDetection #K_nearestneighbour #DecisionTree #ExtraTreesClassifier #SupportVectorMachines #RidgeClassifier #SGDClassifier #AI #Research #IEEE #Publication #IEEEConference
Like Comment
To view or add a comment, sign in

2,270 followers

View Profile Connect

Khaled Elleithy’s Post

Toward Robust Arabic AI-Generated Text Detection: Tackling Diacritics Challenges

mdpi.com

More from this author

The high school student who hacked the CIA Director’s personal email explains how easy it was

Explore topics