OCR* has been solved for western/Latin script for over 20 years, yet it is still too inaccurate to use for handwritten Devanagari script. This keeps in place a huge digital divide, disallowing Indian farmers the same benefits of western farmers like yield prediction, loan calculation etc. This is a huge problem! So we’re launching a new Challenge with Heifer International: 🇮🇳 AI for Indian Farmers - Handwritten Devanagari ✍️ 🎯The goal: Develop Optical Character Recognition for handwritten Devanagari script to close the digital divide for Indian Farmers 💡 So why is this old Machine Learning problem still unsolved? While the Latin script has 26 letters and a total of ~70 different characters incl. numbers and punctuation, Devanagari has thousands of combinations due to the conjunction of consonants, diacritics and modifiers. As a cherry on the cake, letters are joined together in a word as well. Heifer Labs & Heifer India have provided us with big datasets from the handwritten financial records of farmers to start solving this problem. We’re looking for Experts with to help solve this Challenge, apply & more info here: https://lnkd.in/exEpF9q4 ❣️ Big plus if you can read Devanagari ❣️ Technical objectives: 🖼️ Image Preprocessing: Improve the quality of the images so that the OCR solutions would be standardised/improved. Identify which techniques improve the quality of OCR #️⃣ OCR for Devanagari Script Develop an OCR solution to extract both handwritten and typeset text from loan documents. The OCR system must handle the complexities of the Devanagari script effectively. ✅ Post-processing / Quality Control Assess the quality of the OCR outputs Make a best guess on any uninterpretable characters and determine if the spelling of words is correct. Topics that will be covered: #ComputerVision (e.g. OpenCV, PyTorch, TensorFlow, etc.) #DeepLearning Methodologies #CNN #GAN (Generative Adversarial Networks) #RNN (Recurrent Neural Networks) #FeatureExtraction Techniques and #NLP *#OCR: Optical Character Recognition: computer vision technique to turn images of letters and numbers into their digital equivalents Vess Antoinette Buster Kishore Tanishka Juhee Reena Deepa #devanagari #LLMs #digitaldivide #agritech
FruitPunch AI’s Post
More Relevant Posts
-
OCR* has been solved for western/Latin script for over 20 years, yet it is still too inaccurate to use for handwritten Devanagari script. This keeps in place a huge digital divide, disallowing Indian farmers the same benefits of western farmers like yield prediction, loan calculation etc. This is a huge problem! So we’re launching a new Challenge with Heifer International: 🇮🇳 AI for Indian Farmers - Handwritten Devanagari ✍️ 🎯The goal: Develop Optical Character Recognition for handwritten Devanagari script to close the digital divide for Indian Farmers 💡 So why is this old Machine Learning problem still unsolved? While the Latin script has 26 letters and a total of ~70 different characters incl. numbers and punctuation, Devanagari has thousands of combinations due to the conjunction of consonants, diacritics and modifiers. As a cherry on the cake, letters are joined together in a word as well. Heifer Labs & Heifer India have provided us with big datasets from the handwritten financial records of farmers to start solving this problem. We’re looking for Experts with to help solve this Challenge, apply & more info here: https://lnkd.in/exEpF9q4 ❣️ Big plus if you can read Devanagari ❣️ Technical objectives: 🖼️ Image Preprocessing: Improve the quality of the images so that the OCR solutions would be standardised/improved. Identify which techniques improve the quality of OCR #️⃣ OCR for Devanagari Script Develop an OCR solution to extract both handwritten and typeset text from loan documents. The OCR system must handle the complexities of the Devanagari script effectively. ✅ Post-processing / Quality Control Assess the quality of the OCR outputs Make a best guess on any uninterpretable characters and determine if the spelling of words is correct. Topics that will be covered: #ComputerVision (e.g. OpenCV, PyTorch, TensorFlow, etc.) #DeepLearning Methodologies #CNN #GAN (Generative Adversarial Networks) #RNN (Recurrent Neural Networks) #FeatureExtraction Techniques and #NLP *#OCR: Optical Character Recognition: computer vision technique to turn images of letters and numbers into their digital equivalents Vess Antoinette Kishore Kumar Thangavelu Juhee Garg #devanagari #LLMs #digitaldivide #agritech Thomas David Arnout FruitPunch AI
To view or add a comment, sign in
-
SRGAN is a deep learning model specifically designed for image super-resolution which generates high-resolution images from low-resolution inputs #DeepLearning #GAN #GenerativeModels #NeuralNetworks #AI #PyTorch #ImageResolution https://lnkd.in/g2shPKN6
To view or add a comment, sign in
-
Image quality enhancement (super resolution): A study and implementation guide by our AI experts. AI is transforming the field of computer vision and changing the way we see the world. Our AI experts have written a guide on image quality enhancement (super resolution) that delves into the latest research and offers practical implementation tips. If you're interested in learning more about this fascinating topic, check out our guide!
SRGAN is a deep learning model specifically designed for image super-resolution which generates high-resolution images from low-resolution inputs #DeepLearning #GAN #GenerativeModels #NeuralNetworks #AI #PyTorch #ImageResolution https://lnkd.in/g2shPKN6
Revolutionizing Image Enhancement: A Deep Dive into SRGAN
clavrit.com
To view or add a comment, sign in
-
📢Hi folks, Today, I explored LLAVA1.5 image caption with a prompt-based chatbot in LLM models, one of the best vision model techniques. ❓️WHAT & WHY 1. Multimodal Instruct Data: Author 's present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. 2. LLaVA Model. Author 's introduce LLaVA (Large Language-and-Vision Assistant), an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding. 3. Performance: Author 's early experiments show that LLaVA demonstrates impressive multimodel chat abilities, sometimes exhibiting the behaviors of multimodal GPT-4 on unseen images/instructions, and yields a 85.1% relative score compared with GPT-4 on a synthetic multimodal instruction-following dataset. When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieves a new state-of-the-art accuracy of 92.53%. 4. Open-source: Author'smake GPT-4 generated visual instruction tuning data, our model, and code base publicly available. 👨💻Code: https://lnkd.in/eyQ4W8CN 📃Paper: https://llava-vl.github.io 💡LLAVA models, in my experience, produce more acceptable and efficient outcomes, along with a thorough description of the input photos. The paper's context-based language and picture model, which is based on CLIP, is another key benefit. In addition, the outcome is far more detailed than the OpenAI Vision and Gemini models. It's LLAVA who is superior. 🤔If you want to learn more about computer vision, check out my articles. 💡My Computer Vision Articles: https://lnkd.in/gQSswkEU 📃However, I adore working on these 3D creations using AI. Connect all 3D AI, Computer vision, time series, and NLP enthusiasts. 👉Thank you for coming, everyone. If you have any questions, please leave a comment. #python #pythondeveloper #deeplearning #machinelearning #aicommunity #datascientist #opencv #computervision #3dmodelling #mesh #pythonprogramming #nvidia #3d #threejs #3dimaging #imageprocessing #kagglecompetition #googledeveloperstudentclubs #artificialintelligence #animation #mesh #texture #3dmodelling #ai #ml #dl #depthmap #sam #segmentation #meta #yolov8 #imageprocessing
To view or add a comment, sign in
-
> What is RAS? RAS (Robotics Application Stack) is an #opensource platform that makes it easy to create, test and deploy Service Robots in non-linear environments. RAS operates at the application layer: You can call Behaviour Modules in Natural Language, and it will automatically generate the Behavior Tree (with suitable fallbacks), test trajectories in simulation, and execute the action on the real robot. Hence, it abstracts away all the low-level implementation details of ROS2 interfacing. > Why use RAS? RAS is highly modular, hence adding new Behavior Modules to the stack is easy. Behavior Modules can recursively call other behavior modules, which further increases modularity and reusability of the code you develop. (for example: you may choose to define a Module “Transfer contents” that will call modules “Pick”, “Pour” and “Place” in sequence with appropriate substitutions) With its NLP and automatic simulation testing capabilities, RAS allows you to rapidly test your code for different environments, by simply changing the config files and changing the natural language input! > How to use RAS? https://lnkd.in/gnsg-2cv Clone the repository, and follow the setup instructions inside oss_docker. Then you’re set. If your application of choice involves simple chemical experiments, you can start using RAS right away! If not, you can add new Behavior modules using by modifying oss_bt_framework/behaviors/module. A detailed tutorial video for adding new modules and primitives will be added on the github soon. We plan to roll out monthly updates. Stay tuned for more information! #ROSCON #ROS2 #NLP #Robotics #AI #OpenSourceSoftware #automation #nocode
RAS: Automating an Experiment
To view or add a comment, sign in
-
"🚀Unleashing AI for Spam Detection: Introducing Deep Convolutional Forest (DCF) 🔍 Harnessing cutting-edge Machine Learning (ML) and Deep Learning (DL) techniques, I’ve built a sophisticated spam detection system capable of accurately filtering out unwanted messages Here’s a quick overview: Dataset Preparation: Processed the processed_spam.csv file with labeled messages. Applied data cleaning, normalization, and feature extraction. Model Training: Algorithms Used: Integrated traditional ML models (SVM, NB, KNN, RF) with DL models (CNN, LSTM). Ensemble Approach: Created a Deep Convolutional Forest (DCF) to enhance accuracy and resilience. Hyperparameter Tuning: Optimized performance through meticulous tuning. Model Saving: Saved the trained model as a pickle file for easy deployment and real-time use. Web Application Development: Built a user-friendly Flask website for instant spam detection. User Experience: Designed for simplicity and effectiveness. Real-World Application: Ready for integration into email filters and messaging apps. Scalability: Adapts and improves over time with more data. Future Enhancements: Continuous Learning: Plans for periodic retraining. Broader Application: Exploring detection of other malicious content. This project merges advanced AI techniques with practical application, offering a robust tool for spam detection. Try the live demo and see AI in action! #ArtificialIntelligence #MachineLearning #DeepLearning #DataScience #SpamDetection #NLP #Flask #WebDevelopment #Innovation #Technology #Python #EnsembleLearning"
To view or add a comment, sign in
-
𝗗𝗶𝘃𝗶𝗻𝗴 𝗗𝗲𝗲𝗽 𝗶𝗻𝘁𝗼 𝗟𝗟𝗠 𝗙𝗶𝗻𝗲-𝗧𝘂𝗻𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗣𝗵𝗶-𝟯.𝟱 Just wrapped up an exciting project fine-tuning Phi-3.5-mini-instruct using QLoRA and the OpenAssistant dataset. Here's a snapshot of my experience: How can check it out here - https://lnkd.in/dZj7Vr-5 𝙏𝙝𝙚 𝘼𝙥𝙥𝙧𝙤𝙖𝙘𝙝 • Model: Phi-3.5-mini-instr • Dataset: OpenAssistant • Technique: QLoRA (Quantized Low-Rank Adaptation) 𝙆𝙚𝙮 𝙏𝙖𝙠𝙚𝙖𝙬𝙖𝙮𝙨 • Dataset selection is crucial • Data curation can make or break your model • Proper formatting saves hours of debugging • Hardware matters: GPUs with BitsandBytes or OpenVino • Quantization is your friend for speedy inference • PEFT parameter tuning is an art and science 𝙁𝙪𝙩𝙪𝙧𝙚 𝙀𝙭𝙥𝙡𝙤𝙧𝙖𝙩𝙞𝙤𝙣𝙨 • Domain-specific datasets for targeted applications; specifically healthcare 𝙎𝙥𝙤𝙩𝙡𝙞𝙜𝙝𝙩 𝙤𝙣 𝙋𝙝𝙞-3.5 Phi-3.5 has been turning heads in the AI community, and for good reason: • Benchmark Brilliance: Outperforms many larger models on various NLP tasks. • Efficiency Champion: Achieves high performance with a relatively small parameter count. • Fine-Tuning Potential: Its architecture makes it exceptionally receptive to domain-specific fine-tuning. • Versatility: Excels in both general language understanding and specialized applications. While OpenAssistant gave standard LLM results, the real magic happens when you tailor the approach to your specific use case (what I will be focusing on!). My experience fine-tuning Phi-3.5 has shown its immense potential for customization. The model's strong baseline performance provides an excellent foundation for task-specific optimization. #Phi35 #FineTuning #NLP #MachineLearning #HuggingFace
To view or add a comment, sign in
-
🚀 Excited to share my latest project: a Retrieval-Augmented Generation (RAG) application with OpenAI's GPT-3.5 Turbo! quick overview: 1.Environment Setup: Secured OpenAI API keys with `dotenv`. 2.Data Extraction: Loaded PDF data using `pypdf`. 3.Data Processing: Split data into chunks with recursive text splitting. 4.Embeddings: Converted chunks to vectors using `OpenAIEmbeddings`. 5.Vector Database: Created a FAISS database for efficient storage. 6.Model Integration: Used `ChatOpenAI` with GPT-3.5 Turbo. 7.Prompt Design: Developed a `ChatPromptTemplate`. 8.Retrieval: Implemented `as_retrieval` for context-based data fetching. 9.Document Chain: Combined LLM model and prompt into a cohesive system. 10.Query Handling: Set up a retrieval chain for responsive query answers. GitHub profile: https://lnkd.in/duaSZE3t This project showcases the integration of NLP tools and models for effective information retrieval and generation. #AI #NLP #OpenAI #deeplearning #MachineLearning #FAISS #Innovation Google Google DeepMind Microsoft OpenAI NVIDIA
To view or add a comment, sign in
-
🚀 Exciting News in AI! Mistral has just launched its first-ever Multimodal Model, Pixtral 12B, integrating both language and vision processing capabilities. This marks a huge step for Mistral as it takes on established players like Meta in the vision-language model space. 🔑 What's New? Pixtral 12B builds on Mistral's Nemo 12B text-based model and incorporates a 400M parameter vision adapter. Key architecture specs include: - 131,072 tokens for nuanced language understanding - 1024x1024 image processing, breaking images into 16x16 patches - GeLU activation & 2D Rotary Position Embedding (RoPE) 🎉 Accessible via Hugging Face Hub Model weights are available here: [Pixtral 12B on Hugging Face](https://lnkd.in/deVw2yfG) Developers can easily integrate with the mistral_common Python package: `pip install --upgrade mistral_common` 📊 Stay Tuned for Benchmarks! While performance metrics are not yet available, Pixtral 12B promises to unlock new possibilities in AI across industries like content creation, visual question answering, and data analysis. #AI #Multimodal #DeepLearning #LanguageModels #ComputerVision #Mistral #HuggingFace #AIInnovation #NLP
To view or add a comment, sign in
-
🤔 𝐂𝐚𝐧 𝐅𝐢𝐧𝐞-𝐓𝐮𝐧𝐢𝐧𝐠 𝐎𝐩𝐞𝐧𝐀𝐈 𝐌𝐨𝐝𝐞𝐥𝐬 𝐃𝐞𝐥𝐢𝐯𝐞𝐫 𝐑𝐞𝐬𝐮𝐥𝐭𝐬? 𝐍𝐨𝐭 𝐐𝐮𝐢𝐭𝐞 𝐘𝐞𝐭... A recent study by Stanford University using the 𝐅𝐢𝐧𝐞𝐓𝐮𝐧𝐞𝐁𝐞𝐧𝐜𝐡 framework highlights serious limitations in fine-tuning APIs for commercial models like GPT-4o and Gemini. Here's the breakdown: 🔍 𝐌𝐞𝐭𝐡𝐨𝐝𝐨𝐥𝐨𝐠𝐲 • Datasets: QA pairs on 𝐍𝐞𝐰𝐬, 𝐅𝐢𝐜𝐭𝐢𝐨𝐧𝐚𝐥 𝐏𝐞𝐨𝐩𝐥𝐞, 𝐌𝐞𝐝𝐢𝐜𝐚𝐥 𝐆𝐮𝐢𝐝𝐞𝐥𝐢𝐧𝐞𝐬, 𝐚𝐧𝐝 𝐂𝐨𝐝𝐞. • Focus: Teaching 𝐧𝐞𝐰 𝐢𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 and 𝐮𝐩𝐝𝐚𝐭𝐢𝐧𝐠 𝐤𝐧𝐨𝐰𝐥𝐞𝐝𝐠𝐞. • Evaluations: Rephrased questions, derivative queries, or questions asked at later dates. 𝐊𝐞𝐲 𝐈𝐧𝐬𝐢𝐠𝐡𝐭𝐬 1️⃣ 𝐒𝐭𝐫𝐮𝐠𝐠𝐥𝐢𝐧𝐠 𝐆𝐞𝐧𝐞𝐫𝐚𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧: Models could memorize but failed to generalize effectively. 2️⃣ 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 𝐂𝐨𝐧𝐜𝐞𝐫𝐧𝐬: • New information: 37% average accuracy. • Updated knowledge: Only 19%! 3️⃣ 𝐓𝐨𝐩 𝐏𝐞𝐫𝐟𝐨𝐫𝐦𝐞𝐫𝐬: GPT-4o mini showed the best (but still subpar) results. 4️⃣ 𝐔𝐧𝐝𝐞𝐫𝐰𝐡𝐞𝐥𝐦𝐢𝐧𝐠 𝐑𝐞𝐬𝐮𝐥𝐭𝐬: Gemini models lagged, achieving only 5% accuracy for new knowledge and 2% for updates. 𝐓𝐡𝐞 𝐕𝐞𝐫𝐝𝐢𝐜𝐭 Fine-tuning APIs aren’t production-ready for business-critical tasks. This highlights the urgency for 𝐦𝐨𝐫𝐞 𝐫𝐨𝐛𝐮𝐬𝐭 𝐭𝐮𝐧𝐢𝐧𝐠 𝐦𝐞𝐭𝐡𝐨𝐝𝐬 on open models. 🔗 𝐏𝐚𝐩𝐞𝐫: https://lnkd.in/gA48UUsh 🔗 𝐂𝐨𝐝𝐞: GitHub link is in the comments!!! #FineTuning #LLMs #AIResearch #ProductionReadyAI #GenerativeAI #AI #llm #LLM #NLP #openai #gpt #gpt4 #machinelearning #models
To view or add a comment, sign in
6,260 followers
CEO of FruitPunch AI | Building the global AI for Good community to solve humanity's greatest challenges! | AI, community building, education
2moThe fact that OCR, or the translation of handwritten script to digital documents is still too inaccurate to use for Devanagari script, while it's long solved for Latin script, is widening the gap between rich western countries and developing nations. We must give Indian farmers the same advantages as western entrepreneurs have had for the last 30 years!