Abhijeet Lokhande, M.Sc.’s Post

Data Scientist | Creator of PII Detection Models | Transforming Data Privacy with Advanced NLP

2mo

🚀 Excited to share some early progress on my **Image Search using Natural Language Queries** project! This work is inspired by OpenAI's CLIP (Contrastive Language-Image Pre-Training) and aims to create a more intuitive and powerful way to search for images using natural language descriptions. 🛠️ I'm still in the early stages, and while the results aren't perfect yet, it's a promising start. If you're curious about the technical details or want to see the code in action, you can check out my Jupyter Notebook here https://lnkd.in/eUDZr8yy 🔍 The journey has just begun, and I'm looking forward to refining the model further and exploring new possibilities. #AI #MachineLearning #NLP #ComputerVision #DeepLearning #Innovation #ImageSearch

text-to-image-search/TextToImageSearch.ipynb at main · abhijeetlokhande1996/text-to-image-search

github.com

1 Comment

Anne Spring

Product at Built AI

2mo

This is epic!

1 Reaction

To view or add a comment, sign in

More Relevant Posts

V V Srimannarayana Reddy Tetali

Machine Learning & AI Enthusiast | Front-End Developer | Java & Python Programmer | Golang
6mo Edited
Report this post
🚀 Day 2 Update: Mastering the Art of Prompting with LLMs! 🚀 On Day 2, I explored the art of communicating effectively with Large Language Models (LLMs). It's like learning how to talk to a super-smart assistant! By delving into the technique called "prompting" – basically, how we ask LLMs questions or give them tasks and the way we prompt them greatly affects their responses. It's like giving clear instructions to get the best results. Throughout the learning process, I learned different prompting techniques, like using specific and detailed instructions. Think of it as talking to a friend: the clearer you are, the better they understand and help you. I also learned that giving feedback is important. Just like how we learn from our mistakes, LLMs improve with feedback on their responses. By the end of the day, I felt more confident in our ability to guide LLMs to give us the answers or create the content we need. Checkout the learning content at https://lnkd.in/gfzYt44M Stay tuned for Day 3, where we'll dive into more exciting ways to make LLMs even smarter! 📈 #Prompting #LearningJourney #LLM #NaturalLanguageProcessing #AI #ArtificialIntelligence #NLP #LanguageModels #DeepLearning #MachineLearning #Transformers #Education #LearningJourney #Foundation #Basics #TechEducation #GenerativeAI #TechCommunity #DataScience #TechSkills #KnowledgeSharing #AICommunity

Mastering-LLMs/Day 2/Prompting.md at main · SrimanTetali/Mastering-LLMs

github.com
Like Comment
To view or add a comment, sign in
Sachin Biswas

Electronics & Telecommunication Engineering student | Data enthusiast | Machine learning explorer | Outdoor enthusiast
4mo Edited
Report this post
"Sentiment Cinematographer," where I dove into the world of movie reviews to analyze and predict audience sentiment using the Bag of Words model. In this project, I utilized Python and its powerful libraries to preprocess a large dataset of movie reviews, transforming them into a format suitable for machine learning models. By applying the Bag of Words approach, I converted textual data into numerical features, which were then used to train a sentiment analysis model. Key Highlights: Data Preprocessing: Cleaned and tokenized textual data to enhance model accuracy. Feature Extraction: Implemented the Bag of Words technique to convert reviews into numerical vectors. Model Training: Trained multiple machine learning models to classify reviews as positive or negative. Evaluation: Assessed model performance using metrics such as accuracy, precision, and recall. This project not only deepened my understanding of natural language processing but also showcased the power of machine learning in deriving insights from textual data. Looking forward to applying these skills in real-world applications and exploring more advanced NLP techniques! Feel free to check out the detailed analysis and results here. GitHub repo-- https://lnkd.in/gwkiaKWi #MachineLearning #NLP #SentimentAnalysis #DataScience #Python #MovieReviews #BagOfWords

Natural-language-processing/Bag_of_words(BOW)/Movie_review_analysis(bag_of_words).ipynb at main · SachinBiswas7/Natural-language-processing

github.com
Like Comment
To view or add a comment, sign in
Shubhangi Goswami

Data Science Intern
4mo
Report this post
🎉 Excited to share my recent project on sentiment analysis and visualization using social media data! 🚀 Over the past few days, I delved into the world of Natural Language Processing (NLP) to understand public opinion and attitudes toward specific topics or brands. Task 4: Analyze and visualize sentiment patterns in social media data to understand public opinion and attitudes toward specific topics or brands. Here’s a quick overview of what I accomplished and learned: 1. Data Generation: Created a synthetic dataset simulating social media posts with various sentiments (positive, negative, neutral). 2. Sentiment Analysis: Utilized the VADER (Valence Aware Dictionary and Sentiment Reasoner) sentiment analyzer from the nltk library to evaluate sentiment scores for each post. 3. Visualization: Employed matplotlib to visualize sentiment distribution and sentiment score patterns, making it easier to interpret the overall public opinion. This project was a great opportunity to apply NLP techniques and improve my data analysis skills. You can find the complete code and detailed instructions in my GitHub repository: https://lnkd.in/g_h-bfSd #DataScience #NLP #SentimentAnalysis #Python #MachineLearning #Visualization #AI #DataAnalysis #SocialMedia #VADER #Matplotlib #GitHub #ProdigyInfoTech

PRODIGY_DS/PRODIGY_DS_Task_04.ipynb at main · shubhangi0001/PRODIGY_DS

github.com

2 Comments
Like Comment
To view or add a comment, sign in
Shubham Baghwala

Helping clients in their Digital Transformation journey with the focus on ROI benefits. I am aiming to decode the hype around AI to implement value-driven solutions.
3mo
Report this post
Ever feel overwhelmed by mountains of data? Here's how Retrieval-Augmented Generation (RAG) and vector databases like FAISS can cut through the noise! . 🧠📚 Revolutionize data access with RAG and FAISS 👇👇👇 https://lnkd.in/gauHJijF showcasing RAG Basic feature 🌱THE BASICS → RAG combines document retrieval and natural language generation. → Sentence Transformers encode documents into embeddings for semantic understanding. → FAISS, a vector database, indexes these embeddings for rapid and precise retrieval. 🤓 THE ADVANCED → Embeddings Generation: Sentence Transformers (like 'all-MiniLM-L6-v2') convert documents into dense vectors representing their semantic content. → Creating FAISS Index: FAISS (Facebook AI Similarity Search), a vector database, indexes these embeddings, enabling efficient similarity searches. → Document Retrieval: Query embeddings are matched with indexed document embeddings to find the most relevant document. 🥷 THE SURPRISING → RAG can turn any data repository into an interactive Q&A system with real-time responses. → This method scales seamlessly with larger datasets, providing precise answers even from vast information pools. → By grounding responses in actual data, RAG reduces the chances of generating inaccurate information and enhances the reliability of language models. 👇 ACT NOW! 👇 Integrate RAG and FAISS into your systems to leverage AI's power in delivering precise, data-backed answers instantly. Start building your retrieval-augmented system today and revolutionize how you access information. And don’t forget to like this post ❤️ if you found it useful. I do appreciate it! 🥐 #AI #MachineLearning #DataScience #NLP #ArtificialIntelligence #FAISS #InformationRetrieval #RAG #SentenceTransformers #TechInnovation #HealthcareAI #DataDriven #DeepLearning

RAGPipelines/1stPipeline.ipynb at main · baghwalas/RAGPipelines

github.com
Like Comment
To view or add a comment, sign in
Kamal Sai Tillari

AIML'26 | Java | python | Machine Learning | Deep Learning | Gen-AI
3mo Edited
Report this post
Hello connections !! I’ve successfully completed a machine learning project focused on fake Article detection. The model is designed using both matching learning and deep learning techniques. It leverages TensorFlow to train and evaluate algorithms on a dataset of articles. The matching learning component compares content features to identify discrepancies, while the deep learning model learns complex patterns to detect fake news. This dual approach enhances accuracy and reliability. The result is a robust system capable of effectively distinguishing between genuine and deceptive articles. I developed this Model 1. By using Logistic Regression(Machine Learning Algorithm) 2. By using Neural Networks and frameworks like Tensorflow ,Keras This project was a great opportunity to apply advanced techniques in AI and enhance my skills in TensorFlow. I’m grateful for the support and resources provided by my mentors and peers throughout this journey. A big thank you to my mentors Nagendra Kishore Girajala sir, Aravind Pappala sir for their valuable guidance and feedback. I’m eager to continue exploring new challenges and innovations in the AI field! A special thank you to Babji Neelam Sir, the CEO of TECHNICAL HUB , for the incredible opportunity to work on AI . I am excited to contribute my skills and insights to advance our AI capabilities and make a meaningful impact with this innovative work. #machinelearning #deeplearning #generativeai #projectshowcase #ai #Innovation https://lnkd.in/gj3kD-Ba

Fake-News-Prediction/Fake_News_Prediction.ipynb at main · kamalsai369/Fake-News-Prediction

github.com

2 Comments
Like Comment
To view or add a comment, sign in
Anusha Narayanan

Data Science and Machine Learning Intern | Artificial Intelligence, Data Science
5mo
Report this post
🎒 Exploring Bag of Words (BoW) 🎒 Bag of Words (BoW) is a cornerstone technique in Natural Language Processing (NLP), especially prevalent in text classification tasks. Its elegance lies in its simplicity: it represents text as a collection of words, disregarding their order and context. BoW maps words to unique integer IDs between 1 and |V|. Each document in the corpus is then converted into a vector of |V| dimensions where in the ith component of the vector, i = w(id), is simply the number of times the word w occurs in the document, i.e., we simply score each word in V by their occurrence count in the document. Suppose D1: Dog bites man D2: Man bites dog D3: Dog eats meat D4: Man eats food The word IDs are dog = 1, bites = 2, man = 3, meat = 4 , food = 5, eats = 6, D1 becomes [1 1 1 0 0 0]. This is because the first three words in the vocabulary appeared exactly once in D1, and the last three did not appear at all. D4 becomes [0 0 1 0 1 1]. 👍 Pros: • Like one-hot encoding, BoW is fairly simple to understand and implement. • With this representation, documents having the same words will have their vec tor representations closer to each other in Euclidean space as compared to documents with completely different words. The distance between D1 and D2 is 0 as compared to the distance between D1 and D4 , which is 2. Thus, the vector space resulting from the BoW scheme captures the semantic similarity of documents. So if two documents have similar vocabulary, they’ll be closer to each other in the vector space and vice versa. • We have a fixed-length encoding for any sentence of arbitrary length . 👎 Cons: • The size of the vector increases with the size of the vocabulary. Thus, sparsity continues to be a problem. One way to control it is by limiting the vocabulary to n number of the most frequent words. • It does not capture the similarity between different words that mean the same thing. Say we have three documents: “I run”, “I ran”, and “I ate”. BoW vectors of all three documents will be equally apart. • This representation does not have any way to handle out of vocabulary words (i.e., new words that were not seen in the corpus that was used to build the vectorizer). • As the name indicates, it is a “bag” of words—word order information is lost in this representation. Both D1 and D2 will have the same representation in this scheme. github link: https://lnkd.in/gZpqkWc8 💬 #NLP #TextAnalytics #BagOfWords

Practical_Natural_Language_Processing/02_Bag_of_Words.ipynb at main · AnushaNarayananP/Practical_Natural_Language_Processing

github.com
Like Comment
To view or add a comment, sign in
Shubham Baghwala

Helping clients in their Digital Transformation journey with the focus on ROI benefits. I am aiming to decode the hype around AI to implement value-driven solutions.
3mo
Report this post
🚀 Understanding the Difference Between Word Vectors and Sentence Vectors: Why Context Matters 🧠 In the world of natural language processing (NLP), embeddings are crucial for understanding and processing text. But not all embeddings are created equal. Here's a quick dive into the difference between word vectors and sentence vectors and why context-aware embeddings make all the difference. 📝 Word Vectors: What they are: Word vectors represent individual words in a numerical form that captures semantic relationships. Limitation: When generating sentence embeddings by averaging word vectors, we lose word order and context. For instance, sentences like "The kids play in the park." and "The play was for kids in the park." would produce identical sentence embeddings despite their different meanings. Sentence Vectors: What they are: Sentence vectors are embeddings that represent entire sentences, taking into account not just the individual words but also their order and the overall context. Advantage: Unlike word vectors, sentence vectors generated by models like those in Vertex AI preserve the context and meaning. This means that even if two sentences have similar words, the sentence embeddings will be different if their meanings are different. 🧠 Why Context-Aware Embeddings Matter: Context-aware embeddings allow models to understand nuances in language, ensuring that similar sentences with different meanings are treated differently. This is especially important in applications like sentiment analysis, translation, and more, where context is key to accurate interpretation. By leveraging advanced models in Google Cloud Platform (GCP), specifically Vertex AI, we can ensure that our NLP solutions are both robust and contextually aware, leading to more accurate and meaningful results. For a detailed explanation of this concept, you can refer to the following document: https://lnkd.in/g8pZp5en, which explains the notebook at https://lnkd.in/gkD56nPy. This insight was gained from my learning journey through the Google Cloud Vertex AI course on DeepLearning.AI. Highly recommended for anyone looking to deepen their understanding of embeddings and their applications in AI! #NLP #AI #MachineLearning #Embeddings #VertexAI #DataScience #DeepLearningAI #GCP

GenAI_learning/Embedding/README_Word_vs_Sentence_Vectors.md at main · baghwalas/GenAI_learning

github.com
Like Comment
To view or add a comment, sign in
Sanhita Saxena

Aspiring Data Scientist | Biotechnology Graduate
5mo
Report this post
Next Word Prediction Using LSTM I recently completed a mini-project on next-word prediction with LSTM, inspired by the 100 Days of Deep Learning playlist by the incredible Nitish Sir. For my dataset, I generated 100 sentences with ChatGPT and then added 100 more sentences, primarily quotes, also from ChatGPT. I performed tokenization, padding, and trained the model on this data. Following Nitish Sir's instructions, I made a few tweaks to the code for reusability and adjusted a hyperparameter to fit my data. To my surprise, the model turned out to be quite coherent! While it's not perfect yet, it's a fantastic start, and I'm excited about the progress. A huge shoutout to Nitish Sir for his clear explanations and hands-on coding approach—his guidance has been a source of confidence for many of us. Thank you, Sir! 🔗 Link to Nitish Sir's Video: https://lnkd.in/dys4JkRU 🔗 Link to My GitHub Project: https://lnkd.in/dY2TXasG #DeepLearning #MachineLearning #LSTM #AI #DataScience #100DaysOfDeepLearning

100-days-of-deep-learning/guided-projects/next_word_predictor.ipynb at main · sanhiitaa/100-days-of-deep-learning

github.com
Like Comment
To view or add a comment, sign in
Wicliff Tah Angwah

Data Analyst || Data Scientist || Research Analyst || Healthcare Analyst || MD
2mo
Report this post
Excited to share my latest project series on Predicting-Product-Ratings-From-Reviews! 🚀🔍 In this classic NLP challenge, I delved into data from an e-commerce store, focusing on women's clothing. Each dataset record comprises customer reviews with review titles, text descriptions, and product ratings (ranging from 1 to 5). 🛍️📊 I transformed this into a binary classification task, where a rating > 3 indicates a customer recommends the product (label 1), while ≤ 3 means they do not recommend it (label 0). 🔄🔢 The primary goal? Leveraging the review text attributes to predict recommendation ratings through classification. 📈🔍 Key Resources used in this project: - Software: Jupyter Notebook, Python, Natural Language Processing 🖥️🐍📚 Curious to explore further? Check out the project on GitHub for details: https://lnkd.in/eH5bd9Bg 🌐💡 #NLP #DataScience #Classification #PredictiveAnalytics

Predicting-Product-Ratings-From-Reviews/PredictProductRating.ipynb at main · Wiclif/Predicting-Product-Ratings-From-Reviews

github.com
Like Comment
To view or add a comment, sign in
Josh Longenecker

GenAI Specialist @ AWS
1mo Edited
Report this post
💡Learn Late Chunking!💡 I recently had the opportunity to look under the hood of Jina AI ‘s innovative "Late Chunking" method by reimplementing it myself in a notebook. (The best way to learn in my opinion!) This approach leverages long-context embedding models by preserving important contextual information that is often lost in traditional chunking methods. By processing the entire text before chunking, it significantly improves retrieval tasks and preserves long-range dependencies. The improvement I found was quite surprising! Check it out here on GitHub: https://lnkd.in/e5F9Ug_c #datascience #nlp #embeddings #llms #machinelearning #llama3 #rag

gen_ai_utils/jinaai_late_chunking.ipynb at main · jlonge4/gen_ai_utils

github.com

4 Comments
Like Comment
To view or add a comment, sign in

1,010 followers

24 Posts

View Profile Follow

Abhijeet Lokhande, M.Sc.’s Post

More Relevant Posts

Explore topics