Praneeth Paikray’s Post

Senior Data Scientist at Fidelity Investments

1mo Edited

Fine-tune your SLM (Gemma-2B) for English-Hindi translation on Kaggle! The Kaggle notebook covers data gathering, prompt template development, fine-tuning, and evaluation. Get started here: https://lnkd.in/gWNEpvm2

Fine-Tuning Gemma 2-2b for English to Hindi

kaggle.com

To view or add a comment, sign in

More Relevant Posts

Ankush Chander

Data | Search | Information Retrieval | Parseltongue | Natural Language Processing
9mo
Report this post
#Word2vec was a beautiful paper where simple ideas came together and produced surprising outcome. Whether it is using energy based model to unite similar things and separate different things or negative sampling to avoid tedious probability calculations. One of the unintended side effect was solving analogy problems using simple vector translations. eg: man-woman vector ~ king - queen vector. Same thing inspired #transE paper that applied translation(head + relation~ tail) and energy based modelling to embed entity/relationships in low dimension. I had fun revisiting this rather dated but interesting paper here: Notebook: https://lnkd.in/dnAMNygj Dataset: https://lnkd.in/dMvnNUWm Original paper: https://lnkd.in/dGXV7cY3

Knowledge Graph embedding- transE

kaggle.com
Like Comment
To view or add a comment, sign in
Anirban N

Lead Analytics and Data Science|Business Strategy|Data strategy|Financial Risk|Business Analytics|Marketing Analytics| Quantitative Analysis I Fintech | Telecom | Technology|
6mo Edited
Report this post
Some techniques of Categorical variable encoding may not work in production environment and they may be good for research environment or Kaggle competition and one such technique is Frequency encoding. Frequency encoding is used for high cardinal categorical variables which gives a number based on frequency of appearance of that category but it won't work properly if the production environment is online. Reason being: 1) Dealing with unseen category which have not been seen before. 2) Having the mapping of category frequency being constantly being updated which may vary from time to time and may differ from training data. Solution : 1) Broadly define different layers of categories and reduce them to Single digit Number depending on business problem. 2) Use Ordinal encoding and Use a specific number for categories which are not there in training data and keep the mapping of ordinal encoding in a seperate table in database which can be used in production environment for online inferencing. #categoricalEncoding #MachineLearning #FrequencyEncoding
Like Comment
To view or add a comment, sign in
myo min htet

ML Engineer
2mo
Report this post
I've trained a word2vec model on 3 million burmese articles and the model can be applied for semantic similarities, my goal is to evaluate how my dataset covers burmese language.The model still doesn't recognize some words, I will upload the updated model trained on more data. You can test the model on huggingface space. https://lnkd.in/eAi8_2KW
Like Comment
To view or add a comment, sign in
Felix Kemeth

Senior Data Scientist
1mo
Report this post
I’ve just published a blog post exploring how LlamaParse and Multimodal LLMs allow us to extract insights from complex PDFs containing both text and images. 📄✨ In the post, I walk through: 🔍 Parsing documents with both text and images 🤖 Using GPT-4V to interpret and query the parsed content 📊 An example: analyzing U.S. election results from 2016 and 2020. Intelligent PDF parsing + multimodal LLMs are really mighty for document processing, allowing us to handle even more than just plain text—like images, charts, equations, ... 🚀 See the full post! 💻👇

Using LlamaParse and Multimodal LLMs for Extracting and Interpreting Text and Images from PDFs

link.medium.com

2 Comments
Like Comment
To view or add a comment, sign in
LlamaIndex

223,175 followers
4mo
Report this post
An underrated capability of sonnet-3.5 is that it’s really good at chart understanding 📊 - compared to gpt-4o it’s much better at inferring chart values into a structured table. Thanks to our brand-new LlamaParse release 💫 you can easily use SOTA multimodal models like sonnet-3.5 for document parsing and structuring, with added validation/scalability/reliability benefits behind our infrastructure. Check out our full tutorial below and example from the Llama 2 paper. Huge shoutout to Pierre-Loic Doulcet and Sacha Bron for the exciting new features. Additional releases: - Fast Mode: Run LlamaParse with our core text layout capabilities without OCR/models, for 0.1c a page. - Improved Table Reconstruction: Fewer hallucinations in reconstructing complex tables. Results coming soon. Notebook: https://lnkd.in/dBzRNYYc LlamaParse: https://lnkd.in/g3UmUkcD
6 Comments
Like Comment
To view or add a comment, sign in
Pierre-Loic Doulcet
4mo
Report this post
This is a major release on our side! I believe that the future of document parsing is using multimodal models. While not perfect today (speed, accuracy and cost are still pain point for multimodal models), LlamaParse can run in multimodal mode, allowing you to play with this new Parsing paradigm on more than 80+ doc format (form .pdf to .pptx). We also add Anthropic Sonnet 3.5, and it's way better than GPT4o at understanding charts, transforming curve into excel data! We also packed this release with a dozen of improvement to LlamaParse, including a way better table reconstruction model. You can try the multimodal parsing mode in the API now (doc is here: https://lnkd.in/dGsyFyVJ )
LlamaIndex

223,175 followers
4mo

An underrated capability of sonnet-3.5 is that it’s really good at chart understanding 📊 - compared to gpt-4o it’s much better at inferring chart values into a structured table. Thanks to our brand-new LlamaParse release 💫 you can easily use SOTA multimodal models like sonnet-3.5 for document parsing and structuring, with added validation/scalability/reliability benefits behind our infrastructure. Check out our full tutorial below and example from the Llama 2 paper. Huge shoutout to Pierre-Loic Doulcet and Sacha Bron for the exciting new features. Additional releases: - Fast Mode: Run LlamaParse with our core text layout capabilities without OCR/models, for 0.1c a page. - Improved Table Reconstruction: Fewer hallucinations in reconstructing complex tables. Results coming soon. Notebook: https://lnkd.in/dBzRNYYc LlamaParse: https://lnkd.in/g3UmUkcD
12 Comments
Like Comment
To view or add a comment, sign in
Virendra Singh
4mo
Report this post
Sonnet 3.5 has been makingn waves. Personally I have found that perhaps it lacks breadth of knowledge ( not able to comprehensively answer many questions as well as openai) but its definitely faster , mostly gives better code and provides projects and artifacts features which are reason enough for switching to it for techies doing using it for programming tasks.
LlamaIndex

223,175 followers
4mo

An underrated capability of sonnet-3.5 is that it’s really good at chart understanding 📊 - compared to gpt-4o it’s much better at inferring chart values into a structured table. Thanks to our brand-new LlamaParse release 💫 you can easily use SOTA multimodal models like sonnet-3.5 for document parsing and structuring, with added validation/scalability/reliability benefits behind our infrastructure. Check out our full tutorial below and example from the Llama 2 paper. Huge shoutout to Pierre-Loic Doulcet and Sacha Bron for the exciting new features. Additional releases: - Fast Mode: Run LlamaParse with our core text layout capabilities without OCR/models, for 0.1c a page. - Improved Table Reconstruction: Fewer hallucinations in reconstructing complex tables. Results coming soon. Notebook: https://lnkd.in/dBzRNYYc LlamaParse: https://lnkd.in/g3UmUkcD
Like Comment
To view or add a comment, sign in
Victrays

226 followers
8mo
Report this post
Lychee: Instant Data #Visualization in 0.32 Seconds! Tired of complex tools? Say goodbye to unnecessary features and #coding headaches. Lychee transforms your #spreadsheets into graphs effortlessly. Try it now! 👉 https://lnkd.in/d5C4w96e #Victrays
Like Comment
To view or add a comment, sign in
Lavanyah Prabu

Cybersecurity Consultant | Cyber Career Mentor | Resume Review Service to Help You Stand Out
9mo
Report this post
Today I earned my "Fundamentals of Text Analysis with the Language Service" badge! I’m so proud to be celebrating this achievement and hope this inspires you to start your own Microsoft Learn journey! #continouslearning #microsoftazure #azureai #microsoftlearn

Fundamentals of Text Analysis with the Language Service

learn.microsoft.com
Like Comment
To view or add a comment, sign in
Ameen Demiry

Software Engineering Manger | AI Advocate | Scrum Leader | Project manager
8mo
Report this post
RAG is a good solution for finding answers. Sometimes accuracy is at the top of your requirements and Knowing graphs are one of the best methods to enhance it. I used it in personal projects in fields of HR, failure analysis and project management where there is little room for error. This short course will give you a nice overview

DeepLearning.AI

1,102,204 followers
8mo

In Knowledge Graphs for RAG, our latest short course, you’ll explore how knowledge graphs work, how to build with them, and create better retrieval augmented generation applications. You’ll build a knowledge graph of text documents from scratch, write advanced Cypher queries to retrieve relevant information from the graph and format it for inclusion in your prompt to a large language model, and much more. Start today: https://hubs.la/Q02px2N10
Like Comment
To view or add a comment, sign in

7,816 followers

34 Posts

View Profile Connect

Praneeth Paikray’s Post

More Relevant Posts

Explore topics