Fine-tune your SLM (Gemma-2B) for English-Hindi translation on Kaggle! The Kaggle notebook covers data gathering, prompt template development, fine-tuning, and evaluation. Get started here: https://lnkd.in/gWNEpvm2
Praneeth Paikray’s Post
More Relevant Posts
-
#Word2vec was a beautiful paper where simple ideas came together and produced surprising outcome. Whether it is using energy based model to unite similar things and separate different things or negative sampling to avoid tedious probability calculations. One of the unintended side effect was solving analogy problems using simple vector translations. eg: man-woman vector ~ king - queen vector. Same thing inspired #transE paper that applied translation(head + relation~ tail) and energy based modelling to embed entity/relationships in low dimension. I had fun revisiting this rather dated but interesting paper here: Notebook: https://lnkd.in/dnAMNygj Dataset: https://lnkd.in/dMvnNUWm Original paper: https://lnkd.in/dGXV7cY3
Knowledge Graph embedding- transE
kaggle.com
To view or add a comment, sign in
-
Some techniques of Categorical variable encoding may not work in production environment and they may be good for research environment or Kaggle competition and one such technique is Frequency encoding. Frequency encoding is used for high cardinal categorical variables which gives a number based on frequency of appearance of that category but it won't work properly if the production environment is online. Reason being: 1) Dealing with unseen category which have not been seen before. 2) Having the mapping of category frequency being constantly being updated which may vary from time to time and may differ from training data. Solution : 1) Broadly define different layers of categories and reduce them to Single digit Number depending on business problem. 2) Use Ordinal encoding and Use a specific number for categories which are not there in training data and keep the mapping of ordinal encoding in a seperate table in database which can be used in production environment for online inferencing. #categoricalEncoding #MachineLearning #FrequencyEncoding
To view or add a comment, sign in
-
I've trained a word2vec model on 3 million burmese articles and the model can be applied for semantic similarities, my goal is to evaluate how my dataset covers burmese language.The model still doesn't recognize some words, I will upload the updated model trained on more data. You can test the model on huggingface space. https://lnkd.in/eAi8_2KW
To view or add a comment, sign in
-
I’ve just published a blog post exploring how LlamaParse and Multimodal LLMs allow us to extract insights from complex PDFs containing both text and images. 📄✨ In the post, I walk through: 🔍 Parsing documents with both text and images 🤖 Using GPT-4V to interpret and query the parsed content 📊 An example: analyzing U.S. election results from 2016 and 2020. Intelligent PDF parsing + multimodal LLMs are really mighty for document processing, allowing us to handle even more than just plain text—like images, charts, equations, ... 🚀 See the full post! 💻👇
Using LlamaParse and Multimodal LLMs for Extracting and Interpreting Text and Images from PDFs
link.medium.com
To view or add a comment, sign in
-
An underrated capability of sonnet-3.5 is that it’s really good at chart understanding 📊 - compared to gpt-4o it’s much better at inferring chart values into a structured table. Thanks to our brand-new LlamaParse release 💫 you can easily use SOTA multimodal models like sonnet-3.5 for document parsing and structuring, with added validation/scalability/reliability benefits behind our infrastructure. Check out our full tutorial below and example from the Llama 2 paper. Huge shoutout to Pierre-Loic Doulcet and Sacha Bron for the exciting new features. Additional releases: - Fast Mode: Run LlamaParse with our core text layout capabilities without OCR/models, for 0.1c a page. - Improved Table Reconstruction: Fewer hallucinations in reconstructing complex tables. Results coming soon. Notebook: https://lnkd.in/dBzRNYYc LlamaParse: https://lnkd.in/g3UmUkcD
To view or add a comment, sign in
-
This is a major release on our side! I believe that the future of document parsing is using multimodal models. While not perfect today (speed, accuracy and cost are still pain point for multimodal models), LlamaParse can run in multimodal mode, allowing you to play with this new Parsing paradigm on more than 80+ doc format (form .pdf to .pptx). We also add Anthropic Sonnet 3.5, and it's way better than GPT4o at understanding charts, transforming curve into excel data! We also packed this release with a dozen of improvement to LlamaParse, including a way better table reconstruction model. You can try the multimodal parsing mode in the API now (doc is here: https://lnkd.in/dGsyFyVJ )
An underrated capability of sonnet-3.5 is that it’s really good at chart understanding 📊 - compared to gpt-4o it’s much better at inferring chart values into a structured table. Thanks to our brand-new LlamaParse release 💫 you can easily use SOTA multimodal models like sonnet-3.5 for document parsing and structuring, with added validation/scalability/reliability benefits behind our infrastructure. Check out our full tutorial below and example from the Llama 2 paper. Huge shoutout to Pierre-Loic Doulcet and Sacha Bron for the exciting new features. Additional releases: - Fast Mode: Run LlamaParse with our core text layout capabilities without OCR/models, for 0.1c a page. - Improved Table Reconstruction: Fewer hallucinations in reconstructing complex tables. Results coming soon. Notebook: https://lnkd.in/dBzRNYYc LlamaParse: https://lnkd.in/g3UmUkcD
To view or add a comment, sign in
-
Sonnet 3.5 has been makingn waves. Personally I have found that perhaps it lacks breadth of knowledge ( not able to comprehensively answer many questions as well as openai) but its definitely faster , mostly gives better code and provides projects and artifacts features which are reason enough for switching to it for techies doing using it for programming tasks.
An underrated capability of sonnet-3.5 is that it’s really good at chart understanding 📊 - compared to gpt-4o it’s much better at inferring chart values into a structured table. Thanks to our brand-new LlamaParse release 💫 you can easily use SOTA multimodal models like sonnet-3.5 for document parsing and structuring, with added validation/scalability/reliability benefits behind our infrastructure. Check out our full tutorial below and example from the Llama 2 paper. Huge shoutout to Pierre-Loic Doulcet and Sacha Bron for the exciting new features. Additional releases: - Fast Mode: Run LlamaParse with our core text layout capabilities without OCR/models, for 0.1c a page. - Improved Table Reconstruction: Fewer hallucinations in reconstructing complex tables. Results coming soon. Notebook: https://lnkd.in/dBzRNYYc LlamaParse: https://lnkd.in/g3UmUkcD
To view or add a comment, sign in
-
Lychee: Instant Data #Visualization in 0.32 Seconds! Tired of complex tools? Say goodbye to unnecessary features and #coding headaches. Lychee transforms your #spreadsheets into graphs effortlessly. Try it now! 👉 https://lnkd.in/d5C4w96e #Victrays
To view or add a comment, sign in
-
Today I earned my "Fundamentals of Text Analysis with the Language Service" badge! I’m so proud to be celebrating this achievement and hope this inspires you to start your own Microsoft Learn journey! #continouslearning #microsoftazure #azureai #microsoftlearn
Fundamentals of Text Analysis with the Language Service
learn.microsoft.com
To view or add a comment, sign in
-
RAG is a good solution for finding answers. Sometimes accuracy is at the top of your requirements and Knowing graphs are one of the best methods to enhance it. I used it in personal projects in fields of HR, failure analysis and project management where there is little room for error. This short course will give you a nice overview
In Knowledge Graphs for RAG, our latest short course, you’ll explore how knowledge graphs work, how to build with them, and create better retrieval augmented generation applications. You’ll build a knowledge graph of text documents from scratch, write advanced Cypher queries to retrieve relevant information from the graph and format it for inclusion in your prompt to a large language model, and much more. Start today: https://hubs.la/Q02px2N10
To view or add a comment, sign in