𝗟𝗼𝗼𝗸𝗶𝗻𝗴 𝘁𝗼 𝗶𝗺𝗽𝗿𝗼𝘃𝗲 𝘆𝗼𝘂𝗿 𝗥𝗔𝗚 𝘀𝘆𝘀𝘁𝗲𝗺'𝘀 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲? 𝗧𝗿𝘆 𝗺𝗲𝘁𝗮𝗱𝗮𝘁𝗮 𝗳𝗶𝗹𝘁𝗲𝗿𝗶𝗻𝗴 𝗳𝗼𝗿 𝗮𝗻 𝗶𝗻𝘀𝘁𝗮𝗻𝘁 𝗯𝗼𝗼𝘀𝘁! Metadata is data about data, like titles, descriptions, and keywords. For search engines like Google, metadata helps understand 💡 webpage content without reading the entire text, improving search 🎯 accuracy. In RAG, you can tag metadata during pre-processing to make it identifiable. For example, tagging books with author, title, version, and genre metadata makes these items within large datasets searchable 🔎 and more readily identifiable and available. 𝗛𝗼𝘄 𝗱𝗼𝗲𝘀 𝗺𝗲𝘁𝗮𝗱𝗮𝘁𝗮 𝗳𝗶𝗹𝘁𝗲𝗿𝗶𝗻𝗴 𝘄𝗼𝗿𝗸 𝗶𝗻 𝗥𝗔𝗚? Metadata filtering enhances search by applying filters. For example, let’s say you’re building a job listing website where jobs are constantly posted with unstructured descriptions. These job listings all have filters including location, pay, and employment duration that you can use to isolate all relevant jobs related to the user’s query. 🤗 A user searching for jobs in Los Angeles with a minimum wage of $35 per hour can find listings more accurately using metadata filtering than with semantic searches alone. 𝗦𝗼, 𝘄𝗵𝗮𝘁 𝗮𝗿𝗲 𝘁𝗵𝗲 𝗯𝗲𝗻𝗲𝗳𝗶𝘁𝘀 𝗼𝗳 𝗺𝗲𝘁𝗮𝗱𝗮𝘁𝗮 𝗳𝗶𝗹𝘁𝗲𝗿𝗶𝗻𝗴? ✅ Enhanced precision by narrowing down search results to meet specific metadata criteria. ✅ Retrieve relevant documents faster by reducing the surface area of the query, especially in large datasets. 𝗧𝗶𝗽𝘀 𝘄𝗵𝗲𝗻 𝘂𝘀𝗶𝗻𝗴 𝗺𝗲𝘁𝗮𝗱𝗮𝘁𝗮 𝗳𝗶𝗹𝘁𝗲𝗿𝗶𝗻𝗴! 1️⃣ Choose relevant metadata. Your filtering will only be as good as how well you tag the information so select metadata attributes 🐸 that provide the most value for your specific use case. For technical documentation, relevant metadata might include document type, programming language, and last updated date. 2️⃣ Experiment with automating metadata tagging. Use libraries like spaCY and NLTK to automatically extract metadata fields. 3️⃣ Store metadata efficiently. Use trufflepig to tag metadata upon upload and rely on our managed service for scalability. 😁 To help improve building RAG, we added metadata filtering as a feature this week. With trufflepig, you can leverage filtering for greater customization + better data organization, so try it out with our updated docs. 👀 Check out trufflepig’s metadata filtering: (link in first comment) 🤩 Follow trufflepig for more RAG tips and updates! Thanks for the support!
trufflepig’s Post
More Relevant Posts
-
Message Crawler 6.6.1 - New Tool "Create Searches": If you have super large import that you cannot load to grid due to size, you can now create multiple search to export it out in samller sections. - Axiom XML: Major updates to how conversation field is generated. Different types of exports from Axiom require different parsing logic and different fields to generate conversations. Using very diverse export, I was able to finetune the logic to where I feel it works very well. - Main UI: Added button to sort documents with 1 click. - Teams HTML: Sometimes images would be download as GZ files. Added function to unzip them to their native PNG format. - Teams HTML Convert: Added option to specify names delimiters for Email_to column. Seems like different people have different delimiters and no good indication why. - Teams HTML Convert v2: More metadata fields will be used creating even better RSMF file. (reactions, deleted, edited, friendly conversation) - Slack: Added button to save log. Now you don't have to copy and paste it. - Oxygen XML: Checking for field Created that will be used if no Timestamp field present. - Export EML: Local time zone will not show up in the header. It will be reset to 0. - Export RSMF: Relativity will now be able to deduplicate RSMF files. Changed how Participant IDs are generated. Relativity's documentation states that Name and Email are used for deduplication. This is not correct and support staff member stated that IDs are being taking into account as well. Due to this discrepancy, deduplication on RSMF files was not effective. New way of generating participants IDs should resolve this problem. - RSMF Export: Added more items to drop for Messaging Platform. - Cellebrite: Bug fix. Date deleted may get offset to wrong time zone when using SQLite database. - Slack Convert: If channel name is over 50 characters, it will be hashed for purposes of document prefix. Long channel names cause problems in Relativity Server 2022 (2023 is fine) - Database Creation: Added an artificial delay to allow Azure to create database before tables will be created. - Bloomberg IM: When user joins or leaves, message body will populated with appropriate text. Nice for EML/PDF export. Also, I removed word Day from file name as it was confusing to many. Just using _0001, _0002 suffix Download here: https://lnkd.in/eNJxG9jk #ediscovery #legaltech #legaltechnology #litigationsupport #legaloperations #legalservices #litigation #legalsupport #forensics #digitalforensics
To view or add a comment, sign in
-
𝐒𝐢𝐦𝐩𝐥𝐢𝐟𝐲 𝐂𝐨𝐦𝐩𝐥𝐞𝐱 𝐐𝐮𝐞𝐫𝐢𝐞𝐬 𝐰𝐢𝐭𝐡 𝐂𝐨𝐦𝐦𝐨𝐧 𝐓𝐚𝐛𝐥𝐞 𝐄𝐱𝐩𝐫𝐞𝐬𝐬𝐢𝐨𝐧𝐬 (𝐂𝐓𝐄𝐬) SQL queries can often get complicated, especially when you're working with large datasets or multiple subqueries. This is where Common Table Expressions (CTEs) come to the rescue! 🔴 𝑾𝒉𝒂𝒕 𝒊𝒔 𝒂 𝑪𝑻𝑬? A Common Table Expression (CTE) is a temporary result set that you can reference within the context of a SELECT, INSERT, UPDATE, or DELETE statement. Think of it as a reusable subquery that improves the structure and clarity of your SQL code. ✨ 𝐊𝐞𝐲 𝐁𝐞𝐧𝐞𝐟𝐢𝐭𝐬 𝐨𝐟 𝐔𝐬𝐢𝐧𝐠 𝐂𝐓𝐄𝐬: 🔹 Clarity & Simplicity 🔹 Eliminate Repeated Code 🔹 Debugging Made Easy 📚 𝐇𝐨𝐰 𝐭𝐨 𝐖𝐫𝐢𝐭𝐞 𝐚 𝐒𝐢𝐦𝐩𝐥𝐞 𝐂𝐓𝐄: Here’s an example of how you can use a CTE to find the total revenue by region: """ WITH RevenueByRegion AS ( SELECT Region, SUM(Sales) AS TotalRevenue FROM SalesData GROUP BY Region ) SELECT * FROM RevenueByRegion; """ This CTE calculates the total revenue for each region, making the final query more readable and reusable. 🔄 𝐑𝐞𝐜𝐮𝐫𝐬𝐢𝐯𝐞 𝐂𝐓𝐄 𝐄𝐱𝐚𝐦𝐩𝐥𝐞: Navigating Hierarchical Data CTEs shine when dealing with hierarchical or recursive data. Take this recursive query example, which helps you find all employees under a specific manager: """ WITH RecursiveEmployees AS ( SELECT EmployeeID, ManagerID, EmployeeName FROM Employees WHERE ManagerID = 1 -- Start from manager with ID 1 UNION ALL SELECT e.EmployeeID, e.ManagerID, e.EmployeeName FROM Employees e INNER JOIN RecursiveEmployees re ON e.ManagerID = re.EmployeeID ) SELECT * FROM RecursiveEmployees; """ This query starts with a specific manager and finds all employees reporting to them, recursively fetching subordinates. ⚡𝐂𝐓𝐄𝐬 𝐢𝐧 𝐀𝐜𝐭𝐢𝐨𝐧: 🔸 𝑺𝒊𝒎𝒑𝒍𝒊𝒇𝒚 𝑸𝒖𝒆𝒓𝒊𝒆𝒔: When your query has multiple layers of logic, break them down with CTEs to improve both performance and readability. 🔸 𝑯𝒂𝒏𝒅𝒍𝒆 𝑹𝒆𝒄𝒖𝒓𝒔𝒊𝒗𝒆 𝑫𝒂𝒕𝒂: Easily query hierarchical relationships such as company structure, product categories, or organizational charts. 🔸 𝑴𝒂𝒊𝒏𝒕𝒂𝒊𝒏𝒂𝒃𝒊𝒍𝒊𝒕𝒚: If you need to adjust the logic, CTEs make your code easier to modify without disrupting the entire query.
To view or add a comment, sign in
-
Reporting is not analytics. Quote "An organization that is very mature in software development could be quite primitive in terms of its analytics deployment and usage," within business the term “ analytics team” can really mean reporting team. The true analytic skills are missing. Looking at the tools they use is the give away. Excel, a BI reporting tool, rather than python, R or mathematical insight tools. Financial reporting areas are typical of “Reporting” not analytics. Bring your true analysts and data scientists together guided by a trained specialist in advanced analytics.
To view or add a comment, sign in
-
𝗘𝘅𝗽𝗹𝗼𝗿𝗶𝗻𝗴 𝗘𝗦𝟭𝟱: 𝗡𝗲𝘄 𝗝𝗮𝘃𝗮𝗦𝗰𝗿𝗶𝗽𝘁 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀 This weekend, I spent some time exploring the new features introduced in ES15 (ECMAScript 2024). When I tried a couple of them— 𝗣𝗿𝗼𝗺𝗶𝘀𝗲.𝘄𝗶𝘁𝗵𝗥𝗲𝘀𝗼𝗹𝘃𝗲𝗿𝘀 𝗮𝗻𝗱 𝗔𝗿𝗿𝗮𝘆.𝗴𝗿𝗼𝘂𝗽𝗕𝘆()—I found them not only interesting and useful for writing cleaner, more efficient code. 1️⃣ 𝗣𝗿𝗼𝗺𝗶𝘀𝗲.𝘄𝗶𝘁𝗵𝗥𝗲𝘀𝗼𝗹𝘃𝗲𝗿𝘀 Creating custom promises just became so much simpler! Instead of manually setting up resolve and reject, this new feature gives you everything in one go. 𝗕𝗲𝗳𝗼𝗿𝗲: let resolve, reject; 𝘤𝘰𝘯𝘴𝘵 𝘱𝘳𝘰𝘮𝘪𝘴𝘦 = 𝘯𝘦𝘸 𝘗𝘳𝘰𝘮𝘪𝘴𝘦((𝘳𝘦𝘴, 𝘳𝘦𝘫) => { 𝘳𝘦𝘴𝘰𝘭𝘷𝘦 = 𝘳𝘦𝘴; 𝘳𝘦𝘫𝘦𝘤𝘵 = 𝘳𝘦𝘫; }); 𝘳𝘦𝘴𝘰𝘭𝘷𝘦('𝘚𝘶𝘤𝘤𝘦𝘴𝘴'); 𝗪𝗶𝘁𝗵 𝗣𝗿𝗼𝗺𝗶𝘀𝗲.𝘄𝗶𝘁𝗵𝗥𝗲𝘀𝗼𝗹𝘃𝗲𝗿𝘀: 𝘤𝘰𝘯𝘴𝘵 { 𝘱𝘳𝘰𝘮𝘪𝘴𝘦, 𝘳𝘦𝘴𝘰𝘭𝘷𝘦, 𝘳𝘦𝘫𝘦𝘤𝘵 } = 𝘗𝘳𝘰𝘮𝘪𝘴𝘦.𝘸𝘪𝘵𝘩𝘙𝘦𝘴𝘰𝘭𝘷𝘦𝘳𝘴(); // 𝘓𝘢𝘵𝘦𝘳 𝘪𝘯 𝘵𝘩𝘦 𝘤𝘰𝘥𝘦 𝘳𝘦𝘴𝘰𝘭𝘷𝘦('𝘚𝘶𝘤𝘤𝘦𝘴𝘴'); // 𝘰𝘳 𝘳𝘦𝘫𝘦𝘤𝘵(𝘯𝘦𝘸 𝘌𝘳𝘳𝘰𝘳('𝘍𝘢𝘪𝘭𝘶𝘳𝘦')); ✅ Cleaner Syntax ✅ Less Boilerplate ✅ Safer and Easier to Use This small improvement made my async code much more manageable and clear. 2️⃣ 𝗔𝗿𝗿𝗮𝘆.𝗴𝗿𝗼𝘂𝗽𝗕𝘆() Working with data is now even more intuitive. Array.groupBy() allows you to categorize array elements effortlessly. Example: 𝘤𝘰𝘯𝘴𝘵 𝘪𝘵𝘦𝘮𝘴 = [ { 𝘯𝘢𝘮𝘦: '𝘢𝘱𝘱𝘭𝘦', 𝘵𝘺𝘱𝘦: '𝘧𝘳𝘶𝘪𝘵' }, { 𝘯𝘢𝘮𝘦: '𝘤𝘢𝘳𝘳𝘰𝘵', 𝘵𝘺𝘱𝘦: '𝘷𝘦𝘨𝘦𝘵𝘢𝘣𝘭𝘦' }, { 𝘯𝘢𝘮𝘦: '𝘣𝘢𝘯𝘢𝘯𝘢', 𝘵𝘺𝘱𝘦: '𝘧𝘳𝘶𝘪𝘵' }, ]; 𝘤𝘰𝘯𝘴𝘵 𝘨𝘳𝘰𝘶𝘱𝘦𝘥 = 𝘪𝘵𝘦𝘮𝘴.𝘨𝘳𝘰𝘶𝘱𝘉𝘺(𝘪𝘵𝘦𝘮 => 𝘪𝘵𝘦𝘮.𝘵𝘺𝘱𝘦); 𝘤𝘰𝘯𝘴𝘰𝘭𝘦.𝘭𝘰𝘨(𝘨𝘳𝘰𝘶𝘱𝘦𝘥); 𝘖𝘶𝘵𝘱𝘶𝘵: { 𝘧𝘳𝘶𝘪𝘵: [ { 𝘯𝘢𝘮𝘦: '𝘢𝘱𝘱𝘭𝘦', 𝘵𝘺𝘱𝘦: '𝘧𝘳𝘶𝘪𝘵' }, { 𝘯𝘢𝘮𝘦: '𝘣𝘢𝘯𝘢𝘯𝘢', 𝘵𝘺𝘱𝘦: '𝘧𝘳𝘶𝘪𝘵' } ], 𝘷𝘦𝘨𝘦𝘵𝘢𝘣𝘭𝘦: [ { 𝘯𝘢𝘮𝘦: '𝘤𝘢𝘳𝘳𝘰𝘵', 𝘵𝘺𝘱𝘦: '𝘷𝘦𝘨𝘦𝘵𝘢𝘣𝘭𝘦' } ] } ✅ Easy to Understand ✅ Compact and Clean ✅ No More Manual reduce Logic Grouping data has never been this simple! Whether it’s for categorizing objects, simplifying data transformations, or improving readability, groupBy() gets the job done with minimal effort. These features are perfect for writing cleaner, clearer, and more efficient JavaScript code. I’m definitely planning to try them in my projects. 💻🔥 #JavaScript #Coding #Programming
To view or add a comment, sign in
-
🚀 Why JSON is the Preferred Format for Structured Data 🚀 ➡ JSON (JavaScript Object Notation) has become the go-to format for structuring and exchanging data, and here’s why: ✅ Human-Readable & Machine-Friendly: Its clean key-value pair structure makes JSON easy to understand and extract data from, perfect for document processing. ✅ Universally Supported: JSON works seamlessly with most programming languages, ensuring compatibility across a wide range of systems and applications. ✅ Lightweight & Fast: With minimal overhead, JSON is efficient for data transmission and storage, especially in environments where speed and bandwidth matter. ✅ Scalable & Flexible: From simple data to complex, nested structures, JSON adapts to all, making it ideal for handling complex PDFs with tables, forms, and more. 💡 Why Convert PDFs to JSON? Converting PDFs to JSON unlocks structured, machine-readable data, enabling automation, analysis, and easy integration into business workflows. Read this blog by Tarun Kr. Singh to know how to convert PDFs to JSON: https://lnkd.in/gwxU77TX #JSON #DataProcessing #Automation #AI #PDF
Convert Unstructured PDF Documents to Structured JSON using Unstract.
https://unstract.com
To view or add a comment, sign in
-
Claude just launched the AI data analyst! An analysis tool that allows users to write and run JavaScript code, enabling real-time data processing and analysis. This feature, available in preview mode, turns Claude into a powerful data analyst that can clean, explore, and analyze data from CSV files. It’s particularly beneficial for teams across marketing, sales, product management, engineering, and finance, as they can now upload data for precise, actionable insights. The analysis tool enhances Claude’s capabilities by making answers mathematically accurate and reproducible, helping teams make data-driven decisions effortlessly. Access it via the feature preview in your Claude.ai account settings.
Introducing the analysis tool in Claude.ai
anthropic.com
To view or add a comment, sign in
-
Excited to share my contribution to the Dataverse project! 🎉 I was primarily responsible for the Data Visualization module, including designing and developing the following components: • Charts page • Saved Chart page • Tables component Additionally, I implemented functionalities such as: • Creating data visualizations from database queries • Editing chart types • Saving and viewing visualizations • Customizing visualizations • Validating data
Dataverse - Natural Language processing-Based Data Visualization Tool 💻💬📊 Thrilled to announce the successful completion of our Level 2 industry based software development project at the University of Moratuwa! Our team built Dataverse, a web app that simplifies database management and data visualization, making it accessible to users through natural language queries. A huge thank you to Mr. Malith Jayasinghe for his invaluable mentorship , and an insightful company visit to WSO2! It was a great experience to learn and grow with such amazing guidance. Special gratitude to Dr. Supunmali Ahangama, our supervisor, whose constant support was instrumental in this project’s success! Key Functionalities: - 💬 Natural language prompts to SQL query generation - 🔒 User authentication & authorization (JWT and social accounts) - 👤 User management (search, view profiles, manage roles) - ✏️ Profile customization (edit profiles, reset passwords) - 📊 Dashboard (add and manage databases) - 📈 Data visualization (generate, customize, save, delete, and share visualizations) - 🗄️ Database management (add databases, view tables and data) - 🤝 Collaboration management (add, remove, view collaborators) - 💬 Chat (interact with DB, view past interactions, save, delete, update chats) - 🎤 Voice input for prompts - ⚙️ Feedback and admin dashboard Technologies Used: - Frontend: React.js - Authentication and Authorization: OAuth, JWT - LLM Integration: OpenAI (via LangChain ) - Backend: Python, Django REST Framework - Database: SQLite - Visualization: Chart.js - API: Django REST Framework (DRF), Postman My Contributions: • UI design in Figma • Front-end components : Home,Login, signup, dashboard-collaborations,view databases, view collaborations , user profile , edit profile , search users • Functionalities : - Natural language to SQL query conversion (OpenAI × LangChain ) - SQL query result generation - Authentication and Autherisation - Profile management - User management (search, view, and assign roles) - Collaboration management - Extract tables and data from connected databases - View tables and data of databases - Filter and display charts seperately per request - Integrated all system modules Other Contributors: Neeshan Ismath , Fathima Sahla , Lakshi Wijetunge , Deshan Kavishka This project provided invaluable hands-on experience, leading to significant insights and the development of practical skills in software development.
To view or add a comment, sign in
-
Understanding Deep Copy in C# In C#, managing object instances and their data effectively is crucial for building robust applications. One common concept developers encounter is the distinction between shallow copy and deep copy. In this article, we will explore what deep copy is, how it differs from shallow copy, and how to implement it in C#. What is Shallow Copy? Before diving into deep copy, it’s essential to understand shallow copy. A shallow copy creates a new object that is a copy of the original object, but it copies only the references to the nested objects. Consequently, changes made to the nested objects in the copied instance will reflect in the original instance since both references point to the same memory location. Example of Shallow Copy public class Address { public string Street { get; set; } } public class Person { public string Name { get; set; } public Address Address { get; set; } } Person original = new Person { Name = "John", Address = new Address { Street = "123 Main St" } }; Person shallowCopy = original; // This is a shallow copy shallowCopy.Address.Street = "456 Elm St"; // Changes the street in both instances Console.WriteLine(original.Address.Street); // Output: 456 Elm St What is Deep Copy? Deep copy, on the other hand, creates a new object along with copies of all the nested objects. This means that modifications to the nested objects in the copied instance will not affect the original instance, as they reside in different memory locations. Example of Deep Copy To implement deep copy, you can use several methods, such as serialization or manually cloning objects. Here’s a simple manual implementation: public class Address { public string Street { get; set; } } public class Person { public string Name { get; set; } public Address Address { get; set; } // Method to create a deep copy public Person DeepCopy() { Person copy = (Person)this.MemberwiseClone(); // Shallow copy copy.Address = new Address { Street = this.Address.Street }; // Deep copy of Address return copy; } } Person original = new Person { Name = "John", Address = new Address { Street = "123 Main St" } }; Person deepCopy = original.DeepCopy(); deepCopy.Address.Street = "456 Elm St"; // Changes only the copied instance Console.WriteLine(original.Address.Street); // Output: 123 Main St Using Serialization for Deep Copy An alternative method to perform deep copy is through serialization. This approach is particularly useful for complex objects. You can serialize the object to a memory stream and then deserialize it back to create a new instance.
To view or add a comment, sign in
-
🔎 𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴 𝗦𝗤𝗟 𝗤𝘂𝗲𝗿𝘆 𝗘𝘅𝗲𝗰𝘂𝘁𝗶𝗼𝗻 𝗢𝗿𝗱𝗲𝗿 ⚙ SQL queries may appear straightforward, but behind the scenes, the database engine is carrying out a series of orchestrated steps to execute them. In this post, we'll go through the key clauses of a SQL query and explore how the database engine processes them step-by-step 🕵️♀️. 1. FROM and JOINs: The first thing the engine does is combine the tables specified in the FROM clause. 2. ON: Join conditions in the ON clause are applied to the tables being joined. This determines which rows are included from each table in the join. 3. WHERE: Next filters are applied based on the conditions in the WHERE clause. Rows that don't meet the criteria are discarded. 4. GROUP BY: If a GROUP BY clause is used, the remaining rows are grouped based on the specified columns. 5. HAVING: The HAVING clause now filters grouped rows, similar to WHERE but for grouped records. 6. SELECT: With the filtered, joined and grouped table complete, the fields for the final output are selected. 7. ORDER BY: If an order is specified with ORDER BY, the rows are sorted accordingly. 8. LIMIT: Finally, the LIMIT clause restricts the number of rows returned. Understanding the execution order of all the SQL clauses is key for writing optimized queries 🦾. ➡ Stay tuned and make sure to follow us at Data and AI Central 🚀 for more FREE resources! #sql #data #datascience #machinelearning #ai #artificialintelligence #programming #dataanalysis #analytics #productivity #innovation #coding #tech #softwareengineer #developer #technology
To view or add a comment, sign in
531 followers