A postgres data dump of over 3 million news articles from Indian news providers I aggregated these to make a semantic search website with different analytics like title tone analysis and be able to track how news evolved around a given context. Sadly Life got busier; I will come back to this later on, but feel free to use this dataset to make something of your own. A lot more can be done with this dataset if you are more data-savvy :) dataset link: https://lnkd.in/gxAjRbXi I do have a slightly chonkier dump of 3.3 mil records from more news providers in my VPS - do send a DM requesting it. #datadump #datascience #data
Suvarna Narayanan Baratharaj’s Post
More Relevant Posts
-
5 SQL Tricks to clean your data
Title: 𝗦𝗵𝗶𝘃𝗮𝗻 𝗸𝘂𝗺𝗮𝗿 || 𝗦𝗲𝗻𝗶𝗼𝗿 𝗗𝗮𝘁𝗮 ...
google.com
To view or add a comment, sign in
-
Following our Notion clone tutorial? Check out the 2nd post from Valeri Karpov (the creator of Mongoose) as he demonstrates the power of RAG and Astra DB when building an open-source clone of Notion. #DataStax #VectorDB #RAGApplications https://ow.ly/SwCJ50TBqAI
To view or add a comment, sign in
-
Populate method of Mongoose 😍 Scenario :: whenever i clicked add to cart button, I pass the respective product _id ,initially i was thinking to access the product details of that specifc product _id by using fetch method but later on i got to know the populate method in mongoose,here I try to explain how we can implement it
To view or add a comment, sign in
-
Overusing dictionaries in monitoring can lead to complex, cluttered configs! In recent blog from Ravi shows how to clone dictionary row entries for objects from import sources in Icinga Director. This guide will help you improve your configurations: https://lnkd.in/eVbZWmaD #IcingaDirector
Icinga Director: Cloning dictionary row entries for objects from import sources
https://icinga.com
To view or add a comment, sign in
-
On the value of semantics: https://lnkd.in/dsRTMerT
PostgreSQL subtransactions, savepoints, and exception blocks
franckpachot.medium.com
To view or add a comment, sign in
-
Excited to Embark on My Data Analytics Journey! After completing my MSc in Information Technology from Vidyalankar School of Information Technology, I am thrilled to start my career as a Data Analyst. During my studies, I developed a deep passion for data—how it can drive business decisions, uncover hidden insights, and solve real-world problems. Through hands-on projects, I’ve sharpened my skills in Python, SQL, and data visualization, and I’ve been applying machine learning techniques to solve analytical challenges. You can explore my projects on GitHub. ✨ A few highlights from my portfolio: - Developed a customer churn prediction model using machine learning to improve retention strategies. - Built an interactive sales dashboard in Power BI, delivering actionable insights for business decisions. - Conducted exploratory data analysis on a retail dataset, identifying key sales trends to optimize future strategies. Though I am early in my career, I am continuously expanding my knowledge and adding more projects to my portfolio. Feel free to check out my work here: github.com/hirenparkar I am excited to connect with industry experts, exchange knowledge, and explore new opportunities in the world of data analytics. If you're seeking a motivated data enthusiast ready to dive into meaningful analytics, I’d love to connect! #DataAnalytics #MScIT #DataScience #Python #SQL #PowerBI #MachineLearning #DataVisualization
hirenparkar - Overview
github.com
To view or add a comment, sign in
-
Recently, I tried to figure out how to work with Finnish Statistics API. It turned out not to be any difficult, however I noticed that there is impossible to get the types of aggregation (for example for district divisions like areas, municipalities, city/countryside and others) without using the visual interface. Essentially this is not a big problem, as I could fetch this from the website and then hardcode. However, I wrote to developers about my finding and got the answer, that the problem really exists and it's impossible to fix it with the current architecture of API. They are going to make API v2 with better functionality in the early 2025. Well, this is great news, I'll wait! So far I wrote a piece of code in which I left the examples of requests to current API that works well. Here it is: https://lnkd.in/drrncQxc
GitHub - slava-zagriichuk/StatFin: Studies over Finnish statistics database (StatFin)
github.com
To view or add a comment, sign in
-
Check out the latest blog from Venkat Rajaji - what went down in the news this week and what it can mean for your org https://lnkd.in/dEvTaUgE
Databricks Follows Cloudera by Adopting Iceberg, While Snowflake Mulls Open Source Approach - Cloudera Blog
https://blog.cloudera.com
To view or add a comment, sign in
-
https://lnkd.in/gGPS78Af K-means clustering is a popular algorithm used for grouping similar data points together. It's often used in data analysis and machine learning. With K-means clustering, you can partition a dataset into K distinct clusters based on their similarities. The algorithm iteratively assigns data points to the nearest cluster centroid and updates the centroids until convergence. It's a powerful technique for finding patterns and structures in data. Let me know if you'd like more details or have any specific questions! 😊📊
GitHub - Smanjupriya/prodigy_ML_02: K-means clustering
github.com
To view or add a comment, sign in
-
3 Forms of Query Rewriting for RAG ✍️ Good RAG requires good retrieval, and good retrieval requires a good query understanding layer. This is a comprehensive resource by `zhaozhiming` showing you 3 key patterns for adding a query rewriting layer for better question-handling for your RAG pipelines 🔥 1. Sub-question decomposition: Break a complex question into sub-questions. Unlike pure chain of thought, you can break a question down into a parallelizable sub-questions that you can try answering all at once. 2. HyDE: rewrite the question to hallucinate an answer that better aligns with the embedding semantics. 3. Step-back prompting (from scratch ✨): To answer a complex question, take a “step back” and answer a more generic question to better answer the specific one. Blog: https://lnkd.in/gj4Va86w
To view or add a comment, sign in