I started working on Hadoop in 2007 within Facebook and realized that supporting complex technical products was hard. At ClearFeed we are now helping many of the new-generation Data Platform firms with Supporting/Onboarding customers more efficiently on #Slack. Yet I am not a classic Support leader and am still learning. So am very excited to host this panel of Support and Solutions leaders from Snowplow, StarTree, and PlanetScale. Looking forward to learning about their unique challenges and how they are solving them. Gotta question for these folks (or myself)? Registration link here: https://lnkd.in/gf88Zsnx - the more audience questions the better!
Joydeep Sen Sarma’s Post
More Relevant Posts
-
Tools like Slack and Teams have become ubiquitous modes for collaboration both internally as well as externally. At StarTree we use these tools to help provide rapid support to our amazing customers. The big question has always been "how scalable is that approach?" The answer is that ClearFeed helps us make that possible! I am looking forward to learning from other Customer Solutions/Support leaders on this panel session. Thanks for inviting me to participate Joydeep Sen Sarma!
I started working on Hadoop in 2007 within Facebook and realized that supporting complex technical products was hard. At ClearFeed we are now helping many of the new-generation Data Platform firms with Supporting/Onboarding customers more efficiently on #Slack. Yet I am not a classic Support leader and am still learning. So am very excited to host this panel of Support and Solutions leaders from Snowplow, StarTree, and PlanetScale. Looking forward to learning about their unique challenges and how they are solving them. Gotta question for these folks (or myself)? Registration link here: https://lnkd.in/gf88Zsnx - the more audience questions the better!
To view or add a comment, sign in
-
-
Calling all Data Engineers and Spark Enthusiasts! How well do you know Apache Spark? 💡 It's time to put your knowledge to the test! Whether you're a seasoned data pro or just getting started, this quick poll will spark (pun intended 😄) some fun conversation and learning. Vote now, share your thoughts, and let’s see how the community stacks up! 🧑💻 #DataEngineer #BigDataEngineer #DataScienceCommunity #CareerInTech #DataCareers #DataInnovation #AnalyticsEngineer #MLPipeline #EngineeringExcellence #CloudEngineer #DataEngineering #BigData #ETL #DataPipelines #DataAnalytics #DataProcessing #CloudComputing #ApacheSpark #DataOps #DistributedComputing #DataInfrastructure #DataIngestion For reach : Karthik K. Ajay Kadiyala Suraz G. Sumit Mittal Darshil Parmar Sagar Prajapati Vaishnavi MURALIDHAR
To view or add a comment, sign in
-
Day 5/365 - #Hive vs. #Iceberg table format In the world of #data #management, choosing the right #table #format is more than a #technical decision; it’s about setting the foundation for how your #organization #handles, #processes, and #extracts value from its data. Two popular contenders in this space are Hive and Apache Iceberg, and while both have their merits, their #architectural #differences can significantly impact your workflows. Keeping things #practical and grounded, let us understand further >>> Hive Tables Hive has been the #backbone of big data platforms for years, with its Hive Metastore acting as the #gatekeeper for #managing #metadata and interacting with file formats like Parquet or ORC. It’s robust, well-integrated into the #Hadoop #ecosystem, and supported by a wide array of processing engines like Spark and Presto. But here’s the catch, Hive was designed for an era when #batch #processing was king. As data volumes exploded and needs shifted to #real #time and high-concurrency environments. >>> Iceberg Tables Enter Apache Iceberg is a table format designed to #tackle the #exact pain points Hive couldn’t handle. Iceberg’s #approach is #modern, #scalable, and downright #elegant. What it #means for the #Data Team >>> Hive is perfect for environments where data doesn’t change often, and #transactional #operations are #minimal. It’s reliable for legacy #big data #pipelines that #don’t #demand high #flexibility or #scalability. >>> Iceberg is the #go-#to for teams that need #flexibility, #transactional #integrity, and #high #performance. If you’re operating in a cloud-native setup with ever-evolving data needs, Iceberg gives you the #tools #to #scale, #iterate, and #innovate. Hive walked so Iceberg could run. While Hive played a monumental role in shaping the early days of big data, the future clearly belongs to formats like Iceberg that are built for the complexities of today’s data-driven world.
To view or add a comment, sign in
-
-
🚀 Why Learning Lower-Level APIs in Apache Spark is Key for Data Engineers 🚀 In the world of big data, it’s easy to rely solely on structured APIs like DataFrames and Datasets. They’re powerful, efficient, and simplify complex tasks. But if you want to become a truly skilled Data Engineer, there’s immense value in diving deeper into Spark’s lower-level APIs. Understanding and utilizing these lower-level tools gives you the ability to solve unique data problems and optimize performance in ways that structured APIs alone can’t. It’s not just about getting the job done – it’s about doing it with precision and expertise. #DataEngineering #ApacheSpark #BigData #CareerGrowth #TechSkills #DataScience #DataAnalyst #BusinessAnalyst #DataOps #TechCommunity #LearningAndDevelopment #MachineLearning #Analytics
To view or add a comment, sign in
-
🔹 Unlocking the Secrets of Apache Spark! 🔹 Just completed Week 9 of the Ultimate Big Data Masters' program! 🎓 Did you know❓ Efficient partitioning in Apache Spark can reduce job execution time by over 50%! 🚀 Here’s how I explored this and more in my recent deep dive into Spark optimization: 🔸 Apache Spark Internals 🧠 🔸 Dataframe Writer API 📝 🔸 PartitionBy Clause 📊 🔸 Partition's Performance Benefits ⚡ 🔸 Understanding Bucketing and its Performance Gains 🗃️ 🔸 Disabling Dynamic Executor Allocation 🔧 🔸 Spark-Submit at a High-Level 🚀 🔸 Evaluating Initial Partitions in a Dataframe 🔍 🔸 Calculating Initial Number of Partitions for a Single Non-Splitable File 🗂️ 🔸 Calculating Initial Number of Partitions for Multiple Files 📂 🌟 Spark's ability to handle massive datasets with agility never ceases to amaze me. Looking forward to pushing the boundaries of what we can achieve with big data! 📈 A big thank you to Sumit Mittal Sir and the incredible TrendyTech team for an enlightening Week 9 learning journey! 🙏🚀 #ApacheSpark #BigData #DataEngineering #LearningJourney #TrendyTech
To view or add a comment, sign in
-
-
🎯 Ever wondered why one of tech's most powerful frameworks is named after a toy elephant? 🐘 I recently dove deep into the fascinating origin story of Hadoop - from a toddler's beloved toy to revolutionizing how we handle Big Data! 🔍 In my latest article, I explore: - The untold story behind Hadoop's quirky name - How Google's 2003 challenge sparked a revolution - Yahoo's game-changing implementation - The journey to becoming open-source powerhouse 💡 Whether you're a data enthusiast, tech professional, or just love a good origin story, this piece offers something unique. 🌟 Fun fact: The person who named this groundbreaking technology wasn't even old enough to speak properly! Curious? Check out my full article here: https://lnkd.in/etHYKi66 #BigData #Hadoop #TechHistory #DataEngineering #Innovation #TechStories #SoftwareEngineering #Programming #bid_challenge #bid_de Break Into Data Meri Nova
To view or add a comment, sign in
-
-
Just wrapped up week 15 of my Data Engineering course, and this one was all about Apache Hive! It's been an eye-opening experience exploring how to manage and analyze massive datasets in a data warehouse environment. Apache Hive Deep Dive: ➡Mastered querying large datasets using HiveQL, a familiar SQL-like language. ➡Learned how to design and manage various types of Apache tables (managed, external). ➡Uncovered optimization techniques like partitioning and bucketing to boost query performance. ➡Dived into Hive join optimization strategies for efficient data manipulation. ➡Explored the world of transactional tables (ACID) for data consistency and reliability. ➡Spark-Hive Integerations A heartfelt thank you to Sumit Mittal Sir and Team TrendyTech for invaluable guidance and expertise throughout this week. Stay tuned for more updates as I continue this exciting journey through the Ultimate Big Data Course. #dataengineering #apachehive #hive #datawarehouse #bigdata #alwayslearning
To view or add a comment, sign in
-
-
𝗗𝗶𝗱 𝘆𝗼𝘂 𝗸𝗻𝗼𝘄 𝘁𝗵𝗮𝘁 𝗦𝗽𝗮𝗿𝗸 𝗱𝗼𝗲𝘀𝗻’𝘁 𝗻𝗮𝘁𝗶𝘃𝗲𝗹𝘆 𝘀𝘂𝗽𝗽𝗼𝗿𝘁 𝗦𝗤𝗟? In the world of big data, the data is stored in file formats, commonly Parquet. To process SQL queries, Spark relies on two essential components: 1. An Execution Engine 2. A Metastore The Metastore plays a critical role by maintaining metadata about data files, such as: 1. Location of data files (e.g., S3 paths) 2. File Format 3. Schema, etc. This is where the Legacy Hive Metastore steps in, enabling Spark to support SQL queries seamlessly. When a user submits a SQL query, Spark interacts with the Hive Metastore to retrieve all the necessary details like: 1. Where are the data files located (e.g., S3 location)? 2. What is the file format? 3. What is the table schema? With this metadata in hand, Spark can process SQL queries efficiently and effectively, bridging the gap between distributed file systems and SQL-based analytics. What are your thoughts on the evolution of metadata management in modern data platforms? #Spark #HDFS #Hive #distributedComputing #Apache
To view or add a comment, sign in
-
-
🚀 Excited to Share My Latest Achievement! 🌟 I’m thrilled to announce that I’ve just completed the Cloudera Data Engineering: Developing Applications with Apache Spark course! 🏆 This journey has been a deep dive into the world of big data and advanced data engineering concepts, equipping me with essential skills to design, develop, and optimize scalable data solutions. Course Highlights: ✅ Understanding the Big Picture Data engineering’s role in the modern data lifecycle Introduction to HDFS, YARN, and distributed processing ✅ Apache Spark Mastery Working with RDDs, DataFrames, and Datasets (including Scala) Spark SQL and Hive integration for seamless data processing ✅ Hands-on Tools Zeppelin for interactive analytics Airflow for pipeline orchestration Workload Manager for performance monitoring ✅ Key Takeaways Writing and deploying Spark applications on the Cloudera Data Engineering service Leveraging Hive and Spark for data transformation and analysis I’m excited to apply this knowledge to tackle complex data challenges and contribute to scalable, high-performance data systems. A big thanks to Cloudera for providing such a comprehensive program! #DataEngineering #ApacheSpark #BigData #Cloudera #Hive
To view or add a comment, sign in
-
In today's digital landscape, understanding Big Data technologies like HIVE and Sqoop is more crucial than ever. With vast amounts of data being generated every second, efficiently managing and analyzing is a skill that opens doors to invaluable insights. I'm thrilled to announce the launch of our latest project, "𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐢𝐧𝐠 𝐄-𝐜𝐨𝐦𝐦𝐞𝐫𝐜𝐞 𝐈𝐧𝐬𝐢𝐠𝐡𝐭𝐬: 𝐁𝐮𝐢𝐥𝐝𝐢𝐧𝐠 𝐚 𝐇𝐈𝐕𝐄 𝐃𝐚𝐭𝐚 𝐖𝐚𝐫𝐞𝐡𝐨𝐮𝐬𝐞." This exclusive tutorial takes you from the fundamentals of Big Data to practical application, guiding you through setting up AWS instances and conducting analysis using HQL queries. Click here to watch: https://lnkd.in/gW84ZDHm Join us on this journey to unlock the potential of Big Data and elevate your analytical capabilities. Don't miss out - this project is available exclusively for members of our YouTube community! #bigdata #hive #sqoop #dataanalysis #youtubeexclusive #datainsights
To view or add a comment, sign in
-
Learning (I am a WIP)
4wOk, learning to be a CTO in the future...good to see my bay area brother in that picture...