Boxplot on LinkedIn: How to Import Data into BigQuery

View organization page for OWOX BI, graphic

3,237 followers

6mo

🌟 Today we're starting a new Google BigQuery 101 week at OWOX BI. We'll share 5 amazing articles with education, examples & use cases about one of the best Data Warehouses so far on the market. 🔍 Ever wondered how to make the most out of your data using Google BigQuery? With our first article, you can dive deeper into the world of BigQuery, covering everything you need to know to get started and become proficient in SQL queries. 📚 Content Highlights: • What is SQL and What Dialects Does BigQuery Support • Standard vs Legacy SQL • Where to Start with BigQuery • Basic BigQuery Functions 💡 Whether you’re new to data analytics or looking to refine your skills, this comprehensive guide will help you navigate BigQuery’s functionalities and leverage its powerful features for your business needs. 📝 Read the full article here: https://www.owox.com/c/5n6 🚀 Dive into the world of data with us and transform your data management practices! #BigQuery #SQL #DataAnalytics #OWOXBI #standardsql #legacysql #bigqueryfunctions #gbq

Google BigQuery 101: Beginner’s Queries to Data Insights

owox.com

To view or add a comment, sign in

Data Engineer Things

37,207 followers

9mo

⌛ I spent 4 hours figuring out how BigQuery executes the SQL query internally. Here’s what I found. 🖋️ Author: Vu Trinh 🔗 Read the article here: https://lnkd.in/ewD3rSf6 ----------------------------------------------- ✅ Follow Data Engineer Things for more insights and updates. 💬 Hit the 'Like' button if you enjoyed the article. ----------------------------------------------- #dataengineering #dataanalytics #bigquery #sql

I spent 4 hours figuring out how BigQuery executes the SQL query internally. Here’s what I found.

blog.det.life

To view or add a comment, sign in

Venu G

Data Engineer @ RSPCA || Big Data || Spark || SQL || Python || ADF || Databricks || DWH

2mo

🎯 SQL Challenge of the Day: Question: You’re working in a retail system hosted on BigQuery with the following tables: salesOrders: Contains the details of customer sales orders. recordTypes: Contains the types of products sold. orderCancellationReasons: Contains the reasons why certain orders were canceled. Your task is to write a SQL query that returns the product type, cancellation reason, and the count of canceled orders for each combination, sorted by the highest count first. 🛒❌ 🛒 Tables Overview: gcp-datawarehouse-dev.salesData.salesOrders (id, recordtypeid, order_status) gcp-datawarehouse-dev.salesData.recordTypes (id, name, subjectType) gcp-datawarehouse-dev.salesData.orderCancellationReasons (id, cancellation_reason) 🧠 SQL Solution: SELECT rt.name AS product_type, cr.cancellation_reason, COUNT(*) AS order_count FROM `gcp-datawarehouse-dev.salesData.salesOrders` so JOIN `gcp-datawarehouse-dev.salesData.recordTypes` rt ON so.recordtypeid = rt.id AND rt.subjectType = 'SalesOrder' JOIN `gcp-datawarehouse-dev.salesData.orderCancellationReasons` cr ON so.id = cr.id GROUP BY rt.name, cr.cancellation_reason ORDER BY order_count DESC; 🧩 Key Concept: The output of the first join becomes the input for the second join, allowing SQL to gradually combine data from different tables to form the final result. This is a powerful way to link multiple datasets in SQL queries! 🚀💡 🤔 Are you new to BigQuery? This example uses a simple dataset salesData to help you practice joins and aggregation in SQL. Give it a try and let me know how it goes in the comments! 💬👇 The output is shown in the comment section below. 📉👇 #BigQuery #SQLChallenge #DataScience #CloudComputing #DataAnalytics #SQLJoins #GoogleCloud #DataEngineering #TechLearning #DataCommunity #RetailData #SQLQuery #CodingChallenge

1 Comment

To view or add a comment, sign in

Rakesh G.

Computer Scientist Adobe|Ex-Amazon| Innovative and results-driven ETL Architect ,Lead Data Engineer and Experienced in integrating data from diverse sources for data-driven insights l Career Coach

1w

👀 Are Analytical Functions Dead in Databases? Think Again! 💡 In a world of real-time analytics and data insights, it’s time to rethink how we use SQL to aggregate data. The debate: LISTAGG vs. STRING_AGG—two titans of string aggregation. Which one truly deserves your attention? --- 🔥 The Old Guard: LISTAGG Found in: Oracle, Snowflake, Redshift. Syntax: SELECT product_id, LISTAGG(review_text, ', ') WITHIN GROUP (ORDER BY review_date) AS combined_reviews FROM reviews GROUP BY product_id; Pros: Works well for ordered aggregation. Cons: Can crash with long strings. Feels a bit... 2010s? --- ⚡ The New Star: STRING_AGG Found in: PostgreSQL, SQL Server, Redshift. Syntax: SELECT product_id, STRING_AGG(review_text, ', ' ORDER BY review_date) AS combined_reviews FROM reviews GROUP BY product_id; Pros: Handles long strings like a champ, cleaner syntax, and inline ordering. Cons: Not available everywhere yet. --- 💡 Why This Matters: Picture this: Your database has rows of reviews: "Easy to use", "Affordable", "Fast delivery". You want below in a single column: "Easy to use, Affordable, Fast delivery". Using the right function can transform messy rows into actionable insights, saving time and headaches. --- Where Do They Shine? PostgreSQL, SQL Server: STRING_AGG is the future. Oracle, Snowflake: Stick with LISTAGG (for now). BigQuery, Databricks: Use ARRAY_AGG or collect_list() as alternatives. --- 🚀 So, Are Analytical Functions Dead? Not at all. They’ve evolved—just like your data needs. Whether you’re a fan of the tried-and-true LISTAGG or the flexible STRING_AGG, mastering these functions keeps your skills relevant and sharp. What’s your go-to SQL trick for aggregating data? Share your thoughts below! 👇 #SQL #DataAnalytics #PostgreSQL #Snowflake #BigData #Redshift #CloudData #DataEngineering #Bigquery #Interview

To view or add a comment, sign in

María Elena A.

9mo

How to grant access to specific BigQuery tables using Dataform https://lnkd.in/gcHb3-Q4

How to grant access to specific BigQuery tables using Dataform

medium.com

To view or add a comment, sign in

Volodymyr Bilonenko 🇺🇦🇪🇺

Director of Software Engineering at Vay

9mo

🌍 Ever wondered what the "ST_" prefix stands for in BigQuery and Snowflake SQL? 🤔 As you know, many modern databases and warehouses like BigQuery and Snowflake provide support for map geometry operations. All these functions, like ST_ASGEOJSON are prefixed with ST_. Recently I was asked by a colleague what ST_ means and (after using them for many years) did not know. Quick research uncovered a quirky story from SQL standardisation. A popular answer is that ST stands for Spatial Type, but this is not true ❌ BigQuery and likes adopted this prefix from the SQL/MM Spatial standard, and this is the story behind it 📖: At the early development of the SQL/MM standard, the vision was to integrate spatial and temporal data due to their frequent association (Spatial and Temporal = ST). Later, however, it was decided that temporal deserves its own SQL/Temporal standard 🗺️ ⏳ 🙈 Hard to say why the prefix was not updated, but it looks like the ST was copied and pasted into multiple standards from SQL/MM and OGC, and it was easier to keep it while it had no temporal component. 👉 Now you know the truth ST = Spatial and Temporal Did you know you can use the BigQuery GIS function to work with OpenStreetMap's data? Check my already classic article below: https://lnkd.in/euX8inBB

3 Comments

To view or add a comment, sign in

Pavan Kalyan Kolapudi

5mo

Hello friends today we will discuss about most favourite BigQuery(data warehouse )❤️ Google BigQuery can make you spend more money when running SQL queries. Here's how it happens and how you can avoid it.👀 When you query a table in BigQuery, it builds a query like this: SELECT FROM dataset.table_name LIMIT 1000; They don't list the columns in the SELECT statement, tempting you to use a `*` to select all columns. You might think that adding a `LIMIT` will keep the cost down, right? WRONG🚨 In BigQuery, `LIMIT` doesn't reduce query cost. It just discards data you’ve already paid to retrieve. 🚀Key point: BigQuery charges you based on the amount of data you read, not the amount you display. To reduce data retrieval costs, you can: 🏖️Select fewer columns: This approach saves money but isn't always practical if you need all columns. 🏖️Use `TABLESAMPLE`: This lets you retrieve fewer rows, lowering your costs while still allowing you to select all necessary columns. 🏖️Partition your table: By querying only specific partitions with a `WHERE` clause, you only pay for the data in that partition. Here are examples to help you understand: 🎈Selecting fewer columns: SELECT column1, column2 FROM dataset.table_name LIMIT 1000; 🎈Using `TABLESAMPLE`: SELECT * FROM dataset.table_name TABLESAMPLE SYSTEM (10 PERCENT); 🎈Querying a partitioned table: SELECT * FROM dataset.table_name WHERE _PARTITIONTIME = '2023-07-01'; These methods help you control costs while still getting the data you need. Batch starting from Monday onwards only one seat left 🚨, interested candidates dm me provide link to you🥳 We offer tailored services like course training and interview preparation 🤝 Provided sample pic for easy understanding 👇🏻

4 Comments

To view or add a comment, sign in

Hugo Lu

Founder at Orchestra

7mo

This SQL query seems pretty useful for anyone working in Snowflake OR BigQuery. Group By Extensions. Let's start with group by rollup. GROUP BY ROLLUP is an extension of the GROUP BY clause that produces sub-total rows (in addition to the grouped rows). Sub-total rows are rows that further aggregate whose values are derived by computing the same aggregate functions that were used to produce the grouped rows. You can think of rollup as generating multiple result sets, each of which (after the first) is the aggregate of the previous result set. So, for example, if you own a chain of retail stores, you might want to see the profit for: - Each store. - Each city (large cities might have multiple stores). - Each state. - Everything (all stores in all states). You could create separate reports to get that information, but it is more efficient to scan the data once. --- CRUCIAL POINT HERE. This is great for quickly doing segmentation analysis. If you are familiar with the concept of grouping sets (GROUP BY GROUPING SETS) you can think of a ROLLUP grouping as equivalent to a series of grouping sets, and which is essentially a shorter specification. The N elements of a ROLLUP specification correspond to N+1 GROUPING SETS. Check out Group by CUBE and Group by GROUPING SETS. Thanks David Freitag for the inspiration for this one. More people need to know! #dataengineering #sql #bigquery #snowflake

18 Comments

To view or add a comment, sign in

Thaneesh Reddy Chichili

Senior Data Engineer at Softborne || Google Cloud Certified Professional Data Engineer || Google Analytics Certified

2mo Edited

Tablesample it is! I've noticed that many people get anxious when they forget to use the LIMIT clause in their queries. I've often explained how LIMIT works and why it doesn’t contribute to cost optimization. However, this new TABLESAMPLE feature is a real game-changer for controlling costs efficiently! You can even use TABLESAMPLE for joins: -------------------------------------------------- SELECT col1, col2 FROM dataset.table1 T1 TABLESAMPLE SYSTEM (10 PERCENT) JOIN dataset.table2 T2 TABLESAMPLE SYSTEM (20 PERCENT) USING (customer_id) -------------------------------------------------- For understanding more about TABLESAMPLE : https://lnkd.in/gX93f76R #sql #dataengineering #bigquery #gcp

David Freitag

Senior Data Engineer at American Family Insurance | SQL, Python, AWS/GCP, ETL Pipelines

2mo Edited

Google BigQuery is designed to trick you into overpaying when you run SQL queries. Here's how they do it, and how to stop overpaying. When you open a table to query it, BigQuery builds you a query that looks like this: SELECT FROM dataset.table_name LIMIT 1000; They leave the column list for the SELECT statement blank, but the temptation is to throw a * in there. There's a LIMIT on the query, so the cost won't be crazy, right? Wrong. LIMIT does nothing to reduce query cost in BigQuery! It just throws away a lot of data you've already paid for. Key point: BigQuery charges you on the data you retrieve, not the data it displays. One way to retrieve less data: select fewer columns. This defeats the purpose for a lot of queries. I need all the columns I'm picking! What should you do instead? 1. Use TABLESAMPLE to retrieve fewer rows. This reduces your query cost while allowing you to still select all the columns you want. 2. Partition your table. Use the partition column in your WHERE clause, and you only pay for querying the data in that partition. Check out the code sample below to see these in action. 👇 #sql #dataengineering #bigquery —————————— 📌📌Liked this post? You’ll love my free SQL cheat sheet. It's a complete guide with real examples. Get a copy here: 👉 https://lnkd.in/eeQB9SnC

3 Comments

To view or add a comment, sign in

Boxplot’s Post

How to Import Data into BigQuery - KDnuggets

kdnuggets.com

Explore topics

Boxplot’s Post

More Relevant Posts

Explore topics