Start free trial Sign in

From the course: End-to-End Real-World Data Engineering Project with Databricks

Loading the sample data

From the course: End-to-End Real-World Data Engineering Project with Databricks

Start my 1-month free trial Buy for my team

Loading the sample data

“

- [Instructor] Now, it's time to load the data. In the real world, we assume that we will have our upstream system, which is going to place the data in these folders. It might be possible that in the real world case, if you are using an Azure Cloud or maybe the AWS Cloud, you're going to keep this data into those respective cloud storage location as well, but to keep the things simple for this specific project, we are keeping our data within the Databricks, Databricks file system itself. So let's assume that we have a upstream system and they are pushing the data under these folders, so, let's upload the data in the bronze layer. Imagine that we have the customer data available in terms of CSV, to upload this here, click on Upload button and select the customer CSV file, you can find this file under the exercise and the resource attached with the scores, and just selecting it from my machine, say customer.csv and say, Done. So I have uploaded my customer, CSV, let's similarly upload the product data. You find the product json file and upload that product json file under this product catalog folder. And for the transaction log, we have the parquet file, so upload the transaction parquet file in this folder and Done. So our initial setup is ready, we have created the workspace, we have created the folders, and we have also uploaded the sample data into these specific folders. Now, from next video onwards, we're going to start writing our Databricks code. Let's move there.

Contents