From the course: Data Engineering Foundations
Unlock the full course today
Join today to access over 24,100 courses taught by industry experts.
Solution: Transforming data
From the course: Data Engineering Foundations
Solution: Transforming data
(upbeat music) - [Tutor] So here's the solution to the challenge. The first function that you are going to use to group all the ratings is group by. Then you would further add the column on which you want to group. So that is movie underscore id. To calculate the average rating, you would use the mean function. The column name is rating. Then in order to merge the two data frames we would have to provide the common column, which is .id, and the movies underscore data fame and movie underscore id, and average rating data frame. Finally, we would have to print the final data frame which is df.show. And that is it. We would have the transformed data frame. And this is the final data frame that you would see.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
(Locked)
Sources of data extraction4m 46s
-
(Locked)
Data extraction from a PostgreSQL database6m 55s
-
(Locked)
Challenge: Data extraction40s
-
(Locked)
Solution: Data extraction51s
-
(Locked)
Transforming data2m 3s
-
(Locked)
Challenge: Transforming data42s
-
(Locked)
Solution: Transforming data58s
-
(Locked)
Loading data into a DB4m 11s
-
(Locked)
Challenge: Loading data59s
-
(Locked)
Solution: Loading data1m
-
(Locked)
Scheduling ETL pipeline using Airflow9m 3s
-
(Locked)
-