Lab2-2 - More SQL
Lab2-2 - More SQL
Lab2-2 - More SQL
1. Construct more complex SQL SELECT statements to generate results tables that satisfy a
range of given information requests
3. Create SELECT statements that join columns from two or more tables
Preamble
In the first SQL tutorial you gained experience in using the four main DML commands: SELECT,
UPDATE, INSERT, DELETE. If you have not successfully completed this tutorial, go back and do so
now. In this tutorial you will gain experience in creating more complex SELECT statements.
Specifically, in the first exercise you will use aggregate functions to create summary results tables. In
the second exercise you will write statements that retrieve data from more than one source table,
an operation known as a join. Finally, in the third exercise you will apply all that you have learnt so
far by writing statements to satisfy a range of different information requests. Note that some of
these final questions will be quite testing, so be prepared to take some time over these.
As before, you should be using the SQL Lab interface along with the StayHome database to perform
these exercises. The questions that require a value answer assume the contents held in the original
version (instance) of the database. Unless you have fully reversed any changes you have made (e.g.
UPDATEs, DELETES), replace your current copy with the original file from Blackboard.
The operation of these functions is very similar to the descriptive statistics functions in SPSS in that
they take a column of data as input and output a single value, such as the average value or the
number of distinct records satisfying a given condition. For instance, this simple example (on the left)
returns the total number of records in the Staff table (on the right):
FROM Staff; 6|
1
CS1703(1805): Data and Information (2015/16)
Dr Timothy Cribbin, Brunel University London
Check that this is correct by selecting the Staff table and counting the rows for yourself. What
happens if you replace the ‘*’ with a field name like ‘StaffNo’?
You should find that the result is the same. However, using a named field name, instead of an
asterisk ‘wildcard’, means that records containing a null value in the specified field are ignored. For
instance, suppose (for a moment) two of the staff were unpaid volunteers who did not receive a
salary and thus had null (no) value in the salary field:
This statement would return a count of four, rather than six. However, equally, we could have
specified a WHERE condition as so:
FROM Staff
This statement instructs the DBMS to count all records that have a non-null value for staffNo (which
is the primary key so should not be null) but ignore all records that have either a null or zero value in
the salary field. Now try this one:
Q1: Complete the following query so that the result lists the total number of staff who earn more
than $40000:
Note: You could place the DISTINCT keyword before the aggregate function if you want the DBMS to
disregard any duplicate records during the count. DISTINCT may have an effect when computing the
average or sum of a column but has no effect on MIN or MAX calculations.
The AVG, SUM, MIN and MAX functions all work in the same way. Check the lecture slides/text book
if you need some reminders and then complete the following queries:
Q2: Complete the following query so that the result lists the minimum, maximum, average and
total salary for all staff working for the business:
SELECT [?](salary) AS minSal, MAX([?]) AS maxSal, [?] AS avgSal, SUM(salary) AS totSal FROM Staff;
Often it is desirable to compute aggregate functions across multiple categories or groups. For
instance we might want to find out the total number of staff working at each grade or the number of
DVD copies of a movie held at each distribution centre. To create a table of results by category we
use the GROUP BY clause.
2
CS1703(1805): Data and Information (2015/16)
Dr Timothy Cribbin, Brunel University London
Q4: Complete the following query so that the result lists the total number of DVD copies stored at
each distribution centre:
A grouped result table will have as many rows as there are categories within the attributes specified
by the GROUP BY clause. Often, it will only be necessary to see results for a subset of categories. For
instance, you might want to compute the average salary for staff by distribution centre, but only for
the largest centres. Constraints on groups to be displayed are set using the HAVING clause, which
follows immediately after GROUP BY:
Q5: Complete the following query so that the result lists the number of staff and average salary at
each distribution centre, BUT ONLY for distribution centres that comprise more than one staff
member:
FROM Staff
GROUP BY [?]
Note: It is important that field names specified within HAVING appear either within an aggregate
function in the SELECT list or within the GROUP BY list.
An SQL join operation combines data from two or more tables by pairing up records that have the
same values according to some common field. Recall from the lecture, that table names are specified
within the FROM clause, with names separated by commas. When stating field names that are
common to both tables, the source table must be specified to avoid any ambiguity. For instance the
following query joins the Actor and Role tables to display movie characters and the actors that play
them:
3
CS1703(1805): Data and Information (2015/16)
Dr Timothy Cribbin, Brunel University London
Note that because the ActorNo field is common to both tables (refer back to the table descriptions
shown in Appendix A of Laboratory Tutorial 2-1), it is important to be clear about which table is
being referenced at any given point in the statement. This is particularly critical when it comes to the
WHERE clause, which specifies the nature of the join. For instance, “WHERE Actor.ActorNo =
Actor.ActorNo” would not make any sense (it’s a circular reference) because we wish to join rows
across the two tables that share the same Actor number. If you have entered the query correctly,
the results table should look like this:
Remember also, you have the option to declare an alias to make the disambiguation process more
concise. Aliases are declared in the FROM clause. The following query produces identical results to
the first, using the table aliases A for Actor and R for Role:
Now let’s try another query, of similar difficulty, but joining different tables this time:
Q6: Complete the following query so that the result lists the catalogue number and video number
of all copies of “Casino Royale” held by the business:
The examples so far have been simple two-way joins. It is possible to join together data from three
or even more separate tables where appropriate relations exist. For instance, try the following
example of a three-way join:
Q7: Complete the following query so that the result lists all of the DVD copies (identified by video
number) supplied by 20th Century Fox Home Videos along with their titles and current availability
for rental:
4
CS1703(1805): Data and Information (2015/16)
Dr Timothy Cribbin, Brunel University London
Q8: From the results of the last query, you should see that only one copy of “War of the Worlds” is
available. What is its video number?
Exercise 3: Sub-selects
Let’s say we wanted to retrieve just the names and dCenterNo of workers at the distribution center
at 8 Jefferson Way. We could do a join as follows:
SELECT name, dCenterNo FROM [?] WHERE dCenterNo = (SELECT [?] FROM [?] WHERE dStreet = '8
Jefferson Way')
Some database experts argue that where such alternatives exist, the join option generally completes
more quickly than the equivalent sub-select option. Others argue it depends on the nature of the
query and it’s often worth trying both to see which is more efficient. The small size of the StayHome
database means that any difference here is likely to be negligible, but if you are working with tables
containing thousands or millions of rows, then choosing the right option might result in faster
execution times.
Q10: Complete the following query so that the result lists full details of all staff members working
in distribution centre ‘B002’:
Q11: Complete the following query so that the result lists the total number of DVDs rented by
member ‘M284354’:
5
CS1703(1805): Data and Information (2015/16)
Dr Timothy Cribbin, Brunel University London
Q12: Complete the following query so that the result lists the number and name of all staff who do
not have a designated supervisor:
Q13: Complete the following query so that the result lists the total number of DVD copies
available for rental at each of the distribution centres. Rows should be presented in descending
order of number of copies available:
SELECT [?](videoNo) AS totalDVD, branchNo FROM [?] WHERE [?] = true GROUP [?] branchNo
ORDER BY COUNT(videoNo) [?];
Q14: Complete the following query so that the result lists all distribution centres and their total
staff salary costs but only where such costs are greater than $50,000:
SELECT dCenterNo, SUM(salary) as totalSalary FROM Staff [?] BY dCenterNo [?] [?](salary) > [?];
Q15: Complete the following query so that the result lists the name of all of the managers and the
address of the branch that they work in:
SELECT name, dStreet, dCity, dState, dZipCode FROM Staff, [?] WHERE [?] =
DistributionCenter.dCenterNo AND [?] = ‘Manager’;
Remember, in your actual query, you must surround the position field name with square brackets as
it is also an SQL keyword, however you must omit these brackets in your answer as Blackboard
treats them as special characters.
Have fun.
Further Reading
Connolly, T., Begg, C. & Holowczak, R (2008) Business Database Systems (Chapter 3).