Lab2-2 - More SQL

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 6

CS1703(1805): Data and Information (2015/16)

Dr Timothy Cribbin, Brunel University London

Laboratory Tutorial 2-2: More SQL


In this laboratory tutorial you will:

1. Construct more complex SQL SELECT statements to generate results tables that satisfy a
range of given information requests

2. Use aggregate functions to create summary results tables

3. Create SELECT statements that join columns from two or more tables

4. Create a SELECT statement containing a sub-query

Preamble
In the first SQL tutorial you gained experience in using the four main DML commands: SELECT,
UPDATE, INSERT, DELETE. If you have not successfully completed this tutorial, go back and do so
now. In this tutorial you will gain experience in creating more complex SELECT statements.
Specifically, in the first exercise you will use aggregate functions to create summary results tables. In
the second exercise you will write statements that retrieve data from more than one source table,
an operation known as a join. Finally, in the third exercise you will apply all that you have learnt so
far by writing statements to satisfy a range of different information requests. Note that some of
these final questions will be quite testing, so be prepared to take some time over these.

As before, you should be using the SQL Lab interface along with the StayHome database to perform
these exercises. The questions that require a value answer assume the contents held in the original
version (instance) of the database. Unless you have fully reversed any changes you have made (e.g.
UPDATEs, DELETES), replace your current copy with the original file from Blackboard.

Exercise 1: Aggregate functions


The aggregate functions that are built into SQL make it possible to create summary tables from data
in the source tables. Remember there are five main aggregate functions: COUNT, SUM, AVG, MIN
and MAX.

The operation of these functions is very similar to the descriptive statistics functions in SPSS in that
they take a column of data as input and output a single value, such as the average value or the
number of distinct records satisfying a given condition. For instance, this simple example (on the left)
returns the total number of records in the Staff table (on the right):

SELECT COUNT(*) AS allRows allRows|

FROM Staff; 6|

1
CS1703(1805): Data and Information (2015/16)
Dr Timothy Cribbin, Brunel University London

Check that this is correct by selecting the Staff table and counting the rows for yourself. What
happens if you replace the ‘*’ with a field name like ‘StaffNo’?

You should find that the result is the same. However, using a named field name, instead of an
asterisk ‘wildcard’, means that records containing a null value in the specified field are ignored. For
instance, suppose (for a moment) two of the staff were unpaid volunteers who did not receive a
salary and thus had null (no) value in the salary field:

SELECT COUNT(Salary) AS paidStaff FROM Staff;

This statement would return a count of four, rather than six. However, equally, we could have
specified a WHERE condition as so:

SELECT COUNT(staffNo) AS paidStaff

FROM Staff

WHERE salary > 0;

This statement instructs the DBMS to count all records that have a non-null value for staffNo (which
is the primary key so should not be null) but ignore all records that have either a null or zero value in
the salary field. Now try this one:

Q1: Complete the following query so that the result lists the total number of staff who earn more
than $40000:

SELECT [?](staffNo) AS totalStaff FROM STAFF [?] salary > [?];

Note: You could place the DISTINCT keyword before the aggregate function if you want the DBMS to
disregard any duplicate records during the count. DISTINCT may have an effect when computing the
average or sum of a column but has no effect on MIN or MAX calculations.

The AVG, SUM, MIN and MAX functions all work in the same way. Check the lecture slides/text book
if you need some reminders and then complete the following queries:

Q2: Complete the following query so that the result lists the minimum, maximum, average and
total salary for all staff working for the business:

SELECT [?](salary) AS minSal, MAX([?]) AS maxSal, [?] AS avgSal, SUM(salary) AS totSal FROM Staff;

Q3: What is the total annual cost of salaries at StayHome?

Often it is desirable to compute aggregate functions across multiple categories or groups. For
instance we might want to find out the total number of staff working at each grade or the number of
DVD copies of a movie held at each distribution centre. To create a table of results by category we
use the GROUP BY clause.

2
CS1703(1805): Data and Information (2015/16)
Dr Timothy Cribbin, Brunel University London

Q4: Complete the following query so that the result lists the total number of DVD copies stored at
each distribution centre:

SELECT branchNo, [?] FROM DVDCopy [?] [?];

A grouped result table will have as many rows as there are categories within the attributes specified
by the GROUP BY clause. Often, it will only be necessary to see results for a subset of categories. For
instance, you might want to compute the average salary for staff by distribution centre, but only for
the largest centres. Constraints on groups to be displayed are set using the HAVING clause, which
follows immediately after GROUP BY:

Q5: Complete the following query so that the result lists the number of staff and average salary at
each distribution centre, BUT ONLY for distribution centres that comprise more than one staff
member:

SELECT dCenterNo, COUNT(staffNo) AS noStaff, [?] AS avgSal

FROM Staff

GROUP BY [?]

HAVING [?](staffNo) [?] 1;

Note: It is important that field names specified within HAVING appear either within an aggregate
function in the SELECT list or within the GROUP BY list.

Exercise 2: Joining data from multiple tables


Recall from the first DB lecture that relationships between tables are represented as common
(shared) fields, where the candidate (usually primary) key in one table is the foreign key in the other
table. For instance in the StayHome DB, the field dCenterNo relates the records in DistributionCenter
(primary key) table to records in the Staff (foreign key) table. This makes it possible, for instance, to
look-up a member of staff’s work address using just their name or staff number.

An SQL join operation combines data from two or more tables by pairing up records that have the
same values according to some common field. Recall from the lecture, that table names are specified
within the FROM clause, with names separated by commas. When stating field names that are
common to both tables, the source table must be specified to avoid any ambiguity. For instance the
following query joins the Actor and Role tables to display movie characters and the actors that play
them:

SELECT Actor.ActorNo, ActorName, character

FROM Actor, Role

WHERE Actor.ActorNo = Role.ActorNo;

3
CS1703(1805): Data and Information (2015/16)
Dr Timothy Cribbin, Brunel University London

Note that because the ActorNo field is common to both tables (refer back to the table descriptions
shown in Appendix A of Laboratory Tutorial 2-1), it is important to be clear about which table is
being referenced at any given point in the statement. This is particularly critical when it comes to the
WHERE clause, which specifies the nature of the join. For instance, “WHERE Actor.ActorNo =
Actor.ActorNo” would not make any sense (it’s a circular reference) because we wish to join rows
across the two tables that share the same Actor number. If you have entered the query correctly,
the results table should look like this:

Remember also, you have the option to declare an alias to make the disambiguation process more
concise. Aliases are declared in the FROM clause. The following query produces identical results to
the first, using the table aliases A for Actor and R for Role:

SELECT A.ActorNo, ActorName, character

FROM Actor A, Role R

WHERE A.ActorNo = R.ActorNo;

Now let’s try another query, of similar difficulty, but joining different tables this time:

Q6: Complete the following query so that the result lists the catalogue number and video number
of all copies of “Casino Royale” held by the business:

SELECT DVD.[?], videoNo

FROM [?], DVDCopy

[?] [?] = DVDCopy.catalogNo AND [?] = 'Casino Royale';

The examples so far have been simple two-way joins. It is possible to join together data from three
or even more separate tables where appropriate relations exist. For instance, try the following
example of a three-way join:

Q7: Complete the following query so that the result lists all of the DVD copies (identified by video
number) supplied by 20th Century Fox Home Videos along with their titles and current availability
for rental:

4
CS1703(1805): Data and Information (2015/16)
Dr Timothy Cribbin, Brunel University London

SELECT title, videoNo, [?]

FROM DVD, DVDCopy, [?]

WHERE Supplier.[?] = DVD.supplierNo AND DVD.catalogNo = DVDCopy.catalogNo AND [?] = '20th


Century Fox Home Videos';

Q8: From the results of the last query, you should see that only one copy of “War of the Worlds” is
available. What is its video number?

Exercise 3: Sub-selects
Let’s say we wanted to retrieve just the names and dCenterNo of workers at the distribution center
at 8 Jefferson Way. We could do a join as follows:

SELECT name, d.dCenterNo FROM Staff s, DistributionCenter d WHERE d.dCenterNo = s.dCenterNo


AND d.dStreet = '8 Jefferson Way'

The same result could, however, be achieved using a sub-select (sub-query).

Q9: Complete the following “sub-select” version of this query:

SELECT name, dCenterNo FROM [?] WHERE dCenterNo = (SELECT [?] FROM [?] WHERE dStreet = '8
Jefferson Way')

Some database experts argue that where such alternatives exist, the join option generally completes
more quickly than the equivalent sub-select option. Others argue it depends on the nature of the
query and it’s often worth trying both to see which is more efficient. The small size of the StayHome
database means that any difference here is likely to be negligible, but if you are working with tables
containing thousands or millions of rows, then choosing the right option might result in faster
execution times.

Exercise 4: More SELECT practice


What follows is a series of information requests that require you to ‘mix and match’ the different
SQL clauses and keywords covered in the previous exercises to create the appropriate
statement/command. Some are quite tricky so you might need to spend some time working out the
solution. If you get really stuck, please ask a tutor during the laboratory session.

Q10: Complete the following query so that the result lists full details of all staff members working
in distribution centre ‘B002’:

SELECT * FROM Staff [?] [?] = [?];

Q11: Complete the following query so that the result lists the total number of DVDs rented by
member ‘M284354’:
5
CS1703(1805): Data and Information (2015/16)
Dr Timothy Cribbin, Brunel University London

SELECT COUNT(*) as rentalCount FROM [?] [?] memberNo = [?];

Q12: Complete the following query so that the result lists the number and name of all staff who do
not have a designated supervisor:

SELECT [?], name FROM Staff WHERE [?] IS [?];

Q13: Complete the following query so that the result lists the total number of DVD copies
available for rental at each of the distribution centres. Rows should be presented in descending
order of number of copies available:

SELECT [?](videoNo) AS totalDVD, branchNo FROM [?] WHERE [?] = true GROUP [?] branchNo
ORDER BY COUNT(videoNo) [?];

Q14: Complete the following query so that the result lists all distribution centres and their total
staff salary costs but only where such costs are greater than $50,000:

SELECT dCenterNo, SUM(salary) as totalSalary FROM Staff [?] BY dCenterNo [?] [?](salary) > [?];

Q15: Complete the following query so that the result lists the name of all of the managers and the
address of the branch that they work in:

SELECT name, dStreet, dCity, dState, dZipCode FROM Staff, [?] WHERE [?] =
DistributionCenter.dCenterNo AND [?] = ‘Manager’;

Remember, in your actual query, you must surround the position field name with square brackets as
it is also an SQL keyword, however you must omit these brackets in your answer as Blackboard
treats them as special characters.

Summary and further work


In this tutorial you have learned how to perform more sophisticated SELECT queries. Specifically, you
have learnt how to use aggregate functions and how to perform join operations that link the data
contained in related tables together. You should continue to practice your SQL by generating your
own queries and action commands (i.e. UPDATES, INSERTS, DELETES). Try combining a join or sub-
select with an aggregate function, for instance.

Have fun.

Further Reading
Connolly, T., Begg, C. & Holowczak, R (2008) Business Database Systems (Chapter 3).

You might also like