CasJobs Guide

Introduction

Welcome to the Catalog Archive Server Jobs System, or CasJobs. This guide assumes you have experience with basic SQL syntax. If you'd like to learn more about SQL, you can follow the SkyServer link above which has a SQL Tutorial in its Help section, but there are also plenty of other tutorials on the web. This guide will cover the slightly simpler varient of SQL that CasJobs uses.

Registration

Before you can query this system you'll need to register and then log in. Your queries and their resulting data are associated with your account, so don't forget your login! But if lightning strikes, skies fall and you've consequently forgot your password it can be emailed to you from here. Your email may be used to contact you about your account, as well as optionally notifing about query completion. Your UserID will be used to identify you to the group world; should you choose to participate in this please pick a 'nice' UserID. A personal database, or MyDB is also created and assigned to your account on registration.

Running a Query in CasJobs

The Query page is the first page you will see once you log into CasJobs. However, before you run a query, you need to select the 'Context', i.e., the database on which your query will be run. Please read the following few sections including 'Context and Queues' before you attempt to run a query for the first time, else you may get unexpected errors (e.g. "Invalid object name"). Also remember that only queries on MyDB tables should generally be run in the MyDB context, because the MyDB context has very limited resources for running queries.

MyDB

Your MyDB is a personal database created just for you upon registration. You have full privileges in this context; you can create/alter/drop tables, functions and stored procedures. It is designed to be a sort of staging area where you refine your requested data before eventually extracting it to a local copy in CSV, FITS, etc. format. Please note that you can only extract data from MyDB, so any data you will eventually want a local copy of must first be put into MyDB.

Getting Data Into MyDB

Getting data into MyDB is easy. Just write your query, then click submit and it will automatically create a table using the name from the 'Table' field (directly above the query box) in MyDB containing the results of your query. Alternatively, you use an 'into' statement in your query; an example of this is below.

Select top 1000 objid,ra,dec into mydb.MyFirstTable from photoobj

This will create a table called 'MyFirstTable' in MyDB containing the columns specified by the select clause. 'Into' statements are generally placed directly before the 'from' keyword of a query. If your query doesn't have a 'from', then 'into' just goes at the end. An into statement in your query always takes precedence over whatever is in the 'Table' field, so if you have both, only the one in the query is used. Please note that the 'Table' field is only used when the submit button is clicked; queries submitted using the quick method will display results by default instead of automatically creating a table.

Importing Data Into MyDB

You can also import your own, custom data into MyDB. This is done from the 'Import' page, link located at the top. More info on importing tables is on that page.

Downloading Data From MyDB

Once you've got a table in MyDB, you can then extract and download that table in CSV, VOTABLE or FITS file format. This is done from the MyDB page. To do this, click MyDB (at the top), then click the table on the left, choose your output type then click extract. You can only extract tables in MyDB.

Context and Queues

'Context' is the database on which your query will be run. It is determined by the Context drop down list on the query page. So if something like 'DR10' is selected, then your query will be run on dr10, if 'DR7' then DR7, etc. You should always run your query in the context where most of the data resides, not the MyDB context. Only queries on MyDB tables should generally be run in the MyDB context, because the MyDB context has very limited resources for running queries. The 'Queue' selection, also a drop down list, will determine the maximum time a query will be permitted to run. Every context has at least one queue. If your chosen context has more than one queue it is best to choose the queue that is closest to the amount of time you think your query will take. Queues are optimized for queries of their time limit, so chosing an appropriate queue is generally faster.

Referencing Objects Outside of Context

For those situations where you'd like to reference data outside of your selected context you can prefix the foreign object. If you've been following this guide, you've already done this. This query uses a 'MyDB' prefix:

Select top 1000 objid,ra,dec into mydb.MyFirstTable from photoobj

So assuming the context of this query was 'DR1', this means that 'MyFirstTable' is to be created at 'MyDB', not 'DR1'. Using prefixes, you could rewrite this query so it runs in 'MyDB' context, like:

Select top 1000 objid,ra,dec into MyFirstTable from dr1.photoobj

Since the context has changed to 'MyDB', this time the prefix 'dr1' indicates that the photoobj table is not 'mydb', but 'dr1'. Context prefix can be used on tables and stored procedures, but not functions (see function limitations). Valid context prefixes are any that appear in the context drop down list in the query page.

Quick Execution

If you head over to the query page, you'll notice there are two buttons at the bottom of the query space, one labelled 'submit', the other 'quick'. These are different types of query execution. 'Quick' is a synchronous execution action; it will run your query right away at the currently selected context and then return the results right below the query window. It is the quickest way to run something and is ideal for quick things like creating functions, procedures, top 10's, etc. However there are restrictions on this form of execution; quick queries are limited to one minute of execution and can only return about one meg of data before they will be cancelled.

Submit Execution

For queries that exceed 'quick' limitations use the 'Submit' button. 'Submit' is an asynchronous execution action; it does not return the results right away. It will submit your query to the selected context and queue and will run it as soon as a slot in that queue is available. You can check the status of your queries from the History page, link located at the top of the page. (must be logged in)

Sharing Data, Using Groups

Groups are a mechanism for allowing other people to view specified tables in your mydb. They are managed from the 'Groups' page, link located at the top. From there you can join a group, create a group, or invite other users to join your group. Users within a group can see and reference tables that other group members have published. You can publish any table in your mydb from the mydb page by clicking on the desired table and then clicking publish. You can remove a table from publication at any time, but you cannot drop or rename that table until it has been depublished. Group tables that you can see are listed at the mydb page under tables. You can reference that table in your own query by using the same name you see at the mydb page, specifically group.[user].[tablename]. Other group members have read only access to your published tables; they cannot change or drop them.

CasJobs Limitations due to Distributed Execution

One of the main purposes of the CasJobs service is to provide load-balancing, i.e. distribute the queries among different servers that serve the SDSS data. This is accomplished using the SQL Server linked server functionality. However, this does introduce certain limitations and differences between standard SQL capabilities and CasJobs which may be confusing for users. The most prominent example of this is the inability to run DROP TABLE and CREATE TABLE commands outside of the MyDB context. For example, the following commands will not work in the DR4 context:

DROP TABLE mydb.mytable_21
CREATE TABLE mydb.mytable_21 (objid int, ra float, dec float)

CasJobs will return an error message indicating that there are too many prefixes in front of the table name. This error is in fact coming from SQL Server which does not allow tables to be dropped or created across linked servers (remotely). The difference between INSERT and SELECT ... INTO commands and DROP/CREATE TABLE commands is that SQL Server actually permits inserting into a table remotely, but not dropping or creating it.

The same applies to calling functions remotely - SQL server does not allow calling functions remotely. Hence you cannot call functions outside your current context, e.g., you cannot run the command

SELECT DR4.fGetNearestObjidEq(180,-0.5,1.0)

in the MyDB context.

CasJobs Syntax vs. SkyServer/sdssQA Syntax

If you're used to joining against the target db through the skyserver interface with something like TARGDR2..PhotoObj, CasJobs works very much the same way. The difference is the prefix you use for CasJobs must be a valid context (listed in the context drop down menu). So, if 'dr1' was your current context and you were interested in joining with a context called 'targdr2', you could say 'targdr2.photobj', assuming you were running from within 'dr1' context. CasJobs will also accept the '..' construct. See 'Referencing Objects Outside of Context' for more info.

CasJobs vs. SQL

CasJobs uses a virtual prefix system rather than the three part prefix system used by sql server. A virtual prefix is simply a context name, such as "DR1" or "MyDB". So instead of using a multi-part prefix, like:

Select x,y,z from dr1.dbo.sometable

Select x,y,z from dr1..sometable

You would simply use the context as the prefix, like this:

Select x,y,z from dr1.sometable

Just like SQL, you would use prefixes whenever you are referencing something that is outside of your current selected context.