Andrew H. Karp, Sierra Information Services, Sonoma, CA USA: Getting in To The Picture (Format)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

SUGI 31 Tutorials

Paper 243-31

Getting in to the Picture (Format)


Andrew H. Karp, Sierra Information Services, Sonoma, CA USA

ABSTRACT
Although part of the SAS Format facility for well over two decades, PICTURE Formats are often not well understood,
and consequently tend to be underutilized by even experienced SAS users. Yet, they provide a wealth of tools to
effectively portray the values of numeric variables and often avoid the need for either tedious data step coding or to
create new variables in your data sets. This tutorial describes how PICTURE Formats “work,” how to write them in
PROC FORMAT statements and how to apply them in both Data and Procedure Steps. By mastering the concepts
and techniques shown in this paper you will be able to apply the power of PICTURE formats in your SAS programs
and in so doing often complete your data management, reporting and analysis tasks faster and easier than resorting
to other potentially more tedious and time-intensive methods in the SAS System.

INTRODUCTION
A Format contains the instructions used by the SAS System to display, or portray the values of variables. More
formally, formats control the “external representation” of a variable’s values. There are two broad classes of SAS
Formats: VALUE Formats and PICTURE Formats. VALUE Formats are either “supplied” or “internal” to the SAS
System (that is, they ‘come with’ our installation of SAS System Software) or you can create your own using PROC
FORMAT and the VALUE Statement. VALUE Formats can be used to externally represent the values of either
character or numeric variables.

Experienced SAS users are already aware of the broad array of tools, as well as the flexibility offered by appropriate
use of both SAS-supplied and user-created VALUE Formats. Using them enhances the appearance of our output,
reduces ambiguity about the definitions of values of variables appearing in our report, and can be used as a resource-
saving alternative to aggregating (that is, “rolling up”) data from one unit of analysis to another. They are also
exceptionally well suited for tasks such as “recoding” or “bucketing” the values of a variable in to a smaller number of
discrete groups. For more information about Value Formats, please see my paper, “My Friend the SAS Format,” in
the SUGI 30 Proceedings.

Picture Formats are different from Value Formats in two critical respects. First, Picture Formats can be used ONLY
with numeric variables. Second, a Picture Format creates a template that is used to display the values of numeric
variables. (Note: A “Picture Format Template” is NOT the same thing as a Style or Table Template in the Output
Delivery System.). Also, there are no “SAS-supplied” Picture Formats; rather, you create them using the PICTURE
Statement in PROC Format. Picture Formats are stored in a FORMATS Catalog in a SAS Library, just like Value
Formats. And, you can use PROC FORMAT tools such as the FMTLIB, CNTLIN, CNTLOUT and MULTILABEL
Options with Picture Formats as well as Value Formats.

So, what is a Picture Format Template? It is a series of commands that control how the values of numeric variables
are displayed in your SAS-generated output. Once you master the core concepts and functionalities of Picture
Formats you will find them a powerful and flexible tool with which to enhance the quality of your reports and analyses.

GETTING STARTED WITH PICTURE FORMATS: A BASIC EXAMPLE AND ESSENTIAL CONCEPTS

PICTURE FORMAT VS. THE DATA STEP…THE WINNER IS?


To fix ideas, suppose we have a numeric variable in a data set representing a series of telephone numbers in the
United States. The first three digits are the area code, the next three are the “exchange” and the last four are the
number itself. What we want to do is insert a hyphen (dash) between the third and fourth digit and between the sixth
and seventh digit. Here’s an example data set, where the “raw data” are included within the Data Step creating the
data set using a “Datalines” statement.

* example 1;
data phones1;
input phone_number;
datalines;
2022933923
4154410702
9083038859
7079351413
;

1
SUGI 31 Tutorials

run;

options nonumber nocenter nodate;


proc print data=phones1;
title 'Getting in to the Picture Format';
title2 'Phone Number Data Set 1';
run;

The PROC PRINT-generated output is:

Now, what the boss really wants is to have the output displayed with hyphens between the area code and exchange
and between the exchange and the remaining four digits of the telephone number. Since we don’t have a SAS-
supplied “telephone number format,” there’s no help there. And, it’s pretty clear that a VALUE format isn’t going to
help either, since it will control the display (that is, the representation) of the values of the variable.

One approach might be to turn the telephone number in to a character, rather than a numeric, variable and then use a
combination of the SUBSTR (substring) SAS Programming Language Function and the concatenation operator in a
data step to “break apart” the “pieces” of the telephone number and then essentially reassemble them in a new
variable with the required hyphens. A Data Step implementing this approach might look like this (with a PROC PRINT
to display the results):

The output is shown on the next page:

2
SUGI 31 Tutorials

Well, we got what we wanted, but at the expense of doing a lot of work
in the Data Step (i.e., numeric-to-character conversion of the value of
the original telephone number variable, then a fairly complex
assignment statement to display the values of telephone number in
the desired way. While there are other ways this Data Step could
have been written, the “take-home message” here is that because of
the tools available in the Picture Format facility, there’s no need to
apply a Data Step approach to this problem in the first place.

EXAMPLE 1: USING A SIMPLE PICTURE FORMAT


Here’s how a Picture Format is created, and then applied to the
telephone number data set. After seeing what it does, we’ll go over
the syntax and options of this initial, and very basic Picture Format,
and use it to set the stage for identifying more of the Picture Format’s functionalities.

Name of
Picture Format

The template, showing a


series of digit selectors

Range of values to which the


Picture Format will be applied

Associating the Picture


Format to a Variable

Let’s go over the syntax step by step. First, we’re calling, or starting the Format Procedure with the PROC FORMAT
statement, just like we would to create a Value Format. But, the PICTURE Statement tells the PROC that we’re about
to create a Picture Format, the name of which is given immediately to the right of “PICTURE.” So, far, we know we’re
creating a Picture Format called “phone_a.” Next, we’re supplying the range of values to which the Picture is going to
be applied. In this example, we’re using the ‘low’ and ‘high’ keywords, separated by a dash, to instruct PROC
FORMAT that the Picture Format we’re creating will be used to display all (non-missing) values of the variable to
which it will be associated in either a subsequent Data or Procedure Step. To the right of the equals sign is the
template, or instructions as to how the value of the variable is to be displayed. The strings of “9’s” are called digit
selectors, and will be explained in more detail later in this paper.

Once the Phone_a Picture Format is created, we can associate it to the values of a variable in, say a PROC PRINT
Step, and see the results. Here they are:

Just from this basic example we can see that the Picture Format gave us exactly what we needed without a lot of
tedious, potentially error-prone and inefficient Data Step programming. With the Picture Format, we did NOT need to
do anything other than create the format and then associate it to a variable to obtain the data display we need.

3
SUGI 31 Tutorials

Now, let’s identify some other basic, but very useful aspects of the Picture Format facility before delving its details and
advanced capabilities. Suppose, having seen the fine job you’ve done on the previous task, the boss changes her
mind and asks that the area code be enclosed in parentheses, with one space between the right parenthesis and the
exchange, and she still wants a hyphen between the exchange and the rest of the telephone number. Most of you
know this as a typical “boss question” that usually starts with something like “Well, this is nice, but how hard would it
be to make a couple of tiny changes…”

Fortunately, the Picture Format’s PREFIX option will give us just what we want. While we’ll get in to the details of digit
selectors and other details of the Picture Format facility shortly, one core rule for Picture Formats is that if you use
digit selectors as the template for displaying your data, the first position of the template must be a digit selector. That
means we can’t make the right parentheses we need part of the template itself, since it is obviously not a digit.
Instead, we will instruct PROC FORMAT to make the left parentheses the prefix to the template, which will start with
the (required) digit selector. Here we go:

This PROC FORMAT task shows you two important options to the PICTURE statement. First, we’re making the
default length of the Picture Format 16 characters, which is wide enough to accommodate BOTH the template AND
the specified prefix. The prefix we want is given in the PREFIX option, which, like the DEFAULT option is enclosed in
parentheses. Be careful: The PREFIX option is in the parentheses, and the value of the option is a left parentheses
symbol enclosed in single quotes. The PROC PRINT Output is:

So, with just a small amount of additional work in the PROC FORMAT step, we have exactly what the boss wants
(until, of course, she changes her mind again) without resort to complex and tedious Data Step coding.

One more fairly basic example and then we’ll take a more detailed look at
PICTURE Statement syntax, options, rules and more advanced capabilities.
Suppose we have a data set that looks like the one shown to the left.

In the United States, some of the values of Number in this data set correspond
to local numbers (seven digits, three for exchange and four for the number) and
others to long distance number (ten digits, with the first three representing the
area code. We also have three “bad” observations in the data set. A valid local
number (again, in the USA), has to be at least seven digits long and must start

4
SUGI 31 Tutorials

with the number two. Valid area codes in long distance numbers start with the number two. Under these “rules,”
observations 5, 6 and 10 have invalid values of a telephone number. What we now need to do is create a picture
format that has three “rules” or value ranges to it: 1) if the length of the telephone number is ten digits and starts with
the number two, then we want the area code enclosed in parentheses and a hyphen separating the exchange from
the number or, 2) if the length of the telephone number is seven digits and the first digit is a two, then we want a
hyphen separating the exchange from the number or, 3) first digit of a ten or seven digit number is a 1, or the length
of the phone number is LESS than seven digits then we want “Invalid Number” displayed in our output. Needless to
say, coding these rules in a Data Step, especially when we are “starting” with a numeric variable, could be very
tedious.

Fortunately, the Picture Format facility lets avoid a lot of unnecessary programming and still get what we need, quickly
and easily. Here’s the solution:

In this example, we can see how a series of value ranges were supplied, each with a different picture template. So,
the variable’s display is controlled by its internal value. The results shown below and to the left:

The three examples of Picture Formats we’ve seen so far should be enough to convince you that they offer a power
range of tools to display or portray the values of numeric variables without extensive, tedious, data step coding. So,
having seen these examples, I hope you’ll want to continue reading this paper to see even more tools and capabilities
of Picture Formats and how you can apply them in your work.

5
SUGI 31 Tutorials

PICTURE FORMAT DETAILS AND SYNTAX

RULES FOR PICTURE FORMATS


Picture Format names can be up to 32 characters in length starting with the release of SAS 9.1 Software. The name
you give to a Picture Format cannot be the name of a SAS-Supplied format, nor may it end in a number. Picture
Formats are used to display the values of numeric variables. Like Value Formats, Picture Formats can be stored in
either a temporary or permanent Formats Catalog. Although not discussed in this paper, the new (to SAS 8, and
enhanced in SAS 9) MULTILABEL option can be used to create a Picture Format with overlapping value ranges. (For
more information, please see the PROC FORMAT documentation and/or my paper “Using Multilabel Formats,
available for download at www.SierraInformation.com)

A Picture Format may be up to 40 characters in length. A ‘picture’ or ‘template’ is a series of characters in single
quote marks. The characters forming the ‘template’ or ‘picture’ can be one of three types:
o Digit selectors, which numeric characters ranging from zero to nine defining positions in which the
values of numbers in the variable will be displayed. As we will soon see, there is a critical
difference in results when you apply a “zero digit selector” versus a “non-zero digit selector.”
o Message characters, which are non-numeric characters that will be printed in the picture. For
example of a message character, see the Phone_C Picture format above where some values of
telephone number were displayed as either “Invalid Long Distance Number” or “Invalid Local
Number.”
o Directives, which control the display of date, time or datetime variables. These special characters
require specification of the DATATYPE= option in the PICTURE Statement, and will be discussed
in detail below.

UNDERSTANDING DIGIT SELECTORS


Perhaps one of the most confusing aspects of Picture Formats is the “digit selector.” But, by working through a few
examples, we’ll see how digit selectors work and how to specify them correctly for your particular data presentation
needs. First, let’s review some core concepts: 1) a Picture Format is creating a template (or ‘picture’) that will display
the values of a numeric variable in your SAS-generated output; 2) if you are using a Picture Format to display numeric
values (as opposed to message characters, which we will discuss next), you’ll need to tell PROC FORMAT how many
“slots” or “spaces” in the Picture Format are needed to display the values of the numeric variable and what to display
if there’s no value to display in a “slot.” That’s that the digit selector does.

A digit selector is either a zero (0) or the numbers one (1) through nine (9). Most SAS users, and the PROC
FORMAT documentation, use either a zero or a nine as digit selectors, so that’s the same convention I’ll apply in this
paper. When you specify zero as the digit selector, any leading zeros in the number to be displayed are shown as
blanks. When nine is specified as the digit selector, the leading zeros are displayed in the output. Perhaps the
easiest way to remember how digit selectors work is the saying taught to me by Pete Lund of Looking Glass Analytics,
who has also written extensively about the SAS Formats (see below.) The saying is: “Nines print zeros, and zeros
print blanks.”

Let’s take a look at how digit selectors are used in a Picture Format, and what happens when you use either a zero or
a nine for a digit selector. The example data set below shows some made-up values for sales of parts in an
automobile store.

First, a Picture Format with a string of nines as digit selectors is created and applied the variable SALES.

6
SUGI 31 Tutorials

The output looks like this:

Using a series of nines as the digit selectors results in having zeros displayed in the output for every “position” in the
picture template for which there was no value of the variable to which it was applied. Remembering that “nines print
zeros and zeros print blanks,” one potential approach to a better-looking result might be to replace all the nines with
zeros. What happens when we do that?

The output now looks like this:

7
SUGI 31 Tutorials

Well, by using nines as our digit selectors we’ve managed to address the problem with leading zeros. But, remember,
since “nines print blanks,” if the value of the variable to which the Picture Format is applied is a zero, then the
formatted value that appears in our output is a blank! In many situations, that can lead to some confusion…in our
example, having a ‘blank’ value of sales for distributor caps is misleading. Does it mean we sold no distributor caps,
or does it mean we are missing data in the source data set for this value of the parts variable? Since we don’t want to
give our clients/customers confusing reports, one way to solve this problem is to use both zeros and nines as digit
selectors in the same Picture template. Here’s an example that will give us the solution we need:

In this Picture Format I’ve combined both zero and nine digit selectors in one template. The result, shown below,
gives us exactly what we need. If you’re creating a picture format to display numeric variables that may have values of
zeros, I’d recommend your using an appropriate combination of zero and nine digit selectors so that you have zero
values displayed in your output.

EVEN MORE PICTURE FORMAT TOOLS: THE MULTIPLIER AND ROUND OPTIONS

In my opinion, two of the most useful and powerful tools in the Picture Format “arsenal” are the MULTIPLIER (or
MULT) and ROUND options. With these, we can usually avoid “pre-processing” observations via a Data Step before
obtaining the output we need without a potentially tedious, time-consuming or resource-intensive Data Step.

Let’s first take a look at the MULT option, and then the ROUND Option, and then we’ll use both in one Picture Format.

THE MULT OPTION


This option allows you to provide a constant by which the values of the number is to be multiplied before it is
formatted. With it, you can easy carry out tasks such as “round up” financial data to the nearest thousand (or some
other appropriate value) convert values from one unit of measurement to another (e.g., from inches to centimeters or
from US dollars to another currency, Using the MULT option is not only easy, but it avoids unnecessary Data Step
processing and allows you to easily change the value of the multiplier, if, for example you are using it to calculate
currency exchange rates that change between each “run” of a report.

Here is a PROC REPORT task that generates a report from some (simulated) credit card transaction data. This very

8
SUGI 31 Tutorials

powerful PROC is used to group and sum credit card charges by year and credit card used. This data set has over
265,000 observations in it, so processing time may be something to keep in mind as we consider requests to change
the report.

The output is:

So far, so good. But, what do we do when the boss asks her typical “how hard would it be” question: can we display
the credit card charges rounded to the nearest thousand dollars? Some might want to create a new data set, and
create a new variable in that data set, where each of the more than quarter-million individual records had their values
of the variable charge_amount rounded to the nearest thousand, and then have PROC REPORT re-generate the
analysis we need.

USING THE ROUND OPTION


We can avoid the Data Step with the MULT Option, and avoid any potential truncation problems by also specifying the
ROUND Option. Without the ROUND Option, PROC FORMAT would automatically truncate any decimal portion of
the variables value and then display the result according to the defined template. When you specify the ROUND
Option and the MULT Option, PROC FORMAT first multiples the variable’s value by the supplied multiplier, and then
rounds the results to the nearest integer and then formats the value according to the template. According to the
PROC FORMAT documentation, a value of exactly .5 is rounded up to the next highest integer.

9
SUGI 31 Tutorials

For complete details on how the steps PROC FORMAT follows to build Picture Formats, see Chapter 23 of the BASE
SAS Procedures Documentation, and in particular, Table 23.1 (“Building a Picture Format”).

The code sample below shows how the MULT Option is added to the Picture Format, and then how the Format is
applied in a PROC REPORT Define Statement. Then, the output generated by the PROC REPORT task is
displayed.

The resulting output looks like this:

10
SUGI 31 Tutorials

USING PICTURE FORMATS TO CONVERT VALUES OF VARIABLES

In my opinion, the most powerful application of Picture Formats is to carry out conversions of variable values from one
unit of measurement to another without having to use a Data Step to operate on every observation in the source data
set. Instead, a Picture Format can gives us what we want and affords greater flexibility in our programming.

Perhaps the best example of how Picture Formats can help us to convert values of one variable to another is a
conversion of numeric variables from one currency to another. This is a fairly common requirement for SAS users
working with data that are drawn from financial/accounting systems in many countries, each with its own currency.
And, since the rate at which currencies are exchanged changes frequently, a SAS program that takes data stored in
one currency and displays it in another currency needs to be updated with the latest exchange rate prior its execution.

Here is an example of how a Picture Format is used to display values that are stored in US dollars in the credit card
transaction file seen above as the equivalent value in Japanese Yen. The Picture Format code below shows how the
value supplied to the MULT option is the exchange rate on the date I first generated this example. Obviously, if this is
something you need to do at your job on an ongoing basis, its very easy to update the value in the MULT option with
the latest exchange rate just before you run your program again. Here are the PROC FORMAT and PROC REPORT
steps that generate the desired report: Notice that I’ve modified the PROC REPORT task so that the output it
generates clearly indicates that the values displayed are the Japanese Yen.

The resulting output is shown on the next page:

11
SUGI 31 Tutorials

USING PICTURE FORMATS IN VARIABLE ASSIGNMENT STATEMENTS

Picture Formats can also be used to assign the values of new variables in Data Steps. While I’ve generally want to
avoid creating “extra” variables in data sets, there may be times when we need to assign values to a new variable
based on the commands placed in a Picture Format He is an example: suppose the boss asks: “How hard would it
be to give me an Excel™ spreadsheet that shows the amount of credit card transactions by credit card and year, with
separate columns showing total charge amounts in US Dollars, Japanese Yen and British Pounds?” Here is what we
will do:
1) Create Picture Formats for Great Britain Pounds (GBP) and Japanese Yen (JPY). The variable
charge_amount already “holds” the data in US Dollars.
2) Use the PUT Function in a Data Step to create two new variables, each of which applies the previous-
created Picture Format to the values of charge_amount.
3) Before exporting the results an Excel workbook, we will look at the results of steps 1 and 2 using PROC
PRINT. If we are happy with them, then a tool such as PROC EXPORT or the Export Wizard can be
employed to create an Excel workbook from the data set.

First, let’s create the appropriate Picture Formats:

Now, well, use these Picture Formats to assign values to new variables in a Data Step and then look at the results:

12
SUGI 31 Tutorials

The results are:

13
SUGI 31 Tutorials

USING DIRECTIVES WITH DATE, TIME OR DATETIME VARIABLES

The last section of this paper discusses the use of directives used to format the values of date, time or datetime
variables in Picture Formats. Even though we have a wide range of useful SAS-supplied Value Formats to associate
to variables representing dates and times, these Picture Format directives given us an even broader array of ways to
display this type of data. In order to apply directives in a Picture Format, you need to include the DATATYPE= option
in the Picture Statement. The valid values of this option are DATE, TIME or DATETIME.

This table shows some of the directives you can use in a Picture Format to customize the display of values of your
date, time or datetime variables. Please consult the PROC FORMAT documentation for a complete list.

To fix ideas about directives might be used to create a customized Picture Format, below is an excerpt from the credit
card transaction file that’s been used for several other examples in this paper. Notice that the variable trans_date is a
SAS Date variable (that is, an integer representing the number of days from January 1, 1960.)

Here is an example of a PROC FORMAT task where a Picture Format is created using directives to display the
values of the SAS date variable trans_date.

14
SUGI 31 Tutorials

Next, the date_b_fmt Picture Format is associated to the variable trans_date in the following PROC PRINT step:

The output is:

15
SUGI 31 Tutorials

CONCLUSION
The Picture Format facility contains a wealth of tools you can use to work effectively—and efficiently---with your data.
Taking the time to master the concepts and capabilities of this aspect of the SAS System can provide you with a
range of enhanced capabilities to portray, manage and analyze your data.

REFERENCES
Karp, Andrew, My Friend the SAS Format, Proceedings of the Thirtieth Annual SAS Users Group International
Conference, Cary: NC SAS Institute, Inc., 2005
URL: http://www2.sas.com/proceedings/sugi30/253-30.pdf
Lund, Peter, More Than Just VALUE: A Look Into the Depths of PROC FORMAT, Proceedings of the Twenty-
Seventh Annual SAS Users Group International Conference, Cary, NC: SAS Institute, Inc, 2002
URL: http://www2.sas.com/proceedings/sugi27/p004-27.pdf
SAS Institute, Inc., BASE SAS 9.1.3 Procedures Guide, Cary, NC: SAS Institute, Inc, 2004

ACKNOWLEDGMENTS
Thanks to Rick Langston of SAS Institute’s Research and Development unit for his insights in to the SAS Format
facility and for answering several questions I had while preparing this paper. I’d also like to thank Pete Lund of
Looking Glass Analytics, Olympia, Washington, for his advice and insights during the development of this paper.

CONTACT INFORMATION
Your comments and questions are valued and encouraged. Please contact me at:
Andrew H. Karp
Sierra Information Services
19229 Sonoma Hwy PMB 264
Sonoma, CA 95476 USA
707 996 7380 voice
[email protected]
www.sierrainformation.com

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.

16

You might also like