Teradata Analytic Functions 1206-151K

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

What would you do if you knew?

Teradata Analytic Functions


Release 15.10.06
B035-1206-151K
November 2017
The product or products described in this book are licensed products of Teradata Corporation or its affiliates.

Teradata, Aster, BYNET, Claraview, DecisionCast, IntelliBase, IntelliCloud, IntelliFlex, QueryGrid, SQL-MapReduce, Teradata Decision Experts,
"Teradata Labs" logo, Teradata ServiceConnect, and Teradata Source Experts are trademarks or registered trademarks of Teradata Corporation or its
affiliates in the United States and other countries.
Adaptec and SCSISelect are trademarks or registered trademarks of Adaptec, Inc.
Amazon Web Services, AWS, Amazon Elastic Compute Cloud, Amazon EC2, Amazon Simple Storage Service, Amazon S3, AWS CloudFormation, and
AWS Marketplace are trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries.
AMD Opteron and Opteron are trademarks of Advanced Micro Devices, Inc.
Apache, Apache Avro, Apache Hadoop, Apache Hive, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks of the
Apache Software Foundation in the United States and/or other countries.
Apple, Mac, and OS X all are registered trademarks of Apple Inc.
Axeda is a registered trademark of Axeda Corporation. Axeda Agents, Axeda Applications, Axeda Policy Manager, Axeda Enterprise, Axeda Access,
Axeda Software Management, Axeda Service, Axeda ServiceLink, and Firewall-Friendly are trademarks and Maximum Results and Maximum Support
are servicemarks of Axeda Corporation.
CENTOS is a trademark of Red Hat, Inc., registered in the U.S. and other countries.
Cloudera and CDH are trademarks or registered trademarks of Cloudera Inc. in the United States, and in jurisdictions throughout the world.
Data Domain, EMC, PowerPath, SRDF, and Symmetrix are either registered trademarks or trademarks of EMC Corporation in the United States and/or
other countries.
GoldenGate is a trademark of Oracle.
Hewlett-Packard and HP are registered trademarks of Hewlett-Packard Company.
Hortonworks, the Hortonworks logo and other Hortonworks trademarks are trademarks of Hortonworks Inc. in the United States and other countries.
Intel, Pentium, and XEON are registered trademarks of Intel Corporation.
IBM, CICS, RACF, Tivoli, IBM Spectrum Protect, and z/OS are trademarks or registered trademarks of International Business Machines Corporation.
Linux is a registered trademark of Linus Torvalds.
LSI is a registered trademark of LSI Corporation.
Microsoft, Active Directory, Windows, Windows NT, and Windows Server are registered trademarks of Microsoft Corporation in the United States and
other countries.
NetVault is a trademark of Quest Software, Inc.
Novell and SUSE are registered trademarks of Novell, Inc., in the United States and other countries.
Oracle, Java, and Solaris are registered trademarks of Oracle and/or its affiliates.
QLogic and SANbox are trademarks or registered trademarks of QLogic Corporation.
Quantum and the Quantum logo are trademarks of Quantum Corporation, registered in the U.S.A. and other countries.
Red Hat is a trademark of Red Hat, Inc., registered in the U.S. and other countries. Used under license.
SAP is the trademark or registered trademark of SAP AG in Germany and in several other countries.
SAS and SAS/C are trademarks or registered trademarks of SAS Institute Inc.
Sentinel® is a registered trademark of SafeNet, Inc.
Simba, the Simba logo, SimbaEngine, SimbaEngine C/S, SimbaExpress and SimbaLib are registered trademarks of Simba Technologies Inc.
SPARC is a registered trademark of SPARC International, Inc.
Unicode is a registered trademark of Unicode, Inc. in the United States and other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Veritas, the Veritas Logo and NetBackup are trademarks or registered trademarks of Veritas Technologies LLC or its affiliates in the U.S. and other
countries.
Other product and company names mentioned herein may be the trademarks of their respective owners.
The information contained in this document is provided on an "as-is" basis, without warranty of any kind, either express or
implied, including the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. Some
jurisdictions do not allow the exclusion of implied warranties, so the above exclusion may not apply to you. In no event will
Teradata Corporation be liable for any indirect, direct, special, incidental, or consequential damages, including lost profits or
lost savings, even if expressly advised of the possibility of such damages.
The information contained in this document may contain references or cross-references to features, functions, products, or services that are not
announced or available in your country. Such references do not imply that Teradata Corporation intends to announce such features, functions,
products, or services in your country. Please consult your local Teradata Corporation representative for those features, functions, products, or services
available in your country.
Information contained in this document may contain technical inaccuracies or typographical errors. Information may be changed or updated without
notice. Teradata Corporation may also make improvements or changes in the products or services described in this information at any time without
notice.
To maintain the quality of our products and services, we would like your comments on the accuracy, clarity, organization, and value of this document.
Please e-mail: [email protected]
Any comments or materials (collectively referred to as "Feedback") sent to Teradata Corporation will be deemed non-confidential. Teradata Corporation
will have no obligation of any kind with respect to Feedback and will be free to use, reproduce, disclose, exhibit, display, transform, create derivative
works of, and distribute the Feedback and derivative works thereof without limitation on a royalty-free basis. Further, Teradata Corporation will be free
to use any ideas, concepts, know-how, or techniques contained in such Feedback for any purpose whatsoever, including developing, manufacturing, or
marketing products or services incorporating Feedback.
Copyright © 2017 by Teradata. All Rights Reserved.
Table of Contents

Preface...................................................................................................................................................................5
Purpose............................................................................................................................................................................ 5
Audience..........................................................................................................................................................................5
Supported Releases.........................................................................................................................................................5
Revision History.............................................................................................................................................................5
Additional Information.................................................................................................................................................5
Teradata Support............................................................................................................................................................6
Product Safety Information.......................................................................................................................................... 6

Chapter 1:
Introduction to the Teradata Analytic Functions..................................... 7
Teradata Analytic Functions........................................................................................................................................ 7
Terminology................................................................................................................................................................... 7

Chapter 2:
The Teradata Analytic Functions......................................................................................9
nPath®............................................................................................................................................................................9
SESSIONIZE................................................................................................................................................................. 32

Appendix A:
How to Read Syntax Diagrams......................................................................................... 39
Syntax Diagram Conventions.................................................................................................................................... 39

Teradata Analytic Functions, Release 15.10.06 3


Table of Contents

4 Teradata Analytic Functions, Release 15.10.06


Preface

Purpose
This book describes the Teradata analytic functions.

Audience
This book is intended for data analysts, data scientists and other technical personnel who use Teradata
Database.

Supported Releases
This book supports Teradata® Database 15.10.06.
Teradata Database 15.10.06 is supported on:
• SUSE Linux Enterprise Server (SLES) 10 SP3
• SUSE Linux Enterprise Server (SLES) 11 SP1
• SUSE Linux Enterprise Server (SLES) 11 SP3
Teradata Database client applications support other operating systems.

Revision History
Date Release Description
November 2017 15.10.06 Added syntax diagrams and additional explanations for the
functions.
August 2017 15.10.06 Initial release

Additional Information
Related Links

URL Description
http://www.info.teradata.com Use the Teradata Information Products Publishing Library site to:
• View or download a manual:

Teradata Analytic Functions, Release 15.10.06 5


Preface
Teradata Support

URL Description

1. Under Online Publications, select General Search.


2. Enter your search criteria and click Search.
• Download a documentation CD-ROM:
1. Under Online Publications, select General Search.
2. In the Title or Keyword field, enter CD-ROM, and click Search.

http://www.teradata.com The Teradata home page provides links to numerous sources of


information about Teradata. Links include:
• Executive reports, white papers, case studies of customer
experiences with Teradata, and thought leadership
• Technical information, solutions, and expert advice
• Press releases, mentions and media resources

https://access.teradata.com Use Teradata @ Your Service to access Orange Books, technical alerts,
and knowledge repositories, view and join forums, and download
software patches.
http://developer.teradata.com/ Teradata Developer Exchange provides articles on using Teradata
products, technical discussion forums, and code downloads.

To maintain the quality of our products and services, we would like your comments on the accuracy, clarity,
organization, and value of this document. Please email [email protected].

Customer Education
Teradata Customer Education delivers training for your global workforce, including scheduled public
courses, customized on-site training, and web-based training. For information about the classes, schedules,
and the Teradata Certification Program, go to www.teradata.com/TEN/.

Teradata Support
Teradata customer support is located at https://access.teradata.com.

Product Safety Information


This document may contain information addressing product safety practices related to data or property
damage, identified by the word Notice. A notice indicates a situation which, if not avoided, could result in
damage to property, such as equipment or data, but not related to personal injury.

Example

Notice:
Improper use of the Reconfiguration utility can result in data loss.

6 Teradata Analytic Functions, Release 15.10.06


CHAPTER 1
Introduction to the Teradata Analytic
Functions

Teradata Analytic Functions


Use Teradata analytic functions to analyze ordered data. Ordered data is time series data, such as
clickstreams, financial transactions, and online user interactions. Teradata provides the following functions
specifically for analyzing ordered data.
Teradata System Analytic Functions

Function Name Description


nPath Performs regular pattern matching over a sequence of rows from one or
more inputs.
SESSIONIZE Maps each click in a clickstream to a unique session identifier.

To make these functions accessible, contact your Teradata Support representative.


Online help is available for each of the functions. For more detailed information, type HELP
'function_name', for example, HELP 'SQL NPATH'.

Terminology
This document refers to the following terms.
Term Description
Path An ordered, start-to-finish series of actions, for example, page views, for which
sequences and sub-sequences can be generated.
Sequence A sequence is the path prefixed with a carat (^), which indicates the start of a
path. For example, if a user visited page a, page b, and page c, in that order, the
session sequence is ^,a,b,c.
Sub-sequence For a given sequence of actions, a sub-sequence is one possible subset of the steps
that begins with the initial action. For example, the path a,b,c generates three
subsequences: ^,a; ^,a,b; and ^,a,b,c.

Teradata Analytic Functions, Release 15.10.06 7


Chapter 1: Introduction to the Teradata Analytic Functions
Terminology

8 Teradata Analytic Functions, Release 15.10.06


CHAPTER 2
The Teradata Analytic Functions

nPath®

Purpose
The nPath® function matches specified patterns in a sequence of rows from one or more input tables and
extracts information from the matched rows.

Typical nPath uses are:


• Categorizing entities based on observed patterns; for example, distinguishing “loyal customers” from
“price-sensitive shoppers.”
• Selecting relevant data from a data set and then inputting it to another function or a third-party data
graph generator, such as the application that produced the following figure.
Sankey Diagram of Teradata nPath Output

Teradata Analytic Functions, Release 15.10.06 9


Chapter 2: The Teradata Analytic Functions
nPath®
Syntax

Syntax Elements
query_expression
SELECT statement, including a table operator SELECT call.
For information about the syntax of the SELECT clause, ON clause, PARTITION BY clause, DIMENSION
clause, and ORDER BY clause, see the chapter on the SELECT statement in SQL Data Manipulation
Language, B035-1146.
For information about the supported symbol predicates, see the chapter on Logical Predicates in SQL
Functions, Operators, Expressions, and Predicates, B035-1145.

10 Teradata Analytic Functions, Release 15.10.06


Chapter 2: The Teradata Analytic Functions
nPath®
ON Clause
The function requires at least one partitioned input table, and can have additional input tables that are either
partitioned or DIMENSION tables.

Note:
If the input path to nPath is nondeterministic, then the results are nondeterministic.

ON table_name
Name of input table.
ON view_name
Name of input view.
ON query_expression
Expression that resolves to an input table.
AS alias_name
Alias name for the input table.
PARTITION BY column_name
Name of the column by which every partitioned input table is partitioned. The function
requires at least one partitioned input table, and can have additional partitioned input
tables.
PARTITION BY column_position
Position of the column by which every partitioned input table is partitioned. The function
requires at least one partitioned input table, and you can specify additional partitioned input
tables.
DIMENSION
You can specify additional DIMENSION input tables.
ORDER BY column_name
Name of the column by which every input table is ordered.
ORDER BY column_position
Position of the column to use for ordering the results.
ORDER BY sort_expression
Expression to use for ordering the results.
ASC
Sort results in ascending order.
DESC
Sort results in descending order.
NULLS FIRST
Sort nulls first in results.
NULLS LAST
Sort nulls last in results.

MODE
Specifies the pattern-matching mode.

Teradata Analytic Functions, Release 15.10.06 11


Chapter 2: The Teradata Analytic Functions
nPath®
OVERLAPPING
The function finds every occurrence of the pattern in the partition, regardless of whether the
instance is part of a previously found match. Therefore, one row can match multiple
symbols in a given matched pattern.
NONOVERLAPPING
The function begins the next pattern search at the row that follows the last pattern match.
This is the default behavior of commonly used pattern matching utilities, including the
UNIX grep utility.

PATTERN
Specifies the pattern for which the function searches.
symbolic_search_pattern
You compose symbolic_search_pattern with the symbols that you define in the SYMBOLS
argument, operators, and parentheses.
To specify that a subpattern must appear a specific number of times, use the Range-
Matching Feature defined later in this section. For pattern matching details, refer to Pattern
Matching.
The following table describes the simplest patterns, which you can combine to form more
complex patterns. When patterns have multiple operators, the function applies them in
order of precedence, and applies operators of equal precedence from left to right. The
following table also shows operator precedence. To force the function to evaluate a
subpattern first, enclose the subpattern in parentheses. In the following table, A and B are
symbols defined in the SYMBOLS argument.
Simple nPath Patterns and Operator Precedence

pattern Description Operator


Precedence
A The function returns the rows that contain exactly one occurrence 1 (highest)
of A.
A. The function returns the rows that contain exactly one occurrence 1
of A.
A? The function returns the rows that contain at most one occurrence 1
of A. The ? operator is nongreedy.
A* The function returns the rows that contain zero or more 1
occurrences of A. The * operator is nongreedy.
A+ The function returns the rows that contain at least one occurrence 1
of A. The + operator is nongreedy.
A.B Cascade operator. The function returns the rows that contain A 2
followed immediately by B.
A|B Alternative (or) operator. The function returns the rows that 3
contain either A or B.
^A Startanchor operator. The function returns the rows that start with
A.

12 Teradata Analytic Functions, Release 15.10.06


Chapter 2: The Teradata Analytic Functions
nPath®
pattern Description Operator
Precedence
A$ Endanchor operator. The function returns the rows that end with
A.

SYMBOLS
Defines the symbols that appear in the values of the PATTERN and RESULT arguments.
For example, following is a SYMBOLS argument for analyzing web site visits:
SYMBOLS (
pagetype = 'homepage' AS H,
pagetype <> 'homepage' AND pagetype <>
'checkout' AS PP,
pagetype = 'checkout' AS CO
)
For more information about symbols that appear in the PATTERN argument value, refer to Symbols. For
more information about symbols that appear in the RESULT argument value, refer to Result: Applying
Aggregate Functions.
logical_predicate
An expression whose value is a column name. If col_expr represents a column that appears
in multiple input tables, then you must qualify the ambiguous column name with its table
name. For example:
Symbols (
weblog.pagetype = 'homepage' AS H,
weblog.pagetype = 'thankyou' AS T,
ads.adname = 'xmaspromo' AS X,
ads.adname = 'realtorpromo' AS R
)
AS symbol
Any valid identifier. The symbol is case-insensitive. However, a symbol of one or two
uppercase letters is easier to identify in patterns.
symbol_predicate
SQL predicate, often a column name.

RESULT
Defines the output columns.
aggregate_function
The function to apply. For details, see Result: Applying Aggregate Functions. The function
evaluates this argument once for every matched pattern in the partition. That is, the function
outputs one row for each pattern match.
expression
An expression whose value is a column name. The expression specifies the values to retrieve
from the matched rows.

Teradata Analytic Functions, Release 15.10.06 13


Chapter 2: The Teradata Analytic Functions
nPath®
symbol_list
Each symbol represents all the rows that matched the predicate of that symbol in this
particular matched PATTERN.
The list can include a single symbol, or can be a comma separated list of more than one. For
a single symbol list, ANY and the parentheses are optional. For example, OF A, OF ANY
(A), and OF ANY (A,B) are valid for symbol_list, but OF (A,B) and OF (A) are not.
AS column_name
Determined by RESULTS argument. See Result: Applying Aggregate Functions.

Contains data to search for patterns.

Range-Matching Feature
You use the range-matching feature to specify the number of times that a subpattern must appear in a
match. You can specify the count as one of the following, enclosed in braces, { }:
• Exact number number of times that the subpattern appears in a match.
• Minimum number of times that the subpattern appears in a match.
• Minimum and maximum number of times that the subpattern appears in a match.
The format is as follows:
(subpattern){n[,[m]]}
where n is the minimum and m is the maximum.
(subpattern){n}
Subpattern must appear exactly n times.
For example, the following pattern specifies that subpattern (A.B|C) must appear exactly 3
times:
'X.(Y.Z).(A.B|C){3}'
The preceding pattern is equivalent to the following pattern:
'X.(Y.Z).(A.B|C).(A.B|C).(A.B|C)'
(subpattern){n,}
Subpattern must appear at least n times. For example, the following pattern specifies that
subpattern (A.B|C) must appear at least 4 times:
'X.(Y.Z).(A.B|C){4,}'
The preceding pattern is equivalent to the following pattern:
'X.(Y.Z).(A.B|C).(A.B|C).(A.B|C).(A.B|C)*'

14 Teradata Analytic Functions, Release 15.10.06


Chapter 2: The Teradata Analytic Functions
nPath®
(subpattern){n,m}
Specifies that subpattern must appear at least n times and at most m times. For example, the
following pattern specifies that subpattern (A.B|C) must appear at least 2 times and at most
4 times:
'X.(Y.Z).(A.B|C){2,4}'
The preceding pattern is equivalent to the following pattern:
'X.(Y.Z).(A.B|C).(A.B|C).(A.B|C)?.(A.B|C)?'

Output
nPath Output Table Schema

Column Name Description


partition_column Column by which partitioned input tables are partitioned.
order_column Column by which input tables are ordered.
input_column Determined by RESULT argument. See Result: Applying Aggregate Functions.

Usage Notes

Pattern Matching
Conceptually, nPath pattern matching proceeds like this: Starting from a row in a partition, the function
tries to match the given pattern along the row sequence in the partition (ordered as specified in the ORDER
BY clause).
If the function cannot match the pattern, it outputs nothing; otherwise, it continues to the next row. When
the function finds a sequence of rows that match the pattern, it selects the largest set of rows that constitute
the match and outputs a row based on this match.
For example, suppose that the pattern is 'A.B+' and the rows that constitute the match start at a row t1 and
end at row t4. Suppose that t1 matches A and each of t2,t3, and t4 matches B. When the matching is
complete, A represents t1 and B represents t2, t3, and t4. Using the rows represented by A and B, the
function evaluates the Result argument (typically applying an aggregate function to each symbol in the
pattern), outputs one row with the result values, and proceeds to search for the next pattern match.
Before running nPath on a large data set, create a small data set that includes the pattern that you want to
find. Test your pattern on the small data set, refine the pattern until nPath gives the desired output, and then
using the refined pattern for the large data set.

Greedy Pattern Matching


The nPath function uses greedy pattern matching, finding the longest available match despite any nongreedy
operators in the pattern.
For example, consider the input table link2:

Teradata Analytic Functions, Release 15.10.06 15


Chapter 2: The Teradata Analytic Functions
nPath®
nPath Greedy Pattern Matching Example Input Table link2

userid jobtitle startdate enddate


21 Chief Exec Officer 1994-10-01 2005-02-28
21 Software Engineer 1996-10-01 2001-06-30
21 Software Engineer 1998-10-01 2001-06-30
21 Chief Exec Officer 2005-03-01 2007-03-31
21 Chief Exec Officer 2007-06-01 null

The following query returns the following table:


SELECT dt.job_transition_path, count(*) AS count FROM NPATH (
ON link2 PARTITION BY userid ORDER BY startdate
USING
MODE (NONOVERLAPPING)
Pattern ('CEO.ENGR.OTHER*')
Symbols (jobtitle like 'software eng%' AS ENGR,
true AS OTHER,
jobtitle like 'Chief Exec Officer' AS CEO)
Result (accumulate(jobtitle OF ANY(ENGR,OTHER,CEO))
AS job_transition_path)
) as dt GROUP BY 1 ORDER BY 2 DESC;

nPath Greedy Pattern Matching Example 1 Output Table

job_transition_path count
[Chief Exec Officer, Software Engineer, Software Engineer, Chief Exec Officer, Chief Exec 1
Officer]

In the pattern, CEO matches the first row, ENGR matches the second row, and OTHER* matches the
remaining rows:

The following query returns the following table:


SELECT dt.job_transition_path, count(*) AS count FROM NPATH (
ON link2 PARTITION BY userid ORDER BY startdate
USING
MODE (NONOVERLAPPING)
Pattern ('CEO.ENGR.OTHER*.CEO')
Symbols (jobtitle like 'software eng%' AS ENGR,

16 Teradata Analytic Functions, Release 15.10.06


Chapter 2: The Teradata Analytic Functions
nPath®
true AS OTHER,
jobtitle like 'Chief Exec Officer' AS CEO)
Result (accumulate(jobtitle of ANY(ENGR,OTHER,CEO))
AS job_transition_path)
) as dt GROUP BY 1 ORDER BY 2 DESC;

nPath Greedy Pattern Matching Example 2 Output Table

job_transition_path count
[Chief Exec Officer, Software Engineer, Software Engineer, Chief Exec Officer, Chief Exec 1
Officer]

In the pattern, CEO matches the first row, ENGR matches the second row, OTHER* matches the next two
rows, and CEO matches the last row:

Symbols
This section applies only to symbols that appear in the Pattern argument, described in Syntax Elements. For
information about symbols that appear in the Result argument, refer to Result: Applying Aggregate
Functions.
For each symbol definition, col_expr = symbol_predicate AS symbol, the function returns the rows for
which col_expr equals symbol_predicate. For example, for pagetype = 'home' AS H, the function returns
the first and fourth rows of the following table.
nPath Sample Input Table

sessionid clicktime userid productname pagetype referrer productprice


1 07:00:10 333 home www.compan
y2.com
1 07:00:12 333 product1 checkout www.compan 200.2
y2.com
1 07:01:00 333 product2 checkout 340
13 15:35:08 67403 home www.compan
y1.com

The function does not return any row that contains a NULL value. For example, for pagetype =
'checkout' AS C, the function returns the second row of the preceding table, but not the third.

Teradata Analytic Functions, Release 15.10.06 17


Chapter 2: The Teradata Analytic Functions
nPath®
The predicate TRUE matches every row.
If symbols have overlapping predicates, multiple symbols might match the same row.

Boolean Expressions in a Symbol


• When you use Boolean expressions in a symbol, then TRUE, NOT TRUE, and integers are not allowed
within parentheses or quotes.
• Teradata Boolean expressions used with AS symbol_name are supported with logical operators as well as
parentheses.

Multiple Inputs and Symbols


When using multiple inputs, each symbol must come from one and only one input stream. If the input
tables contain columns of the same name, then the column reference in the SYMBOLS clause can be
qualified with the table name.

Aggregate Functions and nPath


• Some aggregate functions, such as FIRST, LAST, COUNT, and SUM, are computed over more than one
symbol. In such cases, the symbols must be enclosed in parentheses, along with the ANY keyword. The
symbols in an expression must belong to the same input stream.
• You can use the Teradata aggregate functions in the RESULTS clause, in addition to the nPath-specific
Aggregate functions.

ON Clause Usage Notes


• If a query is provided as part of the ON clause, it must be enclosed in parentheses.
• PARTITION BY ANY, HASH BY, and LOCAL ORDER BY clauses are not supported.
• There must be at least one PARTITION BY or PARTITION BY ORDER BY clause provided for one
input stream. The remaining input streams can have the following combinations of PARTITION BY,
ORDER BY, and DIMENSION clauses:
∘ PARTITION BY ORDER BY
∘ PARTITION BY
∘ DIMENSION
∘ DIMENSION ORDER BY
• If there are multiple PARTITION BY clauses, the number and type of the partitioning columns must
match and the data types should be compatible.
• You can only use DIMENSION input when there are multiple inputs. There must be at least one other
PARTITION BY clause or PARTITION BY ORDER BY clause used with DIMENSION.
• The Teradata PARTITION BY clause allows you to specify the column position instead of the column
name, which determines how the rows are partitioned.
• The maximum number of input fields in the ON clause SELECT list is 2048. NPATH includes the symbol
predicates, RESULT clauses, and ORDER BY clauses in the input SELECT list, so the total count of all
these input fields should not exceed 2048.
• The ON clause input table can have an optional alias associated with it using the AS name clause.

18 Teradata Analytic Functions, Release 15.10.06


Chapter 2: The Teradata Analytic Functions
nPath®
Usage Notes for the PARTITION BY, DIMENSION, and ORDER BY Clauses
• The PARTITION BY expression defines the scope of a partition of input rows over which nPath searches
for pattern matches.
• The Teradata PARTITION BY clause allows the column position to be specified instead of the column
name, which determines how the rows are partitioned.
• The DIMENSION expression duplicates the input table on all the AMPs and can only be used when there
are multiple inputs.
• When multiple PARTITION BY and DIMENSION inputs are specified for nPath, the inputs are handled
as follows: An input buffer is constructed for multiple inputs in each partition. For partition by key input,
the buffer consists of the rows whose partition number matches the partition number of this input buffer.
For DIMENSION input, all the rows of the table are inserted into the buffer. The rows are then sorted by
the corresponding ORDER BY clause (if any) of each input. The nPath function then processes the rows
in the input buffer sequentially. Note that nPath is designed to find the pattern within one partition only.
• The ORDER BY expression specifies the sort-order of the input rows.

Result: Applying Aggregate Functions


The Result argument defines the output columns, specifying the values to retrieve from the matched rows
and the aggregate function to apply to these values.
For each pattern, the nPath function can apply specified one or more aggregate functions to the matched
rows and output aggregate results.The supported aggregate functions are:
• SQL aggregate functions AVG, COUNT, MAX, MIN, and SUM
• Teradata nPath sequence aggregate functions described in the following table
In the following table, col_expr is an expression whose value is a column name, symbol is defined by the
Symbols argument, and symbol_list has this syntax:
{ symbol | ANY (symbol[,...]) }

Function Description

COUNT ( Returns either the number of total number of matched rows (*) or the
{ * number (or distinct number) of col_expr values in the matched rows.
col_expr }
OF symbol_list )

FIRST ( Returns the col_expr value of the first matched row. For the example
col_expr OF in Pattern Matching, FIRST (pageid OF B) returns the pageid of
symbol_list ) row t2.

LAST ( Returns the col_expr value of the last matched row. For the example in
col_expr OF Pattern Matching, LAST (pageid OF B) returns the pageid of row
symbol_list ) t4.

FIRST_NOTNULL ( Returns the first non-null col_expr value in the matched rows.
col_expr OF
symbol_list )

LAST_NOTNULL ( Returns the last non-null col_expr value in the matched rows.
col_expr OF
symbol_list )

Teradata Analytic Functions, Release 15.10.06 19


Chapter 2: The Teradata Analytic Functions
nPath®
Function Description

MAX_CHOOSE ( Returns the descriptive_col_expr value of the matched row with the
quantifying_col_expr, highest-sorted quantifying_col_expr value. For example, MAX_CHOOSE
descriptive_col_expr (product_price, product_name OF B) returns the
OF symbol_list ) product_name of the most expensive product in the rows that map to
B.
The descriptive_col_expr can have any data type. The
qualifying_col_expr must have a sortable data type (SMALLINT,
INTEGER, BIGINT, DOUBLE PRECISION, DATE, TIME,
TIMESTAMP, VARCHAR, or CHARACTER).

MIN_CHOOSE ( Returns the descriptive_col_expr value of the matched row with the
quantifying_col_expr, lowest-sorted qualifying_col_expr value. For example, MIN_CHOOSE
descriptive_col_expr (product_price, product_name OF B) returns the
OF symbol_list ) product_name of the least expensive product in the rows that map to
B.
The descriptive_col_expr can have any data type. The
qualifying_col_expr must have a sortable data type (SMALLINT,
INTEGER, BIGINT, DOUBLE PRECISION, DATE, TIME,
TIMESTAMP, VARCHAR, or CHARACTER).

DUPCOUNT ( Returns the duplicate count for col_expr in the matched rows. That is,
col_expr OF for each matched row, the function returns the number of occurrences
symbol_list ) of the current value of col_expr in the immediately preceding matched
row.
When col_expr is also the ORDER BY col_expr, this function returns
the equivalent of ROW_NUMBER()-RANK().

DUPCOUNTCUM ( Returns the cumulative duplicate count for col_expr in the matched
col_expr OF rows. That is, for each matched row, the function returns the number
symbol_list ) of occurrences of the current value of col_expr in all preceding
matched rows.
When col_expr is also the ORDER BY col_expr, this function returns
the equivalent of ROW_NUMBER()-DENSE_RANK().

ACCUMULATE ( Returns, for each matched row, the concatenated values in col_expr,
col_expr OF symbol_list separated by delimiter. The default delimiter is ', ' (a comma followed
[ DELIMITER by a space).
'delimiter'] )

You can compute an aggregate over more than one symbol. For example, SUM (val OF ANY (A,B))
computes the sum of the values of the attribute val across all rows in the matched segment that map to A or
B.
For an example, see Example 1: Use FIRST, LAST_NOTNULL, MAX_CHOOSE, and MIN_CHOOSE.

20 Teradata Analytic Functions, Release 15.10.06


Chapter 2: The Teradata Analytic Functions
nPath®
Example 1: Use FIRST, LAST_NOTNULL, MAX_CHOOSE, and MIN_CHOOSE

Input
nPath Aggregate Functions Example 1 Input Table trans1

userid gender ts productname productamt


1 M 2012-01-01 00:00:00 shoes 100
1 M 2012-02-01 00:00:00 books 300
1 M 2012-03-01 00:00:00 television 500
1 M 2012-04-01 00:00:00 envelopes 10
2 2012-01-01 00:00:00 bookcases 150
2 2012-02-01 00:00:00 tables 250
2 F 2012-03-01 00:00:00 appliances 1500
3 F 2012-01-01 00:00:00 chairs 400
3 F 2012-02-01 00:00:00 cellphones 600
3 F 2012-03-01 00:00:00 dvds 50

SQL-MapReduce Call
SELECT * FROM NPATH (
ON trans1
PARTITION BY userid ORDER BY ts
USING
MODE (nonoverlapping)
PATTERN ('A+')
SYMBOLS(TRUE AS A)
RESULT (FIRST(userid OF A) AS Userid,
LAST_NOTNULL (gender OF A) AS Gender,
MAX_CHOOSE (productamt, productname OF A) AS Max_prod,
MIN_CHOOSE (productamt, productname OF A) AS Min_prod)
) as dt ORDER BY 1;

Output
nPath Aggregate Functions Example 1 Output Table

userid gender max_prod min_prod


1 M television envelopes
2 F appliances bookcases
3 F cellphones dvds

Teradata Analytic Functions, Release 15.10.06 21


Chapter 2: The Teradata Analytic Functions
nPath®
nPath Examples

ClickStream Data Examples

Input, Symbols, and Symbol Predicates


This statement creates the input table of clickstream data that the examples use:
CREATE Multiset TABLE clicks1 (
userid INTEGER,
sessionid INTEGER,
pageid INTEGER,
category INTEGER,
ts TIMESTAMP FORMAT 'YYYY-MM-DDbHH:MI:SS',
referrer VARCHAR (256),
val FLOAT) PRIMARY INDEX ( userid );

) ;
The following table summarizes the symbols and symbol predicates that the examples use.
nPath Clickstream Data Examples Symbols and Symbol Predicates

Symbol Symbol Predicate


A pageid IN (10, 25)
B category = 10 OR (category = 20 AND pageid <> 33)
C category IN (SELECT pageid FROM clicks1 GROUP BY userid HAVING COUNT(*) > 10)
D referrer LIKE '%Amazon%'
X true

This invocation gets the pageid for each row and the pageid for the next row in sequence:
SELECT dt.sessionid, dt.pageid, dt.next_pageid FROM NPATH (
ON clicks1
PARTITION BY sessionid
ORDER BY ts
USING
MODE (OVERLAPPING)
PATTERN ('A.B')
SYMBOLS (true AS A, true AS B)
RESULT (FIRST(sessionid OF A) AS sessionid,
FIRST (pageid OF A) AS pageid,
FIRST (pageid OF B) AS next_pageid
)
) as dt;

Counting Preceding Rows in a Sequence


For each row, this invocation counts the number of preceding rows in a given sequence (including the
current row). The ORDER BY clause specifies DESC because the pattern must be matched over the rows

22 Teradata Analytic Functions, Release 15.10.06


Chapter 2: The Teradata Analytic Functions
nPath®
preceding the start row, while the semantics dictate that the pattern be matched over the rows following the
start row.
SELECT dt.sessionid, dt.pageid, dt.rank FROM NPATH (
ON clicks1
PARTITION BY sessionid
ORDER BY ts DESC
USING
MODE (OVERLAPPING)
PATTERN ('A*')
SYMBOLS (true AS A)
RESULT (FIRST(sessionid OF A) AS sessionid,
FIRST (pageid OF A) AS pageid,
COUNT (* OF A) AS countrank)
) as dt;

Complex Path Query


This query finds the user click-paths that start at pageid 50 and proceed either to pageid80 or to pages in
category 9 or category 10, finds the pageid of the last page in the path, counts the visits to page 80, and
returns the maximum count for each last page, by which it sorts the output. The query ignores paths of fewer
than five pages and pages for which category is less than zero.
SELECT dt.last_pageid, MAX(dt.count_page80) FROM NPATH (
ON (SELECT * FROM clicks1 WHERE category >= 0)
PARTITION BY sessionid ORDER BY ts
USING
PATTERN ('A.(B|C)*')
MODE (OVERLAPPING)
SYMBOLS (pageid = 50 AS A,
pageid = 80 AS B,
pageid <> 80 AND category IN (9,10) AS C)
RESULT (LAST(pageid OF ANY (A,B,C)) AS last_pageid,
COUNT (* OF B) AS count_page80,
COUNT (* OF ANY (A,B,C)) AS count_any)
) as dt WHERE dt.count_any >= 5
GROUP BY dt.last_pageid
ORDER BY MAX(dt.count_page80);

Range-Matching Examples
Whenever a user visits the home page and then visits checkout pages and buys increasingly expensive
products, the nPath query returns the first purchase and the most expensive purchase.
nPath Example Input Table: aggregate_clicks

userid sessionid productname pagetype clicktime referrer productprice


1039 1 sneakers home 2009-07-29 Nike 100
20:17:59
1039 2 books home 2009-04-21 BarnesNoble 300
13:17:59
1039 3 television home 2009-05-23 Bestbuy 500
13:17:59

Teradata Analytic Functions, Release 15.10.06 23


Chapter 2: The Teradata Analytic Functions
nPath®
userid sessionid productname pagetype clicktime referrer productprice
1039 4 envelopes home 2009-07-16 Staples 10
11:17:59
1039 4 envelopes home1 2009-07-16 Staples 10
11:18:16
1039 4 envelopes page1 2009-07-16 Staples 10
11:18:18
1039 5 bookcases home 2009-08-19 Ikea 150
22:17:59
1039 5 bookcases home1 2009-08-19 Ikea 150
22:18:02
1039 5 bookcases page1 2009-08-19 Ikea 150
22:18:05
1039 5 bookcases page2 2009-08-22 Ikea 150
04:20:05
1039 5 bookcases checkout 2009-08-24 Ikea 150
14:30:05
1039 5 bookcases page2 2009-08-27 Ikea 150
23:03:05
1040 1 tables home 2009-07-29 Ikea 250
20:17:59
1040 2 Appliance home 2009-04-21 GE 1500
13:17:59
1040 3 laptops home 2009-05-23 Dell 800
13:17:59
1040 4 chairs home 2009-07-16 Staples 400
11:17:59
1040 4 chairs home1 2009-07-16 Staples 400
11:18:16
1040 4 chairs page1 2009-07-16 Staples 400
11:18:18
1040 5 cellphones home 2009-08-19 Samsung 600
22:17:59
1040 5 cellphones home1 2009-08-19 Samsung 600
22:18:02
1040 5 cellphones page1 2009-08-19 Samsung 600
22:18:05
1040 5 cellphones page2 2009-08-22 Samsung 600

24 Teradata Analytic Functions, Release 15.10.06


Chapter 2: The Teradata Analytic Functions
nPath®
userid sessionid productname pagetype clicktime referrer productprice
04:20:05
1040 5 cellphones checkout 2009-08-24 Samsung 600
14:30:05
1040 5 cellphones page2 2009-08-27 Samsung 600
23:03:05
... ... ... ... ... ... ...

Example 1: Accumulate Pages Visited in Each Session

SQL-MapReduce Call
SELECT * FROM NPATH (
ON aggregate_clicks
PARTITION BY sessionid
ORDER BY clicktime
USING
MODE (nonoverlapping)
PATTERN ('A*')
SYMBOLS (TRUE AS A)
RESULT (FIRST (sessionid OF A) AS sessionid,
ACCUMULATE (pagetype OF A) AS path)
) as dt ORDER BY dt.sessionid;

Output
nPath Range-Matching Example 1 Output Table

sessionid path
1 [home, home1, page1, home, home1, page1, home, home, home, home1, page1,
checkout, home, home, home, home, home, home, home, home, home]
2 [home, home, home, home, home, home, home, home, home, home1, page1, checkout,
checkout, home, home]
3 [home, home, home, home, home, home, home, home, home1, page1, home, home1,
page1, home
4 [home, home, home, home, home, home, home1, home1, home1, page1, page1, page1]
5 [home, home, home, home, home1, home1, home1, page1, page1, page1, page2, page2,
page2, checkout, checkout, checkout, page2, page2, page2]

Example 2: Find Sessions That Start at Home Page and Visit Page1

SQL-MapReduce Call
SELECT * FROM NPATH (
ON aggregate_clicks
PARTITION BY sessionid
ORDER BY clicktime
USING

Teradata Analytic Functions, Release 15.10.06 25


Chapter 2: The Teradata Analytic Functions
nPath®
MODE (nonoverlapping)
PATTERN ('^H.A*.P1.A*')
SYMBOLS (pagetype='home' AS H, pagetype='page1' AS P1, TRUE AS A)
RESULT (FIRST(sessionid OF A) AS sessionid,
ACCUMULATE (pagetype OF ANY(H,P1,A)) AS path)
) as dt ORDER BY dt.sessionid;

Output
nPath Range-Matching Example 2 Output Table

sessionid path
1 [home, home1, page1, home, home1, page1, home, home, home, home1, page1,
checkout, home, home, home, home, home, home, home, home, home]
2 [home, home, home, home, home, home, home, home, home, home1, page1, checkout,
checkout, home, home]
3 [home, home, home, home, home, home, home, home, home1, page1, home, home1,
page1, home]
4 [home, home, home, home, home, home, home1, home1, home1, page1, page1, page1]
5 [home, home, home, home, home1, home1, home1, page1, page1, page1, page2, page2,
page2, checkout, checkout, checkout, page2, page2, page2]

Example 3: Find Paths to Checkout Page for Purchases Over $200

SQL-MapReduce Call
SELECT * FROM NPATH (
ON aggregate_clicks
PARTITION BY sessionid
ORDER BY clicktime
USING
MODE (nonoverlapping)
PATTERN ('A*.C+.A*')
SYMBOLS (productprice > 200 AND
pagetype='checkout' AS C, true AS A)
RESULT (FIRST(sessionid OF A) AS sessionid,
ACCUMULATE (pagetype OF ANY(A,C)) AS path,
AVG (productprice OF ANY(A,C)) AS totalsum)
) as dt ORDER BY dt.sessionid;

Output
nPath Range-Matching Example 3 Output Table

sessionid path sum


1 [home, home1, page1, home, home1, page1, home, home, home, 602.857142857143
home1, page1, checkout, home, home, home, home, home, home,
home, home, home]

26 Teradata Analytic Functions, Release 15.10.06


Chapter 2: The Teradata Analytic Functions
nPath®
sessionid path sum
5 [home, home, home, home, home1, home1, home1, page1, page1, 363.157894736842
page1, page2, page2, page2, checkout, checkout, checkout, page2,
page2, page2]

Example 4: Use OVERLAPPING Mode

SQL-MapReduce Call
SELECT * FROM NPATH (
ON aggregate_clicks
PARTITION BY sessionid
ORDER BY clicktime
USING
MODE (overlapping)
PATTERN ('A.A')
SYMBOLS (TRUE AS A)
RESULT (FIRST(sessionid OF A) AS sessionid,
ACCUMULATE (pagetype OF A) AS path)
) as dt ORDER BY dt.sessionid;

Output
nPath Range-Matching Example 4 Output Table

sessionid path
1 [home, home]
1 [home, home]
1 [home, home]
1 [home, home]
1 [home, home]
1 [home, home]
1 [home, home]
1 [home, home]
1 [checkout, home]
1 [page1, checkout]
1 [home1, page1]
1 [home, home1]
1 [home, home]
1 [home, home]
1 [page1, home]
1 [home1, page1]

Teradata Analytic Functions, Release 15.10.06 27


Chapter 2: The Teradata Analytic Functions
nPath®
sessionid path
1 [home, home1]
1 [page1, home]
1 [home1, page1]
1 [home, home1]
2 [home, home]
2 [checkout, home]
2 [checkout, checkout]
... ...

Example 5: Find First Product with Multiple Referrers in Any Session

SQL-MapReduce Call
SELECT * FROM NPATH (
ON aggregate_clicks
PARTITION BY sessionid
ORDER BY clicktime
USING
MODE (nonoverlapping)
PATTERN ('REFERRER{2,}')
SYMBOLS (referrer IS NOT NULL AS REFERRER)
RESULT (FIRST(sessionid OF REFERRER) AS sessionid,
FIRST(productname OF REFERRER) AS product)
) as dt ORDER BY dt.sessionid;

Output
nPath Range-Matching Example 5 Output Table

sessionid product
1 envelopes
2 tables
3 bookcases
4 tables
5 Appliances

Example 6: Find Data for Sessions That Checked Out 3-6 Products

SQL-MapReduce Call
SELECT * FROM NPATH (
ON aggregate_clicks
PARTITION BY sessionid
ORDER BY clicktime
USING

28 Teradata Analytic Functions, Release 15.10.06


Chapter 2: The Teradata Analytic Functions
nPath®
MODE (nonoverlapping)
PATTERN ('H+.D*.C{3,6}.D')
SYMBOLS (pagetype = 'home' AS H, pagetype='checkout' AS C,
pagetype<>'home' AND pagetype<>'checkout' AS D)
RESULT (FIRST(sessionid OF C) AS sessionid,
max_choose(productprice, productname OF C) AS
most_expensive_product,
MAX(productprice OF C) AS max_price,
min_choose(productprice, productname of C) AS
least_expensive_product,
MIN(productprice OF C) AS min_price)
) as dt ORDER BY dt.sessionid;

Output
nPath Range-Matching Example 6 Output Table

sessionid most_expensive_product max_price least_expensive_product min_price


5 cellphones 600 bookcases 150

Example 7: Find Data for Sessions That Checked Out at Least 3 Products
Modify the previous query call in Example 6 to find sessions where the user checked out at least three
products by changing the Pattern argument to:
PATTERN('H+.D*.C{3,}.D')

SQL-MapReduce Call
SELECT * FROM NPATH (
ON aggregate_clicks
PARTITION BY sessionid
ORDER BY clicktime
USING
MODE (nonoverlapping)
PATTERN('H+.D*.C{3,}.D')
SYMBOLS(pagetype = 'home' AS H, pagetype='checkout' AS C,
pagetype<>'home' AND pagetype<>'checkout' AS D)
RESULT (FIRST(sessionid OF C) AS sessionid,
max_choose(productprice, productname OF C) AS
most_expensive_product,
MAX (productprice OF C) AS max_price,
min_choose (productprice, productname OF C) AS
least_expensive_product,
MIN (productprice OF C) AS min_price)
) as dt ORDER BY dt.sessionid;

Output
nPath Range-Matching Example 7 Output Table

sessionid most_expensive_product max_price least_expensive_product min_price


5 cellphones 600 bookcases 150

Teradata Analytic Functions, Release 15.10.06 29


Chapter 2: The Teradata Analytic Functions
nPath®
Example 8: Multiple Partitioned Input Tables and Dimension Input Table
An e-commerce store wants to count the advertising impressions that lead to a user clicking an online
advertisement. The example counts the online advertisements that the user viewed and the television
advertisements that the user might have viewed.

Input
nPath Multiple-Input Example 2 Input Table impressions

userid ts imp
1 2012-01-01 ad1
1 2012-01-02 ad1
1 2012-01-03 ad1
1 2012-01-04 ad1
1 2012-01-05 ad1
1 2012-01-06 ad1
1 2012-01-07 ad1
2 2012-01-08 ad2
2 2012-01-09 ad2
2 2012-01-10 ad2
2 2012-01-11 ad2
... ... ...

nPath Multiple-Input Example 2 Input Table clicks2

userid ts click
1 2012-01-01 ad1
2 2012-01-08 ad2
3 2012-01-16 ad3
4 2012-01-23 ad4
5 2012-02-01 ad5
6 2012-02-08 ad6
7 2012-02-14 ad7
8 2012-02-24 ad8
9 2012-03-02 ad9
10 2012-03-10 ad10
11 2012-03-18 ad11
12 2012-03-25 ad12

30 Teradata Analytic Functions, Release 15.10.06


Chapter 2: The Teradata Analytic Functions
nPath®
userid ts click
13 2012-03-30 ad13
14 2012-04-02 ad14
15 2012-04-06 ad15

nPath Multiple-Input Example 2 tv_spots

ts tv_imp
2012-01-01 ad2
2012-01-02 ad2
2012-01-03 ad3
2012-01-04 ad4
2012-01-05 ad5
2012-01-06 ad6
2012-01-07 ad7
2012-01-08 ad8
2012-01-09 ad9
2012-01-10 ad10
2012-01-11 ad11
2012-01-12 ad12
2012-01-13 ad13
2012-01-14 ad14
2012-01-15 ad15

SQL-MapReduce Call
The tables impressions and clicks have a user_id column, but the table tv_spots is only a record of television
advertisements shown, which any user might have seen. Therefore, tv_spots must be a dimension table.
SELECT * FROM npath (
ON impressions PARTITION BY userid ORDER BY ts
ON clicks2 PARTITION BY userid ORDER BY ts
ON tv_spots DIMENSION ORDER BY ts
USING
MODE (nonoverlapping)
SYMBOLS (true as imp, true as click, true as tv_imp)
PATTERN ('(imp|tv_imp)*.click')
RESULT (COUNT(* of imp) as imp_cnt,
COUNT (* of tv_imp) as tv_imp_cnt)
) as dt ORDER BY dt.imp_cnt;

Teradata Analytic Functions, Release 15.10.06 31


Chapter 2: The Teradata Analytic Functions
SESSIONIZE
Output
nPath Multiple-Input Example 2 Output Table

dt.imp_cnt tv_imp_cnt
18
19 0
19 0
20 0
21 0
22 0
22 0
22 0
22 0
22 0
23 0
23 0
23 0
24 0
25 0

Related Topics
For more information about the PARTITION BY, ORDER BY, and DIMENSION clauses, see the chapter on
the SELECT statement in SQL Data Manipulation Language, B035-1146.

SESSIONIZE

Purpose
The Sessionize function maps each click in a session to a unique session identifier. A session is defined as a
sequence of clicks by one user that are separated by at most n seconds.
The function is useful both for sessionization and for detecting web crawler (“bot”) activity. It is typically
used to understand user browsing behavior on a web site.

32 Teradata Analytic Functions, Release 15.10.06


Chapter 2: The Teradata Analytic Functions
SESSIONIZE
Syntax

Syntax Elements
ON Clause
The function supports one ON clause.
The input table must have a timestamp column and columns by which to partition and order the data. Input
data must be partitioned such that each partition contains all rows of an entity. No input column can have
the name 'sessionid' or 'clicklag', because these are output column names.

Note:
If the input path to nPath is nondeterministic, then the results are nondeterministic.

ON table_name
Name of input table.

Teradata Analytic Functions, Release 15.10.06 33


Chapter 2: The Teradata Analytic Functions
SESSIONIZE
ON view_name
Name of input view.
ON query_expression
SELECT statement. See
AS alias_name
Alias name for the input table.
PARTITION BY column_name
Name of the column by which every partitioned input table is partitioned. The function
requires at least one partitioned input table, and can have additional partitioned input
tables.
PARTITION BY column_position
Position of the column by which every partitioned input table is partitioned. The function
requires at least one partitioned input table, and you can specify additional partitioned input
tables.
ORDER BY column_name
Name of the column by which every input table is ordered.
ORDER BY column_position
Position of the column to use for ordering the results.
ORDER BY sort_expression
Expression to use for ordering the results.
ASC
Sort results in ascending order.
DESC
Sort results in descending order.
NULLS FIRST
Sort nulls first in results.
NULLS LAST
Sort nulls last in results.

34 Teradata Analytic Functions, Release 15.10.06


Chapter 2: The Teradata Analytic Functions
SESSIONIZE
TIMECOLUMN
timestamp_column
Name of the input column that contains the click times. If the data type is INTEGER,
BIGINT, or SMALLINT, then the function treats the values as milliseconds.

Note:
The timestamp_column must also be an order_column.

The timestamp_column can have any of the following data types:


• TIME
• TIMESTAMP
• INTEGER
• BIGINT
• SMALLINT

TIMEOUT
session_timeout_value
Specifies the number of seconds at which the session times out. If session_timeout seconds
elapse after a click, then the next click starts a new session. The data type of session_timeout
is DOUBLE PRECISION.

CLICKLAG
min_human_click_lag
Specifies the minimum number of seconds between clicks for the session user to be
considered human. If clicks are more frequent, indicating that the user is a “bot,” the
function ignores the session. The min_human_click_lag must be less than session_timout.
The data type of min_human_click_lag is DOUBLE PRECISION.

EMITNULL
Specifies whether to output rows that have NULL values in their session id and clicklag columns, even if
their timestamp_column has a NULL value.
true_value
Output rows that have NULL values in their session id and clicklag column, even if
TIMECOLUMN row has a NULL value.
Valid values for true_value are: true, t, yes, y, and 1, which must be enclosed in single
quotation marks.
false_value
Do not output any rows with NULL values for the TIMECOLUMN. This is the default.
Valid values for false_value are: false, f, no, n, and 0. The default value is false.

AS alias_name
Name of output table.

other_select_conditions
You can specify other SELECT statement options. See SQL Data Manipulation Language, B035-1146

Teradata Analytic Functions, Release 15.10.06 35


Chapter 2: The Teradata Analytic Functions
SESSIONIZE
Output
Sessionize Output Table Schema

Column Name Data Type Description


input_column TIME, Column copied from input table. The function copies
TIMESTAMP, every input table column to the output table.
INTEGER,
BIGINT, or
SMALLINT
sessionid INTEGER or Contains the identifiers that the function assigned to
BOOLEAN the sessions.
clicklag BOOLEAN Contains 't' if the session exceeded
min_human_click_lag, 'f' otherwise.

Usage Notes
ON Clause Usage Notes
• SESSIONIZE can have only one ON clause.
• The following combinations are allowed in the ON clause:
∘ PARTITION BY ORDER BY
∘ PARTITION BY
• PARTITION BY ANY, DIMENSION, HASH BY and LOCAL ORDER BY are not allowed in the ON
Clause.
• You can associate an optional alias with the ON Clause input table by using the AS name clause.

SESSIONIZE Function Output


The output of the SESSIONIZE function includes all columns from the input and an additional column
named sessionid that contains the assigned session identifiers. If CLICKLAG is also specified in the call, then
an additional column named CLICKLAG is generated. The data type of the CLICKLAG column is CHAR
and can contain a t or f value, which corresponds to TRUE or FALSE. Since the output columns of
SESSIONIZE are called sessionid and CLICKLAG, the input columns cannot use these names.

Usage Example
Input Table: sessionize_table

partition_id clicktime userid productname pagetype referrer productprice


1 1110000 333 Home www.yahoo.co
m
1 1112000 333 Ipod Checkout www.yahoo.co 200.2
m

36 Teradata Analytic Functions, Release 15.10.06


Chapter 2: The Teradata Analytic Functions
SESSIONIZE

partition_id clicktime userid productname pagetype referrer productprice


1 1160000 333 Bose Checkout 340
1 1200000 333 Home www.google.c
om
1 1203000 67403 Home www.google.c
om
1 1300000 67403 Home www.google.c
om
1 1301000 67403 Home
1 1302000 67403 Home
1 1340000 67403 Iphone Checkout 650
1 1450000 67403 Bose Checkout 750
1 1450200 80000 Home godaddy.com
1 1450600 80000 Bose Checkout 340
1 1450800 80000 Itrip Checkout 450
1 1452000 880000 Iphone Checkout 650

Example: SESSIONIZE Call


SELECT *
FROM SESSIONIZE
(
ON sessionize_table PARTITION BY partition_id ORDER BY clicktime
USING
TIMECOLUMN('clicktime')
TIMEOUT(60)
CLICKLAG(0.2)
)ORDER BY partition_id, clicktime;

Output

partition_i clicktime userid productna pagetype referrer productpri SESSIONI CLICKLA


d me ce D G
1 1110000 333 Home www.yaho 0 f
o.com
1 1112000 333 Ipod Checkout www.yaho 200.2 0 f
o.com
1 1160000 333 Bose Checkout 340 0 f
1 1200000 333 Home www.googl 0 f
e.com

Teradata Analytic Functions, Release 15.10.06 37


Chapter 2: The Teradata Analytic Functions
SESSIONIZE

partition_i clicktime userid productna pagetype referrer productpri SESSIONI CLICKLA


d me ce D G
1 1203000 67403 Home www.googl 0 f
e.com
1 1300000 67403 Home www.googl 1 f
e.com
1 1301000 67403 Home 1 f
1 1302000 67403 Home 1 f
1 1340000 67403 Iphone Checkout 650 1 f
1 1450000 67403 Bose Checkout 750 2 f
1 1450200 80000 Home godaddy.c 2 t
om
1 1450600 80000 Bose Checkout 340 2 f
1 1450800 80000 Itrip Checkout 450 2 t
1 1452000 880000 Iphone Checkout 650 2 f

Related Topics
For more information about the ON clause, PARTITION BY clause, and ORDER BY clause, see the chapter
on the SELECT statement in SQL Data Manipulation Language, B035-1146.

38 Teradata Analytic Functions, Release 15.10.06


APPENDIX A
How to Read Syntax Diagrams

Syntax Diagram Conventions


Notation Conventions

Item Definition and Comments


Letter An uppercase or lowercase alphabetic character ranging from A through Z.
Number A digit ranging from 0 through 9.
Do not use commas when typing a number with more than 3 digits.
Word Keywords and variables.
• UPPERCASE LETTERS represent a keyword.
Syntax diagrams show all keywords in uppercase, unless operating system
restrictions require them to be in lowercase.
• lowercase letters represent a keyword that you must type in lowercase, such as a
Linux command.
• Mixed Case letters represent exceptions to uppercase and lowercase rules. The
exceptions are noted in the syntax explanation.
• lowercase italic letters represent a variable such as a column or table name.
Substitute the variable with a proper value.
• lowercase bold letters represent an excerpt from the diagram.
The excerpt is defined immediately following the diagram that contains it.
• UNDERLINED LETTERS represent the default value.
This applies to both uppercase and lowercase words.

Spaces Use one space between items such as keywords or variables.


Punctuation Type all punctuation exactly as it appears in the diagram.

Paths
The main path along the syntax diagram begins at the left with a keyword, and proceeds, left to right, to the
vertical bar, which marks the end of the diagram. Paths that do not have an arrow or a vertical bar only show
portions of the syntax.
The only part of a path that reads from right to left is a loop.

Continuation Links
Paths that are too long for one line use continuation links. Continuation links are circled letters indicating
the beginning and end of a link:

Teradata Analytic Functions, Release 15.10.06 39


Appendix A: How to Read Syntax Diagrams
Syntax Diagram Conventions
A

When you see a circled letter in a syntax diagram, go to the corresponding circled letter and continue
reading.

Required Entries
Required entries appear on the main path:
SHOW

If you can choose from more than one entry, the choices appear vertically, in a stack. The first entry appears
on the main path:
SHOW CONTROLS
VERSIONS

Optional Entries
You may choose to include or disregard optional entries. Optional entries appear below the main path:
SHOW
CONTROLS

If you can optionally choose from more than one entry, all the choices appear below the main path:

READ
SHARE
ACCESS

Some commands and statements treat one of the optional choices as a default value. This value is
UNDERLINED. It is presumed to be selected if you type the command or statement without specifying one
of the options.

Strings
String literals appear in apostrophes:
'msgtext '

Abbreviations
If a keyword or a reserved word has a valid abbreviation, the unabbreviated form always appears on the
main path. The shortest valid abbreviation appears beneath.
SHOW CONTROLS
CONTROL

In the above syntax, the following formats are valid:


SHOW CONTROLS
SHOW CONTROL

40 Teradata Analytic Functions, Release 15.10.06


Appendix A: How to Read Syntax Diagrams
Syntax Diagram Conventions
Loops
A loop is an entry or a group of entries that you can repeat one or more times. Syntax diagrams show loops
as a return path above the main path, over the item or items that you can repeat:
, 3
, 4
( cname )

Read loops from right to left.


The following conventions apply to loops:
Item Description Example
maximum number of entries The number appears in a circle on In the example, you may type
allowed the return path. cname a maximum of four times.
minimum number of entries The number appears in a square In the example, you must type at
allowed on the return path. least three groups of column
names.
separator character required The character appears on the In the example, the separator
between entries return path. character is a comma.
If the diagram does not show a
separator character, use one blank
space.
delimiter character required The beginning and end characters In the example, the delimiter
around entries appear outside the return path. characters are the left and right
Generally, a space is not needed parentheses.
between delimiter characters and
entries.

Excerpts
Sometimes a piece of a syntax phrase is too large to fit into the diagram. Such a phrase is indicated by a break
in the path, marked by (|) terminators on each side of the break. The name for the excerpted piece appears
between the terminators in boldface type.
The boldface excerpt name and the excerpted phrase appears immediately after the main diagram. The
excerpted phrase starts and ends with a plain horizontal line:
LOCKING excerpt
HAVING con

excerpt
where_cond
,
cname
,
col_pos

Teradata Analytic Functions, Release 15.10.06 41


Appendix A: How to Read Syntax Diagrams
Syntax Diagram Conventions
Multiple Legitimate Phrases
In a syntax diagram, it is possible for any number of phrases to be legitimate:
dbname
DATABASE
tname
TABLE
vname
VIEW

In this example, any of the following phrases are legitimate:


dbname
DATABASE dbname
tname
TABLE tname
vname
VIEW vname

Sample Syntax Diagram


,
CREATE VIEW viewname AS A
CV cname LOCKING
LOCK

A dbname ACCESS B
DATABASE FOR SHARE MODE
tname IN READ
TABLE WRITE
vname EXCLUSIVE
VIEW EXCL

, ,
B SEL expr FROM tname qual_cond C
.aname
C
HAVING cond ;

qual_cond

WHERE cond ,
GROUP BY cname
,
col_pos

42 Teradata Analytic Functions, Release 15.10.06

You might also like