Teradata Analytic Functions 1206-151K
Teradata Analytic Functions 1206-151K
Teradata Analytic Functions 1206-151K
Teradata, Aster, BYNET, Claraview, DecisionCast, IntelliBase, IntelliCloud, IntelliFlex, QueryGrid, SQL-MapReduce, Teradata Decision Experts,
"Teradata Labs" logo, Teradata ServiceConnect, and Teradata Source Experts are trademarks or registered trademarks of Teradata Corporation or its
affiliates in the United States and other countries.
Adaptec and SCSISelect are trademarks or registered trademarks of Adaptec, Inc.
Amazon Web Services, AWS, Amazon Elastic Compute Cloud, Amazon EC2, Amazon Simple Storage Service, Amazon S3, AWS CloudFormation, and
AWS Marketplace are trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries.
AMD Opteron and Opteron are trademarks of Advanced Micro Devices, Inc.
Apache, Apache Avro, Apache Hadoop, Apache Hive, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks of the
Apache Software Foundation in the United States and/or other countries.
Apple, Mac, and OS X all are registered trademarks of Apple Inc.
Axeda is a registered trademark of Axeda Corporation. Axeda Agents, Axeda Applications, Axeda Policy Manager, Axeda Enterprise, Axeda Access,
Axeda Software Management, Axeda Service, Axeda ServiceLink, and Firewall-Friendly are trademarks and Maximum Results and Maximum Support
are servicemarks of Axeda Corporation.
CENTOS is a trademark of Red Hat, Inc., registered in the U.S. and other countries.
Cloudera and CDH are trademarks or registered trademarks of Cloudera Inc. in the United States, and in jurisdictions throughout the world.
Data Domain, EMC, PowerPath, SRDF, and Symmetrix are either registered trademarks or trademarks of EMC Corporation in the United States and/or
other countries.
GoldenGate is a trademark of Oracle.
Hewlett-Packard and HP are registered trademarks of Hewlett-Packard Company.
Hortonworks, the Hortonworks logo and other Hortonworks trademarks are trademarks of Hortonworks Inc. in the United States and other countries.
Intel, Pentium, and XEON are registered trademarks of Intel Corporation.
IBM, CICS, RACF, Tivoli, IBM Spectrum Protect, and z/OS are trademarks or registered trademarks of International Business Machines Corporation.
Linux is a registered trademark of Linus Torvalds.
LSI is a registered trademark of LSI Corporation.
Microsoft, Active Directory, Windows, Windows NT, and Windows Server are registered trademarks of Microsoft Corporation in the United States and
other countries.
NetVault is a trademark of Quest Software, Inc.
Novell and SUSE are registered trademarks of Novell, Inc., in the United States and other countries.
Oracle, Java, and Solaris are registered trademarks of Oracle and/or its affiliates.
QLogic and SANbox are trademarks or registered trademarks of QLogic Corporation.
Quantum and the Quantum logo are trademarks of Quantum Corporation, registered in the U.S.A. and other countries.
Red Hat is a trademark of Red Hat, Inc., registered in the U.S. and other countries. Used under license.
SAP is the trademark or registered trademark of SAP AG in Germany and in several other countries.
SAS and SAS/C are trademarks or registered trademarks of SAS Institute Inc.
Sentinel® is a registered trademark of SafeNet, Inc.
Simba, the Simba logo, SimbaEngine, SimbaEngine C/S, SimbaExpress and SimbaLib are registered trademarks of Simba Technologies Inc.
SPARC is a registered trademark of SPARC International, Inc.
Unicode is a registered trademark of Unicode, Inc. in the United States and other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Veritas, the Veritas Logo and NetBackup are trademarks or registered trademarks of Veritas Technologies LLC or its affiliates in the U.S. and other
countries.
Other product and company names mentioned herein may be the trademarks of their respective owners.
The information contained in this document is provided on an "as-is" basis, without warranty of any kind, either express or
implied, including the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. Some
jurisdictions do not allow the exclusion of implied warranties, so the above exclusion may not apply to you. In no event will
Teradata Corporation be liable for any indirect, direct, special, incidental, or consequential damages, including lost profits or
lost savings, even if expressly advised of the possibility of such damages.
The information contained in this document may contain references or cross-references to features, functions, products, or services that are not
announced or available in your country. Such references do not imply that Teradata Corporation intends to announce such features, functions,
products, or services in your country. Please consult your local Teradata Corporation representative for those features, functions, products, or services
available in your country.
Information contained in this document may contain technical inaccuracies or typographical errors. Information may be changed or updated without
notice. Teradata Corporation may also make improvements or changes in the products or services described in this information at any time without
notice.
To maintain the quality of our products and services, we would like your comments on the accuracy, clarity, organization, and value of this document.
Please e-mail: [email protected]
Any comments or materials (collectively referred to as "Feedback") sent to Teradata Corporation will be deemed non-confidential. Teradata Corporation
will have no obligation of any kind with respect to Feedback and will be free to use, reproduce, disclose, exhibit, display, transform, create derivative
works of, and distribute the Feedback and derivative works thereof without limitation on a royalty-free basis. Further, Teradata Corporation will be free
to use any ideas, concepts, know-how, or techniques contained in such Feedback for any purpose whatsoever, including developing, manufacturing, or
marketing products or services incorporating Feedback.
Copyright © 2017 by Teradata. All Rights Reserved.
Table of Contents
Preface...................................................................................................................................................................5
Purpose............................................................................................................................................................................ 5
Audience..........................................................................................................................................................................5
Supported Releases.........................................................................................................................................................5
Revision History.............................................................................................................................................................5
Additional Information.................................................................................................................................................5
Teradata Support............................................................................................................................................................6
Product Safety Information.......................................................................................................................................... 6
Chapter 1:
Introduction to the Teradata Analytic Functions..................................... 7
Teradata Analytic Functions........................................................................................................................................ 7
Terminology................................................................................................................................................................... 7
Chapter 2:
The Teradata Analytic Functions......................................................................................9
nPath®............................................................................................................................................................................9
SESSIONIZE................................................................................................................................................................. 32
Appendix A:
How to Read Syntax Diagrams......................................................................................... 39
Syntax Diagram Conventions.................................................................................................................................... 39
Purpose
This book describes the Teradata analytic functions.
Audience
This book is intended for data analysts, data scientists and other technical personnel who use Teradata
Database.
Supported Releases
This book supports Teradata® Database 15.10.06.
Teradata Database 15.10.06 is supported on:
• SUSE Linux Enterprise Server (SLES) 10 SP3
• SUSE Linux Enterprise Server (SLES) 11 SP1
• SUSE Linux Enterprise Server (SLES) 11 SP3
Teradata Database client applications support other operating systems.
Revision History
Date Release Description
November 2017 15.10.06 Added syntax diagrams and additional explanations for the
functions.
August 2017 15.10.06 Initial release
Additional Information
Related Links
URL Description
http://www.info.teradata.com Use the Teradata Information Products Publishing Library site to:
• View or download a manual:
URL Description
https://access.teradata.com Use Teradata @ Your Service to access Orange Books, technical alerts,
and knowledge repositories, view and join forums, and download
software patches.
http://developer.teradata.com/ Teradata Developer Exchange provides articles on using Teradata
products, technical discussion forums, and code downloads.
To maintain the quality of our products and services, we would like your comments on the accuracy, clarity,
organization, and value of this document. Please email [email protected].
Customer Education
Teradata Customer Education delivers training for your global workforce, including scheduled public
courses, customized on-site training, and web-based training. For information about the classes, schedules,
and the Teradata Certification Program, go to www.teradata.com/TEN/.
Teradata Support
Teradata customer support is located at https://access.teradata.com.
Example
Notice:
Improper use of the Reconfiguration utility can result in data loss.
Terminology
This document refers to the following terms.
Term Description
Path An ordered, start-to-finish series of actions, for example, page views, for which
sequences and sub-sequences can be generated.
Sequence A sequence is the path prefixed with a carat (^), which indicates the start of a
path. For example, if a user visited page a, page b, and page c, in that order, the
session sequence is ^,a,b,c.
Sub-sequence For a given sequence of actions, a sub-sequence is one possible subset of the steps
that begins with the initial action. For example, the path a,b,c generates three
subsequences: ^,a; ^,a,b; and ^,a,b,c.
nPath®
Purpose
The nPath® function matches specified patterns in a sequence of rows from one or more input tables and
extracts information from the matched rows.
Syntax Elements
query_expression
SELECT statement, including a table operator SELECT call.
For information about the syntax of the SELECT clause, ON clause, PARTITION BY clause, DIMENSION
clause, and ORDER BY clause, see the chapter on the SELECT statement in SQL Data Manipulation
Language, B035-1146.
For information about the supported symbol predicates, see the chapter on Logical Predicates in SQL
Functions, Operators, Expressions, and Predicates, B035-1145.
Note:
If the input path to nPath is nondeterministic, then the results are nondeterministic.
ON table_name
Name of input table.
ON view_name
Name of input view.
ON query_expression
Expression that resolves to an input table.
AS alias_name
Alias name for the input table.
PARTITION BY column_name
Name of the column by which every partitioned input table is partitioned. The function
requires at least one partitioned input table, and can have additional partitioned input
tables.
PARTITION BY column_position
Position of the column by which every partitioned input table is partitioned. The function
requires at least one partitioned input table, and you can specify additional partitioned input
tables.
DIMENSION
You can specify additional DIMENSION input tables.
ORDER BY column_name
Name of the column by which every input table is ordered.
ORDER BY column_position
Position of the column to use for ordering the results.
ORDER BY sort_expression
Expression to use for ordering the results.
ASC
Sort results in ascending order.
DESC
Sort results in descending order.
NULLS FIRST
Sort nulls first in results.
NULLS LAST
Sort nulls last in results.
MODE
Specifies the pattern-matching mode.
PATTERN
Specifies the pattern for which the function searches.
symbolic_search_pattern
You compose symbolic_search_pattern with the symbols that you define in the SYMBOLS
argument, operators, and parentheses.
To specify that a subpattern must appear a specific number of times, use the Range-
Matching Feature defined later in this section. For pattern matching details, refer to Pattern
Matching.
The following table describes the simplest patterns, which you can combine to form more
complex patterns. When patterns have multiple operators, the function applies them in
order of precedence, and applies operators of equal precedence from left to right. The
following table also shows operator precedence. To force the function to evaluate a
subpattern first, enclose the subpattern in parentheses. In the following table, A and B are
symbols defined in the SYMBOLS argument.
Simple nPath Patterns and Operator Precedence
SYMBOLS
Defines the symbols that appear in the values of the PATTERN and RESULT arguments.
For example, following is a SYMBOLS argument for analyzing web site visits:
SYMBOLS (
pagetype = 'homepage' AS H,
pagetype <> 'homepage' AND pagetype <>
'checkout' AS PP,
pagetype = 'checkout' AS CO
)
For more information about symbols that appear in the PATTERN argument value, refer to Symbols. For
more information about symbols that appear in the RESULT argument value, refer to Result: Applying
Aggregate Functions.
logical_predicate
An expression whose value is a column name. If col_expr represents a column that appears
in multiple input tables, then you must qualify the ambiguous column name with its table
name. For example:
Symbols (
weblog.pagetype = 'homepage' AS H,
weblog.pagetype = 'thankyou' AS T,
ads.adname = 'xmaspromo' AS X,
ads.adname = 'realtorpromo' AS R
)
AS symbol
Any valid identifier. The symbol is case-insensitive. However, a symbol of one or two
uppercase letters is easier to identify in patterns.
symbol_predicate
SQL predicate, often a column name.
RESULT
Defines the output columns.
aggregate_function
The function to apply. For details, see Result: Applying Aggregate Functions. The function
evaluates this argument once for every matched pattern in the partition. That is, the function
outputs one row for each pattern match.
expression
An expression whose value is a column name. The expression specifies the values to retrieve
from the matched rows.
Range-Matching Feature
You use the range-matching feature to specify the number of times that a subpattern must appear in a
match. You can specify the count as one of the following, enclosed in braces, { }:
• Exact number number of times that the subpattern appears in a match.
• Minimum number of times that the subpattern appears in a match.
• Minimum and maximum number of times that the subpattern appears in a match.
The format is as follows:
(subpattern){n[,[m]]}
where n is the minimum and m is the maximum.
(subpattern){n}
Subpattern must appear exactly n times.
For example, the following pattern specifies that subpattern (A.B|C) must appear exactly 3
times:
'X.(Y.Z).(A.B|C){3}'
The preceding pattern is equivalent to the following pattern:
'X.(Y.Z).(A.B|C).(A.B|C).(A.B|C)'
(subpattern){n,}
Subpattern must appear at least n times. For example, the following pattern specifies that
subpattern (A.B|C) must appear at least 4 times:
'X.(Y.Z).(A.B|C){4,}'
The preceding pattern is equivalent to the following pattern:
'X.(Y.Z).(A.B|C).(A.B|C).(A.B|C).(A.B|C)*'
Output
nPath Output Table Schema
Usage Notes
Pattern Matching
Conceptually, nPath pattern matching proceeds like this: Starting from a row in a partition, the function
tries to match the given pattern along the row sequence in the partition (ordered as specified in the ORDER
BY clause).
If the function cannot match the pattern, it outputs nothing; otherwise, it continues to the next row. When
the function finds a sequence of rows that match the pattern, it selects the largest set of rows that constitute
the match and outputs a row based on this match.
For example, suppose that the pattern is 'A.B+' and the rows that constitute the match start at a row t1 and
end at row t4. Suppose that t1 matches A and each of t2,t3, and t4 matches B. When the matching is
complete, A represents t1 and B represents t2, t3, and t4. Using the rows represented by A and B, the
function evaluates the Result argument (typically applying an aggregate function to each symbol in the
pattern), outputs one row with the result values, and proceeds to search for the next pattern match.
Before running nPath on a large data set, create a small data set that includes the pattern that you want to
find. Test your pattern on the small data set, refine the pattern until nPath gives the desired output, and then
using the refined pattern for the large data set.
job_transition_path count
[Chief Exec Officer, Software Engineer, Software Engineer, Chief Exec Officer, Chief Exec 1
Officer]
In the pattern, CEO matches the first row, ENGR matches the second row, and OTHER* matches the
remaining rows:
job_transition_path count
[Chief Exec Officer, Software Engineer, Software Engineer, Chief Exec Officer, Chief Exec 1
Officer]
In the pattern, CEO matches the first row, ENGR matches the second row, OTHER* matches the next two
rows, and CEO matches the last row:
Symbols
This section applies only to symbols that appear in the Pattern argument, described in Syntax Elements. For
information about symbols that appear in the Result argument, refer to Result: Applying Aggregate
Functions.
For each symbol definition, col_expr = symbol_predicate AS symbol, the function returns the rows for
which col_expr equals symbol_predicate. For example, for pagetype = 'home' AS H, the function returns
the first and fourth rows of the following table.
nPath Sample Input Table
The function does not return any row that contains a NULL value. For example, for pagetype =
'checkout' AS C, the function returns the second row of the preceding table, but not the third.
Function Description
COUNT ( Returns either the number of total number of matched rows (*) or the
{ * number (or distinct number) of col_expr values in the matched rows.
col_expr }
OF symbol_list )
FIRST ( Returns the col_expr value of the first matched row. For the example
col_expr OF in Pattern Matching, FIRST (pageid OF B) returns the pageid of
symbol_list ) row t2.
LAST ( Returns the col_expr value of the last matched row. For the example in
col_expr OF Pattern Matching, LAST (pageid OF B) returns the pageid of row
symbol_list ) t4.
FIRST_NOTNULL ( Returns the first non-null col_expr value in the matched rows.
col_expr OF
symbol_list )
LAST_NOTNULL ( Returns the last non-null col_expr value in the matched rows.
col_expr OF
symbol_list )
MAX_CHOOSE ( Returns the descriptive_col_expr value of the matched row with the
quantifying_col_expr, highest-sorted quantifying_col_expr value. For example, MAX_CHOOSE
descriptive_col_expr (product_price, product_name OF B) returns the
OF symbol_list ) product_name of the most expensive product in the rows that map to
B.
The descriptive_col_expr can have any data type. The
qualifying_col_expr must have a sortable data type (SMALLINT,
INTEGER, BIGINT, DOUBLE PRECISION, DATE, TIME,
TIMESTAMP, VARCHAR, or CHARACTER).
MIN_CHOOSE ( Returns the descriptive_col_expr value of the matched row with the
quantifying_col_expr, lowest-sorted qualifying_col_expr value. For example, MIN_CHOOSE
descriptive_col_expr (product_price, product_name OF B) returns the
OF symbol_list ) product_name of the least expensive product in the rows that map to
B.
The descriptive_col_expr can have any data type. The
qualifying_col_expr must have a sortable data type (SMALLINT,
INTEGER, BIGINT, DOUBLE PRECISION, DATE, TIME,
TIMESTAMP, VARCHAR, or CHARACTER).
DUPCOUNT ( Returns the duplicate count for col_expr in the matched rows. That is,
col_expr OF for each matched row, the function returns the number of occurrences
symbol_list ) of the current value of col_expr in the immediately preceding matched
row.
When col_expr is also the ORDER BY col_expr, this function returns
the equivalent of ROW_NUMBER()-RANK().
DUPCOUNTCUM ( Returns the cumulative duplicate count for col_expr in the matched
col_expr OF rows. That is, for each matched row, the function returns the number
symbol_list ) of occurrences of the current value of col_expr in all preceding
matched rows.
When col_expr is also the ORDER BY col_expr, this function returns
the equivalent of ROW_NUMBER()-DENSE_RANK().
ACCUMULATE ( Returns, for each matched row, the concatenated values in col_expr,
col_expr OF symbol_list separated by delimiter. The default delimiter is ', ' (a comma followed
[ DELIMITER by a space).
'delimiter'] )
You can compute an aggregate over more than one symbol. For example, SUM (val OF ANY (A,B))
computes the sum of the values of the attribute val across all rows in the matched segment that map to A or
B.
For an example, see Example 1: Use FIRST, LAST_NOTNULL, MAX_CHOOSE, and MIN_CHOOSE.
Input
nPath Aggregate Functions Example 1 Input Table trans1
SQL-MapReduce Call
SELECT * FROM NPATH (
ON trans1
PARTITION BY userid ORDER BY ts
USING
MODE (nonoverlapping)
PATTERN ('A+')
SYMBOLS(TRUE AS A)
RESULT (FIRST(userid OF A) AS Userid,
LAST_NOTNULL (gender OF A) AS Gender,
MAX_CHOOSE (productamt, productname OF A) AS Max_prod,
MIN_CHOOSE (productamt, productname OF A) AS Min_prod)
) as dt ORDER BY 1;
Output
nPath Aggregate Functions Example 1 Output Table
) ;
The following table summarizes the symbols and symbol predicates that the examples use.
nPath Clickstream Data Examples Symbols and Symbol Predicates
This invocation gets the pageid for each row and the pageid for the next row in sequence:
SELECT dt.sessionid, dt.pageid, dt.next_pageid FROM NPATH (
ON clicks1
PARTITION BY sessionid
ORDER BY ts
USING
MODE (OVERLAPPING)
PATTERN ('A.B')
SYMBOLS (true AS A, true AS B)
RESULT (FIRST(sessionid OF A) AS sessionid,
FIRST (pageid OF A) AS pageid,
FIRST (pageid OF B) AS next_pageid
)
) as dt;
Range-Matching Examples
Whenever a user visits the home page and then visits checkout pages and buys increasingly expensive
products, the nPath query returns the first purchase and the most expensive purchase.
nPath Example Input Table: aggregate_clicks
SQL-MapReduce Call
SELECT * FROM NPATH (
ON aggregate_clicks
PARTITION BY sessionid
ORDER BY clicktime
USING
MODE (nonoverlapping)
PATTERN ('A*')
SYMBOLS (TRUE AS A)
RESULT (FIRST (sessionid OF A) AS sessionid,
ACCUMULATE (pagetype OF A) AS path)
) as dt ORDER BY dt.sessionid;
Output
nPath Range-Matching Example 1 Output Table
sessionid path
1 [home, home1, page1, home, home1, page1, home, home, home, home1, page1,
checkout, home, home, home, home, home, home, home, home, home]
2 [home, home, home, home, home, home, home, home, home, home1, page1, checkout,
checkout, home, home]
3 [home, home, home, home, home, home, home, home, home1, page1, home, home1,
page1, home
4 [home, home, home, home, home, home, home1, home1, home1, page1, page1, page1]
5 [home, home, home, home, home1, home1, home1, page1, page1, page1, page2, page2,
page2, checkout, checkout, checkout, page2, page2, page2]
Example 2: Find Sessions That Start at Home Page and Visit Page1
SQL-MapReduce Call
SELECT * FROM NPATH (
ON aggregate_clicks
PARTITION BY sessionid
ORDER BY clicktime
USING
Output
nPath Range-Matching Example 2 Output Table
sessionid path
1 [home, home1, page1, home, home1, page1, home, home, home, home1, page1,
checkout, home, home, home, home, home, home, home, home, home]
2 [home, home, home, home, home, home, home, home, home, home1, page1, checkout,
checkout, home, home]
3 [home, home, home, home, home, home, home, home, home1, page1, home, home1,
page1, home]
4 [home, home, home, home, home, home, home1, home1, home1, page1, page1, page1]
5 [home, home, home, home, home1, home1, home1, page1, page1, page1, page2, page2,
page2, checkout, checkout, checkout, page2, page2, page2]
SQL-MapReduce Call
SELECT * FROM NPATH (
ON aggregate_clicks
PARTITION BY sessionid
ORDER BY clicktime
USING
MODE (nonoverlapping)
PATTERN ('A*.C+.A*')
SYMBOLS (productprice > 200 AND
pagetype='checkout' AS C, true AS A)
RESULT (FIRST(sessionid OF A) AS sessionid,
ACCUMULATE (pagetype OF ANY(A,C)) AS path,
AVG (productprice OF ANY(A,C)) AS totalsum)
) as dt ORDER BY dt.sessionid;
Output
nPath Range-Matching Example 3 Output Table
SQL-MapReduce Call
SELECT * FROM NPATH (
ON aggregate_clicks
PARTITION BY sessionid
ORDER BY clicktime
USING
MODE (overlapping)
PATTERN ('A.A')
SYMBOLS (TRUE AS A)
RESULT (FIRST(sessionid OF A) AS sessionid,
ACCUMULATE (pagetype OF A) AS path)
) as dt ORDER BY dt.sessionid;
Output
nPath Range-Matching Example 4 Output Table
sessionid path
1 [home, home]
1 [home, home]
1 [home, home]
1 [home, home]
1 [home, home]
1 [home, home]
1 [home, home]
1 [home, home]
1 [checkout, home]
1 [page1, checkout]
1 [home1, page1]
1 [home, home1]
1 [home, home]
1 [home, home]
1 [page1, home]
1 [home1, page1]
SQL-MapReduce Call
SELECT * FROM NPATH (
ON aggregate_clicks
PARTITION BY sessionid
ORDER BY clicktime
USING
MODE (nonoverlapping)
PATTERN ('REFERRER{2,}')
SYMBOLS (referrer IS NOT NULL AS REFERRER)
RESULT (FIRST(sessionid OF REFERRER) AS sessionid,
FIRST(productname OF REFERRER) AS product)
) as dt ORDER BY dt.sessionid;
Output
nPath Range-Matching Example 5 Output Table
sessionid product
1 envelopes
2 tables
3 bookcases
4 tables
5 Appliances
Example 6: Find Data for Sessions That Checked Out 3-6 Products
SQL-MapReduce Call
SELECT * FROM NPATH (
ON aggregate_clicks
PARTITION BY sessionid
ORDER BY clicktime
USING
Output
nPath Range-Matching Example 6 Output Table
Example 7: Find Data for Sessions That Checked Out at Least 3 Products
Modify the previous query call in Example 6 to find sessions where the user checked out at least three
products by changing the Pattern argument to:
PATTERN('H+.D*.C{3,}.D')
SQL-MapReduce Call
SELECT * FROM NPATH (
ON aggregate_clicks
PARTITION BY sessionid
ORDER BY clicktime
USING
MODE (nonoverlapping)
PATTERN('H+.D*.C{3,}.D')
SYMBOLS(pagetype = 'home' AS H, pagetype='checkout' AS C,
pagetype<>'home' AND pagetype<>'checkout' AS D)
RESULT (FIRST(sessionid OF C) AS sessionid,
max_choose(productprice, productname OF C) AS
most_expensive_product,
MAX (productprice OF C) AS max_price,
min_choose (productprice, productname OF C) AS
least_expensive_product,
MIN (productprice OF C) AS min_price)
) as dt ORDER BY dt.sessionid;
Output
nPath Range-Matching Example 7 Output Table
Input
nPath Multiple-Input Example 2 Input Table impressions
userid ts imp
1 2012-01-01 ad1
1 2012-01-02 ad1
1 2012-01-03 ad1
1 2012-01-04 ad1
1 2012-01-05 ad1
1 2012-01-06 ad1
1 2012-01-07 ad1
2 2012-01-08 ad2
2 2012-01-09 ad2
2 2012-01-10 ad2
2 2012-01-11 ad2
... ... ...
userid ts click
1 2012-01-01 ad1
2 2012-01-08 ad2
3 2012-01-16 ad3
4 2012-01-23 ad4
5 2012-02-01 ad5
6 2012-02-08 ad6
7 2012-02-14 ad7
8 2012-02-24 ad8
9 2012-03-02 ad9
10 2012-03-10 ad10
11 2012-03-18 ad11
12 2012-03-25 ad12
ts tv_imp
2012-01-01 ad2
2012-01-02 ad2
2012-01-03 ad3
2012-01-04 ad4
2012-01-05 ad5
2012-01-06 ad6
2012-01-07 ad7
2012-01-08 ad8
2012-01-09 ad9
2012-01-10 ad10
2012-01-11 ad11
2012-01-12 ad12
2012-01-13 ad13
2012-01-14 ad14
2012-01-15 ad15
SQL-MapReduce Call
The tables impressions and clicks have a user_id column, but the table tv_spots is only a record of television
advertisements shown, which any user might have seen. Therefore, tv_spots must be a dimension table.
SELECT * FROM npath (
ON impressions PARTITION BY userid ORDER BY ts
ON clicks2 PARTITION BY userid ORDER BY ts
ON tv_spots DIMENSION ORDER BY ts
USING
MODE (nonoverlapping)
SYMBOLS (true as imp, true as click, true as tv_imp)
PATTERN ('(imp|tv_imp)*.click')
RESULT (COUNT(* of imp) as imp_cnt,
COUNT (* of tv_imp) as tv_imp_cnt)
) as dt ORDER BY dt.imp_cnt;
dt.imp_cnt tv_imp_cnt
18
19 0
19 0
20 0
21 0
22 0
22 0
22 0
22 0
22 0
23 0
23 0
23 0
24 0
25 0
Related Topics
For more information about the PARTITION BY, ORDER BY, and DIMENSION clauses, see the chapter on
the SELECT statement in SQL Data Manipulation Language, B035-1146.
SESSIONIZE
Purpose
The Sessionize function maps each click in a session to a unique session identifier. A session is defined as a
sequence of clicks by one user that are separated by at most n seconds.
The function is useful both for sessionization and for detecting web crawler (“bot”) activity. It is typically
used to understand user browsing behavior on a web site.
Syntax Elements
ON Clause
The function supports one ON clause.
The input table must have a timestamp column and columns by which to partition and order the data. Input
data must be partitioned such that each partition contains all rows of an entity. No input column can have
the name 'sessionid' or 'clicklag', because these are output column names.
Note:
If the input path to nPath is nondeterministic, then the results are nondeterministic.
ON table_name
Name of input table.
Note:
The timestamp_column must also be an order_column.
TIMEOUT
session_timeout_value
Specifies the number of seconds at which the session times out. If session_timeout seconds
elapse after a click, then the next click starts a new session. The data type of session_timeout
is DOUBLE PRECISION.
CLICKLAG
min_human_click_lag
Specifies the minimum number of seconds between clicks for the session user to be
considered human. If clicks are more frequent, indicating that the user is a “bot,” the
function ignores the session. The min_human_click_lag must be less than session_timout.
The data type of min_human_click_lag is DOUBLE PRECISION.
EMITNULL
Specifies whether to output rows that have NULL values in their session id and clicklag columns, even if
their timestamp_column has a NULL value.
true_value
Output rows that have NULL values in their session id and clicklag column, even if
TIMECOLUMN row has a NULL value.
Valid values for true_value are: true, t, yes, y, and 1, which must be enclosed in single
quotation marks.
false_value
Do not output any rows with NULL values for the TIMECOLUMN. This is the default.
Valid values for false_value are: false, f, no, n, and 0. The default value is false.
AS alias_name
Name of output table.
other_select_conditions
You can specify other SELECT statement options. See SQL Data Manipulation Language, B035-1146
Usage Notes
ON Clause Usage Notes
• SESSIONIZE can have only one ON clause.
• The following combinations are allowed in the ON clause:
∘ PARTITION BY ORDER BY
∘ PARTITION BY
• PARTITION BY ANY, DIMENSION, HASH BY and LOCAL ORDER BY are not allowed in the ON
Clause.
• You can associate an optional alias with the ON Clause input table by using the AS name clause.
•
Usage Example
Input Table: sessionize_table
Output
Related Topics
For more information about the ON clause, PARTITION BY clause, and ORDER BY clause, see the chapter
on the SELECT statement in SQL Data Manipulation Language, B035-1146.
Paths
The main path along the syntax diagram begins at the left with a keyword, and proceeds, left to right, to the
vertical bar, which marks the end of the diagram. Paths that do not have an arrow or a vertical bar only show
portions of the syntax.
The only part of a path that reads from right to left is a loop.
Continuation Links
Paths that are too long for one line use continuation links. Continuation links are circled letters indicating
the beginning and end of a link:
When you see a circled letter in a syntax diagram, go to the corresponding circled letter and continue
reading.
Required Entries
Required entries appear on the main path:
SHOW
If you can choose from more than one entry, the choices appear vertically, in a stack. The first entry appears
on the main path:
SHOW CONTROLS
VERSIONS
Optional Entries
You may choose to include or disregard optional entries. Optional entries appear below the main path:
SHOW
CONTROLS
If you can optionally choose from more than one entry, all the choices appear below the main path:
READ
SHARE
ACCESS
Some commands and statements treat one of the optional choices as a default value. This value is
UNDERLINED. It is presumed to be selected if you type the command or statement without specifying one
of the options.
Strings
String literals appear in apostrophes:
'msgtext '
Abbreviations
If a keyword or a reserved word has a valid abbreviation, the unabbreviated form always appears on the
main path. The shortest valid abbreviation appears beneath.
SHOW CONTROLS
CONTROL
Excerpts
Sometimes a piece of a syntax phrase is too large to fit into the diagram. Such a phrase is indicated by a break
in the path, marked by (|) terminators on each side of the break. The name for the excerpted piece appears
between the terminators in boldface type.
The boldface excerpt name and the excerpted phrase appears immediately after the main diagram. The
excerpted phrase starts and ends with a plain horizontal line:
LOCKING excerpt
HAVING con
excerpt
where_cond
,
cname
,
col_pos
A dbname ACCESS B
DATABASE FOR SHARE MODE
tname IN READ
TABLE WRITE
vname EXCLUSIVE
VIEW EXCL
, ,
B SEL expr FROM tname qual_cond C
.aname
C
HAVING cond ;
qual_cond
WHERE cond ,
GROUP BY cname
,
col_pos