SAP HANA File Loader Guide PDF
SAP HANA File Loader Guide PDF
SAP HANA File Loader Guide PDF
10 Security Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
10.1 Authorizations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
10.2 Roles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29
10.3 Authentication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
10.4 Integration in Application Authorizations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
10.5 Users. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
11 Best Practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Note
Right after reading this document we advise you to read through the Enabling Search section in the SAP
HANA Developer Guide. This section provides you with information about how to build SQL search queries
to perform searches on the loaded file contents.
The file loader is a set of HTTP services that you can use to develop your own applications to search in file
contents. The file loader package also contains a basic example application with monitoring and statistical
information about the current file loader schedule.
Note
The file loader supports the loading of file contents for search. To also enable properties and metadata of
files for search, you can extend the node table with additional columns and follow the steps described in the
Enabling Search section in the SAP HANA Developer Guide.
Note
The file loader does not support the loading of files that are only accessible using HTTPS or that require
HTTP authentication.
Related Information
SAP HANA Developer Guide This guide explains how to build applications using SAP HANA, including how to
model data, how to write procedures, and how to build application logic in SAP HANA Extended Application
Services (SAP HANA XS).
The following diagram shows the architecture of the file loader component in SAP HANA.
Technically, the file loader is an SAP HANA XS application that is shipped as a delivery unit.
The binary file content from the HTTP server is converted and stored as a textual representation in the node
table in the SAP HANA database. The loading can be asynchronous and in parallel. The queue table is used to
track the file loading process.
The file loader exposes an HTTP service API in REST format that can be accessed from any HTTP client.
Prerequisites
Procedure
Related Information
Context
Before you can use the file loader example application, you have to install and configure the component and
set up the user management with minimal authorizations. Then, based on this configuration, you can try out
the example UI that comes with the file loader component.
Note
To perform the set-up steps, you need a user with system administrator permissions and access to the SAP
HANA XS Administration Tool.
Procedure
The file loader component comes as an SAP HANA XS delivery unit with SAP HANA.
Import the file loader delivery unit with the name HCO_INA_FILELOAD.tgz with SAP HANA Application
Lifecycle Management (sap/hana/xs/lm) or SAP HANA studio. After the import, the component is
available and activated in the sap bc ina fileloader job package.
The installation creates a new schema called SAP_INA_FILELOADER with the tables
sap.bc.ina.fileloader.db::INA_FILELOADER_JOB_SCHEDULES_T6 and
sap.bc.ina.fileloader.db::INA_FILELOADER_JOB_SCHEDULES_V1.
2. Enable job scheduling in the SAP HANA XS.
Note
Job scheduling in the SAP HANA XS is not enabled by default. It has to be enabled by the system
administrator.
For information about how to enable job scheduling in SAP HANA XS, see the SAP HANA
documentation.
Once job scheduling is enabled, create a new section, scheduler, in the xsengine.ini file. Add a new
property named enabled and set the value to true.
3. Set up users and tables for the example scenario.
Note
Our example uses users with minimal authorizations.
○ The access user (INA_FL_TEST_ACCESS), who calls the HTTP services to schedule the file loading
process.
○ The job access user with minimal authorizations (INA_FL_TEST_JOB_ACCESS) who is used in the
background job.
○ The job admin user (INA_FL_TEST_JOB_ADMIN) who is used in the SQL connection (xssqlcc).
○ Example tables.
4. Configure the file loader job.
Note
For this step, you need a user with the sap.bc.ina.fileloader.roles::Access role.
Start the SAP HANA XS Administration Tool and activate the file loader job in the Application Objects tree
under sap bc ina fileloader job job01.xsjob (direct link: /sap/hana/xs/admin/?
package=sap.bc.ina.fileloader.job&object_name=job01&object_type=xsjob).
To activate the job, mark it as <Active> and enter the job access user (INA_FL_TEST_JOB_ACCESS) in
the <User> field. Save the changes.
5. Configure the SQL connection for the file loader job.
Note
For this step, you need a user with the sap.bc.ina.fileloader.roles::Access and
sap.hana.xs.admin.roles::JobAdministrator roles.
You are still in the SAP HANA XS Administration Tool. In the Application Objects tree, navigate to the file
loader SQL connection located under sap bc ina fileloader job ileloader.xssqlcc (direct link:
http(s)://<SAP HANA host>:<SAP HANA port>/sap/hana/xs/admin/?
package=sap.bc.ina.fileloader.job&object_name=fileloader&object_type=xssqlcc) and
enter the job admin user (INA_FL_TEST_JOB_ADMIN) in the <User> field. Save the changes.
6. Start the file loader example application.
Start your browser and open the file loader example application by entering the following address:
http(s)://<SAP HANA host>:<SAP HANA port>/sap/bc/ina/fileloader/app/example/
webfileloader.html.
The node table contains a column of data type BINTEXT. The file loader updates this column with
searchable file content. You can perform a full-text search on this content by executing a SELECT
statement with the CONTAINS() TEXT predicate.
Run the example clean-up SQL script to remove all tutorial data and users generated in the previous steps
and to return to your previous system state.
Related Information
The setup SQL script creates the users and the example database tables for the file loader example application
tutorial.
-- Set up the users with minimal authorizations for the file loader
-- tutorial
-- The following users are created:
-- 1) The access user calls the HTTP services to schedule the file loading
-- process. This user also owns the example node and queue table in the user's
-- schema. [INA_FL_TEST_ACCESS]
-- 2) The job access user is used within the background job. This user requires
-- the same object authorizations as all other access users that are using
-- the file loader. [INA_FL_TEST_JOB_ACCESS]
-- 3) The job admin user is used in an SQLconnection (xssqlcc) that is used for
-- scheduling a job dynamically in the HTTP service.
-- This user requires only one application authorization.
-- [INA_FL_TEST_JOB_ADMIN]
-- Execute this script in the SAP HANA studio as a system administrator.
-- After executing this script, go the HANA XS administrator UI and configure
-- the SQL connection and job
-- Execute the teardown script to delete the users and tables
CONNECT <system administrator user> PASSWORD <password of system administrator>;
;
-- Create the file loader access user, this is the end user to be used
-- in the example
DROP USER INA_FL_TEST_ACCESS CASCADE;
CREATE USER INA_FL_TEST_ACCESS PASSWORD <password of access user>;
-- Create the job admin user that is used in the SQL connection
-- of the job sap.bc.ina.fileloader.job.xssqlcc
DROP USER INA_FL_TEST_JOB_ADMIN CASCADE;
CREATE USER INA_FL_TEST_JOB_ADMIN PASSWORD <password of job admin user>;
ALTER USER INA_FL_TEST_JOB_ADMIN DISABLE PASSWORD LIFETIME;
CALL GRANT_ACTIVATED_ROLE('sap.hana.xs.admin.roles::JobAdministrator',
'INA_FL_TEST_JOB_ADMIN');
-- Now switch to the access user to create the node and queue table
-- The tables will be used by the example UI
-- /sap/bc/ina/fileloader/app/example/webfileloader.html
CONNECT INA_FL_TEST_ACCESS PASSWORD <password of access user>;
The clean-up SQL script removes the users and tables from the file loader example application tutorial,
connects with the access user, and deletes the tables.
-- Delete the fileloader access user, this is the end user to be used
-- in the example
DROP USER INA_FL_TEST_ACCESS CASCADE;
-- Delete the job admin user that is used in the sql connection of
-- the job sap.bc.ina.fileloader.job.xssqlcc
DROP USER INA_FL_TEST_JOB_ADMIN CASCADE;
The file loader component contains a small browser-based demo UI to show what you can develop and how
you can use the file loader's capabilities.
Context
The file loader example application is an implementation of the file loader functionality using JavaScript with a
web frontend. You can enter a number of URLs for documents that you want to upload into SAP HANA with the
file loader. You can then search for content inside the uploaded documents.
Procedure
Log in with the access user INA_FL_TEST_ACCESS. On the initial screen, there is a text field containing
URLs. Use copy and paste to replace these URLs with the URLs for your own documents.
Note
Clicking the + icon on the initial screen of the user interface shows an options panel where you can change
some basic options: for example, the number of packages, schedules and the frequency, as well as the
names of the queue table and node table.
Number of Packages Limits the number of work packages that are used
to upload the documents. The packages are used
to divide the workload and so minimize the risk of
locking errors on database tables.
Number of Schedules Limits the number of job schedules that are used to
upload the documents for each package.
Schedule Frequency in Minutes Limits how long a job schedule runs before it is
stopped and restarted.
When you choose Start loading, the results screen appears. The top half contains the Queue monitor,
which displays the current status of the files you chose to process.
Note
If you use the delivered example URLs, one URL will fail and show an error message instead. This is
intended.
The file loading process is ready to start as soon as the node table and the queue table are available and
populated with data.
After the files have been processed, the application can use the extracted and converted content of the node
table. The queue table can be used by the application for cleanup processes if errors occur.
The file loader provides HTTP services to retrieve the current status of the data processing to determine the
process status for long-running processes.
The state of the file loading process is stored in the queue table column /1ES/_STATUS.
● NEW
● FILE_LOADING_IN_PROCESS
● FILE_LOADING_FAILED
● FILE_LOADING_SUCCESSFUL
● TEXT_CONVERSION_IN_PROGRESS
● NODE_TABLE_UPDATE_FAILED
● TEXT_CONVERSION_FAILED
● SUCCESS
● TIMED_OUT_WHILE_INDEXING
The file loader supports any structure for the table, but needs one column for the text content.
Note
The content column must have the data type BINTEXT.
The node table can have a language column (NVARCHAR(2)). This column can be used to store the language of
the file.
Example
Related Information
The table shows the progress of the file loading so that you can take action if data loading problems occur.
Every entry in the node table has a corresponding entry in the queue table. The node table controls the data
loading process and is used by the file loader jobs. The primary keys of the queue table are identical to those of
the node table. All other columns are determined by the file loader.
Example
The primary keys must be identical to the primary keys of the node table.
In this example, the column ID is the primary key. All other columns /1ES/_* must have the structure
described.
To populate the queue table, you need to create an entry for every entry of the node table. Fill the required
fields of the queue table entry. All other fields are updated by the file loader during processing.
● Primary Key: Use identical primary keys for the node table and the queue table.
● URL: Provide an absolute HTTP URL that targets a file in the /1ES/_URL column.
● Initial Status: To indicate that the entry has to be processed by the file loader, set the /1ES/_STATUS
column to the value "NEW".and set the time stamp column /1ES/_TS_STATUS_NEW to the current time
stamp.
The various operations that can be executed with this service are called commands. A command is described
as a JSON object.
Command JSON: The HTTP service API uses a JSON format to describe commands that should be executed.
The command must have a command property to provide a command name as a string (for example
cmdScheduleJob). The second property is the optional parameter property that describes the data that is
required by the command.
http(s)://<host>:<port>/sap/bc/ina/fileloader/service.xsjs
HTTP GET: Services that provide information and do not change the state can be invoked using the HTTP GET
method. The file loader service supports the command parameter.
http(s)://<host>:<port>/sap/bc/ina/fileloader/service.xsjs?command=<command JSON>
HTTP POST: Services that change the state must be invoked using the HTTP POST method. Information is
passed in the HTTP POST body.
http(s)://<host>:<port>/sap/bc/ina/fileloader/service.xsjs
The HTTP body contains the command in JSON format: <command JSON>
If you use lowercase letters and/or special characters for queue, node table or attribute data, you do not need
to use quotation marks (“). The file loader API behaves differently from the SAP HANA SQL interface in this
regard, to make it easier to use.
Related Information
When service.xsjs is invoked using an HTTP GET request, an HTML response is provided. The CSRF token
can also be fetched.
The file loader API supports various commands using HTTP GET or POST.
9.3 cmdScheduleJob
The schedule job command creates multiple schedules for the specified node tables and queue tables that
process the tables in parallel.
The maximum number of schedules limits the number of parallel executions and the timeout limits the
execution time of the complete process. The frequency specifies how long a schedule should run until the
schedule is restarted.
{
"command": "cmdScheduleJob",
"parameter": {
"fileLoaderRequest": {
"table": {
"queueTable": {
"schema": "< Schema name of the queue table >",
"name": "< Name of the queue table >",
"packageId": "< optional unique id of a package >"
},
"nodeTable": {
"schema": "< Schema name of the node table >",
"name": "< Name of the queue table >",
"fileContentAttributeName": "< Content attribute name >"
}
},
"schedule": {
"frequencyMinutes": "< Minutes until the schedule restarts >",
"maxNumberOfSchedules": "< Maximum number of parallel job
schedules>",
"timeOutMinutes": "< Minutes to complete the loading process >"
}
}
}
}
Return
{
"statusCode": "<code>",
"message": {
"text": "<message text>",
"detailMessages":[
{"message": "<message text>","code": “<code>”}…
]
},
"schedule": {
"jobName" : < id of the job>,
"numberOfSchedules" < number of schedules >,
"scheduleDetails": [
{
"fileloaderScheduleId": "<schedule id fileloader>",
"xsEngineScheduleId": "< schedule id XS >"
}
]
}
}
If no package is specified, all data is used for the statistics. These statistics are used by the application to
decide on further processing or for monitoring.
Command
{
"command": "cmdGetQueueStatistics",
"parameter": {
"fileLoaderRequest": {
"table": {
"queueTable": {
"schema": "< Schema name of the queue table >",
"name": "< Name of the queue table >",
"packageId": "< optional unique ID of a package >"
}
}
}
}
}
Return
The detailed status information is given for all individual statuses; the summaries are calculated as follows:
{
"statusCode": <code>,
9.5 cmdSetQueueTimedOut
The job schedules are created with an overall timeout. The job schedule automatically stops the processing
when the timeout is reached. However, the application can use this command to stop the processing earlier by
setting unfinished or unprocessed files as timed out. Once you have executed this command, the running job
schedules will not find any unprocessed files and will stop the processing. This service processes a subset of
all node table entries. To update all node table entries, call the service until the number of updated records is
0.
Command
{
"command": "cmdSetQueueTimedOut",
"parameter": {
"fileLoaderRequest": {
"table": {
"queueTable": {
"schema": "< Schema name of the queue table >",
"name": "< Name of the queue table >",
"packageId": "< Optional unique ID of a package >"
}
Return
{
"statusCode": "<code>",
"message": {
"text": "<message text>",
"detailMessages":[
{"message": "<message text>","code": “<code>”}…
]
},
"table": {
“numberOfUpdatedRecords” : <Number of updated records>
}
}
9.6 cmdGetQueueSchedules
This command returns the number of schedules and the active schedules of the given queue. It is used for
monitoring purposes.
Command
{
"command": "cmdGetQueueSchedules",
"parameter": {
"fileLoaderRequest": {
"table": {
"queueTable": {
"schema": "< Schema name of the queue table >",
"name": "< Name of the queue table >"
}
}
}
}
}
{
"statusCode": "<code>",
"message": {
"text": "<message text>",
"detailMessages":[
{"message": "<message text>","code": “<code>”}…
]
},
"schedule": {
"jobName" : < id of the job>,
"numberOfSchedules" < number of schedules >,
"scheduleDetails": [
{
"fileloaderScheduleId": "<schedule id fileloader>",
"xsEngineScheduleId": "< schedule id XS >"
}
]
}
}
9.7 cmdKillJobSchedules
This command stops the schedules for all packages and sets the status "timed out" for all unprocessed files.
Note
This command is reserved for use in error or emergency situations only, to stop the processing
immediately.
Command
{
"command": "cmdKillJobSchedules",
"parameter": {
"fileLoaderRequest": {
"table": {
"queueTable": {
"schema": "< Schema name of the queue table >",
"name": "< Name of the queue table >"
}
}
}
}
}
{
"statusCode": "<code>",
"message": {
"text": "<message text>",
"detailMessages":[
{"message": "<message text>","code": “<code>”}…
]
},
"schedule": {
"jobName" : < id of the job>,
"numberOfSchedules" < number of schedules >,
"scheduleDetails": [
{
"fileloaderScheduleId": "<schedule id fileloader>",
"xsEngineScheduleId": "< schedule id XS >"
}
]
}
}
9.8 cmdUpdateNodeLanguage
Use this command when you want to store the file's language in the node table.
The text processor is able to determine the language of the file when it is converted to text. If the node table
has a language column, this command copies the determined language to the node table column.
This service processes a subset of all node table entries. To update all node table entries, call the service until
the number of updated records is 0.
Command
{
"command": "cmdUpdateNodeLanguage",
"parameter": {
"fileLoaderRequest": {
"table": {
"queueTable": {
"schema": "< Schema name of the queue table >",
"name": "< Name of the queue table >",
"packageId": "< optional unique id of a package >"
},
"nodeTable": {
"schema": "< Schema name of the node table >",
"name": "< Name of the queue table >",
"fileContentAttributeName": "< Content attribute name >",
"fileLanguageAttributeName": "< Language attribute name >",
"useAbapLanguageFormat": "<optional true/false(default) >
}
Return
{
"statusCode": "<code>",
"message": {
"text": "<message text>",
"detailMessages":[
{"message": "<message text>","code": “<code>”}…
]
},
"table": {
“numberOfUpdatedRecords” : <number of updated records>
}
}
9.9 cmdGetUser
This command returns the user name of the user being logged on.
Command
{"command": "cmdGetUser"}
Return
If the returned number of updated records is 0, then all columns are updated.
{
"user": {
"name": "< user name >"
}
}
Command
{
"command": "cmdGetSystemInformation",
// optional parameter to restrict the output to the desired sections only
"parameter": [
"User", "Time", "XS"…
]
}
Return
The command returns the following information about the current system.
{ "XS": { … },
"Fileloader": { … },
"Time": { … },
"User": { … },
"System": { … },
"Services": { … },
"Memory": { … },
"CPU": { … },
"Disk": { … },
"Statistics": { … }
}
9.11 cmdCommands
Command
{
"command": "cmdGetCommands",
Return
[
{
"name": "<command name>",
"description": "<description>",
"accessMethods": [ "HTTP POST","HTTP GET","JavaScript" ],
"privilege": "sap.bc.ina.fileloader::<application privilege>",
}
…
]
The file loader HTTP services can be used remotely by applications. The services ensure authentication and
authorization, and prevent cross-site request forgery (CSRF).
Note
Ensure that you assign minimal authorizations to users.
10.1 Authorizations
Monitoring The monitoring authorization allows users to access the HTTP service with
read-only access.
Access The access authorization allows users to influence (start and stop) the file
loading process using the HTTP service.
JobAccess This authorization defines the access to the file loader job script. It pre
vents non-authorized users from executing the job script.
10.2 Roles
The file loader roles define the minimum authorizations and access types that are required.
Monitoring Use the monitoring role when a component uses file loader commands to
monitor the current status of the file loading process. The access rights
sap.bc.ina.fileloader.ro are limited to read-only access to the job schedules and the file loader
les::Monitoring queues. The role includes the file loader monitoring authorization and ob
ject authorization to select on the job schedule table and framework ta
bles.
Access Use the access role when a file loader client component requires to sched
ule jobs or to stop jobs. This role also includes all of the access rights of
sap.bc.ina.fileloader.ro the monitoring role. The role inludes the file loader access application au
les::Access thorization and object authorization to select/update/insert/delete on the
job schedule table and select on framework tables.
JobAccess This role is used by the technical user (XSSQLCC) that accesses the file
loader job. The technical user
sap.bc.ina.fileloader.ro sap.bc.ina.fileloader.job.fileloader.xssqlcc needs this role
les::JobAccess
to operate correctly.
10.3 Authentication
The file loader HTTP services support the authentication methods of SAP HANA XS.
Client applications have to fulfill certain prerequisites to be used with the file loader component.
The file loader works with the node table and the queue table that are defined by the application.
In addition to the authorizations that are defined in the file loader roles, the file loader client application has to
define the object authorization for the queue table and the node table. The file loader requires SELECT and
UPDATE authorization for the two tables.
Example
This example shows how to define two file loader users with application-specific roles that includes the file
loader roles.
There are a number of users that require particular authorizations for the file loader.
User Description
System Administrator This user needs access to the SAP HANA XS adminis
tration tool to check the status of the job scheduler.
File Loader REST Service User This user uses the file loader HTTP service. The user
needs to be assigned the file loader Monitoring or
Access role and requires application-specific object
authorizations.
File Loader Job User The job user is a technical user that is used to exe
cute the file loader job. This user needs to be as
signed the file loader JobAccess role and requires
application-specific object authorizations.
1. The overall execution time of the file loading process should be as short as possible.
2. The load on the SAP HANA system should be minimal while the file loading process is running.
3. All files should be processed successfully.
● The loading time can be reduced by adding additional parallel schedules to the file loader job.
● The number of parallel schedules reduces the overall execution time of the file loading process.
● There should not be more parallel schedules than files to process. A high number of schedules can also
lead to locking overhead while updating the queue table.
● The overall SAP HANA system load is also increased. The system administrator has to balance the file
loader load with the remaining load of the SAP HANA system.
● As well as the load on the SAP HANA system, the load on the HTTP servers where the remote files are
located is increased. If the remote HTTP servers cannot handle the load, the response times might
increase, or the servers might not provide any response.
● If new schedules are added to the file loading process, consider resizing the remote HTTP servers.
Coding Samples
Any software coding and/or code lines / strings ("Code") included in this documentation are only examples and are not intended to be used in a productive system
environment. The Code is only intended to better explain and visualize the syntax and phrasing rules of certain coding. SAP does not warrant the correctness and
completeness of the Code given herein, and SAP shall not be liable for errors or damages caused by the usage of the Code, unless damages were caused by SAP
intentionally or by SAP's gross negligence.
Accessibility
The information contained in the SAP documentation represents SAP's current view of accessibility criteria as of the date of publication; it is in no way intended to be
a binding guideline on how to ensure accessibility of software products. SAP specifically disclaims any liability with respect to this document and no contractual
obligations or commitments are formed either directly or indirectly by this document.
Gender-Neutral Language
As far as possible, SAP documentation is gender neutral. Depending on the context, the reader is addressed directly with "you", or a gender-neutral noun (such as
"sales person" or "working days") is used. If when referring to members of both sexes, however, the third-person singular cannot be avoided or a gender-neutral noun
does not exist, SAP reserves the right to use the masculine form of the noun and pronoun. This is to ensure that the documentation remains comprehensible.
Internet Hyperlinks
The SAP documentation may contain hyperlinks to the Internet. These hyperlinks are intended to serve as a hint about where to find related information. SAP does
not warrant the availability and correctness of this related information or the ability of this information to serve a particular purpose. SAP shall not be liable for any
damages caused by the use of related information unless damages have been caused by SAP's gross negligence or willful misconduct. Regarding link classification,
see: http://help.sap.com/disclaimer.