Paraller Processing To Fetch Large Volume of Data Using BI
Paraller Processing To Fetch Large Volume of Data Using BI
Paraller Processing To Fetch Large Volume of Data Using BI
Business Requirement
Sometimes we need to fetch huge data from a Cloud Environment and BI Report fails stating Report Data
Size Exceeds Maximum Limit
In such cases, we might make use of the concept of using multithreading. What this essentially does is
that it breaks the entire program into smaller chunks such that if you have say 10K records to process you
may break them into smaller units say of 1K each and submit the program 10 times. Each one of the
program runs on its own without interfering with the other and you get the desired data (the only
drawback being they are in different files and maybe a merge operation is required). But having said so it
still serves the purpose and at-least one is able to get the data from the system instead of the first approach
when you hit a dead-end.
I will try to explain this with the help of a worked-out example.
Worked Out Example
In the diagram above we saw that the Report failed stating “ Report Data Size exceeds the maximum
limit. Stopped processing”
In order to overcome such a problem we need to break the entire data into smaller chunks.This can be
done by first of all identifying the Total No Of Records which is fetched by the report.
The general syntax of determine this is by following the below SQL syntax
) TAB1
Worker.sql
SELECT TAB1.*
FROM
(
select ROWNUM as "RECORD_SEQ",
"PER_PERSONS"."PERSON_ID" as "PERSON_ID",
"HR_ALL_ORGANIZATION_UNITS_F_VL"."NAME" as "BUSINESS_GROUP_NAME"
from "PER_PERSONS" "PER_PERSONS",
"HR_ALL_ORGANIZATION_UNITS_F_VL" "HR_ALL_ORGANIZATION_UNITS_F_VL",
"HR_ORG_UNIT_CLASSIFICATIONS_F" "HR_ORG_UNIT_CLASSIFICATIONS_F"
where "HR_ALL_ORGANIZATION_UNITS_F_VL"."NAME" = :p_bg_name
and "HR_ORG_UNIT_CLASSIFICATIONS_F"."ORGANIZATION_ID" =
"HR_ALL_ORGANIZATION_UNITS_F_VL"."ORGANIZATION_ID"
and "HR_ORG_UNIT_CLASSIFICATIONS_F"."CLASSIFICATION_CODE" = 'ENTERPRISE'
and TRUNC(sysdate) BETWEEN
"HR_ORG_UNIT_CLASSIFICATIONS_F"."EFFECTIVE_START_DATE" AND
"HR_ORG_UNIT_CLASSIFICATIONS_F"."EFFECTIVE_END_DATE"
and TRUNC(sysdate) BETWEEN
"HR_ALL_ORGANIZATION_UNITS_F_VL"."EFFECTIVE_START_DATE" AND
"HR_ALL_ORGANIZATION_UNITS_F_VL"."EFFECTIVE_END_DATE"
and "HR_ORG_UNIT_CLASSIFICATIONS_F"."STATUS" = 'A'
and "PER_PERSONS"."PERSON_ID" = nvl(:p_person_id, "PER_PERSONS"."PERSON_ID")
and "PER_PERSONS"."BUSINESS_GROUP_ID" =
"HR_ALL_ORGANIZATION_UNITS_F_VL"."ORGANIZATION_ID"
and trunc(SYSDATE) BETWEEN
"HR_ALL_ORGANIZATION_UNITS_F_VL"."EFFECTIVE_START_DATE" AND
"HR_ALL_ORGANIZATION_UNITS_F_VL"."EFFECTIVE_END_DATE"
and
( exists
(select 1 from "PER_ACTION_OCCURRENCES" "PER_ACTION_OCCURRENCES"
where "PER_ACTION_OCCURRENCES"."PARENT_ENTITY_KEY_ID" =
"PER_PERSONS"."PERSON_ID"
and trunc("PER_ACTION_OCCURRENCES"."ACTION_DATE") between
DECODE(:p_last_run_days,0,TO_DATE('01-01-0001','DD-MM-YYYY'),
(trunc(SYSDATE)-:p_last_run_days)) and DECODE(:p_last_run_days,0,TO_DATE('31-12-
4712','DD-MM-YYYY'),trunc(SYSDATE) )
)
or exists
( select 1 from "PER_PERSON_NAMES_F_V" "PER_PERSON_NAMES_F_V"
where "PER_PERSON_NAMES_F_V"."PERSON_ID" = "PER_PERSONS"."PERSON_ID"
Parallel Processing Technique to Fetch Large Volume of Data Using BI Report In Oracle Cloud
By: Ashish Harbhajanka
and "PER_PERSON_NAMES_F_V"."BUSINESS_GROUP_ID" =
"PER_PERSONS"."BUSINESS_GROUP_ID"
and trunc("PER_PERSON_NAMES_F_V"."LAST_UPDATE_DATE") BETWEEN
DECODE(:p_last_run_days,0,TO_DATE('01-01-0001','DD-MM-YYYY'),
(trunc(SYSDATE)-:p_last_run_days)) and DECODE(:p_last_run_days,0,TO_DATE('31-12-
4712','DD-MM-YYYY'),trunc(SYSDATE) )
)
or exists
( select 1 FROM "PER_ALL_PEOPLE_F_V"
where "PER_ALL_PEOPLE_F_V"."PERSON_ID" = "PER_PERSONS"."PERSON_ID"
and "PER_ALL_PEOPLE_F_V"."BUSINESS_GROUP_ID" =
"PER_PERSONS"."BUSINESS_GROUP_ID"
and trunc("PER_ALL_PEOPLE_F_V"."LAST_UPDATE_DATE") between
DECODE(:p_last_run_days,0,TO_DATE('01-01-0001','DD-MM-YYYY'),
(trunc(SYSDATE)-:p_last_run_days)) and DECODE(:p_last_run_days,0,TO_DATE('31-12-
4712','DD-MM-YYYY'),trunc(SYSDATE) )
)
)
and exists
( select 1 FROM "PER_PERSON_TYPE_USAGES_F"
where "PER_PERSON_TYPE_USAGES_F"."PERSON_ID" =
"PER_PERSONS"."PERSON_ID"
and DECODE (:p_include_term_recs,'N',SYSTEM_PERSON_TYPE,1) !=
DECODE(:p_include_term_recs,'N','EX_EMP',2)
and DECODE (:p_include_term_recs,'Y',trunc(SYSDATE), trunc(SYSDATE)) BETWEEN
DECODE (:p_include_term_recs,'Y',trunc(SYSDATE),trunc(EFFECTIVE_START_DATE)) AND
DECODE (:p_include_term_recs,'Y',trunc(SYSDATE),trunc(EFFECTIVE_END_DATE))
)
) TAB1
WHERE TAB1.RECORD_SEQ BETWEEN :START_SEQ AND :END_SEQ
For this example, we have used the delivered report “Worker Report”. The report location is
Shared Folders->Human Capital Management->Data Exchange -> Worker Report
This report has 5 parameters namely:
1) p_bg_name
This is a mandatory parameter and takes Business Group Name as an input.
2) p_person_id
This is an optional parameter and is used when the intent is to extract data for a specific person
record.
3) p_days_since_last_run
It holds a numeric value say 5,10 etc. This fetches only records which got created since
<last_run_days>. Meaning if you pass 5 as a value then the report will only fetch records which
got created in last 5 days, if you pass 10 as value then report will fetch records which got created
in last 10 days and so on. This is an optional parameter.
4) p_include_term_records
Use this parameter to choose whether you are interested to fetch only active records (pass value
as ‘N’) or all records (pass value as ‘Y’). This is an optional parameter.
Parallel Processing Technique to Fetch Large Volume of Data Using BI Report In Oracle Cloud
By: Ashish Harbhajanka
5) p_start_seq
This is a mandatory parameter used to provide the starting sequence of record you would like
to fetch in a specific batch.
6) p_end_seq
This is a mandatory parameter used to indicate the last sequence of record you would like to
fetch in a specific batch.
For this particular cloud environment I applied the same logic and could figure out that the total no of
records are 66511
Parallel Processing Technique to Fetch Large Volume of Data Using BI Report In Oracle Cloud
By: Ashish Harbhajanka
Next we need to decide the chunk size for each run. In my case I decided to have the below bifurcation
Batch Record Sequence
Sequence
Batch1 1-10000
Batch2 10001-20000
Batch3 20001-30000
Batch4 30001-40000
Batch5 40001-50000
Batch6 50001-60000
Batch7 60001-66511
Now although we have decided the no of batches and the chunk size one important thing is still left and
that is assigning RECORD_SEQ to each data record fetched by the query.
In order to do so we would make use of the ROWNUM property of the SQL.
Also we would need to use two additional parameters namely START_SEQ and END_SEQ to ensure the
correct records are picked in correct batch.
The lines highlighted in RED fonts in the ModifiedSQL.sql is the section newly added to the original sql.
Executing Report
Now as the new report is ready we will try to execute the same and find the results.
Parallel Processing Technique to Fetch Large Volume of Data Using BI Report In Oracle Cloud
By: Ashish Harbhajanka
Inference/Summary
So now we have seen how we can use parallel processing technique to overcome data size issue while
fetching Data using a SQL Data Model Based BI Report in a Cloud Environment.
While this example was used for a specific report the same method can be applied to any other report
(both seeded as well as custom) and hopefully it will deliver desired results. This technique should work
not only for HCM but for other modules too be it Financial, SCM, PPM etc. In a nutshell, anywhere
where you use a BI SQL to fetch data this trick can be applied.
Do try at your end and feel free to let-us know your observations.
That’s all from my side have a nice day ahead!