3-Migrating and DataLoading Into ADB
3-Migrating and DataLoading Into ADB
3-Migrating and DataLoading Into ADB
We will be
covering several topics during this module. We will start by describing the options
and considerations for migration to ADB.
We will then cover migrating to ADB using Data Pump, followed by how to migrate or
synchronize ADB with GoldenGate. Once data is migrated into ADB, Oracle Data Sync
can be used for synchronization of ADB with source systems. In the last section, we
will cover how to use DBMS_CLOUD for loading data into ADB.
When you have completed this video, you will be able to describe the options and
considerations for migrating to ADB. You will be able to articulate how to migrate
to Autonomous Database with Data Pump and GoldenGate. Finally, you will be able to
articulate how to load data to ADB with Data Sync and DBMS_CLOUD.
Let's start with the migration options and considerations. The diagram above
provides a high-level architectural diagram for different ways to load and maintain
data in ADB. Data can be loaded directly to ADB through applications such as SQL
Developer, which can read data files such as TEXT and XLS and load directly into
tables in ADB.
A more efficient and preferred method for loading data into ADB is to stage the
data into Cloud Object Store, preferably Oracle's, but also support it are Amazon
S3 and Azure Blob Storage. Any file type can be staged in Object Store. Once the
data is in Object Store, Autonomous Database can access it directly. Tools can be
used to facilitate the data movement between Object Store and the database, for
example, IMP, DP, and for export, DMP files from other databases, or Parquet, JSON,
.CSV format, among others. Depending on the amount of data to be loaded, either of
these strategies would work.
Data Pump must be used for migrating databases versions 10.1 and above to
Autonomous Database, as it addresses the issues just mentioned. For online
migrations, GoldenGate can be used to keep old and new databases in sync. More on
this will be covered later in the module.
When considering the extended set of methods that can be used for migration and
loading, it's important to segregate the methods by functionality and limitations
of use against Autonomous Database. The considerations are as follows. How large is
the dataset to be imported? What is the input file format?
Does the method support none-Oracle-database sources? Does the method support using
Oracle and/or third-party object store? Use the chart in this slide to go over the
considerations to decide which method is the most appropriate for any specific
Autonomous Database loading requirement.
You can load data into ADB Cloud using Oracle Database Tools and Oracle and third-
party data integration tools. You can load data from files local to your client
computer or from files stored in cloud-based object store. The Oracle Autonomous
Database has built-in functionality called DBMS Cloud specifically designed so the
database can move data back and forth with external sources through a secure and
transparent process.
DBMS Cloud allows data movement from the object store, data from any applications
or data sources, export to TEXT, .CSV, or JSON, [INAUDIBLE] third-party data
integration tools. DBMS Cloud can also access data stored on object storage from
other clouds, AWS, S3, and Azure Blob storage. DBMS Cloud does not impose any
volume limit, so it's preferred method to use.
SQL Loader can be used for loading data located on local client file systems into
Autonomous Database. There are limits around OS and client machines when using SQL
Loader. Data Pump is the best way to migrate a full or part database into ADB,
including databases from previous versions.
Because Data Pump will perform the upgrade as part of the export/import process,
this is the simplest way to get the ADB from any existing Oracle database
implementation. SQL Developer provides a GUI front end for using Data Pump. They
can automate the whole export and import process from an existing database to ADB.
SQL Developer also includes an import wizard that can be used to import data from
several file types into ADB. A very common use of the wizard is to import Excel
files into Autonomous Data Warehouse.
Oracle Object Store is directly integrated into Autonomous Database, and is the
best option for staging data that will be consumed by Autonomous Database. Any file
type can be stored in object store, including SQL Loader files, Excel, JSON,
Parquet, and of course dump files. Flat files stored in object store can also be
used as Oracle database external tables. So they can be queried directly from the
database as part of a normal DML Operation.
Object store is separate than storage allocated to the Autonomous Database for data
set object storage, such as tables and indexes. That storage is part of the exadata
assist, and is automatically allocated and managed. Users do not have direct access
to that storage.
One of the main considerations when loading and updating ADB is the network latency
between the data source and the Autonomous Database. Many ways to measure this
latency exist. One is the website cloudharmony.com, which provides many real-time
metrics for connectivity between the client and Oracle Cloud Services.
It's important to run these tests when determining which Oracle Cloud Service
location will provide the best connectivity. The Oracle Cloud dashboard has an
integrated tool that will provide real time and historic latency information
between your existing location and any specified Oracle Data center. When planning
for a immigration, after performing network and data transfer analysis, the total
data transfer time to ADB should be quantified to ensure operational continuity.
The chart in this slide includes a sample of how long it would take to transfer
data into object store, for then loading into ADB. The numbers here are
representative of a specific location. If the transfer imposed importing into the
database will be long and if it's important to keep the source and new ADB systems
in sync, contingency plans such as using Oracle GoldenGate to synchronize the
database could be integrated into the migration plan.
Customers that have a large database may want to migrate to ADB. And our customers
they have slow connectivity to an Oracle Data center can leverage the data transfer
service available to object store customers. This allows customers to ship their
data on hard drives to Oracle Cloud Services where they will be loaded into Oracle
Object Store. For transfer operations that could take days or weeks and risk being
interrupted, this provides a safe and multi-use movement method.
Autonomous Database uses hybrid columnar compression for all tables by default.
This gives the best compression ratio and optimal performance for direct path load
operations like the loads done using DBMS Cloud package. If you perform DML
operations like update and merge on your tables, these may cause a compression
ratio for the effective rows to decrease, leading to larger table sizes. These
operations may also perform slower compared to some of the operations on an
uncompressed table. For best compression ratio and optimal performance, Oracle
recommends using bulk operations like direct path loads and create table select
statements, but if your work requires frequent DML operations like update and merge
on large parts of the table, you can create those tables as uncompressed tables to
achieve better DML performance.
Let's look at how to perform a migration using Data Pump. Data Pump EXPDP and IMPDP
can be used for database versions 10.1 and above and source databases, and will
perform the upgrade steps necessary to convert the database from older versions to
the current Autonomous Database version. Data Pump Import lets you import data from
Data Dump files residing on the Oracle Cloud Infrastructure Object Store, Oracle
Cloud Infrastructure Object Store Classic, and AWS S3.
You can save your data to your cloud object store and use Oracle Data Pump to load
data to the Autonomous Database. Oracle Data Pump offers very fast bulk data and
metadata movement between Oracle databases and Autonomous Database. This is the
preferred way to move between an existing database implementation and an Autonomous
Database. The primary mechanism that Data Pump uses is data file copy, but it will
also leverage direct path, external tables, or network link imports.
Oracle Data Pump Export provides several export modes. Oracle recommends using the
schema mode for migrating to Autonomous Database. You can list the schemas you want
to export by using a schema parameter. For a faster migration, export your schemas
into multiple Data Pump files and use parallelism. You can specify the dump file
name, the format you want to use with a dump file parameter.
Set the parallel parameter to at least the number of CPUs you have in your
Autonomous Database. The exclude data_options parameters ensure that the object
types not required in Autonomous Database are not exported and table partitions are
grouped together so that they can be imported faster during the import to
Autonomous Database. In this example, you see under the exclude and data_options,
the recommended objects that should not be exported, as they are either not
supported or not recommended for Autonomous Database.
To use Data Pump from ADB, a credential identifying the Object Storage bucket to
use must be defined with a DBMS_CLOUD.CREATE_CREDENTIAL function. This will allow
ADB to access objects that are stored in the object store, including dump files. To
export an existing database to prepare for import into ADB, use the XTP command and
add the exclude option for database functionality that is not recommended or
supported in ADB. This will prevent errors during the import process.
In this example, the exclude and data_options parameters ensure that the object
types not required in Autonomous Database are not exported and table partitions are
grouped together so that they can be imported faster during to import to import
your schemas and data. If you want to migrate your existing indexes, materialized
views, and materialized view logs to Autonomous Database and manage them manually,
you can remove these objects from the exclude list, which will export those objects
types too. This is exploring the SH schema and is creating 16 parallel threads.
To import and export a database into ADB, use the MDB command with the admin or
user with permission to import and the connect string with the ADB wallet. This
example shows importing using a defined credential and from a dump file in object
store. In the previous export, we had created 16 parallel strings, so we will
import with the same parallel parameter. The same exclude options are specified.
For the best import performance, use the high database service for your input
connection and set the parallel parameter to the number of CPUs in your Autonomous
Database, as shown in this example. If using Data Pump versions 12.2 or older, the
credential parameter is not supported. If you're using an older version of Data
Pump Import, you needed to define the default credential property for Autonomous
Database and use a default credential keyword in the dump file parameter.
An extra step is necessary in this case. The same process as before would be used
to create the credential, but after, the created credential needs to be made the
default credential. This is done with the alter database property, set
default_credential statement, as seen in this slide. In the MDB command, the
keyword default_credential is specified before the location of the dump file in the
dump file parameter. See the example in this slide.
The log files for Data Pump Import operations are stored in the directory
DATA_PUMP_DIR. This is the only directory you can specify for the Data Pump
directory parameter. To access the log file, you need to move the log file to your
cloud object storage using the procedure DBMNS_CLOUD.PUT_OBJECT.
This process is not automatic. And if the logs are not moved, you will receive a
warning when running the MDB that the logs are not there. In this example, we're
moving the log import.log to object store with a DBMS_CLOUD.PUT_OBJECT command.
Migrations to ADB using Oracle GoldenGate-- good uses for using GoldenGate to
replicate information to Autonomous Database include using Oracle GoldenGate On
Premises to replicate data to Autonomous Database for real-time data warehousing,
replicate on premises data to the Autonomous Database to set up a staging
environment for downstream ETL, or real-time data warehousing. For operational
reporting, replicate real-time data from multiple on-premises data sources and
deliver to the Autonomous Database for creating reports. The Oracle GoldenGate
cloud service can also be used to migrate data to the Autonomous Database.
There are some limitations of using the GoldenGate replicat process with Autonomous
Database. Currently, only non-integrated replicats are supported with Oracle
Autonomous Database. For the best compression ratio in your target tables in
Autonomous Database, Oracle recommends replicating changes, including updates and
deletes, from your source system as inserts into staging tables and using in-
database batch operations to merge the changes into your target table.
You can configure in the Autonomous Database instance as a target database for
Oracle GoldenGate On Premises. The source for replicating the Autonomous Databases
can be Oracle GoldenGate On Premises release 12.3.0.1.2 and later. Those are
certified with Oracle Autonomous Database for remote delivery using the non-
integrated replicats only.
However, any supported release of the Oracle GoldenGate for any supported database
and operating system combination that can send trail data to Oracle GoldenGate for
Oracle Database Release 12.3.0.1.2 and later can be used as a source system. Oracle
Autonomous Database cannot be a source database. It can only ingest data, in other
words, be a target database.
The following data types are not supported for replicating data the Oracle
Autonomous Database-- LONG, LONG RAW, XMLTYPE STORE AS OBJECT relational, XMLSTORE
AS BINARY, BFILE, MEDIA, and SPATIAL. To configure ADB for GoldenGate replication,
use a pre-created ggadmin user in ADB.
Ggadmin has been granted the rights and the privileges for Oracle GoldenGate On
Premises replicat to work. By default, this user is locked. To unlock the ggadmin
user, connect to your Oracle Autonomous Database using the admin user using any SQL
client tool. By default, those user is locked. To unlock the ggadmin user, alter
user ggadmin identified by password account unlock command.
For replication processes, a new target user must be created which is different
than the administration user just discussed. This user must be created and granted
privileges to perform appropriate operations on the database. Once the user is
created, connect to ADB as that user.
To prepare the on-premise database for synchronization, the following steps must be
followed. Log in to your Oracle GoldenGate On Premises Oracle database, create a
new Oracle GoldenGate On Premises user. You can do that by the create user username
identified by password-- grant DBA connect resource to the user just created,
create some tables for the process with the drop table and create table user source
DWCS key, and create your extract using commands such as extract, userid, extrail,
and table.
Connect to your Oracle GoldenGate On Premises instance using an SSH and private
key. Once you are connected to your Oracle GoldenGate On Premises instance, change
user to Oracle and transfer the credential zip file that you downloaded for Oracle
Autonomous Database, contains a wallet connection information, as described in
previous modules. Edit the tnsnames.ora file in Oracle GoldenGate On Premises
instance to include the connection details that is available in the tnsnames.ora
file in your key directory, the directory where you unzip the credential file which
you downloaded for Autonomous Database in the Connecting to Autonomous Database
module.
Create a user alias and start GGSCI. Create a GoldenGate wallet and add the user ID
alias to create the store. Configure Oracle GoldenGate Manager in Classic replicat
to deliver to the Oracle Autonomous database. If you're not already connected to
your Oracle GoldenGate On Premises instance, connect using the SSH command. Once
you are connected to your Oracle GoldenGate On Premises, change to the user oracle
from your Oracle GoldenGate On Premises instane, test a connection to your
Autonomous Database using SQL Plus and then create a new user for replication--
drop user, create user, alter user, and grant the user, create resource, create
view, create session, and create table.
Log in as your new replication user, create the replication tables, connect to
GGSCI, and configure Oracle GoldenGate manager, and open the Manager parameter file
to edit it. That is done with the edit param mgr command under GGSCI. Ensure that
the manager parameter file has the following information-- port number, access
rules, purge old extracts, minkeep files, auto-restart, add the ggschema and
ggadmin to your GLOBALS file. And then stop and restart the Manager.
You are now ready to configure replicat files. And as you can see in this slide
here, these are the steps to configure the replicat file. At this point, the
replication process should be running. Insert records into your source database and
then ensure that the data is replicated into your Oracle Autonomous Database table
using the stats REPDWCS command.
Now let's look at what's entailed in migrating using Data Sync. Use Data Sync to
upload and manage data to Autonomous Database commonly in conjunction with OOAC.
You can load data files, CSV and Excel, various relational sources, tables, views,
SQL statements, OTBI, JDBC data sources, in Oracle Service Cloud. You can load the
relational tables or datasets-- capable of incremental extract and loads, it
supports several updates strategies, capable of de-duping source data. And you can
replicate it and transform data when the target is Autonomous Database.
Data Sync supports transformations, ELT, surrogate key generation, foreign key
lookups, et cetera. Data Sync efficiently writes parallelizing data loads and using
parallel streams for a target table. Data Sync is capable of pre/post load
processing, SQL and store procedures. Data Sync job scheduling automates async data
extraction from source. And Data Sync can easily integrate with third-party
scheduling systems. Supported Data Sync sources are JDBC, MongoDB, PSIbase, Hive,
salesforce.com, AWS, both redshift and postgre, MySQL, MSAccess, Spark, and
NetSuite. And supported targets include the Autonomous Database as well as others.
Once Data Sync is installed, to start using it, the steps to follow are, from the
project menu, click New, and create a new project. In the Connections view, specify
connection details for your data target and your data source if it's a database. In
the Project view, specify loading options for your project. In the Jobs view,
create a new job. Then click Run job.
In the next few slides, we will show the screenshots of the process. From the
Projects menu, click New and create a new project from Project's menu. Click New
and create a new project. In the connections view, specify connection details for
your target and your data source if it's a database.
For the URL, you will specify the service name for your database. Normally, this
will be database_high using the high service. You will also need the path to the
directory where you unzip your wallet file.
Please refer to the previous section where this was covered. Please note the
example you see here on the screenshot. When done, click Test Connection to save
the connection.
From the main screen, you should see the connection set up in the previous slide
called target. Highlight the connection and click on Project so you can create the
objects and do the mapping. Use data from objects to import source table
definitions. You can manually create the objects, but that takes longer and more
prone to errors.
Select the source connection target to connect to the database. Click on Search to
bring a list of objects. If you get an error message, then you will need to go back
to your connection and click on Test to make sure it's working. If it's then
working, make sure your schema is in all uppercase letters. Select the check import
definition, in this case, for all three tables on the source database. Then click
on Import.
You should now see your source objects listed in the window. Now click on data from
objects again so that you can add three flat files that are source. From the
dropdown, choose file source.
For file location, specify your file location. For a file name, they should already
get populated from selecting the file. Logical name, they should already get
populated from the file name. Do not select Delete files if unsuccessful load.
Click on Next to continue to Import options.
Under Import options, specify the information about the data in your source. Select
Next. Since you are using an existing source if you defined one above, click Select
an existing. If you did not define one, then click Create new followed by select
Next. Select how you want to map source columns to target columns.
In this case, the mapping is by position. The column order will be the same in the
source and the target databases. Repeat above steps to map every table in the
source, the tables in the target. You should end up with a source and target entry
for each table you have on your target database.
Once you have mapped all objects in the source table, in this case
file.INVENTORY_EXTRACT, and click on Targets tab, you will see a column that says
Load strategy. In this example, purge all the records before loading so you can
reload. Double click on the file that says, Add New Update Existing.
In the pop up, set the Load strategy to replace data in table and click OK. Repeat
this process for every source object to import. Click on the project summary tab. A
line showing each file loading into the correct target table should show, in this
case, three tables.
At the end of the record, the load strategy should be set to replace data in table.
Click on the jobs button to create a job to run your project. Click on New to
create a job. Fill in the parameters for your job.
Pick your ADB database connections. Once your job completes, it should say how many
loads were loaded and status of completed. At this point, your ADB database should
be loaded with the information you selected from the source files. You can repeat
this process for any source that Data Sync supports.
For data loading from files in the cloud, Autonomous Database provides a new PL/SQL
package called DBMS_CLOUD. The PL/SQL package DBMS_CLOUD provides support for
loading data files in the cloud to your tables in Autonomous Database. This package
supports loading from files in the following cloud services-- Oracle Cloud
Infrastructure Object Storage, Oracle Cloud Infrastructure Object Storage Classic,
Azure Blob Storage, and Amazon S3. For the fastest data loading experience, Oracle
recommends uploading the source files to a cloud-based object store such as Oracle
Cloud Infrastructure Object Storage, before loading data into your Autonomous
Database. Oracle provides support for loading files that are locally in your data
center, but when using this method of data loading, you should factor in the
transmission speeds across the internet which may be significantly lower.
In Oracle Cloud Infrastructure, on the top right in the User icon, select User
Settings. On the left, after selecting User Settings, select Authentication Tokens
on the left. Select Generate Token-- slide 60. This will generate a token and be
displayed.
Once the token is generated, click Copy to copy to a clipboard, as you will need
this to create the credentials. And the token won't be displayed again. In a SQL
connection to ADB, and connected as admin, run the DBMS_CLOUD command using the
authorization token just created as the password.
In this example, we are creating a credential called OBJ_STORE_CRED with the Oracle
Cloud Infrastructure user [INAUDIBLE] @oracle.com. And for the password, we're
using the token generated in the previous step. OBJECT_STORE_CRED is the credential
name of the credential being created.
You can specify any name you want. [INAUDIBLE] @oracle.com is the Oracle Cloud
Infrastructure username, which is a user who owns the object store, not the
database user that you're connected to. Password contains the token we generated
and copied to the clipboard.
To access the files in object store, use the DBMS_CLOUD copy data. In this case, we
have a flat file in object store accessed by the credential we created in the
previous step. This example maps the file on objects called channels.csv and maps
it to an Oracle table called channels. When accessing a file in object storage, use
the file_uri_list identifier to point to the file. There is a specific format the
identifier needs to follow to make sure the file can be accessed.
In the next few slides, we break down the components of the identifier. The
https:// statement always starts with the keyword swiftobjectstorage, follow the
data center region where the object store is. Next is the tenant name, which is the
tenancy specified when logging into the Oracle Cloud, followed by the object store
bucket that contains the file, and last, the actual name of the file in object
store.
The dbms_cloud.copy_data loads source files into an existing target table, handles
compression, parallelism, logging automatically, and logs all loads in the table
user_load_operations. Several parameters need to be defined to use this
functionality. Let's see in more detail.
You will need an existing table in the database, that is the table_name. You will
need a defined credential name, as discuss previously. You will need a
FILE_URI_LIST, as defined before. You will need the schema name that owns that
table to be inserted into. The column list in the source files needs to be
identified. The format of the source files needs to be specified. In this example,
a CSV file is being loaded into channels table of the database. Note that not all
parameters need to be defined.
The external table name defined in the database-- you will need a defined
credential name as discussed previously. You will need to FILE_URI_LIST, as defined
before. You will need to list the columns in the external table-- this is the list
of columns in the source files-- and the format of the source files. In this
example, we're using a file called channels.csv residing on object store as an
Oracle database external table. And the database external table is called
CHANNELS_EXT.
Once the table is defined, it can be accessed like any other table in the database.
Here we are running a select statement against a file in object store defined as an
external table. All load operations are logged into the user load operations and
DBA load operations tables in the database. For troubleshooting or reporting, these
tables can be queried. And they contain information such as how many records were
rejected during a load.
SQL Developer provides easy-to-use data loading wizards for loading data into
Autonomous Database. Let's walk through an example of this functionality. As a
reminder, SQL Developer is a free tool included with Autonomous Database. The SQL
Developer data import wizard enables you to import data from files into tables.
To import data into a new table, right click the Tables node in the Connections
navigator of SQL Developer and select Import Data. To import into an existing
table, right click the table and select Import Data. Beginning with the 18.1
release, you can use the wizard to load data from files in the cloud tables in
Autonomous Database. For loading data from the cloud, Autonomous Database uses
PL/SQL package DBMS_CLOUD just discussed.
In the Connections pane of SQL Developer in the Tables tab of your connected ADB
database, right click and select Import data. In the source selection, select the
Oracle Cloud storage as a source type. Provide the file URI, as described earlier,
as defined in previous sections. In the credential selection list, you should have
already defined a credential as discussed in the previous section.
In the dropdown list, you will see the name of the credentials you created. Select
the one you want to use. From this point on, the loading process is the same as any
other loading process in SQL Developer. The main difference is the source file, the
credential process used to load an Autonomous Database.
You can use Oracle SQL Loader to load data from local files in your client machine
into Autonomous Database. Using SQL Loader may be suitable for loading small
amounts of data, as the load performance depends on the network bandwidth between
your client and Autonomous Database. Again, for large amounts of data, Oracle
recommends loading data from the Oracle Cloud Object Storage.
SQL Loader would be the recommended path for loading from earlier releases such as
9.2, or 8i, and so on. Generate a flat file and load using the recommended
parameters from the above documentation. Oracle recommends using the following SQL
Loader parameters for the best load performance-- read size 100 megabytes, bind
size 100 megabytes, direct equals no.
ADB gathers optimizer statistics for your tables during the load operations if you
use the recommended parameters. If you do not use the recommended parameters, then
you need gather optimizer statistics manually. For loading multiple files at the
same time, you can invoke a separate SQL Loader session for each file.