I used Oracle Zero Downtime Migration Tool to move some databases from OnPrem Exadata to a freshly instantiated OCI Exadata Cloud Service (Quarter Rack to be more precise). I found out then, that the speed and performance of backup and restore to OCI Object Store was more than catastrophical. Instead of advertised ~2,5h for a 5 TB big database over 8 channels, the restore took… more than 20 hours!
The backup was not as fast due to connection speed limit, but I assumed that at least within OCI, the restore should run as advertised in Oracle Cloud Infrastructure Exadata Backup & Restore Best Practices using Cloud Object Storage. I was really astonished when I saw that.
I opened an SR of course, but after more than 20 days of ping-pong with Oracle eventually found the reason myself.
You see – ZDM comes with the libopc.so in version 19.0.0.1. This is exactly the same version that is included in every ORACLE_HOME in lib subdirectory. You can easily get the version string with following command:
[zdm@zdm lib]$ strings libopc.so.original | grep -Po 'DNZ_REL_VER=".*?"' | head -1 DNZ_REL_VER="19.0.0.0.0-Production"
On the ExaCS I found, that even for 19.7 database home, the libopc.so
that is used for automatic backup and bkup_api is actually an older version, the 12.2.0.1, and it is located under /var/opt/oracle/dbaas_acfs//opc/libopc.so
[oracle@exa1-node1 ~]$ strings /var/opt/oracle/dbaas_acfs//opc/libopc.so | grep -Po 'DNZ_REL_VER=".*?"' | head -1 DNZ_REL_VER="12.2.0.1.0-Production"
I tested both with RMAN running a backup. For that I used a slightly modified SQL (thanks to Mariami Kupatadze) Script:
select recid , output_device_type , dbsize_mbytes , input_bytes/1024/1024 input_mbytes , output_bytes/1024/1024 output_mbytes , (output_bytes/input_bytes*100) compression , (mbytes_processed/dbsize_mbytes*100) complete , to_char(start_time ,'DD-MON-YYYY HH24:MI:SS') started , to_char(start_time + (sysdate-start_time)/(mbytes_processed/dbsize_mbytes),'DD-MON-YYYY HH24:MI:SS') est_complete from v$rman_status rs , (select sum(bytes)/1024/1024 dbsize_mbytes from v$datafile) where status like 'RUNNING%' and output_device_type is not null;
The results for a 5TB big database were astonishing. First the 12.2.0.1 lib version:
RECID OUTPUT_DEVICE_TYP DBSIZE_MBYTES INPUT_MBYTES OUTPUT_MBYTES COMPRESSION COMPLETE STARTED EST_COMPLETE ---------- ----------------- ------------- ------------ ------------- ----------- ---------- ----------------------------- ----------------------------- 8197 SBT_TAPE 4911385.78 162145.547 60951.75 37.5907641 3.30142152 28-AUG-2020 10:08:18 28-AUG-2020 12:37:14
(scroll right and look at STARTED and EST_COMPLETE). I aborted the backup after few moments as it was already at 3%.
Then I started again the same full database backup, but this time with the 19.x version of libopc.so. I had to wait multiple minutes to get to 0.7% to have at least some kind of representative estimation. And there it is:
RECID OUTPUT_DEVICE_TYP DBSIZE_MBYTES INPUT_MBYTES OUTPUT_MBYTES COMPRESSION COMPLETE STARTED EST_COMPLETE
---------- ----------------- ------------- ------------ ------------- ----------- ---------- ----------------------------- -----------------------------
8201 SBT_TAPE 4911385.78 35671.0469 13269.5 37.1996371 .726292913 28-AUG-2020 10:17:34 29-AUG-2020 07:17:23
So, there you have it. 2,5 hours versus nearly 21 hours. Both over 8 Channels, both within OCI, same bucket, same Database, same Exadata.
Ps. Both libraries seem to consume really a lot CPU, so consider this choosing the number of channels. 19.x version uses 100% CPU for each channel thread. The 12 Version showed around 70-90% CPU for each channel. So see, that you have enough free CPUs in your Exadata Node to run given number of channels.