Catastrophic slow backup/restore to Oracle Cloud Infrastructure Object Storage and its fix

I used Oracle Zero Downtime Migration Tool to move some databases from OnPrem Exadata to a freshly instantiated OCI Exadata Cloud Service (Quarter Rack to be more precise). I found out then, that the speed and performance of backup and restore to OCI Object Store was more than catastrophical. Instead of advertised ~2,5h for a 5 TB big database over 8 channels, the restore took… more than 20 hours!

The backup was not as fast due to connection speed limit, but I assumed that at least within OCI, the restore should run as advertised in Oracle Cloud Infrastructure Exadata Backup & Restore Best Practices using Cloud Object Storage. I was really astonished when I saw that.

I opened an SR of course, but after more than 20 days of ping-pong with Oracle eventually found the reason myself.

You see – ZDM comes with the in version This is exactly the same version that is included in every ORACLE_HOME in lib subdirectory. You can easily get the version string with following command:

[zdm@zdm lib]$ strings | grep -Po 'DNZ_REL_VER=".*?"' | head -1

On the ExaCS I found, that even for 19.7 database home, the that is used for automatic backup and bkup_api is actually an older version, the, and it is located under /var/opt/oracle/dbaas_acfs/<SID>/opc/

[oracle@exa1-node1 ~]$ strings /var/opt/oracle/dbaas_acfs/<SID>/opc/ | grep -Po 'DNZ_REL_VER=".*?"' | head -1

I tested both with RMAN running a backup. For that I used a slightly modified SQL (thanks to Mariami Kupatadze) Script:

select recid
 , output_device_type
 , dbsize_mbytes
 , input_bytes/1024/1024 input_mbytes
 , output_bytes/1024/1024 output_mbytes
 , (output_bytes/input_bytes*100) compression
 , (mbytes_processed/dbsize_mbytes*100) complete
,  to_char(start_time ,'DD-MON-YYYY HH24:MI:SS') started
 , to_char(start_time + (sysdate-start_time)/(mbytes_processed/dbsize_mbytes),'DD-MON-YYYY HH24:MI:SS') est_complete
 from v$rman_status rs
 , (select sum(bytes)/1024/1024 dbsize_mbytes from v$datafile)
 where status like 'RUNNING%'
 and output_device_type is not null;

The results for a 5TB big database were astonishing. First the lib version:

---------- ----------------- ------------- ------------ ------------- ----------- ---------- ----------------------------- -----------------------------
     8197       SBT_TAPE             4911385.78   162145.547      60951.75  37.5907641 3.30142152 28-AUG-2020 10:08:18          28-AUG-2020 12:37:14

(scroll right and look at STARTED and EST_COMPLETE). I aborted the backup after few moments as it was already at 3%.

Then I started again the same full database backup, but this time with the 19.x version of I had to wait multiple minutes to get to 0.7% to have at least some kind of representative estimation. And there it is:

---------- ----------------- ------------- ------------ ------------- ----------- ---------- ----------------------------- -----------------------------
      8201 SBT_TAPE             4911385.78   35671.0469       13269.5  37.1996371 .726292913 28-AUG-2020 10:17:34          29-AUG-2020 07:17:23

So, there you have it. 2,5 hours versus nearly 21 hours. Both over 8 Channels, both within OCI, same bucket, same Database, same Exadata.

Ps. Both libraries seem to consume really a lot CPU, so consider this choosing the number of channels. 19.x version uses 100% CPU for each channel thread. The 12 Version showed around 70-90% CPU for each channel. So see, that you have enough free CPUs in your Exadata Node to run given number of channels.

Kommentar verfassen

Diese Website verwendet Akismet, um Spam zu reduzieren. Erfahre mehr darüber, wie deine Kommentardaten verarbeitet werden.

%d Bloggern gefällt das: