Thursday, 21 May 2015

Use Delphix to virtualise Oracle database binaries for easier maintenance and improved server utilisation

Delphix is well known for its ability to virtualise and provision databases.  However Delphix has for some while also been able to virtualise folders and filesystems.

One of the most common use cases for Delphix file virtualisation (vFiles) is to virtualise application binaries.  Delphix customers have found that the Oracle database binaries stored under $ORACLE_HOME are also a great candidate for virtualising for several reasons.  We call this Virtual Oracle Home (VOH).


Maintaining and Distributing Patched Oracle Database Binaries


Typically this process involves:

  1. Patch the Oracle Home with an Oracle issued PSU
  2. Validate that the Patch works
  3. Patch all of the affected DB Target Servers
Typical Oracle Database Patching Process



With Delphix vFiles this becomes a much more efficient two step process.

Step 1:  Virtualise and synchronise your patched Oracle Home directory.



  1. Establish Patch Master and link to Delphix
  2. Install patch on DB Patch Master
  3. Take Delphix Snapshot to ingest patched binaries
  4. Provision/Refresh Oracle Home to DEV/TEST DB server
  5. Use Refresh, Rewind, Reset data control features tovtest patch, remedy errors and validate patch.
        If patch fails, you can “Rewind” the binaries to previous patch set



Step 2:  Distribute either virtual or physical copies

Now you can distribute your patched ORACLE_HOME, either as a virtual copy using NFS mounts or as a physical copy using Virtual to Physical (V2P) re-hydration of the patched ORACLE_HOME folder hierarchy on the target servers.







Other major use cases for Virtual Oracle Home (VOH)



DevOps - provision entire versioned stack

Archive - ensure exact database version is archived along with the database

Server Utilisation - make better use of your dev/test server pool by provisioning pre-requisite Oracle binaries on demand

Testing-as-a-Service - fully automated Regression Testing by provisioning all the components for your tests on any compatible server and O/S




How do I Implement Virtual Oracle Home (VOH)


Please see my upcoming blog post on implementing VOH using Delphix vFiles and hook scripts.




Learn more at Delphix Agile Data Managementwww.delphix.com

Monday, 18 May 2015

Accelerate ETL by Leveraging Delphix Virtual Databases

Many downstream database and analytics systems need to be routinely refreshed with processed versions of the latest changes from production.   Typically the new data needs go through an ETL process to prepare it for downstream use in an Analytics Engine, Warehouse, Data Mart, Reader Farm, etc.

There are several major benefits Delphix offers in this scenario.

Firstly Delphix maintains its own up-to-date copy of source data by seamlessly synchronising with the source databases and retaining this in a secure central data hub.

Second, all ETL processes can be run directly from virtual databases (VDBs) provisioned by Delphix, completely offloading production and without the need for physical copies.  You can create as many VDBs as you want to run these processes in parallel.

Third, if you wish to keep pace with new transactions you can use Delphix Live Sources, which uses an Oracle Standby to keep the Delphix dSource copy continuously updated but also enables reporting using Oracle Active Dataguard.  If synchronising with a target Hadoop cluster you can run Apache Sqoop jobs against a virtual data warehouse that is constantly being updated from the physical data warehouse.  You can run your ETL directly on the changed data stored in the virtual data warehouse and import only the processed changes into your Hadoop cluster.

In addition it is also possible to use Delphix to help identify deltas between two versions of a source database by creating two VDBs based on two point-in-time snapshots.

If the two VDBs (snap1 and snap2) are based on an Oracle source database, first you simply create a database link connecting to snap1 from snap2.

On snap2 you create tables to store:

1.     INSERTS, rows that have been added since the old version (snap1)
2.     UPDATES, rows that have been modified between the two versions
3.     DELETES, rows that no longer exist in the new version (snap2)

You can identify INSERTS, UPDATES and DELETES between the two versions using the MINUS set operator between the corresponding tables in the two VDBs.

For example to find and populate my local table “delta_inserts” with new rows, I use something like:
           
/* create empty table with correct structure */

create table delta_inserts as select * from table where 1=0;
alter table delta_inserts nologging;

/* populate table with new rows only */

insert /*+ append */
  into delta_inserts
  with new_rows as
  (
    select id from table@snap2
     minus
    select id from table@snap1
)
/* fetch entire row from latest version */
select latest.*
  from table@snap2 latest,
       new_rows delta
 where latest.c_id = delta.c_id;

The SQL is similar for UPDATES and DELETES.  However for UPDATES we need to compare the entire row in the minus operation since any column value might have changed.  For DELETES we only need to store the key or id we will use to identify the rows to delete during the apply process.

Now we have two tables containing copies of changed rows and one with a list of rows to be deleted.

We can now run our ETL processing on only the changed rows held in delta_inserts and delta_updates making this process substantially more efficient than running against the full dataset.

The final phase is to implement an apply process to merge the processed deltas with the target database.  The options for applying the changes will depend on the nature of the changes and the type of the target database which might be relational, Hadoop (HDFS), etc.

For Hadoop you can use Apache Sqoop to perform incremental imports and merges.  For relational databases there are more options.

The apply process would usually

1.     delete all the rows in the target table having the keys saved in delta_deletes
2.     insert or append all the rows to the target table having processed rows saved in delta_inserts
3.     merge or update all the rows in the target table having processed rows saved in delta_updates

If there are foreign key dependencies then you might need to disable and re-enable them before and after the apply process.

In summary, Delphix is ideal for provisioning data sets for ETL processing.  Delphix Virtual Databases (VDBs) can also underpin the process to identify changes between two versions of the same database. ETL jobs can then operate against only the changed rows and finally once processed these changes can be used to refresh the target database using a simple apply process.




Learn more at Delphix Agile Data Management