18 Using Oracle GoldenGate Capture for Cassandra

Learn how to use Oracle GoldenGate capture (Extract) to get changes from Apache Cassandra databases.

Topics:

18.1 Overview

Apache Cassandra is a NoSQL Database Management System designed to store large amounts of data. A Cassandra cluster configuration provides horizontal scaling and replication of data across multiple machines. It can provide high availability and eliminate a single point of failure by replicating data to multiple nodes within a Cassandra cluster. Apache Cassandra is open source and designed to run on low-cost commodity hardware.

Cassandra relaxes the axioms of a traditional relational database management systems (RDBMS) regarding atomicity, consistency, isolation, and durability. When considering implementing Cassandra, it is important to understand its differences from a traditional RDBMS and how those differences affect your specific use case.

Cassandra provides eventual consistency. Under the eventual consistency model, accessing the state of data for a specific row eventually returns the latest state of the data for that row as defined by the most recent change. However, there may be a latency period between the creation and modification of the state of a row and what is returned when the state of that row is queried. The benefit of eventual consistency is that the latency period is predicted based on your Cassandra configuration and the level of work load that your Cassandra cluster is currently under, see http://cassandra.apache.org/.

Review the data type support, see About the Cassandra Data Types.

18.2 Setting Up Cassandra Change Data Capture

Prerequisites

  • Apache Cassandra cluster must have at least one node up and running.

  • Read and write access to CDC commit log files on every live node in the cluster is done through SFTP or NFS.

  • Every node in the Cassandra cluster must have the cdc_enabled parameter set to true in the cassandra.yaml configuration file.

  • Virtual nodes must be enabled on every Cassandra node by setting the num_tokens parameter in cassandra.yaml .

  • You must download and provide the third party libraries listed in Cassandra Capture Client Dependencies.

  • New tables can be created with Change Data Capture (CDC) enabled using the WITH CDC=true clause in the CREATE TABLE command. For example:

    CREATE TABLE ks_demo_rep1.mytable (col1 int, col2 text, col3 text, col4 text, PRIMARY KEY (col1)) WITH cdc=true;
    

    You can enable CDC on existing tables as follows:

    ALTER TABLE ks_demo_rep1.mytable WITH cdc=true;
    

18.2.1 Data Types

Supported Cassandra Data Types

The following are the supported data types:

  • ASCII

  • BIGINT

  • BLOB

  • BOOLEAN

  • DATE

  • DECIMAL

  • DOUBLE

  • DURATION

  • FLOAT

  • INET

  • INT

  • SMALLINT

  • TEXT

  • TIME

  • TIMESTAMP

  • TIMEUUID

  • TINYINT

  • UUID

  • VARCHAR

  • VARINT

Unsupported Data Types

The following are the unsupported data types:

  • COUNTER

  • MAP

  • SET

  • LIST

  • UDT (user defined type)

  • TUPLE

  • CUSTOM_TYPE

18.2.2 Cassandra Database Operations

Supported Operations

The following are the supported operations:

  • INSERT

  • UPDATE (Captured as INSERT)

  • DELETE

Unsupported Operations

The TRUNCATE DDL (CREATE, ALTER, and DROP) operation is not supported. Because the Cassandra commit log files do not record any before images for the UPDATE or DELETE operations. The result is that the captured operations can never have a before image. Oracle GoldenGate features that rely on before image records, such as Conflict Detection and Resolution, are not available.

18.3 Deduplication

One of the features of a Cassandra cluster is its high availability. To support high availability, multiple redundant copies of table data are stored on different nodes in the cluster. Oracle GoldenGate for Big Data Cassandra Capture automatically filters out duplicate rows (deduplicate). Deduplication is active by default. Oracle recommends using it if your data is captured and applied to targets where duplicate records are discouraged (for example RDBMS targets).

18.4 Topology Changes

Cassandra nodes can change their status (topology change) and the cluster can still be alive. Oracle GoldenGate for Big Data Cassandra Capture can detect the node status changes and react to these changes when applicable. The Cassandra capture process can detect the following events happening in the cluster:

  • Node shutdown and boot.

  • Node decommission and commission.

  • New keyspace and table created.

Due to topology changes, if the capture process detects that an active producer node goes down, it tries to recover any missing rows from an available replica node. During this process, there is a possibility of data duplication for some rows. This is a transient data duplication due to the topology change. For more details about reacting to changes in topology, see Troubleshooting.

18.5 Data Availability in the CDC Logs

The Cassandra CDC API can only read data from commit log files in the CDC directory. There is a latency for the data in the active commit log directory to be archived (moved) to the CDC commit log directory.

The input data source for the Cassandra capture process is the CDC commit log directory. There could be delays for the data to be captured mainly due to the commit log files not yet visible to the capture process.

On a production cluster with a lot of activity, this latency is very minimal as the data is archived from the active commit log directory to the CDC commit log directory in the order of microseconds.

18.6 Using Extract Initial Load

Cassandra Extract supports the standard initial load capability to extract source table data to Oracle GoldenGate trail files.

Initial load for Cassandra can be performed to synchronize tables, either as a prerequisite step to replicating changes or as a standalone function.

Direct loading from a source Cassandra table to any target table is not supported.

Configuring the Initial Load

You need to add these parameters to your GLOBALS parameter file:

OGGSOURCE CASSANDRA
CLUSTERCONTACTPOINTS nodeadresses

For example, to write to a single trail file:

SOURCEISTABLE
SOURCEDB keyspace1, USERID user1, PASSWORD pass1
EXTFILE ./dirdat/load_data.dat, PURGE
TABLE keyspace1.table1;

Then you would run this command in GGSCI:

EXTRACT PARAMFILE ./dirprm/load.prm REPORTFILE ./dirrpt/load.rpt

If you want to write to multiple files, you could use:

EXTRACT load
SOURCEISTABLE
SOURCEDB keyspace1, USERID user1, PASSWORD pass1
EXTFILE ./dirdat/la, megabytes 2048, MAXFILES 999
TABLE keyspace1.table1;

Note:

Save the file with the name specified in the example (load.prm) into the dirprm directory.

Then you would run these commands in GGSCI:

ADD EXTRACT load, SOURCEISTABLE 
START EXTRACT load 

18.7 Using Change Data Capture Extract

Review the example .prm files from Oracle GoldenGate for Big Datainstallation directory under $HOME/AdapterExamples/big-data/cassandracapture.

  1. When adding the Cassandra Extract trail, you need to use EXTTRAIL to create a local trail file.

    The Cassandra Extract trail file should not be configured with the RMTTRAIL option.

    ggsci> ADD EXTRACT groupname, TRANLOG
    ggsci> ADD EXTTRAIL trailprefix, EXTRACT groupname
    Example:
    ggsci> ADD EXTRACT cass, TRANLOG
    ggsci> ADD EXTTRAIL ./dirdat/z1, EXTRACT cass
    
  2. To configure the Extract, see the example .prm files in the Oracle GoldenGate for Big Data installation directory in $HOME/AdapterExamples/big-data/cassandracapture.

  3. Position the Extract.

    ggsci> ADD EXTRACT groupname, TRANLOG, BEGIN NOW
    ggsci> ADD EXTRACT groupname, TRANLOG, BEGIN ‘yyyy-mm-dd hh:mm:ss’
    ggsci> ALTER EXTRACT groupname, BEGIN ‘yyyy-mm-dd hh:mm:ss’
    
  4. Manage the transaction data logging for the tables.

    ggsci> DBLOGIN SOURCEDB nodeaddress USERID userid PASSWORD password
    ggsci> ADD TRANDATA keyspace.tablename
    ggsci> INFO TRANDATA keyspace.tablename
    ggsci> DELETE TRANDATA keyspace.tablename
    
    

    Examples:

    ggsci> DBLOGIN SOURCEDB 127.0.0.1
    ggsci> INFO TRANDATA ks_demo_rep1.mytable
    ggsci> INFO TRANDATA ks_demo_rep1.*
    ggsci> INFO TRANDATA *.*
    ggsci> INFO TRANDATA ks_demo_rep1.”CamelCaseTab”
    ggsci> ADD TRANDATA ks_demo_rep1.mytable
    ggsci> DELETE TRANDATA ks_demo_rep1.mytable
    
  5. Append the following line in the GLOBALS parameter file:

    JVMBOOTOPTIONS -Dlogback.configurationFile=AdapterExamples/big-data/cassandracapture/logback.xml
    
  6. Configure the Extract and GLOBALS parameter files:

    Apache Cassandra 3.11 SDK, compatible with Apache Cassandra 3.9, 3.10, 3.11

    Extract parameter file:

    EXTRACT groupname
    TRANLOGOPTIONS CDCREADERSDKVERSION 3.11
    TRANLOGOPTIONS CDCLOGDIRTEMPLATE /path/to/data/cdc_raw
    SOURCEDB nodeaddress
    VAM libggbigdata_vam.so
    EXTTRAIL trailprefix
    TABLE *.*;
    

    GLOBALS parameter file:

    OGGSOURCE CASSANDRA
    CLUSTERCONTACTPOINTS nodeadresses
    JVMCLASSPATH ggjava/ggjava.jar:/path/to/cassandra-driver-core/3.3.1/cassandra-driver-core-3.3.1.jar:dirprm:/path/to/apache-cassandra-3.11.0/lib/*:/path/to/gson/2.3/gson-2.3.jar:/path/to/jsch/0.1.54/jsch-0.1.54.jar:/path/to/commons-lang3/3.5/commons-lang3-3.5.jar
    

    Oracle recommends that you use the latest Cassandra 3.11 JAR files (TRANLOGOPTIONS CDCREADERSDKVERSION 3.11 and JVMCLASSPATH configuration) for all supported Cassandra database versions.

    Apache Cassandra 3.9 SDK

    Extract parameter file:

    EXTRACT groupname
    TRANLOGOPTIONS CDCREADERSDKVERSION 3.9
    TRANLOGOPTIONS CDCLOGDIRTEMPLATE /path/to/data/cdc_raw
    SOURCEDB nodeaddress
    VAM libggbigdata_vam.so
    EXTTRAIL trailprefix
    TABLE *.*;
    

    GLOBALS parameter file:

    OGGSOURCE CASSANDRA
    CLUSTERCONTACTPOINTS nodeadresses
    JVMCLASSPATH ggjava/ggjava.jar:/path/to/cassandra-driver-core/3.3.1/cassandra-driver-core-3.3.1.jar:dirprm:/path/to/apache-cassandra-3.9/lib/*:/path/to/gson/2.3/gson-2.3.jar:/path/to/jsch/0.1.54/jsch-0.1.54.jar:/path/to/commons-lang3/3.5/commons-lang3-3.5.jar
    

18.8 Replicating to RDMBS Targets

You must take additional care when replicating source UPDATE operations from Cassandra trail files to RDMBS targets. Any source UPDATE operation appears as an INSERT record in the Oracle GoldenGate trail file. Replicat may abend when a source UPDATE operation is applied as an INSERT operation on the target database.

You have these options:

  • OVERRIDEDUPS: If you expect that the source database is to contain mostly INSERT operations and very few UPDATE operations, then OVERRIDEDUPS is the recommended option. Replicat can recover from duplicate key errors while replicating the small number of the source UPDATE operations. See OVERRIDEDUPS \ NOOVERRIDEDUPS

  • No additional configuration is required if the target table can accept duplicate rows or you want to abend Replicat on duplicate rows.

If you configure Replicat to use BATCHSQL, there may be duplicate row or missing row errors in batch mode. Although there is a reduction in the Replicat throughput due to these errors, Replicat automatically recovers from these errors. If the source operations are mostly INSERTS, then BATCHSQL is a good option.

18.9 Partition Update or Insert of Static Columns

When the source Cassandra table has static columns, the static column values can be modified by skipping any clustering key columns that are in the table.

For example:

create table ks_demo_rep1.nls_staticcol
(
    teamname text,
    manager text static,
    location text static,
    membername text,
    nationality text,
    position text,
    PRIMARY KEY ((teamname), membername)
)
WITH cdc=true;
insert into ks_demo_rep1.nls_staticcol (teamname, manager, location) VALUES ('Red Bull', 'Christian Horner', '<unknown>

The insert CQL is missing the clustering key membername. Such an operation is a partition insert.

Similarly, you could also update a static column with just the partition keys in the WHERE clause of the CQL that is a partition update operation. Cassandra Extract cannot write a INSERT or UPDATE operation into the trail with missing key columns. It abends on detecting a partition INSERT or UPDATE operation.

18.10 Partition Delete

A Cassandra table may have a primary key composed on one or more partition key columns and clustering key columns. When a DELETE operation is performed on a Cassandra table by skipping the clustering key columns from the WHERE clause, it results in a partition delete operation.

For example:

create table ks_demo_rep1.table1
(
 col1 ascii, col2 bigint, col3 boolean, col4 int,
 PRIMARY KEY((col1, col2), col4)
) with cdc=true;

delete from ks_demo_rep1.table1 where col1 = 'asciival' and col2 = 9876543210; /** skipped clustering key column col4 **/

Cassandra Extract cannot write a DELETE operation into the trail with missing key columns and abends on detecting a partition DELETE operation.

18.11 Security and Authentication

  • Cassandra Extract can connect to a Cassandra cluster using username and password based authentication and SSL authentication.

  • Connection to Kerberos enabled Cassandra clusters is not supported in this release.

18.11.1 Configuring SSL

To enable SSL, add the SSL parameter to your GLOBALS file or Extract parameter file. Additionally, a separate configuration is required for the Java and CPP drivers, see CDC Configuration Reference.

SSL configuration for Java driver

JVMBOOTOPTIONS -
Djavax.net.ssl.trustStore=/path/to/SSL/truststore.file -
Djavax.net.ssl.trustStorePassword=password -
Djavax.net.ssl.keyStore=/path/to/SSL/keystore.file -
Djavax.net.ssl.keyStorePassword=password

The keystore and truststore certificates can be generated using these instructions:

https://docs.datastax.com/en/cassandra/2.1/cassandra/security/secureSSLCertificates_t.html

SSL configuration for Cassandra CPP driver

To operate with an SSL configuration, you have to add the following parameter in the Oracle GoldenGate GLOBALS file or Extract parameter file:

CPPDRIVEROPTIONS SSL PEMPUBLICKEYFILE /path/to/PEM/formatted/public/key/file/cassandra.pem CPPDRIVEROPTIONS SSL PEERCERTVERIFICATIONFLAG 0

This configuration is required to connect to a Cassandra cluster with SSL enabled. Additionally, you need to add these settings to your cassandra.yaml file:

client_encryption_options:
    enabled: true
    # If enabled and optional is set to true encrypted and unencrypted connections are handled.
    optional: false
    keystore: /path/to/keystore
    keystore_password: password
    require_client_auth: false

The PEM formatted certificates can be generated using these instructions:

https://docs.datastax.com/en/developer/cpp-driver/2.8/topics/security/ssl/

18.12 Multiple Extract Support

Multiple Extract groups in a single Oracle GoldenGate for Big Data installation can be configured to connect to the same Cassandra cluster.

To run multiple Extract groups:

  1. One (and only one) Extract group can be configured to move the commit log files in the cdc_raw directory on the Cassandra nodes to a staging directory. The movecommitlogstostagingdir parameter is enabled by default and no additional configuration is required for this Extract group.

  2. All the other Extract groups should be configured with the nomovecommitlogstostagingdir parameter in the Extract parameter (.prm) file.

18.13 CDC Configuration Reference

The following properties are used with Cassandra change data capture.

Properties Required/Optional Location Default Explanation

DBOPTIONS ENABLECASSANDRACPPDRIVERTRACE true

Optional

Extract parameter (.prm) file.

false

Use only during initial load process.

When set to true, the Cassandra driver logs all the API calls to a driver.log file. This file is created in the Oracle GoldenGate for Big Data installation directory. This is useful for debugging.

DBOPTIONS FETCHBATCHSIZE number

Optional

Extract parameter (.prm) file.

1000

Minimum is 1

Maximum is 100000

Use only during initial load process.

Specifies the number of rows of data the driver attempts to fetch on each request submitted to the database server.

The parameter value should be lower than the database configuration parameter, tombstone_warn_threshold, in the database configuration file, cassandra.yaml. Otherwise the initial load process might fail.

Oracle recommends that you set this parameter value to 5000 for initial load Extract optimum performance.

TRANLOGOPTIONS CDCLOGDIRTEMPLATE path

Required

Extract parameter (.prm) file.

None

The CDC commit log directory path template. The template can optionally have the $nodeAddress meta field that is resolved to the respective node address.

TRANLOGOPTIONS SFTP options

Optional

Extract parameter (.prm) file.

None

The secure file transfer protocol (SFTP) connection details to pull and transfer the commit log files. You can use one or more of these options:

USER user

The SFTP user name.

PASSWORD password

The SFTP password.

KNOWNHOSTSFILE file

The location of the Secure Shell (SSH)known hosts file.

LANDINGDIR dir

The SFTP landing directory for the commit log files on the local machine.

PRIVATEKEY file

The SSH private key file.

PASSPHRASE password

The SSH private key pass phrase.

PORTNUMBER portnumber

The SSH port number.

CLUSTERCONTACTPOINTS nodes USER dbuser PASSWORD dbpassword

Optional

GLOBALS parameter file

127.0.0.1

A comma separated list of nodes to be used for a connection to the Cassandra cluster. You should provide at least one node address. The parameter options are:

USER dbuser

No default

Optional

The user name to use when connecting to the database.

PASSWORD dbpassword

No default

Required when USER is used.

The user password to use when connecting to the database.

PORT <port number

No default

Optional

The port to use when connecting to the database.

TRANLOGOPTIONS CDCREADERSDKVERSION version

Optional

Extract parameter (.prm) file.

3.11

The SDK Version for the CDC reader capture API.

ABENDONMISSEDRECORD | NOABENDONMISSEDRECORD

Optional

Extract parameter (.prm) file.

true

When set to true and the possibility of a missing record is found, the process stops with the diagnostic information. This is generally detected when a node goes down and the CDC reader doesn't find a replica node with a matching last record from the dead node. You can set this parameter to false to continue processing. A warning message is logged about the scenario.

TRANLOGOPTIONS CLEANUPCDCCOMMITLOGS

Optional

Extract parameter (.prm) file.

false

Purge CDC commit log files post extract processing. When the value is set to false, the CDC commit log files are moved to the cdc_raw_processed directory.

JVMBOOTOPTIONS jvm_options

Optional

GLOBALS parameter file

None

The boot options for the Java Virtual Machine. Multiple options are delimited by a space character.

JVMCLASSPATH classpath

Required

GLOBALS parameter file

None

The classpath for the Java Virtual Machine. You can include an asterisk (*) wildcard to match all JAR files in any directory. Multiple paths should be delimited with a colon (:) character.

OGGSOURCE source

Required

 

None

The source database for CDC capture or database queries. The valid value is CASSANDRA.

SOURCEDB nodeaddress USERID dbuser PASSWORD dbpassword

Required

Extract parameter (.prm) file.

None

A single Cassandra node address that is used for a connection to the Cassandra cluster and to query the metadata for the captured tables.

USER dbuser

No default

Optional

The user name to use when connecting to the database.

PASSWORD dbpassword

No default

Required when USER is used.

The user password to use when connecting to the database.

ABENDONUPDATERECORDWITHMISSINGKEYS | NOABENDONUPDATERECORDWITHMISSINGKEYS

Optional

Extract parameter (.prm) file.

true

If this value is true, anytime an UPDATE operation record with missing key columns is found, the process stops with the diagnostic information. You can set this property to false to continue processing and write this record to the trail file. A warning message is logged about the scenario. This operation is a partition update, see Partition Update or Insert of Static Columns.

ABENDONDELETERECORDWITHMISSINGKEYS | NOABENDONDELETERECORDWITHMISSINGKEYS

Optional

Extract parameter (.prm) file.

true

If this value is true, anytime an DELETE operation record with missing key columns is found, the process stops with the diagnostic information. You can set this property to false to continue processing and write this record to the trail file. A warning message is logged about the scenario. This operation is a partition update, see Partition Delete.

MOVECOMMITLOGSTOSTAGINGDIR | NOMOVECOMMITLOGSTOSTAGINGDIR

Optional

Extract parameter (.prm) file.

true

Enabled by default and this instructs the Extract group to move the commit log files in the cdc_raw directory on the Cassandra nodes to a staging directory for the commit log files. Only one Extract group can have movecommitlogstostagingdir enabled, and all the other Extract groups disable this by specifying nomovecommitlogstostagingdir.

SSL

Optional

GLOBALS or Extract parameter (.prm) file.

false

Use for basic SSL support during connection. Additional JSSE configuration through Java System properties is expected when enabling this.

Note:

The following SSL properties are in CPPDRIVEROPTIONS SSL so this keyword must be added to any other SSL property to work.

CPPDRIVEROPTIONS SSL PEMPUBLICKEYFILE cassadra.pem

Optional

GLOBALS or Extract parameter (.prm) file.

String that indicates the absolute path with fully qualified name. This file is must for the SSL connection.

None, unless the PEMPUBLICKEYFILE property is specified, then you must specify a value.

Indicates that it is PEM formatted public key file used to verify the peer's certificate. This property is needed for one-way handshake or basic SSL connection.

CPPDRIVEROPTIONS SSL ENABLECLIENTAUTH | DISABLECLIENTAUTH

Optional

GLOBALS or Extract parameter (.prm) file.

false

Enabled indicates a two-way SSL encryption between client and server. It is required to authenticate both the client and the server through PEM formatted certificates. This property also needs the pemclientpublickeyfile and pemclientprivatekeyfile properties to be set. The pemclientprivatekeypasswd property must be configured if the client private key is password protected. Setting this property to false disables client authentication for two-way handshake.

CPPDRIVEROPTIONS SSL PEMCLIENTPUBLICKEYFILE public.pem

Optional

GLOBALS or Extract parameter (.prm) file.

String that indicates the absolute path with fully qualified name. This file is must for the SSL connection.

None, unless the PEMCLIENTPUBLICKEYFILE property is specified, then you must specify a value.

Use for a PEM formatted public key file name used to verify the client's certificate. This is must if you are using CPPDRIVEROPTIONS SSL ENABLECLIENTAUTH or for two-way handshake.

CPPDRIVEROPTIONS SSL PEMCLIENTPRIVATEKEYFILE public.pem

Optional

GLOBALS or Extract parameter (.prm) file.

String that indicates the absolute path with fully qualified name. This file is must for the SSL connection.

None, unless the PEMCLIENTPRIVATEKEYFILE property is specified, then you must specify a value.

Use for a PEM formatted private key file name used to verify the client's certificate. This is must if you are using CPPDRIVEROPTIONS SSL ENABLECLIENTAUTH or for two-way handshake.

CPPDRIVEROPTIONS SSL PEMCLIENTPRIVATEKEYPASSWD privateKeyPasswd

Optional

GLOBALS or Extract parameter (.prm) file.

A string

None, unless the PEMCLIENTPRIVATEKEYPASSWD property is specified, then you must specify a value.

Sets the password for the PEM formatted private key file used to verify the client's certificate. This is must if the private key file is protected with the password.

CPPDRIVEROPTIONS SSL PEERCERTVERIFICATIONFLAG value

Optional

GLOBALS or Extract parameter (.prm) file.

An integer

0

Sets the verification required on the peer's certificate. The range is 0–4:

0–Disable certificate identity verification.

1–Verify the peer certificate

2–Verify the peer identity

3– Not used so it is similar to disable certificate identity verification.

4 –Verify the peer identity by its domain name

CPPDRIVEROPTIONS SSL ENABLEREVERSEDNS

Optional

GLOBALS or Extract parameter (.prm) file.

false

Enables retrieving host name for IP addresses using reverse IP lookup.

18.14 Troubleshooting

No data captured by the Cassandra Extract process.

  • The Cassandra database has not flushed the data from the active commit log files to the CDC commit log files. The flush is dependent on the load of the Cassandra cluster.

  • The Cassandra Extract captures data from the CDC commit log files only.

  • Check the CDC property of the source table. The CDC property of the source table should be set to true.

  • Data is not captured if the TRANLOGOPTIONS CDCREADERSDKVERSION 3.9 parameter is in use and the JVMCLASSPATH is configured to point to Cassandra 3.10 or 3.11 JAR files.

Error: OGG-01115 Function getInstance not implemented.

  • The following line is missing from the GLOBALS file.

    OGGSOURCE CASSANDRA

  • The GLOBALS file is missing from the Oracle GoldenGate directory.

Error: Unable to connect to Cassandra cluster, Exception: com.datastax.driver.core.exceptions.NoHostAvailableException

This indicates that the connection to the Cassandra cluster was unsuccessful.

Check the following parameters:

CLUSTERCONTACTPOINTS

Error: Exception in thread "main" java.lang.NoClassDefFoundError: oracle/goldengate/capture/cassandra/CassandraCDCProcessManager

Check the JVMCLASSPATH parameter in the GLOBALS file.

Error: oracle.goldengate.util.Util - Unable to invoke method while constructing object. Unable to create object of class "oracle.goldengate.capture.cassandracapture311.SchemaLoader3DOT11" Caused by: java.lang.NoSuchMethodError: org.apache.cassandra.config.DatabaseDescriptor.clientInitialization()V

There is a mismatch in the Cassandra SDK version configuration. The TRANLOGOPTIONS CDCREADERSDKVERSION 3.11 parameter is in use and the JVMCLASSPATH may have the Cassandra 3.9 JAR file path.

Error: OGG-25171 Trail file '/path/to/trail/gg' is remote. Only local trail allowed for this extract.

A Cassandra Extract should only be configured to write to local trail files. When adding trail files for Cassandra Extract, use the EXTTRAIL option. For example:

ADD EXTTRAIL ./dirdat/z1, EXTRACT cass

Errors: OGG-868 error message or OGG-4510 error message

The cause could be any of the following:

  • Unknown user or invalid password

  • Unknown node address

  • Insufficient memory

Another cause could be that the connection to the Cassandra database is broken. The error message indicates the database error that has occurred.

Error: OGG-251712 Keyspace keyspacename does not exist in the database.

The issue could be due to these conditions:

  • During the Extract initial load process, you may have deleted the KEYSPACE keyspacename from the Cassandra database.

  • The KEYSPACE keyspacename does not exist in the Cassandra database.

Error: OGG-25175 Unexpected error while fetching row.

This can occur if the connection to the Cassandra database is broken during initial load process.

Error: “Server-side warning: Read 915936 live rows and 12823104 tombstone cells for query SELECT * FROM keyspace.table(see tombstone_warn_threshold)”.

When the value of the initial load DBOPTIONS FETCHBATCHSIZE parameter is greater than the Cassandra database configuration parameter,tombstone_warn_threshold, this is likely to occur.

Increase the value of tombstone_warn_threshold or reduce the DBOPTIONS FETCHBATCHSIZE value to get around this issue.

Duplicate records in the Cassandra Extract trail.

Internal tests on a multi-node Cassandra cluster have revealed that there is a possibility of duplicate records in the Cassandra CDC commit log files. The duplication in the Cassandra commit log files is more common when there is heavy write parallelism, write errors on nodes, and multiple retry attempts on the Cassandra nodes. In these cases, it is expected that Cassandra trail file will have duplicate records.

JSchException or SftpException in the Extract Report File

Verify that the SFTP credentials (user, password, and privatekey) are correct. Check that the SFTP user has read and write permissions for the cdc_raw directory on each of the nodes in the Cassandra cluster.

ERROR o.g.c.c.CassandraCDCProcessManager - Exception during creation of CDC staging directory [{}]java.nio.file.AccessDeniedException

The Extract process does not have permission to create CDC commit log staging directory. For example, if the cdc_raw commit log directory is /path/to/cassandra/home/data/cdc_raw, then the staging directory would be /path/to/cassandra/home/data/cdc_raw/../cdc_raw_staged.

Extract report file shows a lot of DEBUG log statements

On production system, you do not need to enable debug logging. To use INFO level logging, make sure that the GLOBALS file includes this parameter:

JVMBOOTOPTIONS -Dlogback.configurationFile=AdapterExamples/big-data/cassandracapture/logback.xml

To enable SSL in Oracle Golden Gate Cassandra Extract you have to enable SSL in the GLOBALS file or in the Extract Parameter file.

If SSL Keyword is missing, then Extract assumes that you wanted to connect without SSL. So if the Cassandra.yaml file has an SSL configuration entry, then the connection fails.

SSL is enabled and it is one-way handshake

You must specify the CPPDRIVEROPTIONS SSL PEMPUBLICKEYFILE /scratch/testcassandra/testssl/ssl/cassandra.pem property.

If this property is missing, then Extract generates this error:.


2018-06-09 01:55:37 ERROR OGG-25180 The PEM formatted public key file used to verify the peer's certificate is missing. If SSL is enabled, then it is must to set PEMPUBLICKEYFILE in your Oracle GoldenGate GLOBALS file or in Extract parameter file

SSL is enabled and it is two-way handshake

You must specify these properties for SSL two-way handshake:

CPPDRIVEROPTIONS SSL ENABLECLIENTAUTH 
CPPDRIVEROPTIONS SSL PEMCLIENTPUBLICKEYFILE /scratch/testcassandra/testssl/ssl/datastax-cppdriver.pem
CPPDRIVEROPTIONS SSL PEMCLIENTPRIVATEKEYFILE /scratch/testcassandra/testssl/ssl/datastax-cppdriver-private.pem
CPPDRIVEROPTIONS SSL PEMCLIENTPRIVATEKEYPASSWD cassandra

Additionally, consider the following:

  • If ENABLECLIENTAUTH is missing then Extract assumes that it is one-way handshake so it ignores PEMCLIENTPRIVATEKEYFILE and PEMCLIENTPRIVATEKEYFILE. The following error occurs because the cassandra.yaml file should have require_client_auth set to true.

    2018-06-09 02:00:35  ERROR   OGG-00868  No hosts available for the control connection.
    
  • If ENABLECLIENTAUTH is used and PEMCLIENTPRIVATEKEYFILE is missing, then this error occurs:

    2018-06-09 02:04:46  ERROR   OGG-25178  The PEM formatted private key file used to verify the client's certificate is missing. For two way handshake or if ENABLECLIENTAUTH is set, then it is mandatory to set PEMCLIENTPRIVATEKEYFILE in your Oracle GoldenGate GLOBALS file or in Extract parameter file.
    
  • If ENABLECLIENTAUTH is use and PEMCLIENTPUBLICKEYFILE is missing, then this error occurs:

    2018-06-09 02:06:20  ERROR   OGG-25179  The PEM formatted public key file used to verify the client's certificate is missing. For two way handshake or if ENABLECLIENTAUTH is set, then it is mandatory to set PEMCLIENTPUBLICKEYFILE in your Oracle GoldenGate GLOBALS file or in Extract parameter file.
    
  • If the password is set while generating the client private key file then you must add PEMCLIENTPRIVATEKEYPASSWD to avoid this error:

    2018-06-09 02:09:48  ERROR   OGG-25177  The SSL certificate: /scratch/jitiwari/testcassandra/testssl/ssl/datastax-cppdriver-private.pem can not be loaded. Unable to load private key.
    
  • If any of the PEM file is missing from the specified absolute path, then this error occurs:

    2018-06-09 02:12:39  ERROR   OGG-25176  Can not open the SSL certificate: /scratch/jitiwari/testcassandra/testssl/ssl/cassandra.pem.
    

General SSL Errors

Consider these general errors:

  • The SSL connection may fail if you have enabled all SSL required parameters in Extract or GLOBALS file and the SSL is not configured in the cassandra.yaml file.

  • The absolute path or the qualified name of the PEM file may not correct. There could be access issue on the PEM file stored location.

  • The password added during generating the client private key file may not be correct or you may not have enabled it in the Extract parameter or GLOBALS file.