Learn how to use the Google BigQuery Handler, which streams change data capture data from source trail files into Google BigQuery.
BigQuery is a RESTful web service that enables interactive analysis of massively large datasets working in conjunction with Google Storage, see https://cloud.google.com/bigquery/.
Topics:
Topics:
The BigQuery Handler supports the standard SQL data types and most of these data types are supported by the BigQuery Handler. A data type conversion from the column value in the trail file to the corresponding Java type representing the BigQuery column type in the BigQuery Handler is required.
The following data types are supported:
STRING BYTES INTEGER FLOAT NUMERIC BOOLEAN TIMESTAMP DATE TIME DATETIME
The BigQuery Handler does not support complex data types, such as ARRAY
and STRUCT
.
Parent topic: Detailing the Functionality
You can configure the BigQuery Handler in one of these two modes:
auditLogMode = true
When the handler is configured to run with Audit log mode, the data is pushed into Google BigQuery without a unique id and primary key. As a result, Google BigQuery is not able to merge different operations on the same row.
Also, the order in which the audit log is displayed in the BigQuery data set is not deterministic.
To overcome these limitations, you need to specify optype
and position
in the meta columns template for the handler. This adds two columns of the same names in the schema for the table in Google BigQuery. For example:
gg.handler.bigquery.metaColumnsTemplate = ${optype}, ${position}
The optype
is important to determine the operation type for the row in the audit log.
To view the audit log in order of the operations processed in the trail file, specify position which can be used in the ORDER BY
clause while querying the table in Google BigQuery. For example:
SELECT * FROM [projectId:datasetId.tableId] ORDER BY position
auditLogMode = false
This causes the handler to write data into Google BigQuery specifying a unique id and primary key for each row. As a result, Google BigQuery is able to merge different operations on the same row.
The trail source needs to have a full image of the records in order to merge correctly.
Google BigQuery processes every operation as an insert
for each row. As a result, there is a deleted
column added to the schema for the table in this mode of operation. When the handler encounters a delete operation on a row, it inserts the row into Google BigQuery and sets the deleted column to true
.
To view data in the BigQuery table like it would ideally be seen in a RDBMS, specify a WHERE deleted = false
clause while querying the table in Google BigQuery.
Parent topic: Detailing the Functionality
The BigQuery Handler pushes operations to Google BigQuery using synchronous API. Insert, update, and delete operations are processed differently in BigQuery than in a traditional RDBMS.
The following explains how insert, update, and delete operations are interpreted by the handler depending on the mode of operation:
auditLogMode = true
insert
– Inserts the record with optype
as an insert operation in the BigQuery table.
update
–Inserts the record with optype
as an update operation in the BigQuery table.
delete
– Inserts the record with optype
as a delete operation in the BigQuery table.
pkUpdate
—When pkUpdateHandling
property is configured as delete-insert
, the handler sends out a delete operation followed by an insert operation. Both these rows have the same position in the BigQuery table, which helps to identify it as a primary key operation and not a separate delete and insert operation.
auditLogMode = false
insert
– If the row does not already exist in Google BigQuery, then an insert operation is processed as an insert
. If the row already exists in Google BigQuery, then an insert operation is processed as an update
. The handler sets the deleted column to false
.
update
–If a row does not exist in Google BigQuery, then an update operation is processed as an insert
. If the row already exists in Google BigQuery, then an update operation is processed as update
. The handler sets the deleted column to false
.
delete
– If the row does not exist in Google BigQuery, then a delete operation has no effect. If the row exists in Google BigQuery, then a delete operation is processed as a delete
. The handler sets the deleted column to true
.
pkUpdate
—When pkUpdateHandling
property is configured as delete-insert
, the handler sets the deleted column to true
for the row whose primary key is updated. It is followed by a separate insert operation with the new primary key and the deleted column set to false
for this row.
Do not toggle the audit log mode because it forces the BigQuery handler to abend as Google BigQuery cannot alter schema of an existing table. The existing table needs to be deleted before switching audit log modes.
Note:
The BigQuery Handler does not support the truncate
operation. It abends when it encounters a truncate
operation.
Parent topic: Detailing the Functionality
The BigQuery Client library does not ship with Oracle GoldenGate for Big Data. You must download the latest version of the Java Client library for BigQuery at:
https://developers.google.com/api-client-library/java/apis/bigquery/v2
You must configure the gg.classpath
configuration property in the Java Adapter properties file to specify the JARs for the Java Client Library for BigQuery. The path to the dependency JARs must include the asterisk (*
) wildcard character to include all of the JAR files in that directory in the associated classpath. Donot use *.jar
. This is an example of the correctly configured classpath:
gg.classpath= /path_to_repository/bigquery/libs/*:/path_to_repository/bigquery/*
Next, download the following JARs from Maven Central, and then include them in the classpath for the BigQuery Handler:
• api-common-1.6.0.jar • gax-1.28.0.jar • gax-httpjson-0.45.0.jar • google-auth-library-credentials-0.9.1.jar • google-auth-library-oauth2-http-0.9.1.jar • google-cloud-bigquery-1.31.0.jar • google-cloud-core-1.35.0.jar • google-cloud-core-http-1.35.0.jar • google-http-client-jackson-1.23.0.jar • guava-25.1-jre.jar • threetenbp-1.3.6.jar
Parent topic: Using the BigQuery Handler
The following are the configurable values for the BigQuery Handler. These properties are located in the Java Adapter properties file (not in the Replicat properties file).
To enable the selection of the BigQuery Handler, you must first configure the handler type by specifying gg.handler.jdbc.type=bigquery
and the other BigQuery properties as follows:
Table 2-1 BigQuery Handler Configuration Properties
Properties | Required/ Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
|
Required |
Any string |
None |
Provides a name for the BigQuery Handler. The BigQuery Handler name then becomes part of the property names listed in this table. |
|
Required |
|
None |
Selects the BigQuery Handler for streaming change data capture into Google BigQuery. |
|
Optional |
Relative or absolute path to the credentials file |
None |
The credentials file downloaded from Google BigQuery for authentication. If you do not specify the path to the credentials file, you need to set it as an environment variable, see Configuring Handler Authentication. |
|
Required |
Any string |
None |
The name of the project in Google BigQuery. The handler needs project ID to connect to Google BigQuery store. |
|
Optional |
Any string |
|
The name of the data set the tables are stored in. If not specified, the handler creates a new data set named |
|
Optional |
Any number |
|
The maximum number of operations to be batched together. This is applicable for all target table batches. |
|
Optional |
Any number |
|
The maximum amount of time in milliseconds to wait before executing the next batch of operations. This is applicable for all target table batches. |
|
Optional |
|
|
Sets whether to insert all valid rows of a request, even if invalid rows exist. If not set, the entire insert request fails if it contains an invalid row. |
gg.handler. name. ignoreUnknownValues |
Optional |
|
|
Sets whether to accept rows that contain values that do not match the schema. If not set, rows with unknown values are considered to be invalid. |
gg.handler. name. connectionTimeout |
Optional |
Positive integer |
|
The maximum amount of time, in milliseconds, to wait for the handler to establish a connection with Google BigQuery. |
gg.handler. name. readTimeout |
Optional |
Positive integer |
|
The maximum amount of time in milliseconds to wait for the handler to read data from an established connection. |
|
Optional |
A legal string |
None |
A legal string specifying the |
gg.handler. name. auditLogMode |
Optional |
|
|
Set to Set to |
gg.handler. name. pkUpdateHandling |
Optional |
|
|
Sets how the handler handles update operations that change a primary key. Primary key operations can be problematic for the BigQuery Handler and require special consideration:
|
Parent topic: Setting Up and Running the BigQuery Handler
The following is a sample configuration for the BigQuery Handler from the Java Adapter properties file:
gg.handlerlist = bigquery #The handler properties gg.handler.bigquery.type = bigquery gg.handler.bigquery.projectId = festive-athlete-201315 gg.handler.bigquery.datasetId = oggbd gg.handler.bigquery.credentialsFile = credentials.json gg.handler.bigquery.auditLogMode = true gg.handler.bigquery.pkUpdateHandling = delete-insert gg.handler.bigquery.metaColumnsTemplate =${optype}, ${position}
Parent topic: Setting Up and Running the BigQuery Handler
To connect to BigQuery using a proxy server, you must configure the proxy host and the proxy port in the properties file as follows:
javawriter.bootoptions= -Dhttps.proxyHost=proxy_host_name -Dhttps.proxyPort=proxy_port_number
Parent topic: Setting Up and Running the BigQuery Handler
You have to configure the BigQuery Handler authentication using the credentials in the JSON file downloaded from Google BigQuery.
Download the credentials file:
Login into your Google account at cloud.google.com.
Click Console, and then to go to the Dashboard where you can select your project.
From the navigation menu, click APIs & Services then select Credentials.
From the Create Credentials menu, choose Service account key.
Choose the JSON key type to download the JSON credentials file for your system.
Once you have the credentials file, you can authenticate the handler in one of these two ways:
Specify the path to the credentials file in the properties file with the gg.handler.name.credentialsFile
configuration property.
The path of the credentials file must contain the path with no wildcard appended. If you include the *
wildcard in the path to the credentials file, the file is not recognized.
Or
Set the GOOGLE_APPLICATION_CREDENTIALS
environment variable on your system. For example:
export GOOGLE_APPLICATION_CREDENTIALS = credentials.json
Then restart the Oracle GoldenGate manager process.
Parent topic: Setting Up and Running the BigQuery Handler