Learn how to use the Flume Handler to stream change capture data to a Flume source database.
Topics:
A Flume source publishes the data to a Flume channel.
A Flume sink retrieves the data out of a Flume channel and streams the data to different targets.
A Flume Agent is a container process that owns and manages a source, channel and sink.
A single Flume installation can host many agent processes. The Flume Handler can stream data from a trail file to Avro or Thrift RPC Flume sources.
Parent topic: Using the Flume Handler
Instructions for configuring the Flume Handler components and running the handler are described in this section.
To run the Flume Handler, a Flume Agent configured with an Avro or Thrift Flume source must be up and running. Oracle GoldenGate can be collocated with Flume or located on a different machine. If located on a different machine, the host and port of the Flume source must be reachable with a network connection. For instructions on how to configure and start a Flume Agent process, see the Flume User Guide https://flume.apache.org/releases/content/1.6.0/FlumeUserGuide.pdf
.
Topics:
Parent topic: Using the Flume Handler
For the Flume Handler to connect to the Flume source and run, the Flume Agent configuration file and the Flume client jars must be configured in gg.classpathconfiguration
variable. The Flume Handler uses the contents of the Flume Agent configuration file to resolve the host, port, and source type for the connection to Flume source. The Flume client libraries do not ship with Oracle GoldenGate for Big Data. The Flume client library versions must match the version of Flume to which the Flume Handler is connecting. For a list of the required Flume client JAR files by version, see Flume Handler Client Dependencies.
The Oracle GoldenGate property, gg.classpath
variable must be set to include the following default locations:
The default location of the core-site.xml
file is Flume_Home
/conf
.
The default location of the Flume client JARS is Flume_Home
/lib/*
.
The gg.classpath
must be configured exactly as shown here. The path to the Flume Agent configuration file must contain the path with no wild card appended. The inclusion of the wildcard in the path to the Flume Agent configuration file will make the file inaccessible. Conversely, pathing to the dependency jars must include the *
wildcard character in order to include all of the JAR files in that directory in the associated classpath. Do not use *.jar
. The following is an example of a correctly configured gg.classpath
variable:
gg.classpath=dirprm/:/var/lib/flume/lib/*
If the Flume Handler and Flume are not collocated, then the Flume Agent configuration file and the Flume client libraries must be copied to the machine hosting the Flume Handler process.
Parent topic: Setting Up and Running the Flume Handler
The following are the configurable values for the Flume Handler. These properties are located in the Java Adapter properties file (not in the Replicat properties file).
To enable the selection of the Flume Handler, you must first configure the handler type by specifying gg.handler.jdbc.type=flume
and the other Flume properties as follows:
Property Name | Property Value | Required / Optional | Default | Description |
---|---|---|---|---|
|
|
Yes |
List of handlers. Only one is allowed with grouping properties |
|
|
|
Yes |
Type of handler to use. |
|
|
Formatter class or short code |
No. |
Defaults to delimitedtext |
The formatter to be used. Can be one of the following:
You can also write a custom formatter and include the fully qualified class name here. |
|
Any choice of filename |
No. Defaults to |
Either the default |
|
|
|
No. Defaults to |
Operation mode (op) or Transaction Mode (tx). Java Adapter grouping options can be used only in tx mode. |
|
|
A custom implementation fully qualified class name |
No. Defaults to |
Class to be used to define what header properties are to be added to a flume event. |
|
|
|
No. Defaults to |
Defines whether each flume event represents an operation or a transaction. If |
|
|
|
No. Defaults to |
When set to |
|
|
|
No. Defaults to |
When set to |
Parent topic: Setting Up and Running the Flume Handler
gg.handlerlist = flumehandler gg.handler.flumehandler.type = flume gg.handler.flumehandler.RpcClientPropertiesFile=custom-flume-rpc.properties gg.handler.flumehandler.format =avro_op gg.handler.flumehandler.mode =tx gg.handler.flumehandler.EventMapsTo=tx gg.handler.flumehandler.PropagateSchema =true gg.handler.flumehandler.includeTokens=false
Parent topic: Setting Up and Running the Flume Handler
This section explains how operation data from the Oracle GoldenGate trail file is mapped by the Flume Handler into Flume Events based on different configurations. A Flume Event is a unit of data that flows through a Flume agent. The event flows from source to channel to sink and is represented by an implementation of the event interface. An event carries a payload (byte array) that is accompanied by an optional set of headers (string attributes).
Topics:
Parent topic: Using the Flume Handler
The configuration for the Flume Handler in the Oracle GoldenGate Java configuration file is as follows:
gg.handler.{name}.mode=op
The data for each operation from an Oracle GoldenGate trail file maps into a single Flume Event. Each event is immediately flushed to Flume. Each Flume Event has the following headers:
TABLE_NAME:
The table name for the operation
SCHEMA_NAME
: The catalog name (if available) and the schema name of the operation
SCHEMA_HASH
: The hash code of the Avro schema (only applicable for Avro Row and Avro Operation formatters)
Parent topic: Data Mapping of Operations to Flume Events
EventMapsTo
OperationThe configuration for the Flume Handler in the Oracle GoldenGate Java configuration file is as follows:
gg.handler.flume_handler_name.mode=tx
gg.handler.flume_handler_name.EventMapsTo=op
The data for each operation from Oracle GoldenGate trail file maps into a single Flume Event. Events are flushed to Flume when the transaction is committed. Each Flume Event has the following headers:
TABLE_NAME
: The table name for the operation
SCHEMA_NAME
: The catalog name (if available) and the schema name of the operation
SCHEMA_HASH
: The hash code of the Avro schema (only applicable for Avro Row and Avro Operation formatters)
We recommend that you use this mode when formatting data as Avro or delimited text. It is important to understand that configuring Replicat batching functionality increases the number of operations that are processed in a transaction.
Parent topic: Data Mapping of Operations to Flume Events
EventMapsTo
TransactionThe configuration for the Flume Handler in the Oracle GoldenGate Java configuration file is as follows.
gg.handler.flume_handler_name.mode=tx gg.handler.flume_handler_name.EventMapsTo=tx
The data for all operations for a transaction from the source trail file are concatenated and mapped into a single Flume Event. The event is flushed when the transaction is committed. Each Flume Event has the following headers:
GG_TRANID
: The transaction ID of the transaction
OP_COUNT
: The number of operations contained in this Flume payload event
We recommend that you use this mode only when using self describing formats such as JSON or XML. In is important to understand that configuring Replicat batching functionality increases the number of operations that are processed in a transaction.
Parent topic: Data Mapping of Operations to Flume Events
Consider the following options for enhanced performance:
Set Replicat-based grouping
Set the transaction mode with gg.handler.
flume_handler_name
. EventMapsTo=tx
Increase the maximum heap size of the JVM in Oracle GoldenGate Java properties file (the maximum heap size of the Flume Handler may affect performance)
Parent topic: Using the Flume Handler
The Flume Handler is adaptive to metadata change events. To handle metadata change events, the source trail files must have metadata in the trail file. However, this functionality depends on the source replicated database and the upstream Oracle GoldenGate Capture process to capture and replicate DDL events. This feature is not available for all database implementations in Oracle GoldenGate. To determine whether DDL replication is supported, see the Oracle GoldenGate installation and configuration guide for the appropriate database.
Whenever a metadata change occurs at the source, the Flume Handler notifies the associated formatter of the metadata change event. Any cached schema that the formatter is holding for that table will be deleted. The next time that the associated formatter encounters an operation for that table the schema is regenerated.
Parent topic: Using the Flume Handler
Topics:
Parent topic: Using the Flume Handler
The following is a sample configuration for an Avro Flume source from the Flume Agent configuration file:
client.type = default hosts = h1 hosts.h1 = host_ip:host_port batch-size = 100 connect-timeout = 20000 request-timeout = 20000
Parent topic: Example Flume Source Configuration
The following is a sample configuration for an Avro Flume source from the Flume Agent configuration file:
client.type = thrift hosts = h1 hosts.h1 = host_ip:host_port
Parent topic: Example Flume Source Configuration
You may choose to implement the following advanced features of the Flume Handler:
Topics:
Parent topic: Using the Flume Handler
The Flume Handler can propagate schemas to Flume. This feature is currently only supported for the Avro Row and Operation formatters. To enable this feature, set the following property:
gg.handler.name.propagateSchema=true
The Avro Row or Operation Formatters generate Avro schemas on a just-in-time basis. Avro schemas are generated the first time an operation for a table is encountered. A metadata change event results in the schema reference being for a table being cleared, and a new schema is generated the next time an operation is encountered for that table.
When schema propagation is enabled, the Flume Handler propagates schemas in an Avro Event when they are encountered.
Default Flume Schema Event headers for Avro include the following information:
SCHEMA_EVENT
: true
GENERIC_WRAPPER
: true or false
TABLE_NAME
: The table name as seen in the trail
SCHEMA_NAME
: The catalog name (if available) and the schema name
SCHEMA_HASH
: The hash code of the Avro schema
Parent topic: Advanced Features
Kerberos authentication for the Oracle GoldenGate for Big Data Flume Handler connection to the Flume source is possible. This feature is supported only in Flume 1.6.0 and later using the Thrift Flume source. You can enable it by changing the configuration of the Flume source in the Flume Agent configuration file.
The following is an example of the Flume source configuration from the Flume Agent configuration file that shows how to enable Kerberos authentication. You must provide Kerberos principal name of the client and the server. The path to a Kerberos keytab
file must be provided so that the password of the client principal can be resolved at runtime. For information on how to administer Kerberos, on Kerberos principals and their associated passwords, and about the creation of a Kerberos keytab
file, see the Kerberos documentation.
client.type = thrift hosts = h1 hosts.h1 =host_ip:host_port kerberos=true client-principal=flumeclient/client.example.org@EXAMPLE.ORG client-keytab=/tmp/flumeclient.keytab server-principal=flume/server.example.org@EXAMPLE.ORG
Parent topic: Advanced Features
It is possible to configure the Flume Handler so that it fails over when the primary Flume source becomes unavailable. This feature is currently supported only in Flume 1.6.0 and later using the Avro Flume source. It is enabled with Flume source configuration in the Flume Agent configuration file. The following is a sample configuration for enabling fail over functionality:
client.type=default_failover hosts=h1 h2 h3 hosts.h1=host_ip1:host_port1 hosts.h2=host_ip2:host_port2 hosts.h3=host_ip3:host_port3 max-attempts = 3 batch-size = 100 connect-timeout = 20000 request-timeout = 20000
Parent topic: Advanced Features
You can configure the Flume Handler so that produced Flume events are load-balanced across multiple Flume sources. This feature is currently supported only in Flume 1.6.0 and later using the Avro Flume source. You can enable it by changing the Flume source configuration in the Flume Agent configuration file. The following is a sample configuration for enabling load balancing functionality:
client.type = default_loadbalance hosts = h1 h2 h3 hosts.h1 = host_ip1:host_port1 hosts.h2 = host_ip2:host_port2 hosts.h3 = host_ip3:host_port3 backoff = false maxBackoff = 0 host-selector = round_robin batch-size = 100 connect-timeout = 20000 request-timeout = 20000
Parent topic: Advanced Features
Topics:
Parent topic: Using the Flume Handler
Issues with the Java classpath are common. A ClassNotFoundException
in the Oracle GoldenGate Java log4j
log file indicates a classpath problem. You can use the Java log4j
log file to troubleshoot this issue. Setting the log level to DEBUG
allows for logging of each of the JARs referenced in the gg.classpath
object to be logged to the log file. This way, you can make sure that all of the required dependency JARs are resolved. For more information, see Classpath Configuration.
Parent topic: Troubleshooting the Flume Handler
In some situations, the Flume Handler may write to the Flume source faster than the Flume sink can dispatch messages. When this happens, the Flume Handler works for a while, but when Flume can no longer accept messages, it will abend. The cause that is logged in the Oracle GoldenGate Java log file may probably be an EventDeliveryException
, indicating that the Flume Handler was unable to send an event. Check the Flume log for the exact cause of the problem. You may be able to re-configure the Flume channel to increase capacity or increase the Java heap size if the Flume Agent is experiencing an OutOfMemoryException
. This may not solve the problem. If the Flume Handler can push data to the Flume source faster than messages are dispatched by the Flume sink, then any change may only extend the period the Flume Handler can run before failing.
Parent topic: Troubleshooting the Flume Handler
If the Flume Agent configuration file is not in the classpath,, Flume Handler abends at startup. The result is usually a ConfigException
that reports the issue as an error loading the Flume producer properties. Check the gg.handler.
name
. RpcClientProperites
configuration file to make sure that the naming of the Flume Agent properties file is correct. Check the Oracle GoldenGate gg.classpath
properties to make sure that the classpath contains the directory containing the Flume Agent properties file. Also, check the classpath to ensure that the path to the Flume Agent properties file does not end with a wildcard (*)
character.
Parent topic: Troubleshooting the Flume Handler
The Flume Handler terminates abnormally at start up if it cannot connect to the Flume source. The root cause of this problem may probably be reported as an IOExeption
in the Oracle GoldenGate Java log4j
, file indicating a problem connecting to Flume at a given host and port. Make sure that the following are both true:
The Flume Agent process is running
The Flume agent configuration file that the Flume Handler is accessing contains the correct host and port.
Parent topic: Troubleshooting the Flume Handler
Review the contents of the Oracle GoldenGate Java log4j
file to identify any other issues.
Parent topic: Troubleshooting the Flume Handler