Learn how to use the Elasticsearch Handler, which allows you to store, search, and analyze large volumes of data quickly and in near real time.
Topics:
Elasticsearch is a highly scalable open-source full-text search and analytics engine. Elasticsearch allows you to store, search, and analyze large volumes of data quickly and in near real time. It is generally used as the underlying engine or technology that drives applications with complex search features.
The Elasticsearch Handler uses the Elasticsearch Java client to connect and receive data into Elasticsearch node, see https://www.elastic.co.
Parent topic: Using the Elasticsearch Handler
Topics:
Parent topic: Using the Elasticsearch Handler
The Elasticsearch Handler property gg.handler.name.version
should be set according to the version of the Elasticsearch cluster. The Elasticsearch Handler uses a Java Transport client, which must have the same major version (such as, 2.x, or 5.x) as the nodes in the cluster. The Elasticsearch Handler can connect to clusters that have a different minor version (such as, 2.3.x) though new functionality may not be supported.
Parent topic: Detailing the Functionality
An Elasticsearch index is a collection of documents with similar characteristics. An index can only be created in lowercase. An Elasticsearch type is a logical group within an index. All the documents within an index or type should have same number and type of fields.
The Elasticsearch Handler maps the source trail schema concatenated with source trail table name to construct the index. For three-part table names in source trail, the index is constructed by concatenating source catalog, schema, and table name.
The Elasticsearch Handler maps the source table name to the Elasticsearch type. The type name is case-sensitive.
Table 4-1 Elasticsearch Mapping
Source Trail | Elasticsearch Index | Elasticsearch Type |
---|---|---|
|
|
|
|
|
|
If an index does not already exist in the Elasticsearch cluster, a new index is created when Elasticsearch Handler receives (INSERT
or UPDATE
operation in source trail) data.
Parent topic: Detailing the Functionality
An Elasticsearch document is a basic unit of information that can be indexed. Within an index or type, you can store as many documents as you want. Each document has an unique identifier based on the _id
field.
The Elasticsearch Handler maps the source trail primary key column value as the document identifier.
Parent topic: Detailing the Functionality
The Elasticsearch document identifier is created based on the source table's primary key column value. The document identifier cannot be modified. The Elasticsearch handler processes a source primary key's update operation by performing a DELETE
followed by an INSERT
. While performing the INSERT
, there is a possibility that the new document may contain fewer fields than required. For the INSERT
operation to contain all the fields in the source table, enable trail Extract to capture the full data before images for update operations or use GETBEFORECOLS
to write the required column’s before images.
Parent topic: Detailing the Functionality
Elasticsearch supports the following data types:
32-bit integer
64-bit integer
Double
Date
String
Binary
Parent topic: Detailing the Functionality
The Elasticsearch Handler uses the operation mode for better performance. The gg.handler.name.mode
property is not used by the handler.
Parent topic: Detailing the Functionality
The Elasticsearch Handler maps the source table name to the Elasticsearch type. The type name is case-sensitive.
For three-part table names in source trail, the index is constructed by concatenating source catalog, schema, and table name.
INSERT
The Elasticsearch Handler creates a new index if the index does not exist, and then inserts a new document.
UPDATE
If an Elasticsearch index or document exists, the document is updated. If an Elasticsearch index or document does not exist, a new index is created and the column values in the UPDATE
operation are inserted as a new document.
DELETE
If an Elasticsearch index or document exists, the document is deleted. If Elasticsearch index or document does not exist, a new index is created with zero fields.
The TRUNCATE
operation is not supported.
Parent topic: Detailing the Functionality
A cluster is a collection of one or more nodes (servers) that holds the entire data. It provides federated indexing and search capabilities across all nodes.
A node is a single server that is part of the cluster, stores the data, and participates in the cluster’s indexing and searching.
The Elasticsearch Handler property gg.handler.name.ServerAddressList
can be set to point to the nodes available in the cluster.
Parent topic: Detailing the Functionality
You must ensure that the Elasticsearch cluster is setup correctly and the cluster is up and running, see https://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html. Alternatively, you can use Kibana to verify the setup.
Set the Classpath
The property gg.classpath
must include all the jars required by the Java transport client. For a listing of the required client JAR files by version, see Elasticsearch Handler Client Dependencies.
Default location of 2.X JARs: Elasticsearch_Home/lib/* Elasticsearch_Home/plugins/shield/*
Default location of 5.X JARs: Elasticsearch_Home/lib/* Elasticsearch_Home/plugins/x-pack/* Elasticsearch_Home/modules/transport-netty3/* Elasticsearch_Home/modules/transport-netty4/* Elasticsearch_Home/modules/reindex/*
The inclusion of the * wildcard in the path can include the * wildcard character in order to include all of the JAR files in that directory in the associated classpath. Do not use *.jar
.
The following is an example of the correctly configured classpath:
gg.classpath=Elasticsearch_Home/lib/*
Topics:
Parent topic: Using the Elasticsearch Handler
The following are the configurable values for the Elasticsearch handler. These properties are located in the Java Adapter properties file (not in the Replicat properties file).
To enable the selection of the Elasticsearch Handler, you must first configure the handler type by specifying gg.handler.jdbc.type=elasticsearch
and the other Elasticsearch properties as follows:
Table 4-2 Elasticsearch Handler Configuration Properties
Properties | Required/ Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.handlerlist |
Required |
Name (any name of your choice) |
None |
The list of handlers to be used. |
gg.handler.name.type |
Required |
elasticsearch |
None |
Type of handler to use. For example, Elasticsearch, Kafka, Flume, or HDFS. |
gg.handler.name.ServerAddressList |
Optional |
|
|
Comma separated list of contact points of the nodes to connect to the Elasticsearch cluster. |
gg.handler.name.clientSettingsFile |
Required |
Transport client properties file. |
None |
The filename in classpath that holds Elasticsearch transport client properties used by the Elasticsearch Handler. |
gg.handler.name.version |
Optional |
|
|
The version of the transport client used by the Elasticsearch Handler, this should be compatible with the Elasticsearch cluster. |
gg.handler.name.bulkWrite |
Optional |
|
|
When this property is |
gg.handler.name.numberAsString |
Optional |
|
|
When this property is |
gg.handler.elasticsearch.upsert |
Optional |
|
|
When this property is |
Example 4-1 Sample Handler Properties file:
For 2.x Elasticsearch cluster:
gg.handlerlist=elasticsearch gg.handler.elasticsearch.type=elasticsearch gg.handler.elasticsearch.ServerAddressList=localhost:9300 gg.handler.elasticsearch.clientSettingsFile=client.properties gg.handler.elasticsearch.version=2.x gg.classpath=/path/to/elastic/lib/*
For 2.x Elasticsearch cluster with Shield:
gg.handlerlist=elasticsearch gg.handler.elasticsearch.type=elasticsearch gg.handler.elasticsearch.ServerAddressList=localhost:9300 gg.handler.elasticsearch.clientSettingsFile=client.properties gg.handler.elasticsearch.version=2.x gg.classpath=/path/to/elastic/lib/*:/path/to/elastic/plugins/shield/*
For 5.x Elasticsearch cluster:
gg.handlerlist=elasticsearch gg.handler.elasticsearch.type=elasticsearch gg.handler.elasticsearch.ServerAddressList=localhost:9300 gg.handler.elasticsearch.clientSettingsFile=client.properties gg.handler.elasticsearch.version=5.x gg.classpath=/path/to/elastic/lib/*:/path/to/elastic/modules/transport-netty4/*:/path/to/elastic/modules/reindex/*
For 5.x Elasticsearch cluster with x-pack:
gg.handlerlist=elasticsearch gg.handler.elasticsearch.type=elasticsearch gg.handler.elasticsearch.ServerAddressList=localhost:9300 gg.handler.elasticsearch.clientSettingsFile=client.properties gg.handler.elasticsearch.version=5.x gg.classpath=/path/to/elastic/lib/*:/path/to/elastic/plugins/x-pack/*:/path/to/elastic/modules/transport-netty4/*:/path/to/elastic/modules/reindex/*
Sample Replicat configuration and a Java Adapter Properties files can be found at the following directory:
GoldenGate_install_directory/AdapterExamples/big-data/elasticsearch
Parent topic: Setting Up and Running the Elasticsearch Handler
The Elasticsearch Handler uses a Java Transport client to interact with Elasticsearch cluster. The Elasticsearch cluster may have addional plug-ins like shield or x-pack, which may require additional configuration.
The gg.handler.name.clientSettingsFile
property should point to a file that has additional client settings based on the version of Elasticsearch cluster. The Elasticsearch Handler attempts to locate and load the client settings file using the Java classpath. Te Java classpath must include the directory containing the properties file.
The client properties file for Elasticsearch 2.x(without any plug-in) is:
cluster.name=Elasticsearch_cluster_name
The client properties file for Elasticsearch 2.X with the Shield plug-in:
cluster.name=Elasticsearch_cluster_name shield.user=shield_username:shield_password
The Shield plug-in also supports additional capabilities like SSL and IP filtering. The properties can be set in the client.properties
file, see https://www.elastic.co/guide/en/elasticsearch/client/java-api/2.4/transport-client.html and https://www.elastic.co/guide/en/shield/current/_using_elasticsearch_java_clients_with_shield.html.
The client.properties
file for Elasticsearch 5.x with the X-Pack plug-in is:
cluster.name=Elasticsearch_cluster_name xpack.security.user=x-pack_username:x-pack-password
The X-Pack plug-in also supports additional capabilities. The properties can be set in the client.properties
file, see https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.1/transport-client.html and https://www.elastic.co/guide/en/x-pack/current/java-clients.html.
Parent topic: Setting Up and Running the Elasticsearch Handler
The Elasticsearch Handler gg.handler.name.bulkWrite
property is used to determine whether the source trail records should be pushed to the Elasticsearch cluster one at a time or in bulk using the bulk write API. When this property is true, the source trail operations are pushed to the Elasticsearch cluster in batches whose size can be controlled by the MAXTRANSOPS
parameter in the generic Replicat parameter file. Using the bulk write API provides better performance.
Elasticsearch uses different thread pools to improve how memory consumption of threads are managed within a node. Many of these pools also have queues associated with them, which allow pending requests to be held instead of discarded.
For bulk operations, the default queue size is 50 (in version 5.2) and 200 (in version 5.3).
To avoid bulk API errors, you must set the Replicat MAXTRANSOPS
size to match the bulk thread pool queue size at a minimum. The configuration thread_pool.bulk.queue_size
property can be modified in the elasticsearch.yaml
file.
Parent topic: Using the Elasticsearch Handler
Elasticsearch versions 2.x supports a Shield plug-in which provides basic authentication, SSL and IP filtering. Similar capabilities exists in the X-Pack plug-in for Elasticsearch 5.x. The additional transport client settings can be configured in the Elasticsearch Handler using the gg.handler.name.clientSettingsFile
property.
Parent topic: Using the Elasticsearch Handler
The Elasticsearch Handler does not react to any DDL records in the source trail. Any data manipulation records for a new source table results in auto-creation of index or type in the Elasticsearch cluster.
Parent topic: Using the Elasticsearch Handler
This section contains information to help you troubleshoot various issues.
Topics:
Parent topic: Using the Elasticsearch Handler
The most common initial error is an incorrect classpath to include all the required client libraries and creates a ClassNotFound
exception in the log4j
log file.
Also, it may be due to an error resolving the classpath if there is a typographic error in the gg.classpath
variable.
The Elasticsearch transport client libraries do not ship with the Oracle GoldenGate for Big Data product. You should properly configure the gg.classpath
property in the Java Adapter Properties file to correctly resolve the client libraries, see Setting Up and Running the Elasticsearch Handler.
Parent topic: Troubleshooting
The Elasticsearch Handler gg.handler.name.version
property must be set to 2.x or 5.x to match the major version number of the Elasticsearch cluster.
The following errors may occur when there is a wrong version configuration:
Error: NoNodeAvailableException[None of the configured nodes are available:] ERROR 2017-01-30 22:35:07,240 [main] Unable to establish connection. Check handler properties and client settings configuration. java.lang.IllegalArgumentException: unknown setting [shield.user]
Ensure that all required plug-ins are installed and review documentation changes for any removed settings.
Parent topic: Troubleshooting
To resolve this exception:
ERROR 2017-01-30 22:33:10,058 [main] Unable to establish connection. Check handler properties and client settings configuration.
Verify that the gg.handler.name.clientSettingsFile
configuration property is correctly setting the Elasticsearch transport client settings file name. Verify that the gg.classpath
variable includes the path to the correct file name and that the path to the properties file does not contain an asterisk (*) wildcard at the end.
Parent topic: Troubleshooting
This error occurs when the Elasticsearch Handler is unable to connect to the Elasticsearch cluster:
Error: NoNodeAvailableException[None of the configured nodes are available:]
Use the following steps to debug the issue:
Ensure that the Elasticsearch server process is running.
Validate the cluster.name
property in the client properties configuration file.
Validate the authentication credentials for the x-Pack or Shield plug-in in the client properties file.
Validate the gg.handler.name.ServerAddressList
handler property.
Parent topic: Troubleshooting
The following error occurs when the Elasticsearch Handler finds a TRUNCATE
operation in the source trail:
oracle.goldengate.util.GGException: Elasticsearch Handler does not support the operation: TRUNCATE
This exception error message is written to the handler log file before the RAeplicat process abends. Removing the GETTRUNCATES
parameter from the Replicat parameter file resolves this error.
Parent topic: Troubleshooting
DEBUG [main] (ElasticSearch5DOTX.java:130) - Bulk execute status: failures:[true] buildFailureMessage:[failure in bulk execution: [0]: index [cs2cat_s1sch_n1tab], type [N1TAB], id [83], message [RemoteTransportException[[UOvac8l][127.0.0.1:9300][indices:data/write/bulk[s][p]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$7@43eddfb2 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@5ef5f412[Running, pool size = 4, active threads = 4, queued tasks = 50, completed tasks = 84]]];]
It may be due to the Elasticsearch running out of resources to process the operation. You can limit the Replicat batch size using MAXTRANSOPS
to match the value of the thread_pool.bulk.queue_size
Elasticsearch configuration parameter.
Note:
Changes to the Elasticsearch parameter,thread_pool.bulk.queue_size
, are effective only after the Elasticsearch node is restarted.Parent topic: Troubleshooting
The following log messages appear in the handler log file on successful connection:
Connection to 2.x Elasticsearch cluster:
INFO 2017-01-31 01:43:38,814 [main] **BEGIN Elasticsearch client settings** INFO 2017-01-31 01:43:38,860 [main] key[cluster.name] value[elasticsearch-user1-myhost] INFO 2017-01-31 01:43:38,860 [main] key[request.headers.X-Found-Cluster] value[elasticsearch-user1-myhost] INFO 2017-01-31 01:43:38,860 [main] key[shield.user] value[es_admin:user1] INFO 2017-01-31 01:43:39,715 [main] Connecting to Server[myhost.us.example.com] Port[9300] INFO 2017-01-31 01:43:39,715 [main] Client node name: Smith INFO 2017-01-31 01:43:39,715 [main] Connected nodes: [{node-myhost}{vqtHikpGQP-IXieHsgqXjw}{10.196.38.196}{198.51.100.1:9300}] INFO 2017-01-31 01:43:39,715 [main] Filtered nodes: [] INFO 2017-01-31 01:43:39,715 [main] **END Elasticsearch client settings**
Connection to a 5.x Elasticsearch cluster:
INFO [main] (Elasticsearch5DOTX.java:38) - **BEGIN Elasticsearch client settings** INFO [main] (Elasticsearch5DOTX.java:39) - {xpack.security.user=user1:user1_kibana, cluster.name=elasticsearch-user1-myhost, request.headers.X-Found-Cluster=elasticsearch-user1-myhost} INFO [main] (Elasticsearch5DOTX.java:52) - Connecting to Server[myhost.us.example.com] Port[9300] INFO [main] (Elasticsearch5DOTX.java:64) - Client node name: _client_ INFO [main] (Elasticsearch5DOTX.java:65) - Connected nodes: [{node-myhost}{w9N25BrOSZeGsnUsogFn1A}{bIiIultVRjm0Ze57I3KChg}{myhost}{198.51.100.1:9300}] INFO [main] (Elasticsearch5DOTX.java:66) - Filtered nodes: [] INFO [main] (Elasticsearch5DOTX.java:68) - **END Elasticsearch client settings**
Parent topic: Using the Elasticsearch Handler
Elasticsearch: Trying to input very large number
Very large numbers result in inaccurate values with Elasticsearch document. For example, 9223372036854775807, -9223372036854775808. This is an issue with the Elasticsearch server and not a limitation of the Elasticsearch Handler.
The workaround for this issue is to ingest all the number values as strings using the gg.handler.name.numberAsString=true
property.
Elasticsearch: Issue with index
The Elasticsearch Handler is not able to input data into the same index if there are more than one table with similar column names and different column data types.
Index names are always lowercase though the catalog/schema/tablename
in the trail may be case-sensitive.
Parent topic: Using the Elasticsearch Handler