This chapter includes the following sections:
The role of the parser is to translate JMS text message data and header properties into an appropriate set of transactions and operations to pass into the VAM interface. To do this, the parser always must find certain data:
Other data will be used if the configuration requires it:
The parser can obtain this data from JMS header properties, system generated values, static values, or in some parser-specific way. This depends on the nature of the piece of information.
The Oracle GoldenGate message capture adapter supports three types of parsers:
Fixed – Messages contain data presented as fixed width fields in contiguous text.
Delimited – Messages contain data delimited by field and end of record characters.
XML – Messages contain XML data accessed through XPath expressions.
There are several ways source data definitions can be defined using a combination of properties and external files. The Oracle GoldenGate Gendef utility generates a standard source definitions file based on these data definitions and parser properties. The options vary based on parser type:
Delimited – source definitions or user defined
XML – source definitions or user defined
There are several properties that configure how the selected parser gets data and how the source definitions are converted to target definitions.
The following information is required for the parsers to translate the messages:
The transaction identifier (txid
) groups operations into transactions as they are written to the Oracle GoldenGate trail file. The Oracle GoldenGate message capture adapter supports only contiguous, non-interleaved transactions. The transaction identifier can be any unique value that increases for each transaction. A system generated value can generally be used.
The sequence identifier (seqid
) identifies each operation internally. This can be used during recovery processing to identify operations that have already been written to the Oracle GoldenGate trail. The sequence identifier can be any unique value that increases for each operation. The length should be fixed.
The JMS Message ID can be used as a sequence identifier if the message identifier for that provider increases and is unique. However, there are cases (e.g. using clustering, failed transactions) where JMS does not guarantee message order or when the ID may be unique but not be increasing. The system generated Sequence ID can be used, but it can cause duplicate messages under some recovery situations. The recommended approach is to have the JMS client that adds messages to the queue set the Message ID, a header property, or some data element to an application-generated unique value that is increasing.
The table name is used to identify the logical table to which the column data belongs. The adapter requires a two part table name in the form SCHEMA_NAME.TABLE_NAME
. This can either be defined separately (schema and table
) or as a combination of schema and table (schemaandtable
).
A single field may contain both schema and table name, they may be in separate fields, or the schema may be included in the software code so only the table name is required. How the schema and table names can be specified depends on the parser. In any case the two part logical table name is used to write records in the Oracle GoldenGate trail and to generate the source definitions file that describes the trail.
The operation type (optype
) is used to determine whether an operation is an insert, update or delete when written to the Oracle GoldenGate trail. The operation type value for any specific operation is matched against the values defined for each operation type.
The data written to the Oracle GoldenGate trail for each operation type depends on the Extract configuration:
Inserts
The after values of all columns are written to the trail.
Updates
Default – The after values of keys are written. The after values of columns that have changed are written if the before values are present and can be compared. If before values are not present then all columns are written.
NOCOMPRESSUPDATES
– The after values of all columns are written to the trail.
GETUPDATEBEFORES
– The before and after values of columns that have changed are written to the trail if the before values are present and can be compared. If before values are not present only after values are written.
If both NOCOMPRESSUPDATES
and GETUPDATEBEFORES
are included, the before and after values of all columns are written to the trail if before values are present
Deletes
Default – The before values of all keys are written to the trail.
NOCOMPRESSDELETES
– The before values of all columns are written to the trail.
Primary key update operations may also be generated if the before values of keys are present and do not match the after values.
All parsers retrieve column data from the message text and write it to the Oracle GoldenGate trail. In some cases the columns are read in index order as defined by the source definitions, in other cases they are accessed by name.
Depending on the configuration and original message text, both before and after or only after images of the column data may be available. For updates, the data for non-updated columns may or may not be available.
All column data is retrieved as text. It is converted internally into the correct data type for that column based on the source definitions. Any conversion problem will result in an error and the process will abend.
The following data may be included, but is not required.
The relationship of transactions to messages can be:
One transaction per message
This is determined automatically by the scope of the message.
Multiple transactions per message
This is determined by the transaction indicator (txind
). If there is no transaction indicator, the XML parser can create transactions based on a matching transaction rule.
Multiple messages per transaction
The transaction indicator (txind
) is required to specify whether the operation is the beginning, middle, end or the whole transaction. The transaction indicator value for any specific operation is matched against the values defined for each transaction indicator type. A transaction is started if the indicator value is beginning or whole, continued if it is middle, and ended if it is end or whole.
Fixed width parsing is based on a data definition that defines the position and the length of each field. This is in the format of a Cobol copybook. A set of properties define rules for mapping the copybook to logical records in the Oracle GoldenGate trail and in the source definitions file.
The incoming data should consist of a standard format header followed by a data segment. Both should contain fixed width fields. The data is parsed based on the PIC definition in the copybook. It is written to the trail translated as explained in Header and Record Data Type Translation.
The header must be defined by a copybook 01 level record that includes the following:
A code to indicate the type of operation: insert, update, or delete
The copybook record name to use when parsing the data segment
Any fields in the header record that are not mapped to Oracle GoldenGate header fields are output as columns.
The following example shows a copybook definition containing the required header values
Example 10-1 Specifying a Header
01 HEADER. 20 Hdr-Timestamp PIC X(23) 20 Hdr-Source-DB-Function PIC X 20 Hdr-Source-DB-Rec-ID PIC X(8)
For the above example, you must set the following properties:
fixed.header=HEADER fixed.timestamp=Hdr-Timestamp fixed.optype=Hdr-Source-DB-Function fixed.table=Hdr-Source-DB-Rec-Id
The logical name table output in this case will be the value of Hdr-Source-DB-Rec-Id
.
More than one field can be used for a table name. For example, you can define the logical schema name through a static property such as:
fixed.schema=MYSCHEMA
Then you can add a property that defines the data record as multiple fields from the copybook header definition.
Example 10-2 Specifying Compound Table Names
01 HEADER. 20 Hdr-Source-DB PIC X(8). 20 Hdr-Source-DB-Rec-Id PIC X(8). 20 Hdr-Source-DB-Rec-Version PIC 9(4). 20 Hdr-Source-DB-Function PIC X. 20 Hdr-Timestamp PIC X(22).
For the above example, you must set the following properties:
fixed.header=HEADER fixed.table=Hdr-Source-DB-Rec-Id,Hdr-Source-DB-Rec-Version fixed.schema=MYSCHEMA
The fields will be concatenated to result in logical schema and table names of the form:
MYSCHEMA.Hdr-Source-DB-Rec-Id+Hdr-Source-DB-Rec-Version
A timestamp is parsed using the default format YYYY-MM-DD HH:MM:SS.FFF
, with FFF
depending on the size of the field.
Specify different incoming formats by entering a comment before the datetime field as shown in the next example.
Example 10-3 Specifying timestamp formats
01 HEADER. * DATEFORMAT YYYY-MM-DD-HH.MM.SS.FF 20 Hdr-Timestamp PIC X(23)
Use properties to map the standard Oracle GoldenGate operation types to the optype
values. The following example specifies that the operation type is in the Hdr-Source-DB-Function
field and that the value for insert is A
, update is U
and delete is D
.
Example 10-4 Specifying the Function
fixed.optype=Hdr-Source-DB-Function fixed.optype.insert=A fixed.optype.update=U fixed.optype.delete=D
The data in the header and the record data are written to the trail based on the translated data type.
A field definition preceded by a date format comment is translated to an Oracle GoldenGate datetime field of the specified size. If there is no date format comment, the field will be defined by its underlying data type.
A PIC X
field is translated to the CHAR
data type of the indicated size.
A PIC 9
field is translated to a NUMBER
data type with the defined precision and scale. Numbers that are signed or unsigned and those with or without decimals are supported.
The following examples show the translation for various PIC
definitions.
Input | Output |
---|---|
PIC XX |
CHAR(2) |
PIC X(16) |
CHAR(16) |
PIC 9(4) |
NUMBER(4) |
* YYMMDD PIC 9(6) |
DATE(10) YYYY-MM-DD |
PIC 99.99 |
NUMBER(4,2) |
PIC 9(5)V99 |
NUMBER(7,2) |
In the example an input YYMMDD
date of 100522 is translated to 2010-05-22. The number 1234567 with the specified format PIC 9(5)V99
is translated to a seven digit number with two decimal places, or 12345.67.
A comment is used to identify key columns within the data record. The Gendef utility that generates the source definitions uses the comment to locate a key column.
In the following example Account
has been marked as a key column for TABLE1
.
01 TABLE1 * KEY 20 Account PIC X(19) 20 PAN_Seq_Num PIC 9(3)
Delimited parsing is based a preexisting source definitions files and a set of properties. The properties specify the delimiters to use and other rules, such as whether there are column names and before values. The source definitions file determines the valid tables to be processed and the order and data type of the columns in the tables.
The format of the delimited message is:
{METACOLS}n[,{COLNAMES}]m[,{COLBEFOREVALS}]m,{COLVALUES}m\n
Where:
There can be n metadata columns each followed by a field delimiter such as the comma shown in the format statement.
There can be m column values. Each of these are preceded by a field delimiter such as a comma.
The column name and before value are optional.
Each record is terminated by an end of line delimiter, such as \n
.
The metadata columns correspond to the header and contain fields that have special meaning. Metadata columns should include the following information.
optype contains values indicating if the record is an insert, update, or delete. The default values are I
, U
, and D
.
timestamp indicates type of value to use for the commit timestamp of the record. The format of the timestamp defaults to YYYY-DD-MM HH:MM:SS.FFF
.
schemaandtable is the full table name for the record in the format SCHEMA.TABLE
.
schema is the record's schema name.
txind is a value that indicates whether the record is the beginning, middle, end or the only record in the transaction. The default values are 0, 1, 2, 3.
id is the value used as the sequence number (RSN or CSN) of the record. The id of the first record (operation) in the transaction is used for the sequence number of the transaction.
Properties can be set to describe delimiters, values, and date and time formats.
The following properties determine the parsing rules for delimiting the record.
fielddelim specifies one or more ASCII or hexadecimal characters as the value for the field delimiter
recorddelim specifies one or more ASCII or hexadecimal characters as the value for the record delimiter
quote specifies one or more ASCII or hexadecimal characters to use for quoted values
nullindicator specifies one or more ASCII or hexadecimal characters to use for NULL
values
You can define escape characters for the delimiters so they will be replaced if the characters are found in the text. For example if a backslash and apostrophe (\') are specified, then the input "They used Mike\'s truck" is translated to "They used Mike's truck". Or if two quotes ("") are specified, "They call him ""Big Al""" is translated to "They call him "Big Al"".
Data values may be present in the record without quotes, but the system only removes escape characters within quoted values. A non-quoted string that matches a null indicator is treated as null.
The following properties provide more information:
hasbefores indicates before values are present for each record
hasnames indicates column names are present for each record
afterfirst indicates column after values come before column before values
isgrouped indicates all column names, before values and after values are grouped together in three blocks, rather than alternately per column
The default format YYYY-DD-MM HH:MM:SS.FFF
is used to parse dates. The user can use properties to override this on a global, table or column level. Examples of changing the format are shown below.
delim.dateformat.default=MM/DD/YYYY-HH:MM:SS delim.dateformat.MY.TABLE=DD/MMM/YYYY delim.dateformat.MY.TABLE.COL1=MMYYYY
The steps in delimited parsing are:
XML parsing is based on a preexisting source definitions file and a set of properties. The properties specify rules to determine XML elements and attributes that correspond to transactions, operations and columns. The source definitions file determines the valid tables to be processed and the ordering and data types of columns in those tables.
The XML message is formatted in either dynamic or static XML. At runtime the contents of dynamic XML are data values that cannot be predetermined using a sample XML or XSD document. The contents of static XML that determine tables and column element or attribute names can be predetermined using those sample documents.
The following two examples contain the same data.
Example 10-5 An Example of Static XML
<NewMyTableEntries> <NewMyTableEntry> <CreateTime>2010-02-05:10:11:21</CreateTime> <KeyCol>keyval</KeyCol> <Col1>col1val</Col1> </NewMyTableEntry> </NewMyTableEntries>
The NewMyTableEntries
element marks the transaction boundaries. The NewMyTableEntry
indicates an insert to MY.TABLE
. The timestamp is present in an element text value, and the column names are indicated by element names.
You can define rules in the properties file to parse either of these two styles of XML through a set of XPath-like properties. The goal of the properties is to map the XML to a predefined source definitions file through XPath matches.
Example 10-6 An Example of Dynamic XML
<transaction id="1234" ts="2010-02-05:10:11:21"> <operation table="MY.TABLE" optype="I"> <column name="keycol" index="0"> <aftervalue><![CDATA[keyval]]></aftervalue> </column> <column name="col1" index="1"> <aftervalue><![CDATA[col1val]]></aftervalue> </column> </operation> </transaction>
Every operation to every table has the same basic message structure consisting of transaction, operation and column elements. The table name, operation type, timestamp, column names, column values, etc. are obtained from attribute or element text values.
Independent of the style of XML, the parsing process needs to determine:
Transaction boundaries
Operation entries and metadata including:
Column entries and metadata including:
Either the column name or index; if both are specified the system will check to see if the column with the specified data has the specified name.
Column before or after values, sometimes both.
This is done through a set of interrelated rules. For each type of XML message that is to be processed you name a rule that will be used to obtain the required data. For each of these named rules you add properties to:
Specify the rule as a transaction, operation, or column rule type. Rules of any type are required to have a specified name and type.
Specify the XPath expression to match to see if the rule is active for the document being processed. This is optional; if not defined the parser will match the node of the parent rule or the whole document if this is the first rule.
List detailed rules (subrules
) that are to be processed in the order listed. Which subrules
are valid is determined by the rule type. Subrules
are optional.
In the following example the top-level rule is defined as genericrule
. It is a transaction
type rule. Its subrules
are defined in oprule
and they are of the type operation
.
xmlparser.rules=genericrule xmlparser.rules.genericrule.type=tx xmlparser.rules.genericrule.subrules=oprule xmlparser.rules.oprule.type=op
The XML parser supports a subset of XPath expressions necessary to match elements and extract data. An expression can be used to match a particular element or to extract data.
When doing data extraction most of the path is used to match. The tail of the expression is used for extraction.
Supported Constructs | Description |
---|---|
/e |
Use the absolute path from the root of the document to match |
./e or e |
Use the relative path from current node being processed to match |
../e |
Use a path based on the parent of the current node (can be repeated) to match |
//e |
Match |
* |
Match any element. Note: Partially wild-carded names are not supported. |
[n] |
Match the nth occurrence of an expression. |
[x=v] |
Match when x is equal to some value v where x can be:
|
Supported Expressions | Descriptions |
---|---|
Match root element |
/My/Element |
Match sub element to current node |
./Sub/Element |
Match nth element |
/My/*[n] |
Match nth Some element |
/My/Some[n] |
Match any text value |
/My/*[text() ='value'] |
Match the text in Some element |
/My/Some[text() = 'value'] |
Match any attribute |
/My/*[@att = 'value'] |
Match the attribute in Some element |
/My/Some[@att = 'value'] |
In addition to matching paths, the XPath expressions can also be used to obtain data values, either absolutely or relative to the current node being processed. Data value expressions can contain any of the path elements above, but must end with one of the value accessors listed below.
Value Accessors | Description |
---|---|
@att |
Some attribute value. |
text() |
The text content (value) of an element. |
content() |
The full content of an element, including any child XML nodes. |
name() |
The name of an element. |
position() |
The position of an element in its parent. |
Example 10-7 Examples of Extracting Data Values
To extract the relative element text value:
/My/Element/text()
To extract the absolute attribute value:
/My/Element/@att
To extract element text value with a match:
/My/Some[@att = 'value']/Sub/text()
Note:
Path accessors, such as ancestor/descendent/self, are not supported.
The values extracted by the XML parser are either column values or properties of the transaction or operation, such as table or timestamp. These values are either obtained from XML using XPath or through properties of the JMS message, system values, or hard coded values. The XML parser properties specify which of these options are valid for obtaining the values for that property.
The following example specifies that timestamp
can be an XPath expression, a JMS property, or the system generated timestamp.
{txrule}.timestamp={xpath-expression}|${jms-property}|*ts
The next example specifies that table
can be an XPath expression, a JMS property, or hard coded value.
{oprule}.table={xpath-expression}|${jms-property}|"value"
The last example specifies that name
can be a XPath expression or hard coded value.
{colrule}.timestamp={xpath-expression}|"value"
The rule that specifies the boundary for a transaction is at the highest level. Messages may contain a single transaction, multiple transactions, or a part of a transaction that spans messages. These are specified as follows:
single - The transaction rule match is not defined.
multiple - Each transaction rule match defines new transaction.
span – No transaction rule is defined; instead a transaction indicator is specified in an operation rule.
For a transaction rule, the following properties of the rule may also be defined through XPath or other expressions:
timestamp – The time at which the transaction occurred.
txid – The identifier for the transaction.
Transaction rules can have multiple subrules
, but each must be of type operation.
The following example specifies a transaction that is the whole message and includes a timestamp that comes from the JMS property.
Example 10-8 JMS Timestamp
singletxrule.timestamp=$JMSTimeStamp
The following example matches the root element transaction and obtains the timestamp from the ts
attribute.
Example 10-9 ts Timestamp
dyntxrule.match=/Transaction dyntxrule.timestamp=@ts
An operation rule can either be a subrule of a transaction rule, or a highest level rule (if the transaction is a property of the operation).
In addition to the standard rule properties, an operation rule should also define the following through XPath or other expressions:
timestamp – The timestamp of the operation. This is optional if the transaction rule is defined.
table – The name of the table on which this is an operation. Use this with schema.
schema – The name of schema for the table.
schemaandtable – Both schema and table name together in the form SCHEMA.TABLE
. This can be used in place of the individual table and schema properties.
optype – Specifies whether this is an insert, update or delete operation based on optype
values:
optype.insertval – The value indicating an insert. The default is I
.
optype.updateval – The value indicating an update. The default is U
.
optype.deleteval – The value indicating a delete. The default is D
.
seqid – The identifier for the operation. This will be the transaction identifier if txid
has not already been defined at the transaction level.
txind – Specifies whether this operation is the beginning of a transaction, in the middle or at the end; or if it is the whole operation. This property is optional and not valid if the operation rule is a subrule of a transaction rule.
Operation rules can have multiple subrules of type operation or column.
The following example dynamically obtains operation information from the /Operation
element of a /Transaction
.
Example 10-10 Operation
dynoprule.match=./Operation dynoprule.schemaandtable=@table dynoprule.optype=@type
The following example statically matches /NewMyTableEntry
element to an insert operation on the MY.TABLE
table.
Example 10-11 Operation example
statoprule.match=./NewMyTableEntry statoprule.schemaandtable="MY.TABLE" statoprule.optype="I" statoprule.timestamp=./CreateTime/text()
A column rule must be a subrule of an operation rule. In addition to the standard rule properties, a column rule should also define the following through XPath or other expressions.
name – The name of the column within the table definition.
index – The index of the column within the table definition.
Note:
If only one of name
and index
is defined, the other will be determined.
before.value – The before value of the column. This is required for deletes, but is optional for updates.
before.isnull – Indicates whether the before value of the column is null.
before.ismissing – Indicates whether the before value of the column is missing.
after.value – The before value of the column. This is required for deletes, but is optional for updates.
after.isnull – Indicates whether the before value of the column is null.
after.ismissing – Indicates whether the before value of the column is missing.
value – An expression to use for both before.value
and after.value
unless overridden by specific before or after values. Note that this does not support different before values for updates.
isnull – An expression to use for both before.isnull and after.isnull unless overridden.
ismissing – An expression to use for both before.ismissing and after.ismissing unless overridden.
The following example dynamically obtains column information from the /Column
element of an /Operation
Example 10-12 Dynamic Extraction of Column Information
dyncolrule.match=./Column dyncolrule.name=@name dyncolrule.before.value=./beforevalue/text() dyncolrule.after.value=./aftervalue/text()
The following example statically matches the /KeyCol
and /Col1
elements to columns in MY.TABLE
.
Example 10-13 Static Matching of Elements to Columns
statkeycolrule.match=/KeyCol statkeycolrule.name="keycol" statkeycolrule.value=./text() statcol1rule.match=/Col1 statcol1rule.name="col1" statcol1rule.value=./text()
The following example uses the XML samples shown earlier with appropriate rules to generate the same resulting operation on the MY.TABLE
table.
Dynamic XML | Static XML |
---|---|
<transaction id="1234" ts="2010-02-05:10:11:21"> <operation table="MY.TABLE" optype="I"> <column name="keycol" index="0"> <aftervalue> <![CDATA[keyval]]> </aftervalue> </column> <column name="col1" index="1"> <aftervalue> <![CDATA[col1val]]> </aftervalue> </column> </operation> </transaction> |
NewMyTableEntries> <NewMyTableEntry> <CreateTime> 2010-02-05:10:11:21 </CreateTime> <KeyCol>keyval</KeyCol> <Col1>col1val</Col1> </NewMyTableEntry> </NewMyTableEntries> |
Dynamic | Static |
---|---|
dyntxrule.match=/Transaction dyntxrule.timestamp=@ts dyntxrule.subrules=dynoprule dynoprule.match=./Operation dynoprule.schemaandtable=@table dynoprule.optype=@type dynoprule.subrules=dyncolrule dyncolrule.match=./Column dyncolrule.name=@name |
stattxrule.match=/NewMyTableEntries stattxrule.subrules= statoprule statoprule.match=./NewMyTableEntry statoprule.schemaandtable="MY.TABLE" statoprule.optype="I" statoprule.timestamp=./CreateTime/text() statoprule.subrules= statkeycolrule, statcol1rule statkeycolrule.match=/KeyCol |
dyncolrule.before.value=./beforevalue/text() dyncolrule.after.value=./aftervalue/text() |
statkeycolrule.name="keycol" statkeycolrule.value=./text() statcol1rule.match=/Col1 statcol1rule.name="col1" statcol1rule.value=./text() |
INSERT INTO MY.TABLE (KEYCOL, COL1) VALUES ('keyval', 'col1val')
Oracle GoldenGate for Java includes a Gendef utility that generates an Oracle GoldenGate source definitions file from the properties defined in a properties file. It creates a normalized definition of tables based on the property settings and other parser-specific data definition values.
The syntax to run this utility is:
gendef –prop {property_file} [-out {output_file}
This defaults to sending the source definitions to standard out, but it can be directed to a file using the –out
parameter. For example:
gendef –prop dirprm/jmsvam.properties -out dirdef/msgdefs.def
The output source definitions file can then be used in a pump or delivery process to interpret the trail data created through the VAM.