Why use TiDB in Hive as the Metastore database?
TiDB is a distributed SQL database built by PingCAP and its open-source community. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability. It's a one-stop solution for both Online Transactional Processing (OLTP) and Online Analytical Processing (OLAP) workloads.
In scenarios with enormous amounts of data, due to TiDB's distributed architecture, query performance is not limited to the capability of a single machine. When the data volume reaches the bottleneck, you can add nodes to improve TiDB's storage capacity.
Because TiDB is compatible with the MySQL protocol, it's easy to switch Hive's Metastore database to TiDB. You can use TiDB as if you were using MySQL, with almost no changes:
- For the existing Hive cluster, you can use the
mysqldump
tool to replicate all data in MySQL to TiDB. - You can use the metadata initialization tool that comes with Hive to create a new Hive cluster
How to create a Hive cluster with TiDB
Creating a Hive cluster with TiDB involves the following steps:
- Meet component requirements
- Install a Hive cluster
- Deploy a TiDB cluster
- Configure Hive
- Initialize metadata
- Launch Metastore and test
Components required
Component | Version |
Hive | 3.1.2 |
Hadoop | 2.6.0-cdh-5.16.1 |
TiDB | 4.0 |
Java Development Kit (JDK) | 1.8.0_221 |
There are no mandatory requirements for the component versions, as long as the components are compatible with each other. After you confirm that you have successfully installed Hadoop and JDK and can use them directly, you can move on to the next step.
Install a Hive cluster
Step 1: Deploy a TiDB cluster
To set up a TiDB cluster, refer to this document.
Create a Hive user in TiDB and set a password.
Create a database named
hive
and grant privileges to thehive
user.-- Create a database for Hive Metastore. create database hive; -- Create a user and password for Hive Metastore. create user 'hive'@'%' identified by '123456'; -- Grant privileges to the user. grant all privileges on hive.* to 'hive'@'%' identified by '123456'; -- Flush privileges. flush privileges;
Set the configuration item.
set global tidb_skip_isolation_level_check=1;
If you don't set the configuration item, Metastore throws the following exception when it is running:
MetaException(message:The isolation level 'SERIALIZABLE' is not supported. Set tidb_skip_isolation_level_check=1 to skip this error)
Step 2: Configure Hive
Download and decompress Hive. In this example, the decompression directory for Hive is ${HIVE_HOME}.
To edit the
hive-site.xml
configuration file, runvim ${HIVE_HOME}/conf/hive-site.xml
. (The configuration items only use the minimum configuration.)<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://host:port/hive</value> <description>TiDB address</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>TiDB username</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> <description>TiDB password</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://localhost:9083</value> </property> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> </configuration>
To edit the hive-env.sh configuration file, run vim ${HIVE_HOME}/conf/hive-env.sh.
export HADOOP_HOME=... export JAVA_HOME=...
Copy mysql-connector-java-${version}.jar to the lib directory in Hive.
cp ${MYSQL_JDBC_PATH}/mysql-connector-java-${version}.jar ${HIVE_HOME}/lib
Step 3: Initialize metadata
You're performing this step to create a table for Hive metadata. The SQL script is in ${HIVE_HOME}/scripts/metastore/upgrade/mysql.
To initialize metadata, run the following command.
${HIVE_HOME}/bin/schematool -dbType mysql -initSchema --verbose
When schemaTool completed appears in the last line, it means the metadata is successfully initialized.
Step 4: Launch Metastore and test
Launch Metastore.
${HIVE_HOME}/bin/hive --service metastore
Start the Hive client for testing.
${HIVE_HOME}/bin/hive
Conclusion
If you use MySQL as the Hive Metastore database, as data grows in Hive, MySQL might become the bottleneck for the entire system. In this case, TiDB is a good solution, because it is compatible with the MySQL protocol and has excellent horizontal scalability. Due to its distributed architecture, TiDB far outperforms MySQL on large data sets and large numbers of concurrent queries.
This post showed how to deploy a Hive cluster with TiDB as the Metastore database. We hope TiDB can help you horizontally scale your Hive Metastore to meet your growing business needs.
In addition, if you're interested in our MySQL-to-TiDB migration story, check out this post.
FAQ
Hive Compatibility Version?
Hive Version Status 1.x Not tested 2.0.x Tested 2.1.x Tested and verified in production 2.3.x Tested and verified in production 3.x Tested - Do the schemas in the Hive metastore database need to be changed?For Hive version 2.1.x and 2.3.x,no schema change is needed.
Does the foreign key constraint for tables in Hive the metastore database affecting migrating to TiDB? For the versions tested, foreign key constraints do not impact using TiDB as Hive metastore.
How to handle `MetaException(message:The isolation level 'SERIALIZABLE' is not supported. Set tidb_skip_isolation_level_check=1 to skip this error) ` exception? In TiDB, execute the following command `set global tidb_skip_isolation_level_check=1;` to skip check.