Hive ODBC Driver
These instructions are for the Hive ODBC driver available in Hive for HiveServer1.
There is no ODBC driver available for HiveServer2 as part of Apache Hive. There are third party ODBC drivers available from different vendors, and most of them seem to be free.
HiveServer was removed from Hive releases starting with Hive 1.0.0. See HIVE-6977. Please switch over to HiveServer2.
Introduction
The Hive ODBC Driver is a software library that implements the Open Database Connectivity (ODBC) API standard for the Hive database management system, enabling ODBC compliant applications to interact seamlessly (ideally) with Hive through a standard interface. This driver will NOT be built as a part of the typical Hive build process and will need to be compiled and built separately according to the instructions below.
Suggested Reading
This guide assumes you are already familiar with the following:
Software Requirements
The following software components are needed for the successful compilation and operation of the Hive ODBC driver:
- Hive Server – a service through which clients may remotely issue Hive commands and requests. The Hive ODBC driver depends on Hive Server to perform the core set of database interactions. Hive Server is built as part of the Hive build process. More information regarding Hive Server usage can be found here.
- Apache Thrift – a scalable cross-language software framework that enables the Hive ODBC driver (specifically the Hive client) to communicate with the Hive Server. See this link for the details on Thrift Installation. The Hive ODBC driver was developed with Thrift trunk version r790732, but the latest revision should also be fine. Make sure you note the Thrift install path during the Thrift build process as this information will be needed during the Hive client build process. The Thrift install path will be referred to as THRIFT_HOME.
Driver Architecture
Internally, the Hive ODBC Driver contains two separate components: Hive client, and the unixODBC API wrapper.
- Hive client – provides a set of C-compatible library functions to interact with Hive Server in a pattern similar to those dictated by the ODBC specification. However, Hive client was designed to be independent of unixODBC or any ODBC specific headers, allowing it to be used in any number of generic cases beyond ODBC.
- unixODBC API wrapper – provides a layer on top of Hive client that directly implements the ODBC API standard. The unixODBC API wrapper will be compiled into a shared object library, which will be the final form of the Hive ODBC driver. The wrapper files will remain a file attachment on the associated JIRA until it can be checked into the unixODBC code repository: HIVE-187, HIVE-1101.
Building and Setting Up ODBC Components
NOTE: Hive client needs to be built and installed before the unixODBC API wrapper can compile successfully.
Hive Client Build/Setup
In order to build and install the Hive client:
Checkout and setup the latest version of Apache Hive from the Subversion or Git source code repository. For more details, see Getting Started with Hive. From this point onwards, the path to the Hive root directory will be referred to as HIVE_SRC_ROOT.
Using a tarball source release
If you are compiling against source code contained in the tarball release package then HIVE_SRC_ROOT refers to the 'src' subdirectory.
Build the Hive client by running the following command from HIVE_SRC_ROOT. This will compile and copy the libraries and header files to
HIVE_SRC_ROOT/build/odbc/
. Please keep in mind that all paths should be fully specified (no relative paths). If you encounter an "undefined reference to vtables
" error, make sure that you have specified the absolute path for thrift.home.$ ant compile-cpp -Dthrift.home=<THRIFT_HOME>
MVN:
$ cd odbc $ mvn compile -Podbc,hadoop-1 -Dthrift.home=/usr/local -Dboost.home=/usr/local
You can optionally force Hive client to compile into a non-native bit architecture by specifying the additional parameter (assuming you have the proper compilation libraries):
$ ant compile-cpp -Dthrift.home=<THRIFT_HOME> -Dword.size=<32 or 64>
You can verify the entire Hive compilation by running the Hive test suite from HIVE_SRC_ROOT. Specifying the argument '-Dthrift.home=<THRIFT_HOME>' will enable the tests for the Hive client. If you do NOT specify thrift.home, the Hive client tests will not be run and will just return successful.
$ ant test -Dthrift.home=<THRIFT_HOME>
MVN:
$ cd odbc $ mvn test -Podbc,hadoop-1 -Dthrift.home=/usr/local -Dboost.home=/usr/local
You can specifically execute the Hive client tests by running the above command from
HIVE_SRC_ROOT/odbc/
. NOTE: Hive client tests require that a local Hive Server be operating on port 10000.To install the Hive client libraries onto your machine, run the following command from
HIVE_SRC_ROOT/odbc/
. NOTE: The install path defaults to/usr/local
. While there is no current way to change this default directory from the ant build process, a manual install may be performed by skipping the command below and copying out the contents ofHIVE_SRC_ROOT/build/odbc/lib
andHIVE_SRC_ROOT/build/odbc/include
into their local file system counterparts.$ sudo ant install -Dthrift.home=<THRIFT_HOME>
NOTE: The compiled static library, libhiveclient.a, requires linking with stdc++ as well as thrift libraries to function properly.
NOTE: Currently, there is no way to specify non-system library and header directories to the unixODBC build process. Thus, the Hive client libraries and headers MUST be installed to a default system location in order for the unixODBC build process to detect these files. This issue may be remedied in the future.
unixODBC API Wrapper Build/Setup
After you have built and installed the Hive client, you can now install the unixODBC API wrapper:
In the unixODBC root directory, run the following command:
$ ./configure --enable-gui=no --prefix=<unixODBC_INSTALL_DIR>
If you encounter the the errors: "
redefinition of 'struct _hist_entry'
" or "previous declaration of 'add_history' was here
" then re-execute the configure with the following command:$ ./configure --enable-gui=no --enable-readline=no --prefix=<unixODBC_INSTALL_DIR>
To force the compilation of the unixODBC API wrapper into a non-native bit architecture, modify the CC and CXX environment variables to include the appropriate flags. For example:
$ CC="gcc -m32" CXX="g++ -m32" ./configure --enable-gui=no --enable-readline=no --prefix=<unixODBC_INSTALL_DIR>
Compile the unixODBC API wrapper with the following:
$ make
If you want to completely install unixODBC and all related drivers, run the following from the unixODBC root directory:
$ sudo make install
If your system complains about
undefined symbols
during unixODBC testing (such as withisql
orodbcinst
) after installation, try runningldconfig
to update your dynamic linker's runtime libraries.
If you only want to obtain the Hive ODBC driver shared object library:After compilation, the driver will be located at
<unixODBC_BUILD_DIR>/Drivers/hive/.libs/libodbchive.so.1.0.0
.
This may be copied to any other location as desired. Keep in mind that the Hive ODBC driver has a dependency on the Hive client shared object library:libhiveclient.so
andlibthrift.so.0
.
You can manually install the unixODBC API wrapper by doing the following:$ cp <unixODBC_BUILD_DIR>/Drivers/hive/.libs/libodbchive.so.1.0.0 <SYSTEM_INSTALL_DIR> $ cd <SYSTEM_INSTALL_DIR> $ ln -s libodbchive.so.1.0.0 libodbchive.so $ ldconfig
Connecting the Driver to a Driver Manager
This portion assumes that you have already built and installed both the Hive client and the unixODBC API wrapper shared libraries on the current machine. To connect the Hive ODBC driver to a previously installed Driver Manager (such as the one provided by unixODBC or a separate application):
- Locate the odbc.ini file associated with the Driver Manager (DM).
If you are installing the driver on the system DM, then you can run the following command to print the locations of DM configuration files.
$ odbcinst -j unixODBC 2.2.14 DRIVERS............: /usr/local/etc/odbcinst.ini SYSTEM DATA SOURCES: /usr/local/etc/odbc.ini FILE DATA SOURCES..: /usr/local/etc/ODBCDataSources USER DATA SOURCES..: /home/ehwang/.odbc.ini SQLULEN Size.......: 8 SQLLEN Size........: 8 SQLSETPOSIROW Size.: 8
- If you are installing the driver on an application DM, then you have to help yourself on this one . Hint: try looking in the installation directory of your application.
- Keep in mind that an application's DM can exist simultaneously with the system DM and will likely use its own configuration files, such as odbc.ini.
- Also, note that some applications do not have their own DMs and simply use the system DM.
Add the following section to the DM's corresponding odbc.ini:
[Hive] Driver = <path_to_libodbchive.so> Description = Hive Driver v1 DATABASE = default HOST = <Hive_server_address> PORT = <Hive_server_port> FRAMED = 0
Testing with ISQL
Once you have installed the necessary Hive ODBC libraries and added a Hive entry in your system's default odbc.ini, you will be able to interactively test the driver with isql:
$ isql -v Hive
If your system does not have isql, you can obtain it by installing the entirety of unixODBC. If you encounter an error saying that the shared libraries cannot be opened by isql, use the ldd
tool to ensure that all dynamic library dependencies are resolved and use the file
tool to ensure that isql and all necessary libraries are compiled into the same architecture (32 or 64 bit).
Build libodbchive.so for 3rd Party Driver Manager
If you want to build libodbchive.so for other Driver Manager (for example, MicroStrategy uses DataDirect ODBC libraries which contains its own Driver Manager), you need to configure and build libodbchive.so against that Driver Manager (libodbc.so and libodbcinst.so).
If you have the 3rd party Driver Manager installed, the easiest way to do that is to find the installation directory containing libodbc.so and libodbcinst.so, and set that directory to LD_LIBRARY_PATH. Then you need to run configure and make for the Hive ODBC driver. After you get the libodbchive.so, make sure the 3rd party application can access the dynamic library libodbchive.so, libthrift.so and libhiveclient.so (through LD_LIBRARY_PATH or ldconfig).
If you build libodbchive.so for the 3rd party Driver Manager, isql may not work with the same set of .so files. So you may need to compile a different libodbchive.so for each Driver Manager.
Troubleshooting
- Hive client build process
- "libthrift.a: could not read symbols: Bad value" or "relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object"?
- Try recompiling your Apache Thrift libraries with the -fPIC option for your C++ compiler
- "undefined reference to vtable" ?
- Make sure that your Apache Thrift libraries are being included from the proper Thrift directory and that it has the same architecture (32 or 64 bit) as the Hive client.
- Also, check to make sure you are providing a fully qualified path for the thrift.home parameter.
- In general,
ldd
,file
, andnm
are essential unix tools for debugging problems with shared object libraries. If you don't know what they are, useman
to get more details.
- "libthrift.a: could not read symbols: Bad value" or "relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object"?
Current Status
- Comments: Please keep in mind that this is still an initial version and is still very rough around the edges. However, it provides basic ODBC 3.51 API support for connecting, executing queries, fetching, etc. This driver has been successfully tested on 32-bit and 64-bit linux machines with iSQL. It has also been tested with partial success on enterprise applications such as MicroStrategy. Due to licensing reasons, the unixODBC API wrapper files will be uploaded as a separate JIRA attachment that will not be part of this code repository.
- Limitations:
- Only support for Linux operating systems
- No support for Unicode
- No support for asynchronous execution of queries
- Does not support pattern matching for functions such as SQLColumns and SQLTables; requires exact matches.
- Hive Server is currently not thread safe (see JIRA HIVE-80: https://issues.apache.org/jira/browse/HIVE-80). This will prevent the driver from safely making multiple connections to the same Hive Server. We need to resolve this issue to allow the driver to operate properly.
- Hive Server's getSchema() function seems to have trouble with certain types of queries (such as "SELECT * ..." or "EXPLAIN"), and so the Hive ODBC driver sometimes has difficulties with these queries as well.
ODBC API Function Support (does anyone know how to remove the linking from the function names?):
SQLAllocConnect
supported
SQLAllocEnv
supported
SQLAllocHandle
supported
SQLAllocStmt
supported
SQLBindCol
supported
SQLBindParameter
NOT supported
SQLCancel
NOT supported
SQLColAttribute
supported
SQLColumns
supported
SQLConnect
supported
SQLDescribeCol
supported
SQLDescribeParam
NOT supported
SQLDisconnect
supported
SQLDriverConnect
supported
SQLError
supported
SQLExecDirect
supported
SQLExecute
supported
SQLExtendedFetch
NOT supported
SQLFetch
supported
SQLFetchScroll
NOT supported
SQLFreeConnect
supported
SQLFreeEnv
supported
SQLFreeHandle
supported
SQLFreeStmt
supported
SQLGetConnectAttr
NOT supported
SQLGetData
supported (however, SQLSTATE not returning values)
SQLGetDiagField
NOT supported
SQLGetDiagRec
supported
SQLGetInfo
partially supported; (to get MSTR v9 running)
SQLMoreResults
NOT supported
SQLNumParams
NOT supported
SQLNumResultCols
supported
SQLParamOptions
NOT supported
SQLPrepare
supported; but does not permit parameter markers
SQLRowCount
NOT supported
SQLSetConnectAttr
NOT supported
SQLSetConnectOption
NOT supported
SQLSetEnvAttr
Limited support
SQLSetStmtAttr
NOT supported
SQLSetStmtOption
NOT supported
SQLTables
supported
SQLTransact
NOT supported