How to run Hadoop 3.3 and Hive 3.1 in same node

2024-05-12

Hadoop ecosystem now comes with version 3.3, including hadoop, spark, kafka etc. And the recent Hive also comes with version 3.1.

 $ hadoop version
 Hadoop 3.3.4
   
 $ hive --version
 Hive 3.1.3

But Hive 3 still uses Java 8, while most other Hadoop stacks use Java 11. So how to install them in the same node? My way as follows just works.

First, My OS is ubuntu 22.04, with java 11 installed by default.

 # echo $JAVA_HOME
 /usr
   
 # java -version
 openjdk version "11.0.17" 2022-10-18

Well this java version doesn't work for Hive, which requires java 8 only.

So I have to install java 8 in local dir. Change to the user ID as I log into OS and run,

 $ sdk install java 8.0.302-open

This will install java 8 in local via sdkman. After installation please check,

 $ which java
 /home/xxx/.sdkman/candidates/java/current/bin/java
   
 $ java -version
 openjdk version "1.8.0_302"

And run this command to start hive,

 $ JAVA_HOME=/home/xxx/.sdkman/candidates/java/current/ && hive

Well this time Hive 3.1 could start correctly, but Hadoop 3.3 won't work since it seems require java 11.

So I have to change the configuration for Hadoop by editting the file "hadoop-env.cmd", set the following line.

 set JAVA_HOME=/usr

Restart hadoop then you should have made it workable.

Now re-login hive and issue the query command,

 hive> select count(distinct(name)) from books;
 OK
 287
 Time taken: 4.106 seconds, Fetched: 1 row(s)

As you see it works now.