How to run Hadoop 3.3 and Hive 3.1 in same node
Hadoop ecosystem now comes with version 3.3, including hadoop, spark, kafka etc. And the recent Hive also comes with version 3.1.
$ hadoop version
Hadoop 3.3.4
$ hive --version
Hive 3.1.3
But Hive 3 still uses Java 8, while most other Hadoop stacks use Java 11. So how to install them in the same node? My way as follows just works.
First, My OS is ubuntu 22.04, with java 11 installed by default.
# echo $JAVA_HOME
/usr
# java -version
openjdk version "11.0.17" 2022-10-18
Well this java version doesn't work for Hive, which requires java 8 only.
So I have to install java 8 in local dir. Change to the user ID as I log into OS and run,
$ sdk install java 8.0.302-open
This will install java 8 in local via sdkman. After installation please check,
$ which java
/home/xxx/.sdkman/candidates/java/current/bin/java
$ java -version
openjdk version "1.8.0_302"
And run this command to start hive,
$ JAVA_HOME=/home/xxx/.sdkman/candidates/java/current/ && hive
Well this time Hive 3.1 could start correctly, but Hadoop 3.3 won't work since it seems require java 11.
So I have to change the configuration for Hadoop by editting the file "hadoop-env.cmd", set the following line.
set JAVA_HOME=/usr
Restart hadoop then you should have made it workable.
Now re-login hive and issue the query command,
hive> select count(distinct(name)) from books;
OK
287
Time taken: 4.106 seconds, Fetched: 1 row(s)
As you see it works now.