DEV Community

Seamoon Pandey
Seamoon Pandey

Posted on

Apache Hadoop on Ubuntu 22.04 (Beginner-Friendly, Java 8 Without Breaking Java 21)

Hadoop is still a core concept in data engineering — especially for understanding HDFS, YARN, and distributed systems.

The biggest blocker for beginners today is Java version conflicts.
Most modern systems run Java 17 or 21, while Hadoop prefers Java 8.

This guide shows you how to:

  • Install Hadoop from scratch
  • Use Java 8 only for Hadoop
  • Keep your system Java untouched
  • Avoid auto-start, battery drain, and common traps

What you’ll build

  • Hadoop 3.3.6
  • Single-node (pseudo-distributed)
  • Ubuntu 22.04
  • Java 8 for Hadoop only
  • Manual start/stop (laptop-friendly)

Prerequisites

  • Ubuntu 22.04
  • Internet connection
  • sudo access
  • No prior Hadoop knowledge required

Step 1: System preparation

sudo apt update
sudo apt install -y ssh rsync curl
Enter fullscreen mode Exit fullscreen mode

Hadoop relies on SSH — even on a single machine.


Step 2: Create a dedicated Hadoop user (recommended)

sudo adduser hadoop
sudo usermod -aG sudo hadoop
su - hadoop
Enter fullscreen mode Exit fullscreen mode

From here on, everything runs as the hadoop user.


Step 3: Enable passwordless SSH

Hadoop services communicate over SSH.

ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
ssh localhost
Enter fullscreen mode Exit fullscreen mode

If it logs in without asking for a password → you’re good.


Step 4: Java setup (important)

Hadoop works best with Java 8.
You do NOT need to uninstall Java 17/21.

Verify Java 8 exists:

ls /usr/lib/jvm/java-8-openjdk-amd64
Enter fullscreen mode Exit fullscreen mode

If it exists, continue.
(You can install it with sudo apt install openjdk-8-jdk if missing.)


Step 5: Download Hadoop

cd ~
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
tar -xvzf hadoop-3.3.6.tar.gz
mv hadoop-3.3.6 hadoop
Enter fullscreen mode Exit fullscreen mode

Step 6: Hadoop environment variables

Edit .bashrc:

nano ~/.bashrc
Enter fullscreen mode Exit fullscreen mode

Add at the bottom:

export HADOOP_HOME=$HOME/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Enter fullscreen mode Exit fullscreen mode

Apply:

source ~/.bashrc
Enter fullscreen mode Exit fullscreen mode

Step 7: Bind Hadoop to Java 8 (critical step)

This avoids breaking your other Java projects.

nano ~/hadoop/etc/hadoop/hadoop-env.sh
Enter fullscreen mode Exit fullscreen mode

Set:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Enter fullscreen mode Exit fullscreen mode

Verify:

hadoop version
Enter fullscreen mode Exit fullscreen mode

You should see:

Java version: 1.8.0
Enter fullscreen mode Exit fullscreen mode

Step 8: Hadoop configuration

Move into config directory:

cd ~/hadoop/etc/hadoop
Enter fullscreen mode Exit fullscreen mode

core-site.xml

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>
Enter fullscreen mode Exit fullscreen mode

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>dfs.name.dir</name>
    <value>file:/home/hadoop/hadoopdata/namenode</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>file:/home/hadoop/hadoopdata/datanode</value>
  </property>
</configuration>
Enter fullscreen mode Exit fullscreen mode

Create directories:

mkdir -p ~/hadoopdata/namenode ~/hadoopdata/datanode
Enter fullscreen mode Exit fullscreen mode

mapred-site.xml

cp mapred-site.xml.template mapred-site.xml
Enter fullscreen mode Exit fullscreen mode
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>
Enter fullscreen mode Exit fullscreen mode

yarn-site.xml

<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
</configuration>
Enter fullscreen mode Exit fullscreen mode

Step 9: Format the NameNode (run ONCE)

hdfs namenode -format
Enter fullscreen mode Exit fullscreen mode

⚠️ Running this again wipes HDFS metadata.


Step 10: Start Hadoop

start-dfs.sh
start-yarn.sh
Enter fullscreen mode Exit fullscreen mode

Verify:

jps
Enter fullscreen mode Exit fullscreen mode

Expected processes:

  • NameNode
  • DataNode
  • SecondaryNameNode
  • ResourceManager
  • NodeManager

Step 11: Access Web UIs

If these load → Hadoop is running correctly.


Step 12: Test HDFS

hdfs dfs -mkdir /test
hdfs dfs -put ~/.bashrc /test
hdfs dfs -ls /test
Enter fullscreen mode Exit fullscreen mode

Expected output:

Found 1 items
-rw-r--r-- 1 hadoop supergroup ...
Enter fullscreen mode Exit fullscreen mode

About the “native-hadoop” warning

You may see:

WARN NativeCodeLoader: Unable to load native-hadoop library
Enter fullscreen mode Exit fullscreen mode

This is normal.

  • Hadoop falls back to pure Java
  • No functionality is lost
  • Safe to ignore for learning & development

Does Hadoop auto-start on boot?

No.

Hadoop:

  • Does NOT register as a system service
  • Does NOT start after reboot
  • Uses zero resources unless manually started

To stop:

stop-dfs.sh
stop-yarn.sh
Enter fullscreen mode Exit fullscreen mode

Viola you're done.

Top comments (0)