Hadoop is still a core concept in data engineering — especially for understanding HDFS, YARN, and distributed systems.
The biggest blocker for beginners today is Java version conflicts.
Most modern systems run Java 17 or 21, while Hadoop prefers Java 8.
This guide shows you how to:
- Install Hadoop from scratch
- Use Java 8 only for Hadoop
- Keep your system Java untouched
- Avoid auto-start, battery drain, and common traps
What you’ll build
- Hadoop 3.3.6
- Single-node (pseudo-distributed)
- Ubuntu 22.04
- Java 8 for Hadoop only
- Manual start/stop (laptop-friendly)
Prerequisites
- Ubuntu 22.04
- Internet connection
- sudo access
- No prior Hadoop knowledge required
Step 1: System preparation
sudo apt update
sudo apt install -y ssh rsync curl
Hadoop relies on SSH — even on a single machine.
Step 2: Create a dedicated Hadoop user (recommended)
sudo adduser hadoop
sudo usermod -aG sudo hadoop
su - hadoop
From here on, everything runs as the hadoop user.
Step 3: Enable passwordless SSH
Hadoop services communicate over SSH.
ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
ssh localhost
If it logs in without asking for a password → you’re good.
Step 4: Java setup (important)
Hadoop works best with Java 8.
You do NOT need to uninstall Java 17/21.
Verify Java 8 exists:
ls /usr/lib/jvm/java-8-openjdk-amd64
If it exists, continue.
(You can install it with sudo apt install openjdk-8-jdk if missing.)
Step 5: Download Hadoop
cd ~
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
tar -xvzf hadoop-3.3.6.tar.gz
mv hadoop-3.3.6 hadoop
Step 6: Hadoop environment variables
Edit .bashrc:
nano ~/.bashrc
Add at the bottom:
export HADOOP_HOME=$HOME/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Apply:
source ~/.bashrc
Step 7: Bind Hadoop to Java 8 (critical step)
This avoids breaking your other Java projects.
nano ~/hadoop/etc/hadoop/hadoop-env.sh
Set:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Verify:
hadoop version
You should see:
Java version: 1.8.0
Step 8: Hadoop configuration
Move into config directory:
cd ~/hadoop/etc/hadoop
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:/home/hadoop/hadoopdata/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:/home/hadoop/hadoopdata/datanode</value>
</property>
</configuration>
Create directories:
mkdir -p ~/hadoopdata/namenode ~/hadoopdata/datanode
mapred-site.xml
cp mapred-site.xml.template mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Step 9: Format the NameNode (run ONCE)
hdfs namenode -format
⚠️ Running this again wipes HDFS metadata.
Step 10: Start Hadoop
start-dfs.sh
start-yarn.sh
Verify:
jps
Expected processes:
- NameNode
- DataNode
- SecondaryNameNode
- ResourceManager
- NodeManager
Step 11: Access Web UIs
- HDFS: http://localhost:9870
- YARN: http://localhost:8088
If these load → Hadoop is running correctly.
Step 12: Test HDFS
hdfs dfs -mkdir /test
hdfs dfs -put ~/.bashrc /test
hdfs dfs -ls /test
Expected output:
Found 1 items
-rw-r--r-- 1 hadoop supergroup ...
About the “native-hadoop” warning
You may see:
WARN NativeCodeLoader: Unable to load native-hadoop library
This is normal.
- Hadoop falls back to pure Java
- No functionality is lost
- Safe to ignore for learning & development
Does Hadoop auto-start on boot?
No.
Hadoop:
- Does NOT register as a system service
- Does NOT start after reboot
- Uses zero resources unless manually started
To stop:
stop-dfs.sh
stop-yarn.sh
Viola you're done.
Top comments (0)