Hadoop requires SSH access to manage its nodes. For our single-node setup of Hadoop, we therefore need to configure SSH access to localhost for the hduser user we created in the previous section.
Install the SSH server and client:
sudo apt-get install openssh-client
sudo apt-get install openssh-server
Configuring password less SSH:
We need to configure SSH access to localhost for the hduser user
sudo gedit /etc/ssh/sshd_config
Note : Set PubkeyAuthentication to Yes
Once the changes are made reload ssh
sudo /etc/init.d/ssh reload
First, we have to generate an SSH key for the hduser user.
To generate SSH key:
Enter the password of the hduser
Hadoop uses SSH (to access its nodes) which would normally require the user to enter a password. However, this requirement can be eliminated by creating and setting up SSH certificates using the following commands. If asked for a filename just leave it blank and press the enter key to continue.
ssh-keygen -t rsa -P “”
The second line will create an RSA key pair with an empty password. Generally, using an empty password is not recommended, but in this case it is needed to unlock the key without your interaction (you don’t want to enter the passphrase every time Hadoop interacts with its nodes).
I prefer the first option.
To enable SSH access to local machine with this newly created key
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
The final step is to test the SSH setup by connecting to your local machine with the hduser user. The step is also needed to save your local machine’s host key fingerprint to the hduser user’s known_hosts file. If you have any special SSH configuration for your local machine like a non-standard SSH port, you can define host-specific SSH options in $HOME/.ssh/config (see man ssh_config for more information).
To test the SSH setup:
If the SSH connect fails, these general tips might help:
- Enable debugging with ssh -vvv localhost and investigate the error in detail.
- Check the SSH server configuration in /etc/ssh/sshd_config, in particular the options PubkeyAuthentication (which should be set to yes) and AllowUsers (if this option is active, add the hduser user to it). If you made any changes to the SSH server configuration file, you can force a configuration reload with sudo /etc/init.d/ssh reload.
Before cluster set up, we need to configure our nodes, since we are using a single- node cluster we need to configure only one node. Follow the below steps in all nodes:
Step 1: Sync all the nodes with a time zone using NTP (Network Time Protocol)
This will ensure that your server is operating under the correct time zone.
Step 2: Set the localization settings for your server and configure the Network Time Protocol (NTP) synchronization.
This will configure your system to synchronize its system clock to the standard time maintained by a global network of NTP servers.