HOW TO CONFIGURE SSH FOR HADOOP

Hadoop requires SSH access to manage its nodes. For our single-node setup of Hadoop, we therefore need to configure SSH access to localhost for the hduser user we created in the previous section.

Install the SSH server and client:

sudo apt-get install openssh-client

sudo apt-get install openssh-server

Configuring password less SSH:

We need to configure SSH access to localhost for the hduser user

 sudo gedit /etc/ssh/sshd_config

Note : Set PubkeyAuthentication to Yes

Once the changes are made reload ssh

sudo /etc/init.d/ssh reload

First, we have to generate an SSH key for the hduser user.

To generate SSH key:

su hduser

Enter the password of the hduser

Hadoop uses SSH (to access its nodes) which would normally require the user to enter a password. However, this requirement can be eliminated by creating and setting up SSH certificates using the following commands. If asked for a filename just leave it blank and press the enter key to continue.

ssh-keygen

(or)

ssh-keygen -t rsa -P “”

The second line will create an RSA key pair with an empty password. Generally, using an empty password is not recommended, but in this case it is needed to unlock the key without your interaction (you don’t want to enter the passphrase every time Hadoop interacts with its nodes).

I prefer the first option.

1

To enable SSH access to local machine with this newly created key

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

The final step is to test the SSH setup by connecting to your local machine with the hduser user. The step is also needed to save your local machine’s host key fingerprint to the hduser user’s known_hosts file. If you have any special SSH configuration for your local machine like a non-standard SSH port, you can define host-specific SSH options in $HOME/.ssh/config (see man ssh_config for more information).

To test the SSH setup:

ssh localhost

2

If the SSH connect fails, these general tips might help:

  • Enable debugging with ssh -vvv localhost and investigate the error in detail.
  • Check the SSH server configuration in /etc/ssh/sshd_config, in particular the options PubkeyAuthentication (which should be set to yes) and AllowUsers (if this option is active, add the hduser user to it). If you made any changes to the SSH server configuration file, you can force a configuration reload with sudo /etc/init.d/ssh reload.

Before cluster set up, we need to configure our nodes, since we are using a single- node cluster we need to configure only one node. Follow the below steps in all nodes:

Step 1: Sync all the nodes with a time zone using NTP (Network Time Protocol)

This will ensure that your server is operating under the correct time zone.

Step 2: Set the localization settings for your server and configure the Network Time Protocol (NTP) synchronization.

This will configure your system to synchronize its system clock to the standard time maintained by a global network of NTP servers.

Configure Time zones and Network Time Protocol Synchronization will help prevent some inconsistent behavior that can arise from out-of-sync clocks.

Advertisements

One thought on “HOW TO CONFIGURE SSH FOR HADOOP

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s