Hadoop and Ansible

Suyash garg

6 min readDec 1, 2020

Introduction

Hello in this article i am going to show how to setup Hadoop cluster using ansible

List of task we are going to perform today:-

Copy jdk and hadoop in master and slave nodes
Installing the hadoop and jdk in master and slave nodes
Going to configure both hdfs-site.xml or core-site.xml file in master and slave nodes
Formating master nodes
Starting master and slave nodes

Before proceeding further this my setup

5 RHEL (Red hat enterprise linux) 8 all are guest os and using virtualbox as a hypervisor software; in which 1 is controller node (where ansible software is installed) and 4 are target node; in which 1 is Master and 3 are Slave.

Working

Step 1:- Copying

I have already installed hadoop and jdk in my local directory of my controller node

Download hadoop-1.2.1-1.x86_64.rpm and jdk-8u171-linux-x64.rpm form here(you can use any version you like)

Now lets create a file let say docker_eg.yml (you can give any name you want) and add the following lines

- hosts: all
  tasks:
    - name: 'copying hadoop'
      copy:
        src: '/root/ansible_workspace/hadoop-1.2.1-1.x86_64.rpm' # i am giving full path for demo purpose you can also give relative path also if you like
        dest: '/root/'
    
    - name: 'copying jdk'
      copy:
        src: '/root/ansible_workspace/jdk-8u171-linux-x64.rpm'
        dest: '/root/'

run ansible-playbook hadoop_eg.yml -v

Step 2:- Installing Hadoop and jdk

For installing hadoop have to use command module because hadoop can’t be installed directly using package due to support reason

And jdk can be installed directly using default package module package now open hadoop_eg.yml and write

    - name: 'installing hadoop'
      command:
        cmd: 'rpm -i /root/hadoop-1.2.1-1.x86_64.rpm --force'
        warn: no
    
    - name: 'installing jdk'
      yum: # module used to maintain packages in redhat 
        name: '/root/jdk-8u171-linux-x64.rpm' # if we give the name of local file it will installed that rather than searching on internet
        state: present
        disable_gpg_check: yes # disabling gpg check because we don't have gpg key right now

run ansible-playbook hadoop_eg.yml -v

Step 3:- Copying hdfs-site.xml and core-site.xml

Now our next task is to create hdfs-site.xml and core-site.xml it’s always good practice to create file on controller node and then copy it to target node

now create a file name hdfs-site.xml and write the following lines

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration>
<property>
<name>dfs.{{node}}.dir</name>
<value>/{{node}}_dir</value>
</property>
</configuration>

{{node}} is a placeholder we are going to change it dynamically at the run time with the value of data and name depends on the type of node

now create a filename core-site.xml and write the following lines

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://{{master_ip}}:{{master_port}}</value>
</property>
</configuration>

same we are also going to overwrite this file to make program more dynamic

now it’s time to create some variable it’s always good practice to write all the variable in a separate file so let create a file say hadoop_eg_vars.yml and write the following lines

master_ip: '192.168.43.118' # i am going and make this node as master
ip_for_checking: '192.168.43.118' # we need this also due to some ansible draw back
master_port: '9001' # you can use any port number you like
node: 'data' # same as slave

now open hadoop_eg.yml and write the following line just after the hosts top of the file to use vars file

vars_files: 'hadoop_eg_vars.yml' # if you have var file in another folder give full path or relative path of the file

now come to bottom to copy hdfs-site.xml and core-site.xml with correct value on master node write the following lines

    - name: 'copying hdfs and core on master' 
      
      vars: 
        node: 'name'
        master_ip: '0.0.0.0'
      
      block: # used to block multiple statement together
        - template: # this module help us to replace placeholder with actually value
            src: '/root/ansible_workspace/hdfs-site.xml'
            dest: '/etc/hadoop/'
        - template:
            src: '/root/ansible_workspace/core-site.xml'
            dest: '/etc/hadoop/'
      
       when: ansible_facts["default_ipv4"]["address"] == ip_for_checking # conditions that only run when target node is master node otherwise not

ansible_facts is a predefined variable that store all the detail about the target node

before executing file let write code for slave node also

- name: 'copying hdfs and core on slave' 
      
      block:
        - template:
            src: '/root/ansible_workspace/hdfs-site.xml'
            dest: '/etc/hadoop/'
        - template:
            src: '/root/ansible_workspace/core-site.xml'
            dest: '/etc/hadoop/'
      
       when: ansible_facts['default_ipv4']['address'] != master_ip

run ansible-playbook hadoop_eg.yml -v

Step 4:- Formating master node

For formating master node open hadoop_eg.yml and write the following lines

    - name: 'formating master'
      command:
        cmd: 'hadoop namenode -format'
        warn: no
      when: ansible_facts['default_ipv4']['address'] == master_ip

before running this let write the code to start nodes

Step 5:- Starting nodes

For starting master and name nodes open hadoop_eg.yml and write the following lines

    - name: 'starting namenode'
      command:
        cmd: 'hadoop-daemon.sh start namenode'
        warn: no
      when: ansible_facts['default_ipv4']['address'] == master_ip    - name: 'starting datanode'
      command:
        cmd: 'hadoop-daemon.sh start datanode'
        warn: no
      when: ansible_facts['default_ipv4']['address'] != master_ip

run ansible-playbook hadoop_eg.yml -v

now wait for 1 minutes so that all the node is connected

go to master node and run hadoop dfsadmin -report command to check the status of the cluster

as we can see three data node is successfully connected to our cluster and there ip also

Conclusion

Find all the code files here ➡️ https://github.com/suyash222/asnible_hadoop_eg/

Thank for everyone to reading my article till end if you have any doubt please comment if you have any suggestion please mail all comment both positive and negative is more than welcomed

Contact Detail

LinkeDin [https://www.linkedin.com/in/suyash-garg-50245b1b7]

Additional Tags

#vimaldaga #righteducation #educationredefine #rightmentor #worldrecordholder #linuxworld #makingindiafutureready #righeudcation #arthbylw #ansible #facts #inventory #webserver #hadoop

Hadoop and Ansible

Introduction

Working

Step 1:- Copying

Step 2:- Installing Hadoop and jdk

Step 3:- Copying hdfs-site.xml and core-site.xml

Step 4:- Formating master node

Step 5:- Starting nodes

Conclusion

Conclusion

Contact Detail

Additional Tags

Written by Suyash garg