Hadoop and Ansible
Introduction
Hello in this article i am going to show how to setup Hadoop cluster using ansible
List of task we are going to perform today:-
- Copy jdk and hadoop in master and slave nodes
- Installing the hadoop and jdk in master and slave nodes
- Going to configure both hdfs-site.xml or core-site.xml file in master and slave nodes
- Formating master nodes
- Starting master and slave nodes
Before proceeding further this my setup
5 RHEL (Red hat enterprise linux) 8 all are guest os and using virtualbox as a hypervisor software; in which 1 is controller node (where ansible software is installed) and 4 are target node; in which 1 is Master and 3 are Slave.
Working
Step 1:- Copying
I have already installed hadoop and jdk in my local directory of my controller node
Download hadoop-1.2.1-1.x86_64.rpm and jdk-8u171-linux-x64.rpm form here(you can use any version you like)
Now lets create a file let say docker_eg.yml (you can give any name you want) and add the following lines
- hosts: all
tasks:
- name: 'copying hadoop'
copy:
src: '/root/ansible_workspace/hadoop-1.2.1-1.x86_64.rpm' # i am giving full path for demo purpose you can also give relative path also if you like
dest: '/root/'
- name: 'copying jdk'
copy:
src: '/root/ansible_workspace/jdk-8u171-linux-x64.rpm'
dest: '/root/'
run ansible-playbook hadoop_eg.yml -v
Step 2:- Installing Hadoop and jdk
For installing hadoop have to use command module because hadoop can’t be installed directly using package due to support reason
And jdk can be installed directly using default package module package now open hadoop_eg.yml and write
- name: 'installing hadoop'
command:
cmd: 'rpm -i /root/hadoop-1.2.1-1.x86_64.rpm --force'
warn: no
- name: 'installing jdk'
yum: # module used to maintain packages in redhat
name: '/root/jdk-8u171-linux-x64.rpm' # if we give the name of local file it will installed that rather than searching on internet
state: present
disable_gpg_check: yes # disabling gpg check because we don't have gpg key right now
run ansible-playbook hadoop_eg.yml -v
Step 3:- Copying hdfs-site.xml and core-site.xml
Now our next task is to create hdfs-site.xml and core-site.xml it’s always good practice to create file on controller node and then copy it to target node
now create a file name hdfs-site.xml and write the following lines
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration>
<property>
<name>dfs.{{node}}.dir</name>
<value>/{{node}}_dir</value>
</property>
</configuration>
{{node}} is a placeholder we are going to change it dynamically at the run time with the value of data and name depends on the type of node
now create a filename core-site.xml and write the following lines
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://{{master_ip}}:{{master_port}}</value>
</property>
</configuration>
same we are also going to overwrite this file to make program more dynamic
now it’s time to create some variable it’s always good practice to write all the variable in a separate file so let create a file say hadoop_eg_vars.yml and write the following lines
master_ip: '192.168.43.118' # i am going and make this node as master
ip_for_checking: '192.168.43.118' # we need this also due to some ansible draw back
master_port: '9001' # you can use any port number you like
node: 'data' # same as slave
now open hadoop_eg.yml and write the following line just after the hosts top of the file to use vars file
vars_files: 'hadoop_eg_vars.yml' # if you have var file in another folder give full path or relative path of the file
now come to bottom to copy hdfs-site.xml and core-site.xml with correct value on master node write the following lines
- name: 'copying hdfs and core on master'
vars:
node: 'name'
master_ip: '0.0.0.0'
block: # used to block multiple statement together
- template: # this module help us to replace placeholder with actually value
src: '/root/ansible_workspace/hdfs-site.xml'
dest: '/etc/hadoop/'
- template:
src: '/root/ansible_workspace/core-site.xml'
dest: '/etc/hadoop/'
when: ansible_facts["default_ipv4"]["address"] == ip_for_checking # conditions that only run when target node is master node otherwise not
ansible_facts is a predefined variable that store all the detail about the target node
before executing file let write code for slave node also
- name: 'copying hdfs and core on slave'
block:
- template:
src: '/root/ansible_workspace/hdfs-site.xml'
dest: '/etc/hadoop/'
- template:
src: '/root/ansible_workspace/core-site.xml'
dest: '/etc/hadoop/'
when: ansible_facts['default_ipv4']['address'] != master_ip
run ansible-playbook hadoop_eg.yml -v
Step 4:- Formating master node
For formating master node open hadoop_eg.yml and write the following lines
- name: 'formating master'
command:
cmd: 'hadoop namenode -format'
warn: no
when: ansible_facts['default_ipv4']['address'] == master_ip
before running this let write the code to start nodes
Step 5:- Starting nodes
For starting master and name nodes open hadoop_eg.yml and write the following lines
- name: 'starting namenode'
command:
cmd: 'hadoop-daemon.sh start namenode'
warn: no
when: ansible_facts['default_ipv4']['address'] == master_ip - name: 'starting datanode'
command:
cmd: 'hadoop-daemon.sh start datanode'
warn: no
when: ansible_facts['default_ipv4']['address'] != master_ip
run ansible-playbook hadoop_eg.yml -v
now wait for 1 minutes so that all the node is connected
go to master node and run hadoop dfsadmin -report
command to check the status of the cluster
as we can see three data node is successfully connected to our cluster and there ip also
Conclusion
Conclusion
Find all the code files here ➡️ https://github.com/suyash222/asnible_hadoop_eg/
Thank for everyone to reading my article till end if you have any doubt please comment if you have any suggestion please mail all comment both positive and negative is more than welcomed
Contact Detail
LinkeDin [https://www.linkedin.com/in/suyash-garg-50245b1b7]
Additional Tags
#vimaldaga #righteducation #educationredefine #rightmentor #worldrecordholder #linuxworld #makingindiafutureready #righeudcation #arthbylw #ansible #facts #inventory #webserver #hadoop