Hadoop installation

https://www.edureka.co/blog/setting-up-a-multi-node-cluster-in-hadoop-2-x/

vmhost1: This is where name node, clouder manager and many other roles are located.

vmhost2: Data Node

vmhost3: Data Node


replication factor is 3

OS

CPU

memory

network bandwidth

storage

Pre-Steps Before the Installation

1. Set SELinux policy to diasabled. Modify the following parameter in /etc/selinux/config file.

2. Disable firewall.

chkconfig iptables off

3. Set swappiness to 0 in /etc/sysctl.conf file.

Disable IPV6 in /etc/sysctl.conf file.

4. Disable IPV6 in /etc/sysctl.conf file.

5. Configure passwordless SSH for root user.

1. Download and Run the Cloudera Manager Server Installer using wget cmd

you will pop up a browser window and point to http://localhost:7180/

2. Logon Screen

3. Choose Version 

4. specify hostname for cdh clster installation

5. Select Repository

6.Java Installation

7.SSH Login Credentials

8.cluster installtaion--detect version--finish screen

9.Role Assignment --hdfs,hive

10.Database Setup-review-complete-finish

#### Cloud Era Installation on Google Cloud Platform GCP #### 

You can use any domain name here it is san.com  (better copy below steps to notepad)

Create 4 machines with same configuration (because there is a quota restriction) like below 

You have to select same zone for all systems (otherwise systems won’t communicate with private ip) 

1.Goto ->create an instance in GCP (google cloud platform) and follow below steps-NRMBFN 

  1. Name (any name) 

  1. Region,Zone-any region (any region but same for all nodes)-las vegas west4a 

  1. Machine configuration-E2-8Gb 

  1. Boot Disk-centOS7-50Gb 

  1. Firewall-Allow http and https 

  1. Networking-cdh1.san.com------->Create 

2.edit config file /etc/ssh/sshd_config as root user –ALL 4-Nodes 

Make sure the below parameters are there  

cdh1 ~]$ sudo su – root 

[root@cdh1 ~]# vi /etc/ssh/sshd_config +38 

PasswordAuthentication yes     (line number 38) :38 

PermitRootLogin yes                    (line number 65) :65 

Restart sshd 

[root@cdh1 ~]# systemctl restart sshd 

Reset your password (san123) -help you to login as root in putty 

[root@cdh1 ~]# passwd 

 3.Create rc.local file and set hostname (All 4 nodes)- 

touch /etc/rc.d/rc.local  

chmod u+x /etc/rc.d/rc.local 

systemctl enable  rc-local 

setup the hostname for all the nodes:- 

hostnamectl set-hostname cdh1.san.com 

hostnamectl set-hostname cdh2.san.com 

hostnamectl set-hostname cdh3.san.com 

hostnamectl set-hostname cdh4.san.com 

  

  

 4.Change runlevel in all the 4-nodes to multi-user.target 

systemctl set-default multi-user.target 

5.Disable selinux and firewall in all the 4-nodes 

[root@cdh1 ~]# vi /etc/selinux/config +7  

Chane SELINUX to disabled 

SELINUX=disabled  

[root@localhost ~]# systemctl disable firewalld 

6.Log in to the first node cdh1 and install the MySQL metastore (used for Colud era Manager Server(SCM)Database, Hive, OOzie..) Install MySQL and the MySQL community server, and start the MySQL service: (first copy in notepad) 

yum localinstall \ 

[root@cdh1 ~]#yum install mysql-community-server -y 

use the below settings in /etc/my.cnf (:1,$d to remove all contents)copy paste below contents 

[root@cdh1 ~]#vi /etc/my.cnf 

[root@cdh1 ~]#systemctl enable mysqld.service 

[root@cdh1 ~]#systemctl start mysqld.service 

check the running status using  

[root@cdh1 ~]#systemctl status mysqld.service 

7.If root password is not presnt in the log file , use the below commands to reset the password 

============================================================== 

[root@cdh1 ~]# sudo systemctl stop mysqld 

[root@cdh1 ~]# sudo systemctl set-environment MYSQLD_OPTS="--skip-grant-tables" 

[root@cdh1 ~]# sudo systemctl unset-environment MYSQLD_OPTS 

[root@cdh1 ~]# sudo systemctl set-environment MYSQLD_OPTS="--skip-grant-tables" 

[root@cdh1 ~]# sudo systemctl start mysqld 

[root@cdh1 ~]# mysql -u root 

mysqlupdate mysql.user set plugin='mysql_native_password'; 

mysqlUPDATE mysql.user SET authentication_string = PASSWORD('Bigdata123!') WHERE User = 'root'; 

mysqlFLUSH PRIVILEGES; 

mysqlquit 

Login to the Mysql: 

[root@cdh1 ~]# mysql -u root –p 

mysqlquit 

=====================================================================  

8.setup /etc/hosts with the below entries  (communication between systems)-change ips  

[root@localhost ~]# vi /etc/hosts   (replace below ips with your private ips or internal ips from GCP) 

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6 

10.182.0.4 cdh1.san.com  cdh1 

10.182.0.5 cdh2.san.com  cdh2 

10.182.0.6 cdh3.san.com  cdh3 

10.182.0.7 cdh4.san.com  cdh4 

Reboot all the 4-nodes 

[root@cdh1 ~]# reboot 

Reset the root password of all the servers to "san123" in all the nodes 

Configure passwordless SSH between the master  and slave nodes. 

login to  CDH1 node (gerate ssh key and cross check it) 

[root@cdh1 ~]# yum install sshpass –y             (for non-interactive password authentication) 

rm -rf ~/.ssh/id_rsa* 

ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa 

ls -ltr ~/.ssh 

Now do [root@cdh1 ~]# ssh cdh1,2,3,4 (in order to add those keys) from CDH1 

Copy rsa public key to all the nodes in the cluster, use the below command:-(10 =ip and san123 pwd) 

[root@lcdh1]# for i in `cat /etc/hosts | grep 10 | awk '{print $1}'`; do sshpass -p san123 ssh-copy-id $i; done 

Now if you login any node from cdh1 it will not ask for password 

9.Install and Configure cluster shell in cdh1 node only:- 

[root@cdh1 ~]# yum install epel* -y 

[root@cdh1 ]yum install clustershell -y  

Make the below config file:- (for NN,SNN and DN configuration) 

[root@cdh1 ~]# vi /etc/clustershell/groups.d/local.cfg    (delete existing and replace ips of cdh1,2,3,4) 

nn: 10.182.0.4 

snn: 10.182.0.5   

dn: 10.182.0.5 10.182.0.6 10.182.0.7   

edge: 10.182.0.7   

all: 10.182.0.4 10.182.0.5 10.182.0.6 10.182.0.7 

[root@hdp1 etc]# clush -g all  -b "date"    (it should show 4 machines time here) if it is woking cluster shell config is good) 

10.Install and Configure NTP server:- enforcing all the machines to get the time from ntp server 

[root@cdh1 ~]# clush -g all -b "yum install ntp -y" 

[root@cdh1 ~]# clush -g all  -b "systemctl enable ntpd" 

[root@cdh1 ~]# clush -g all  -b "systemctl restart ntpd" 

 [root@cdh1 ~]# cat /etc/ntp.conf (here GCP by default add ntp server) 

Check the time in all the nodes and make sure time in sync 

[root@cdh1 ~]# clush -g all -b "date" 

 Copy /etc/hosts to all the nodes in the cluster:- (for communication between systems) 

[root@cdh1 ~]# clush -g all  --copy /etc/hosts (cross check whether it is copied or not) 

Disable firewall runnning on all the nodes:- (for easy communication) 

[root@cdh1 ~]# clush -g all  -b "systemctl disable firewalld" 

[root@cdh1 ~]# clush -g all  -b "systemctl stop firewalld" 

 11.Install java in all the 4-nodes:- (jdk is required for installing hadoop framework) 

[root@cdh1 ~]# clush -g all  -b "yum install java-1.8.0-openjdk.x86_64 -y" 

Make sure java installed on all the nodes by excuting the below command:-  

[root@cdh1 ~]# clush -g all  -b "java -version" 

Install MySQL Java Connector in all 4-nodes 

[root@cdh1 ~]# clush -g all  -b "yum -y install mysql-connector-java" 

After installation, check whether mysql-connector-java.jar is present in /usr/share/java/.  

[root@cdh1 ~]# clush -g all  -b "ls -al /usr/share/java/mysql-connector-java.jar" 

Disable ipv6 in all the nodes (Add the below entries):- 

[root@cdh1 ~]#sudo vi /etc/sysctl.conf 

net.ipv6.conf.all.disable_ipv6 = 1 

net.ipv6.conf.default.disable_ipv6 = 1 

net.ipv6.conf.lo.disable_ipv6 = 1 

net.ipv6.conf.enp0s3.disable_ipv6 = 1 

vm.swappiness = 10 

Copy the same file to all other nodes:- 

[root@cdh1 ~]# clush -g all  --copy /etc/sysctl.conf 

In all the 4-nodes execute the below commands:- (prerequisites) 

----------------------------------------------------------------------------------------------- 

cd /etc/yum.repos.d 

sudo echo never > /sys/kernel/mm/transparent_hugepage/defrag 

sudo yum -y install wget 

sudo yum -y install createrepo     

sudo yum -y install yum-utils createrepo 

sudo yum -y install MySQL-python* 

sudo yum -y install python* 

sudo yum -y install httpd 

sudo yum -y install telnet 

sudo yum -y install bind* 

sudo yum -y install openssh* 

sudo yum -y install rpmdevtools 

sudo yum -y install ntp* 

sudo yum -y install redhat-lsb* 

sudo yum -y install cyrus* 

sudo yum -y install mod_ssl* 

sudo yum -y install portmap* 

sudo yum -y install openssl* 

sudo yum -y install mlocate* 

sudo yum -y install sshpass* 

sudo yum -y remove snappy 

sudo yum -y install gcc 

sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo 

sudo yum -y install apache-maven 

sudo updatedb 

sudo systemctl disable firewalld.service 

sudo systemctl enable firewalld.service 

sudo systemctl enable httpd 

sudo systemctl enable ntpd 

sudo systemctl enable ntpdate 

sudo systemctl status firewalld.service 

sudo systemctl status httpd 

sudo systemctl status ntpd 

 ---------------------------------------------------------------------------------------------- 

12.Download all RPMs for java and cloudera manager and keep them in /var/www/html/cm in cdh1  

[root@cdh1 ~]#yum install httpd -y 

Enable and start the httpd service 

[root@cdh1 ~]#systemctl enable httpd 

[root@cdh1 ~]#systemctl start httpd 

Download all the required RPMs for the cloudera  manager from the url:- (prerequisites) 

  

  

[root@cdh1 ~]#mkdir /var/www/html/cm 

[root@cdh1 ~]#cd /var/www/html/cm  (copy in notepad and execute below cmds) 

Download the cloudera parcels from the url:- 

[root@cdh1 html]# mkdir -p /var/www/html/CDH5.16.2/parcels 

[root@cdh1 html]# cd /var/www/html/CDH5.16.2/parcels 

[root@cdh1 parcels]# mv CDH-5.16.2-1.cdh5.16.2.p0.8-el7.parcel.sha1 CDH-5.16.2-1.cdh5.16.2.p0.8-el7.parcel.sha 

Create the yum repo using the below command:- 

[root@cdh1 html]# createrepo /var/www/html/cm 

[root@cdh1 parcels]# createrepo /var/www/html/CDH5.16.2/parcels 

Create cloudera manager repocitory (replace cdh1 internal ips below) 

[root@cdh1 cm]# vi /etc/yum.repos.d/cloudera-manager.repo 

[cloudera-manager] 

name = Cloudera Manager Version 5.16.2 

gpgcheck = 1 

[root@cdh1 parcels]# vi /etc/yum.repos.d/cloudera-cdh.repo 

[cloudera-cdh5] 

name=Clouderaparcels-5.16.2_LocalRepo 

baseurl=http://10.128.0.17/CDH5.16.2/parcels 

enabled=1 

gpgcheck=0 

make a directory:- 

[root@cdh1 parcels]# mkdir -p /opt/cloudera/parcels 

copy the parcels to /opt/cloudera/parcels directory:- 

[root@cdh1 parcels]#  cp /var/www/html/CDH5.16.2/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8-el7.parcel.sha /opt/cloudera/parcels/ 

[root@cdh1 parcels]#  cp /var/www/html/CDH5.16.2/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8-el7.parcel /opt/cloudera/parcels/ 

Copy the repocitory to all the nodes in the cluster:- 

[root@cdh1 cm]# clush -g all  --copy /etc/yum.repos.d/cloudera-cdh.repo 

[root@cdh1 cm]# clush -g all  --copy /etc/yum.repos.d/cloudera-manager.repo 

Install the below packages in the first node cdh1 only 

[root@cdh1 cm]# yum install cloudera-manager-agent cloudera-manager-daemons cloudera-manager-server -y 

  

  

  

13.Prepare the cloudera manager database:- 

Login to the mysql database and create the below databases:- 

[root@cdh1 cm]# mysql -u root -p 

Enter password: 

mysqlshow databases; 

ERROR 1820 (HY000): You must reset your password using ALTER USER statement before executing this statement. 

mysql> SET PASSWORD = PASSWORD('Bigdata123!'); or /usr/bin/mysql_secure_installation 

 mysqlgrant all on *.* TO 'root'@'%' IDENTIFIED BY 'Bigdata123!' WITH GRANT OPTION; 

mysql>  uninstall plugin validate_password;    (inorder to create easy password) 

 ----------------------------------------------------------------------------------------- 

create database amon DEFAULT CHARACTER SET utf8; 

grant all on amon.* to 'amon'@'%' identified by 'root@123#'; 

create database scm DEFAULT CHARACTER SET utf8; 

grant all on scm.* to 'scm'@'%' identified by 'root@123#'; 

create database rman DEFAULT CHARACTER SET utf8; 

grant all on rman.* to 'rman'@'%' identified by 'root@123#'; 

create database metastore DEFAULT CHARACTER SET utf8; 

grant all on metastore.* to 'hive'@'%' identified by 'root@123#'; 

create database sentry DEFAULT CHARACTER SET utf8; 

grant all on sentry.* to 'sentry'@'%' identified by 'root@123#'; 

create database nav DEFAULT CHARACTER SET utf8; 

grant all on nav.* to 'nav'@'%' identified by 'root@123#'; 

create database navms DEFAULT CHARACTER SET utf8; 

grant all on navms.* to 'navms'@'%' identified by 'root@123#'; 

create database hue DEFAULT CHARACTER SET utf8; 

grant all on hue.* to 'hue'@'%' identified by 'root@123#'; 

create database oozie DEFAULT CHARACTER SET utf8; 

grant all on oozie.* to 'oozie'@'%' identified by 'root@123#'; 

create database hive DEFAULT CHARACTER SET utf8; 

grant all on hive.* to 'hive'@'%' identified by 'root@123#'; 

flush privileges; 

 -----------------------quit---------------------------------------------------------------------------------- 

 Create schema for SCM database: (coludera uses the SCM DB for saving meta data information) 

Login to the node cdh1 where your MYSQL is installed. 

[root@cdh1 cm]# /usr/share/cmf/schema/scm_prepare_database.sh mysql scm scm 

Enter SCM password:root@123# 

Output last line:All done, your SCM database is configured correctly 

[root@cdh1 cm]# sudo cat /etc/cloudera-scm-server/db.properties      (CDH SCM DB loading credentilas 

Output last linecom.cloudera.cmf.db.password=root@123# 

14.After this, restart cloudera-scm-server in cdh1 node 

[root@cdh1 cm]#systemctl enable cloudera-scm-server 

[root@cdh1 cm]#systemctl restart cloudera-scm-server 

[root@cdh1 cm]#systemctl status cloudera-scm-server 

Now check the cloudera manager by logging in (backend it will create schema and tables)-5min 

[root@cdh1 cm]# tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log (log of CDH SCM server) 

Output first line :MetricSchemaManager: Registering cross entity aggregates... 

Wait for five minute , cloudera manager demon will listen on port 7180. 

  

  

telnet cm.san.com  7180 

  

15.access cloudera manager by using the below link (use your external Ip) 

 Open firewall for port 7180 in GCP –go to GCP and create firewall rule 

  1. Create new firewall GCP rule -Should do it every restart of GCP

  1. Name,target(all instances in your n/w),source ip range(myipadss.com)---create 

Login cloudera portel using your CDH1-public ip and (uname :admin pwd:admin) 

 

Select - Cloudera Enterprise Cloudera Enterprise Trial-60 days-continue-contine 

 

[root@cdh1 ~]# cat /etc/hosts    (get ips and paste in window) 

10.182.0.4,10.182.0.5,10.182.0.6,10.182.0.7 (sreach these ips)-continue 

[root@cdh1 yum.repos.d]# cat cloudera-manager.repo 

 

 

Continue-->continue  

 

 

 

 

[root@cdh1 ~]# clush -g all  -b "echo never > /sys/kernel/mm/transparent_hugepage/defrag" 

[root@cdh1 ~]# clush -g all  -b "echo never > /sys/kernel/mm/transparent_hugepage/enabled" 

 

 

Below is Name Node I.e cdh4 

 

 

Below is Secondary Name Node I.e cdh3 

 

Balancer is cdh4 

 

NFS gateway in edge node I.e cdh1 

 

Select Data nodes as 1,2,3 

 

 

 

 

 

 

 

 

 

 

 

 

Cross check cluster is running or not 

 

 

 

Now shutdown GCP machines 

[root@cdh1 ~]# clush -g all  -b "poweroff" 

When again open GCP machines your external ip will change 
So again, you have to create firewall rule in order to access cloud era 

 

 

     

You can execute a smaple map reduce job using the below command:- 

  

[root@cdh7 cm]#hadoop jar /opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/jars/hadoop-examples.jar wordcount  /user/balaji/1015mbfile /user/balaji/out3 

  

 

 

 

----------------------------------------------------------------------------------------------------------------------------- 

Errors 

 

 

 

 


Start GCP nodes and do below steps after every restart of GCP systems

  1. Create new firewall GCP rule -Should do it every restart of GCP

  1. Name,target(all instances in your n/w),source ip range(myipadss.com)---create 

Login cloudera portel using your CDH1-public ip and (uname :admin pwd:admin) 

No comments:

Post a Comment