jmarturi « Clúster Big Data

Lo primero es generar un par de claves pública-privada para el acceso SSH en el nodo administrador:

# ssh-keygen -t rsa -N '' -f .ssh/id_rsa
Generating public/private rsa key pair.
Your identification has been saved in .ssh/id_rsa
Your public key has been saved in .ssh/id_rsa.pub

# ssh-keygen -t rsa -N '' -f .ssh/id_rsa

Generating public/private rsa key pair.

Your identification has been saved in .ssh/id_rsa

Your public key has been saved in .ssh/id_rsa.pub

A continuación distribuimos la clave pública a cada nodo del futuro clúster:

# ssh-copy-id -f -i .ssh/id_rsa.pub root@hadoop-master1
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: ".ssh/id_rsa.pub"
root@hadoop-master1's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'root@hadoop-master1'"
and check to make sure that only the key(s) you wanted were added.

# ssh-copy-id -f -i .ssh/id_rsa.pub root@hadoop-master1

/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: ".ssh/id_rsa.pub"

root@hadoop-master1's password:

Number of key(s) added: 1

Now try logging into the machine, with: "ssh 'root@hadoop-master1'"

and check to make sure that only the key(s) you wanted were added.

Y comprobamos que el acceso ssh mediante clave funciona:

# ssh root@hadoop-master1

1	# ssh root@hadoop-master1

El nodo administrador tendrá, entre otras, la función de ejecutar un servidor Ambari para el despliegue y posterior gestión del clúster. Por lo tanto hemos de instalar y configurar dicho servidor Ambari. La instalación se realiza de la siguiente forma:

# apt update
# apt install ambari-server

1 2	# apt update # apt install ambari-server

Nótese que al realizar esta instalación se ha instalado un servidor postgresql.

Una vez realizada la instalación realizaremos la configuración del servidor con el comando ambari-server setup:

# ambari-server setup
Using python /usr/bin/python
Setup ambari-server
Checking SELinux...
WARNING: Could not run /usr/sbin/sestatus: OK
Customize user account for ambari-server daemon [y/n] (n)? n
Adjusting ambari-server permissions and ownership...
Checking firewall status...
Checking JDK...
Do you want to change Oracle JDK [y/n] (n)? n
Check JDK version for Ambari Server...
JDK version found: 8
Minimum JDK version is 8 for Ambari. Skipping to setup different JDK for Ambari Server.
Checking GPL software agreement...
Completing setup...
Configuring database...
Enter advanced database configuration [y/n] (n)? n
Configuring database...
Default properties detected. Using built-in database.
Configuring ambari database...
Checking PostgreSQL...
Configuring local database...
Configuring PostgreSQL...
Backup for pg_hba found, reconfiguration not required
Creating schema and user...
done.
Creating tables...
done.
Extracting system views...
Adjusting ambari-server permissions and ownership...
Ambari Server 'setup' completed successfully.

# ambari-server setup

Using python /usr/bin/python

Setup ambari-server

Checking SELinux...

WARNING: Could not run /usr/sbin/sestatus: OK

Customize user account for ambari-server daemon [y/n] (n)? n

Adjusting ambari-server permissions and ownership...

Checking firewall status...

Checking JDK...

Do you want to change Oracle JDK [y/n] (n)? n

Check JDK version for Ambari Server...

JDK version found: 8

Minimum JDK version is 8 for Ambari. Skipping to setup different JDK for Ambari Server.

Checking GPL software agreement...

Completing setup...

Configuring database...

Enter advanced database configuration [y/n] (n)? n

Configuring database...

Default properties detected. Using built-in database.

Configuring ambari database...

Checking PostgreSQL...

Configuring local database...

Configuring PostgreSQL...

Backup for pg_hba found, reconfiguration not required

Creating schema and user...

done.

Creating tables...

done.

Extracting system views...

Adjusting ambari-server permissions and ownership...

Ambari Server 'setup' completed successfully.

Nótese las respuestas dadas a las diferentes preguntas del proceso de configuración.

Finalmente, para instalar el stack BigTop 3.1.1 y que esté disponible al desplegar el clúster, hemos de instalar el “management pack” de BigTop:

# apt install bigtop-ambari-mpack bigtop-utils
# service ambari-server stop
# env -u _JAVA_HOME ambari-server install-mpack --mpack=/usr/lib/bigtop-ambari-mpack/bgtp-ambari-mpack-1.0.0.0-SNAPSHOT-bgtp-ambari-mpack.tar.gz --verbose 
# service ambari-server start

# apt install bigtop-ambari-mpack bigtop-utils

# service ambari-server stop

# env -u _JAVA_HOME ambari-server install-mpack --mpack=/usr/lib/bigtop-ambari-mpack/bgtp-ambari-mpack-1.0.0.0-SNAPSHOT-bgtp-ambari-mpack.tar.gz --verbose

# service ambari-server start

El paso más laborioso de la instalación de los sistemas operativos sobre los servidores es la configuración de los discos, particiones y sistemas de archivo. La configuración se ha realizado de acuerdo a lo especificado en la configuración del almacenamiento de nodos master y workers.

A continuación puede observarse como han quedado las particiones, sistemas de archivo y su montaje después de la instalación en el nodo master:

# lsblk
NAME                   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                      8:0    0   2.2T  0 disk
├─vg3-lv--zoo--db      253:7    0     1T  0 lvm  /hadoop/zookeeper
└─vg3-lv--reserved     253:8    0   1.2T  0 lvm
sdb                      8:16   0   2.2T  0 disk
├─vg2-lv--zoo--journal 253:5    0     1T  0 lvm  /journal/zookeeper
└─vg2-lv--dfs--journal 253:6    0   1.2T  0 lvm  /journal/hdfs
sdc                      8:32   0 447.1G  0 disk
├─sdc1                   8:33   0   1.1G  0 part /boot/efi
├─sdc2                   8:34   0     1G  0 part /boot
├─sdc3                   8:35   0   443G  0 part
│ ├─vg0-lv--root       253:2    0   100G  0 lvm  /
│ ├─vg0-lv--home       253:3    0    10G  0 lvm  /home
│ └─vg0-lv--var        253:4    0   333G  0 lvm  /var
└─sdc4                   8:36   0     2G  0 part [SWAP]
sdd                      8:48   0   4.4T  0 disk
└─sdd1                   8:49   0   4.4T  0 part
  ├─vg1-lv--dfs        253:0    0   2.2T  0 lvm  /hadoop/hdfs
  └─vg1-lv--pgsql      253:1    0   2.2T  0 lvm  /var/lib/pgsql

# lsblk

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

sda 8:0 0 2.2T 0 disk

├─vg3-lv--zoo--db 253:7 0 1T 0 lvm /hadoop/zookeeper

└─vg3-lv--reserved 253:8 0 1.2T 0 lvm

sdb 8:16 0 2.2T 0 disk

├─vg2-lv--zoo--journal 253:5 0 1T 0 lvm /journal/zookeeper

└─vg2-lv--dfs--journal 253:6 0 1.2T 0 lvm /journal/hdfs

sdc 8:32 0 447.1G 0 disk

├─sdc1 8:33 0 1.1G 0 part /boot/efi

├─sdc2 8:34 0 1G 0 part /boot

├─sdc3 8:35 0 443G 0 part

│ ├─vg0-lv--root 253:2 0 100G 0 lvm /

│ ├─vg0-lv--home 253:3 0 10G 0 lvm /home

│ └─vg0-lv--var 253:4 0 333G 0 lvm /var

└─sdc4 8:36 0 2G 0 part [SWAP]

sdd 8:48 0 4.4T 0 disk

└─sdd1 8:49 0 4.4T 0 part

├─vg1-lv--dfs 253:0 0 2.2T 0 lvm /hadoop/hdfs

└─vg1-lv--pgsql 253:1 0 2.2T 0 lvm /var/lib/pgsql

A continuación puede observarse lo propio en los nodos worker:

#  lsblk
NAME             MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                8:0    0   2.2T  0 disk
└─sda1             8:1    0   2.2T  0 part /hadoop/hdfs/data6
sdb                8:16   0   2.2T  0 disk
└─sdb1             8:17   0   2.2T  0 part /hadoop/hdfs/data5
sdc                8:32   0   2.2T  0 disk
└─sdc1             8:33   0   2.2T  0 part /hadoop/hdfs/data4
sdd                8:48   0   2.2T  0 disk
└─sdd1             8:49   0   2.2T  0 part /hadoop/hdfs/data2
sde                8:64   0   2.2T  0 disk
└─sde1             8:65   0   2.2T  0 part /hadoop/hdfs/data3
sdf                8:80   0   2.2T  0 disk
└─sdf1             8:81   0   2.2T  0 part /hadoop/hdfs/data1
sdg                8:96   0 447.1G  0 disk
├─sdg1             8:97   0   1.1G  0 part /boot/efi
├─sdg2             8:98   0     1G  0 part /boot
├─sdg3             8:99   0   443G  0 part
│ ├─vg0-lv--home 253:0    0    10G  0 lvm  /home
│ ├─vg0-lv--root 253:1    0   100G  0 lvm  /
│ └─vg0-lv--var  253:2    0   333G  0 lvm  /var
└─sdg4             8:100  0     2G  0 part [SWAP]

# lsblk

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

sda 8:0 0 2.2T 0 disk

└─sda1 8:1 0 2.2T 0 part /hadoop/hdfs/data6

sdb 8:16 0 2.2T 0 disk

└─sdb1 8:17 0 2.2T 0 part /hadoop/hdfs/data5

sdc 8:32 0 2.2T 0 disk

└─sdc1 8:33 0 2.2T 0 part /hadoop/hdfs/data4

sdd 8:48 0 2.2T 0 disk

└─sdd1 8:49 0 2.2T 0 part /hadoop/hdfs/data2

sde 8:64 0 2.2T 0 disk

└─sde1 8:65 0 2.2T 0 part /hadoop/hdfs/data3

sdf 8:80 0 2.2T 0 disk

└─sdf1 8:81 0 2.2T 0 part /hadoop/hdfs/data1

sdg 8:96 0 447.1G 0 disk

├─sdg1 8:97 0 1.1G 0 part /boot/efi

├─sdg2 8:98 0 1G 0 part /boot

├─sdg3 8:99 0 443G 0 part

│ ├─vg0-lv--home 253:0 0 10G 0 lvm /home

│ ├─vg0-lv--root 253:1 0 100G 0 lvm /

│ └─vg0-lv--var 253:2 0 333G 0 lvm /var

└─sdg4 8:100 0 2G 0 part [SWAP]

Deshabilitar Transparent Huge Pages (THP) reduce el uso elevado de CPU que causa. Se realiza mediante la gestión del servicio asociado.

# systemctl daemon-reload
# systemctl start disable-transparent-huge-pages
# cat /sys/kernel/mm/transparent_hugepage/enabled 
always madvise [never]
# cat /sys/kernel/mm/transparent_hugepage/defrag 
always defer defer+madvise madvise [never]
# systemctl enable disable-transparent-huge-pages
Created symlink /etc/systemd/system/basic.target.wants/disable-transparent-huge-pages.services.service. /etc/systemd/system/disable-transparent-huge-page

# systemctl daemon-reload

# systemctl start disable-transparent-huge-pages

# cat /sys/kernel/mm/transparent_hugepage/enabled

always madvise [never]

# cat /sys/kernel/mm/transparent_hugepage/defrag

always defer defer+madvise madvise [never]

# systemctl enable disable-transparent-huge-pages

Created symlink /etc/systemd/system/basic.target.wants/disable-transparent-huge-pages.services.service. /etc/systemd/system/disable-transparent-huge-page

Si dicho servicio no existe, habrá que crear previamente su archivo de definición de servicio para poder manejarlo:

[Unit]
Description=Disable Transparent Huge Pages (THP)
DefaultDependencies=no
After=sysinit.target local-fs.target

[Service]
Type=oneshot
ExecStart=/bin/sh -c 'echo never | tee /sys/kernel/mm/transparent_hugepage/enabled &gt; 
/dev/null &amp;&amp; echo never | tee /sys/kernel/mm/transparent_hugepage/defrag &gt; /dev/null'

[Install]
WantedBy=basic.target

[Unit]

Description=Disable Transparent Huge Pages (THP)

DefaultDependencies=no

After=sysinit.target local-fs.target

[Service]

Type=oneshot

ExecStart=/bin/sh -c 'echo never | tee /sys/kernel/mm/transparent_hugepage/enabled >

/dev/null && echo never | tee /sys/kernel/mm/transparent_hugepage/defrag > /dev/null'

[Install]

WantedBy=basic.target

Se recomienda la reducción del swapping en todos los nodos, así como la deshabilitación del protocolo IP versión 6.

# /etc/sysctl.conf - Configuration file for setting system variables 
# See /etc/sysctl.d/ for additional system variables.
# See sysctl.conf (5) for information.
#
#kernel.domainname = example.com
# Uncomment the following to stop low-level messages on console 
#kernel.printk = 3 4 1 3
#########################################################
# Configuration for hadoop cluster deployment
# To help detect unreachable nodes with less latency
net.ipv4.tcp_retries2=5
# To reduce swapping 
vm.swappiness=1
# To disable IPv6
net.ipv6.conf.all.disable_ipv6 = 1 
net.ipv6.conf.default.disable_ipv6: = 1 
net.ipv6.conf.lo.disable_ipv6= 1

# /etc/sysctl.conf - Configuration file for setting system variables

# See /etc/sysctl.d/ for additional system variables.

# See sysctl.conf (5) for information.

#kernel.domainname = example.com

# Uncomment the following to stop low-level messages on console

#kernel.printk = 3 4 1 3

#########################################################

# Configuration for hadoop cluster deployment

# To help detect unreachable nodes with less latency

net.ipv4.tcp_retries2=5

# To reduce swapping

vm.swappiness=1

# To disable IPv6

net.ipv6.conf.all.disable_ipv6 = 1

net.ipv6.conf.default.disable_ipv6: = 1

net.ipv6.conf.lo.disable_ipv6= 1

La configuración por defecto del firewall es demasiado restrictiva para cualquier despliegue de Hadoop. Si el clúster de Big Data dispone de una red propia, asegurada y aislada no hay necesidad de un cortafuegos adicional en cada sistema.

# ufw status 
Status: inactive
# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source           destination
Chain FORWARD (policy ACCEPT)
target prot opt source           destination
Chain OUTPUT (policy ACCEPT)
target prot opt source           destination

# ufw status

Status: inactive

# iptables -L

Chain INPUT (policy ACCEPT)

target prot opt source destination

Chain FORWARD (policy ACCEPT)

target prot opt source destination

Chain OUTPUT (policy ACCEPT)

target prot opt source destination

Ajustar el parámetro tcp_retries para la red del sistema permite una detección más rápida de nodos que fallan. En cada nodo, establecer el valor de este parámetro a 5 puede ayudar a detectar nodos inaccesibles con menos latencia. La configuración se realiza en el archivo /etc/sysctl.conf.

#
# /etc/sysctl.conf - Configuration file for setting system variables 
# See /etc/sysctl.d/ for additional system variables.
# See sysctl.conf (5) for information.
#
#kernel.domainname = example.com
# Uncomment the following to stop low-level messages on console #kernel.printk = 3 4 1 3
###########################
# Configuration for hadoop cluster deployment to help detect #unreachable nodes with less latency 
net.ipv4.tcp_retries2=5
###########################
# Functions previously found in netbase
#

# /etc/sysctl.conf - Configuration file for setting system variables

# See /etc/sysctl.d/ for additional system variables.

# See sysctl.conf (5) for information.

#kernel.domainname = example.com

# Uncomment the following to stop low-level messages on console #kernel.printk = 3 4 1 3

###########################

# Configuration for hadoop cluster deployment to help detect #unreachable nodes with less latency

net.ipv4.tcp_retries2=5

###########################

# Functions previously found in netbase

# vim /etc/sysctl.conf
# sysctl -w net.ipv4.tcp_retries2=5 
net.ipv4.tcp_retries2 = 5
# sysctl net.ipv4.tcp_retries2
net.ipv4.tcp_retries2 = 5

# vim /etc/sysctl.conf

# sysctl -w net.ipv4.tcp_retries2=5

net.ipv4.tcp_retries2 = 5

# sysctl net.ipv4.tcp_retries2

net.ipv4.tcp_retries2 = 5

AppArmor debe estar deshabilitado durante la instalación y configuración del clúster. Puede habilitarse posteriormente, después de la instalación y mientras el clúster está corriendo.

# systemctl disable apparmor
Synchronizing state of apparmor.service with SysV service script with
 /lib/systemd/systemd-sysv-install. Executing: /lib/systemd/systemd-sysv-install disable apparmor
Removed /etc/systemd/system/sysinit.target.wants/apparmor.service.
# reboot

# systemctl disable apparmor

Synchronizing state of apparmor.service with SysV service script with

/lib/systemd/systemd-sysv-install. Executing: /lib/systemd/systemd-sysv-install disable apparmor

Removed /etc/systemd/system/sysinit.target.wants/apparmor.service.

# reboot

# service apparmor status
• apparmor.service- Load AppArmor profiles
Loaded: loaded (/lib/systemd/system/apparmor.service; disabled; vendor preset: enabled) 
  Active: inactive (dead)
    Docs: man: apparmor(7)
     https://gitlab.com/apparmor/apparmor/wikis/home/
# apparmor_status
apparmor module is loaded.
15 profiles are loaded.
15 profiles are in enforce mode.
/snap/snapd/18357/usr/lib/snapd/snap-confine
/snap/snapd/18357/usr/lib/snapd/snap-confine//mount-namespace-capture-helper
snap-update-ns.lxd
snap.lxd.activate
snap.1xd.benchmark
snap.lxd.buginfo
snap.lxd.check-kernel
snap.1xd.daemon
snap.1xd.hook.configure
snap.lxd.hook.install
snap.1xd.hook.remove
snap.1xd.1xc
snap.lxd.lxc-to-lxd snap.1xd.lxd
snap.lxd.migrate
0 profiles are in complain mode.
0 processes have profiles defined. 
0 processes are in enforce mode.
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.

# service apparmor status

• apparmor.service- Load AppArmor profiles

Loaded: loaded (/lib/systemd/system/apparmor.service; disabled; vendor preset: enabled)

Active: inactive (dead)

Docs: man: apparmor(7)

https://gitlab.com/apparmor/apparmor/wikis/home/

# apparmor_status

apparmor module is loaded.

15 profiles are loaded.

15 profiles are in enforce mode.

/snap/snapd/18357/usr/lib/snapd/snap-confine

/snap/snapd/18357/usr/lib/snapd/snap-confine//mount-namespace-capture-helper

snap-update-ns.lxd

snap.lxd.activate

snap.1xd.benchmark

snap.lxd.buginfo

snap.lxd.check-kernel

snap.1xd.daemon

snap.1xd.hook.configure

snap.lxd.hook.install

snap.1xd.hook.remove

snap.1xd.1xc

snap.lxd.lxc-to-lxd snap.1xd.lxd

snap.lxd.migrate

0 profiles are in complain mode.

0 processes have profiles defined.

0 processes are in enforce mode.

0 processes are in complain mode.

0 processes are unconfined but have a profile defined.

En cada nodo, ulimit -n especifica el número de i-nodos que pueden abrirse simultáneamente. Con el valor por defecto de 1024, el sistema parece que no dispone de espacio en disco y muestra que no tiene i-nodos disponibles. Este valor debería establecerse a 64000 en cada nodo. El archivo a configurar es /etc/security/limits.conf.

#[domain]      [type]      [item]       [value]
#
#*            soft         core         0
#root         hard         core         100000
#*            hard         rss          10000
#@student     hard         nproc        20
#@faculty     soft         nproc        20
#@faculty     hard         nproc        50
#ftp          hard         nproc        0
#ftp          -            chroot       /ftp
#@student     -            maxlogins    4

root         soft          nofile       64000
root         hard          nofile       64000

#[domain] [type] [item] [value]

#* soft core 0

#root hard core 100000

#* hard rss 10000

#@student hard nproc 20

#@faculty soft nproc 20

#@faculty hard nproc 50

#ftp hard nproc 0

#ftp - chroot /ftp

#@student - maxlogins 4

root soft nofile 64000

root hard nofile 64000

Syslog debe estar habilitado en cada nodo del clúster para preservar los archivos de log referidos a procesos y trabajos finalizados de firma abrupta o que han fallado.

# service rsyslog status
•rsyslog.service System Logging Service
Loaded: loaded (/lib/systemd/system/rsyslog.service; enabled; vendor preset: enabled) 
Active: active (running) since Tue 2023-12-19 10:59:38 UTC; 1h 56min ago
TriggeredBy: syslog.socket
Docs: man: rsyslogd (8)
https://www.rsyslog.com/doc/
Main PID: 1604 (rsyslogd)
Tasks: 4 (limit: 308999)
Memory: 3.6M
CGroup: /system.slice/rsyslog.service
L1604 /usr/sbin/rsyslogd -n -iNONE
Dec 19 10:59:38 hadoop-worker3 systemd[1]: Starting System Logging Service...
Dec 19 10:59:38 hadoop-worker3 rsyslogd[1604]: imuxsock: Acquired UNIX socket
 '/run/systemd/journal/syslog' Dec 19 10:59:38 
hadoop-worker3 rsyslogd[1604]: rsyslogd's groupid changed to 110
Dec 19 10:59:38 hadoop-worker3 rsyslogd[1604]: rsyslogd's userid changed to 104
Dec 19 10:59:38 hadoop-worker3 rsyslogd[1604]: [origin software="rsyslogd"
 swVersion="8.2001.0" x-pid="1604" Dec 19 10:59:38
 hadoop-worker3 systemd[1]: Started System Logging Service.

# service rsyslog status

•rsyslog.service System Logging Service

Loaded: loaded (/lib/systemd/system/rsyslog.service; enabled; vendor preset: enabled)

Active: active (running) since Tue 2023-12-19 10:59:38 UTC; 1h 56min ago

TriggeredBy: syslog.socket

Docs: man: rsyslogd (8)

https://www.rsyslog.com/doc/

Main PID: 1604 (rsyslogd)

Tasks: 4 (limit: 308999)

Memory: 3.6M

CGroup: /system.slice/rsyslog.service

L1604 /usr/sbin/rsyslogd -n -iNONE

Dec 19 10:59:38 hadoop-worker3 systemd[1]: Starting System Logging Service...

Dec 19 10:59:38 hadoop-worker3 rsyslogd[1604]: imuxsock: Acquired UNIX socket

'/run/systemd/journal/syslog' Dec 19 10:59:38

hadoop-worker3 rsyslogd[1604]: rsyslogd's groupid changed to 110

Dec 19 10:59:38 hadoop-worker3 rsyslogd[1604]: rsyslogd's userid changed to 104

Dec 19 10:59:38 hadoop-worker3 rsyslogd[1604]: [origin software="rsyslogd"

swVersion="8.2001.0" x-pid="1604" Dec 19 10:59:38

hadoop-worker3 systemd[1]: Started System Logging Service.

Clúster Big Data

Conoce y accede a toda la información sobre el clúster Hadoop de Tartanga

Author Archives: jmarturi

Tarea D: Configurar acceso SSH sin contraseña

Tarea M: Instalar y configurar servidor Ambari

Tarea A.1: Instalación, discos y sistemas de archivo (incluye tareas J y K)

Tarea I.8: Configuración pre-despliegue, THP

Tarea I.7: Configuración pre-despliegue, swapping e IPv6

Tarea I.6: Configuración pre-despliegue, firewall

Tarea I.5: Configuración pre-despliegue, tcp_retries

Tarea I.4: Configuración pre-despliegue, AppArmor

Tarea I.3: Configuración pre-despliegue, ulimit

Tarea I.2: Configuración pre-despliegue, syslog