4 merge requests!368Update prace.md to document the change from qprace to qprod as the default...,!367Update prace.md to document the change from qprace to qprod as the default...,!366Update prace.md to document the change from qprace to qprod as the default...,!303Předělání onboardovací dokumentace
Bringing up interface ib0: ib_ipoib device ib0 does not seem to be present, delaying initialization. [FAILED]
Bringing up interface ib0: ib_ipoib device ib0 does not seem to be present, delaying initialization. [FAILED]
... ibwarn: [2943] mad_rpc_open_port: can't open UMAD port ((null):0)
... ibwarn: [2943] mad_rpc_open_port: can't open UMAD port ((null):0)
```
```bash
* Zadat text pro projevení v monitoringu:
* Zadat text pro projevení v monitoringu:
Zadat vazbu na RB tiket:
Zadat vazbu na RB tiket:
```bash
```bash
for node in'nodeset -e cn[97,98]';do echo$node; qmgr -c'set node '$node' comment ="''date +%Y%M%d''/hrb33/ib0 down BR#3036"';done
for node in'nodeset -e cn[97,98]';do echo$node; qmgr -c'set node '$node' comment ="''date +%Y%M%d''/hrb33/ib0 down BR#3036"';done
```
```bash
Ověření na mgmt: Ověřit co je na nodu za joby:
Ověření na mgmt: Ověřit co je na nodu za joby:
```
```bash
rspbs --get-node-jobs |grep cn98
rspbs --get-node-jobs |grep cn98
```
```
...
@@ -50,15 +50,15 @@ login2 Log Alerts Active checks of the service have been disabled - only passive
...
@@ -50,15 +50,15 @@ login2 Log Alerts Active checks of the service have been disabled - only passive
* Ověření: v logu, kdo se v tu dobu přihlašoval:
* Ověření: v logu, kdo se v tu dobu přihlašoval:
```
```bash
grep-r"maw00" /var/log/ | less /var/log/secure .. login2 authpriv crit pam gdm-password: pam_succeed_if(gdm-password:auth): error retrieving information about user maw00 .. login2 authpriv err pam gdm-password: gkr-pam: error looking up user information for: maw00
grep-r"maw00" /var/log/ | less /var/log/secure .. login2 authpriv crit pam gdm-password: pam_succeed_if(gdm-password:auth): error retrieving information about user maw00 .. login2 authpriv err pam gdm-password: gkr-pam: error looking up user information for: maw00
```
```
* Smažeme přes nagios cmd:
* Smažeme přes nagios cmd:
```
```bash
for node innodeset -e login[1,2]; do echo "[date +%s`] PROCESS_SERVICE_CHECK_RESULT;$node;Log Alerts;0;OK." > /var/spool/nagios/nagios.cmd ; done
cn17 offline,job-busy /20150213/jir13/vadny disk cn18 BR#3060
cn18 offline /20150213/jir13/vadny disk BR#3060
```
* Dohledání informací o uzlu a párovém uzlu
```bash
/root/jose/admin_tools/get_node_info.sh cn131
/root/jose/admin_tools/get_node_info_v2.sh cn131
```
* Po obnově:
* Zařazení uzlu
```bash
pbsnodes -r cn131
```
* Odstranění komentáře uzlu
```bash
qmgr -c 'set node cn131 comment = ""'
```
## IB
## IB
Na akcelerovaných uzlech je z důvodu optimálního fungování Infinibandu zakázán jeden ze dvou portů IB HCA, viz.:cat /etc/rc.local
Na akcelerovaných uzlech je z důvodu optimálního fungování Infinibandu zakázán jeden ze dvou portů IB HCA, viz.:cat /etc/rc.local
```
```bash
[root@cn189 ~]# ibstat mlx4_1 CA 'mlx4_1' CA type: MT4099 Number of ports: 1 Firmware version: 2.11.500 Hardware version: 0 Node GUID: 0x08003800013a7058 System image GUID: 0x08003800013a705a Port 1: State: Down Physical state: Disabled Rate: 10 Base lid: 213 LMC: 0 SM lid: 12 Capability mask: 0x02514868 Port GUID: 0x08003800013a7059 Link layer: InfiniBand
[root@cn189 ~]# ibstat mlx4_1 CA 'mlx4_1' CA type: MT4099 Number of ports: 1 Firmware version: 2.11.500 Hardware version: 0 Node GUID: 0x08003800013a7058 System image GUID: 0x08003800013a705a Port 1: State: Down Physical state: Disabled Rate: 10 Base lid: 213 LMC: 0 SM lid: 12 Capability mask: 0x02514868 Port GUID: 0x08003800013a7059 Link layer: InfiniBand