Troubleshooting
This page contains troubleshooting tips to help find out the reason of issues.
Slurmrestd
Test Slurm slurmrestd
API is properly responding on Unix socket with this
command:
$ curl --silent --unix-socket /run/slurmrestd/slurmrestd.socket http://slurm/slurm/v0.0.40/diag | \
jq '.statistics | with_entries(select(.key | startswith("jobs")))'
{
"jobs_submitted": 385,
"jobs_started": 407,
"jobs_completed": 411,
"jobs_canceled": 0,
"jobs_failed": 0,
"jobs_pending": 0,
"jobs_running": 0
}
This command should print JSON output with current jobs statistics on the cluster.
Test Slurm accounting on in REST API with this command:
$ curl --silent --unix-socket /run/slurmrestd/slurmrestd.socket http://slurm/slurmdb/v0.0.40/config | \
jq .clusters[].nodes
"cn[1-4]"
This command should print the set of compute nodes in the cluster.
Logs of slurmrestd
are available with this command:
# journalctl --unit slurmrestd.service
Informational and debug messages can be filtered out to see only the errors with this command:
# journalctl --priority=notice --unit slurmrestd.service
Native Services
This section provides instructions to troubleshoot Slurm-web when running with
native services (ie. slurm-web-gateway.service
and
slurm-web-agent.service
).
Test Slurm-web gateway API is available with this command:
$ curl http://localhost:5012/api/version
Slurm-web gateway v4.0.0
Test Slurm-web agent API is available with this command:
$ curl http://localhost:5013/version
Slurm-web agent v4.0.0
Logs of native services are available with these commands:
# journalctl --unit slurm-web-agent.service
# journalctl --unit slurm-web-gateway.service
WSGI Services
This section provides instructions to troubleshoot Slurm-web when running as WSGI applications on production HTTP servers.
Test Slurm-web gateway API is available with this command:
$ curl http://localhost/api/version
Slurm-web gateway v4.0.0
Test Slurm-web agent API is available with this command:
$ curl http://localhost/agent/version
Slurm-web agent v4.0.0
Logs of uWSGI services are available with these commands:
# journalctl --unit slurm-web-agent-uwsgi.service
# journalctl --unit slurm-web-gateway-uwsgi.service
Check for possible errors logs of HTTP servers:
- Nginx
-
In file
/var/log/nginx/error.log
- Apache2
-
-
On Debian/Ubuntu: In file
/var/log/apache2/error.log
-
On RHEL (and compatible) and Fedora:
/var/log/httpd/error_log
-
- Caddy
-
Run this command:
# journalctl --unit caddy.service
LDAP Settings
The command
slurm-web-ldap-check
is
automatically installed with Slurm-web gateway component. This is a utility to
validate LDAP settings in gateway
configuration file.
Run this utility with this command:
# /usr/libexec/slurm-web/slurm-web-ldap-check
INFO ⸬ Running slurm-web-ldap-check
Found 10 user(s) in LDAP directory:
- sstevenson (Scott Stevenson) [users, admin]
- jwalls (Jennifer Walls) [users, biology]
- strevino (Samantha Trevino) [users, biology]
- cingram (Christopher Ingram) [users, biology]
- nlee (Nathan Lee) [users, biology]
- mdavis (Michael Davis) [users, physic, acoustic]
- mgardner (Micheal Gardner) [users, physic, acoustic]
- kthomas (Kevin Thomas) [users, physic, acoustic]
- clewis (Charles Lewis) [users, physic, optic]
- msantos (Michelle Santos) [users, physic, optic]
When LDAP is configured successfully, the command prints the list of users in LDAP directory with their groups memberships, as visible by Slurm-web gateway. In other cases, a message is printed to help diagnose the source of error.
More debug messages can be printed with these options:
# /usr/libexec/slurm-web/slurm-web-ldap-check --debug --debug-flags rfl
This notably adds all LDAP requests with filters sent to the LDAP server and all intermediate results.
Authorization Policy
To help understand roles and permissions granted by authorization policy on clusters, users can go in menu:Settings[Account] to view their permissions on clusters. For example:
In this example, the user cingram is member of users and biology groups in LDAP directory.
On cluster emulator, he is assigned roles special and user with
permissions on view-jobs
, view-qos
and view-stats
actions.
On cluster tiny, he is assigned roles admin and users with permissions on
view-accounts
, view-jobs
, view-nodes
, view-partitions
, view-qos
,
view-reservations
and view-stats
actions.