Troubleshooting

This page contains troubleshooting tips to help find out the reason of issues.

Slurmrestd

Test Slurm slurmrestd API is properly responding on Unix socket with this command:

$ curl --silent --unix-socket /run/slurmrestd/slurmrestd.socket http://slurm/slurm/v0.0.40/diag | \
  jq '.statistics | with_entries(select(.key | startswith("jobs")))'
{
  "jobs_submitted": 385,
  "jobs_started": 407,
  "jobs_completed": 411,
  "jobs_canceled": 0,
  "jobs_failed": 0,
  "jobs_pending": 0,
  "jobs_running": 0
}

This command should print JSON output with current jobs statistics on the cluster.

Test Slurm accounting on in REST API with this command:

$ curl --silent --unix-socket /run/slurmrestd/slurmrestd.socket http://slurm/slurmdb/v0.0.40/config | \
  jq .clusters[].nodes
"cn[1-4]"

This command should print the set of compute nodes in the cluster.

Logs of slurmrestd are available with this command:

# journalctl --unit slurmrestd.service

Informational and debug messages can be filtered out to see only the errors with this command:

# journalctl --priority=notice --unit slurmrestd.service

Native Services

This section provides instructions to troubleshoot Slurm-web when running with native services (ie. slurm-web-gateway.service and slurm-web-agent.service).

Test Slurm-web gateway API is available with this command:

$ curl http://localhost:5012/api/version
Slurm-web gateway v4.0.0

Test Slurm-web agent API is available with this command:

$ curl http://localhost:5013/version
Slurm-web agent v4.0.0

Logs of native services are available with these commands:

# journalctl --unit slurm-web-agent.service
# journalctl --unit slurm-web-gateway.service

WSGI Services

This section provides instructions to troubleshoot Slurm-web when running as WSGI applications on production HTTP servers.

Test Slurm-web gateway API is available with this command:

$ curl http://localhost/api/version
Slurm-web gateway v4.0.0

Test Slurm-web agent API is available with this command:

$ curl http://localhost/agent/version
Slurm-web agent v4.0.0

Logs of uWSGI services are available with these commands:

# journalctl --unit slurm-web-agent-uwsgi.service
# journalctl --unit slurm-web-gateway-uwsgi.service

Check for possible errors logs of HTTP servers:

Nginx

In file /var/log/nginx/error.log

Apache2
  • On Debian/Ubuntu: In file /var/log/apache2/error.log

  • On RHEL (and compatible) and Fedora: /var/log/httpd/error_log

Caddy

Run this command:

# journalctl --unit caddy.service

LDAP Settings

The command slurm-web-ldap-check is automatically installed with Slurm-web gateway component. This is a utility to validate LDAP settings in gateway configuration file.

Run this utility with this command:

# /usr/libexec/slurm-web/slurm-web-ldap-check
INFO ⸬ Running slurm-web-ldap-check
Found 10 user(s) in LDAP directory:
- sstevenson (Scott Stevenson) [users, admin]
- jwalls (Jennifer Walls) [users, biology]
- strevino (Samantha Trevino) [users, biology]
- cingram (Christopher Ingram) [users, biology]
- nlee (Nathan Lee) [users, biology]
- mdavis (Michael Davis) [users, physic, acoustic]
- mgardner (Micheal Gardner) [users, physic, acoustic]
- kthomas (Kevin Thomas) [users, physic, acoustic]
- clewis (Charles Lewis) [users, physic, optic]
- msantos (Michelle Santos) [users, physic, optic]

When LDAP is configured successfully, the command prints the list of users in LDAP directory with their groups memberships, as visible by Slurm-web gateway. In other cases, a message is printed to help diagnose the source of error.

More debug messages can be printed with these options:

# /usr/libexec/slurm-web/slurm-web-ldap-check --debug --debug-flags rfl

This notably adds all LDAP requests with filters sent to the LDAP server and all intermediate results.

Authorization Policy

To help understand roles and permissions granted by authorization policy on clusters, users can go in menu:Settings[Account] to view their permissions on clusters. For example:

screenshot perms

In this example, the user cingram is member of users and biology groups in LDAP directory.

On cluster emulator, he is assigned roles special and user with permissions on view-jobs, view-qos and view-stats actions.

On cluster tiny, he is assigned roles admin and users with permissions on view-accounts, view-jobs, view-nodes, view-partitions, view-qos, view-reservations and view-stats actions.