Overview
Discover most advanced features of Slurm-web.
Dashboard
Slurm-web includes a dashboard with high-level metrics to get quick insight of HPC clusters status and operation:
Multi-Clusters
Slurm-web can be deployed on a central server to monitor all HPC clusters in your organization from a unique interface.
From anywhere in the interface, you can jump to another cluster and easily compare there statuses:
Jobs Status
Easily visualize jobs status with colored badges and quickly spot possible failures.
Slurm-web represents Slurm jobs status with a visual colored badge. This really helps to figure out status of the jobs queue at a glance. Never miss errors when they occur!
Jobs filters and sorting
Jobs queue can be filtered by many criteria (job state, user, account, QOS, partition) and sorted by priority, ID, state, user, etc…
Filters can be applied and removed instantly with just a few clicks. It becomes really trivial to observe specific job flows and better understand Slurm scheduling.
Live Jobs Status
Slurm-web gives the possibility to track specific jobs during their lifetime with live updates:
Watch your jobs running with visual representation of their progress.
Nodes Status
Live status of the compute nodes can be visualized in an advanced interactive graphical representation of the racks based on data extracted from RacksDB. Just move the mouse pointer over a specific node to get all details:
Filters can be applied to quickly figure out nodes out of production:
The cluster status can be displayed in fullscreen to get constant overview of its health and activity.
Advanced Reservations
Resources can be pre-allocated for a particular usage in Slurm with advanced reservations. Slurm-web displays these reservations with their resources, duration, authorized users and accounts:
QOS
Slurm supports QOS with many features and plenty of parameters. Slurm-web displays the defined QOS in a synthetic way:
It becomes easy to spot differences between QOS and change limits to adjust the scheduling policy. The user interface includes built-in help messages to easily understand involved limits:
Reactive
Slurm-web interface is continuously updated in near real-time with fresh data fetched from clusters. Tables and diagrams are updated atomically with latest changes. You never need to reload pages.
Responsive
Slurm-web interface is designed to be accessible on all devices, from smartphones to largest desktop screens.
Enterprise Authentication
Slurm-web supports users authentication with enterprise LDAP directory (FreeIPA, Active Directory, OpenLDAP, etc…).
Access can be restricted to specific groups of users. Both legacy NIS and RFC 2307 bis schemas are fully supported.
Advanced RBAC Permissions
Administrators can define advanced authorization policy based on roles (RBAC) and LDAP groups to control all users permissions in Slurm-web.
Custom Service Messages
Integrate custom service message directly in Slurm-web interface to communicate efficiently with users:
Transparent Caching
Slurm-web can use Redis in-memory database to cache Slurm status, in order to maximize performances and significantly reduce load on Slurm scheduler.
Users are able to track jobs list in near real-time very efficiently. Finally
drop the load generated by infinite loops of squeue
!
Metrics
Slurm-web is designed to integrate with Prometheus (or any compatible solution) to manage many Slurm metrics.
Metrics of the computing resources statuses and the jobs are exported in standard OpenMetrics format, designed to be collected by Prometheus and stored in timeseries database. Slurm-web query this database to produce charts with these metrics.
These graphs give you a clear view of the evolution of the state of your production HPC clusters.