Real-time contextual identification of workload

ESM detects all platform workload and makes that data available in real-time, allowing users and administrators to instantly see whether their jobs and interactive queries are behaving as expected, and if any action is required.

Within ESM, all workload is contextually labeled in a relevant way. With SAS as an example, ESM detects the full job name, server context, job flow hierarchy, and scheduler queue of any detected jobs or user sessions. Interactive sessions are also labeled with the true user that launched them, in cases where that differs to the ID they are executing under (i.e. the real SAS Metadata or LDAP user rather than 'sassrv' or 'cas').

Self-service administration and event investigation

If a user notices something unexpected, ESM provides them with a secure and controlled way to terminate a workload, including cleaning up any temporary storage associated with it. A user must have the necessary hierarchical permissions to terminate a job, and the termination is is saved to an audit log alongside the justification provided by the user.

Similarly, users can also view log events in real-time, and drill down into logs where required (subject to the same authentication). These features allow end-users to investigate and manage all of their workload without ever requiring them to be granted host-level access.

Interested in seeing a demo of ESM?

Top-down and Bottom-up performance analysis

For Administrators, typically more interested in the overall health of the platform, ESM offers top-down workflows that begin as cluster-wide performance heatmaps, and allow them to narrow in to hotspots and drill through to individual jobs or users that may be causing undesired behaviour.

Users who are typically more concerned with the performance and resource utilisation of their own jobs can instantly see how their code is performing and how it compares to the last time it ran. Individual steps can be labeled or annotated, allowing them to identify problems and optimise where required.

Scheduled batch job flow analysis and optimisation

A core feature of ESM is extensive job monitoring. ESM collects detailed performance data on all jobs, including return code, flow and queue information where available. ESM is compatible with IBM® Platform™ LSF® and can be integrated with schedulers like BMC® Control-M™ easily.

ESM monitors jobs for warnings, errors or any other configured events in real time, and allows instant drilldown directly to specific sections of job logs. Its job flow visualisation and investigation features are designed to accelerate root-cause analysis and identification of anomalies to help prevent future exceptions.

Point-and-click chargeback classification

ESM profiles all workload and labels the resource metrics with metadata describing the user, job name, queue and any other available logical contexts. ESM aggregates and retains this performance data for a configurable period to allow for historical performance comparison and for cost allocation.

ESM provides a drag-and-drop interface that lets users generate cost allocation rules by classifying workload 'cost items' into departmental 'cost buckets'. The resulting ruleset can then be used to generate periodic departmental resource breakdown reports that aid in the implementation of a chargeback strategy for multi-tenant environments.