ESM detects all platform workload and makes that data available in real-time, allowing users and administrators to instantly see whether their jobs and interactive queries are behaving as expected, and if any action is required.
Within ESM, all workload is contextually labeled in a relevant way. With SAS as an example, ESM detects the full job name, server context, job flow hierarchy, and scheduler queue of any detected jobs or user sessions. Interactive sessions are also labeled with the true user that launched them, in cases where that differs to the ID they are executing under (i.e. the real SAS Metadata or LDAP user rather than 'sassrv' or 'cas').
If a user notices something unexpected, ESM provides them with a secure and controlled way to terminate a workload, including cleaning up any temporary storage associated with it. A user must have the necessary hierarchical permissions to terminate a job, and the termination is is saved to an audit log alongside the justification provided by the user.
Similarly, users can also view log events in real-time, and drill down into logs where required (subject to the same authentication). These features allow end-users to investigate and manage all of their workload without ever requiring them to be granted host-level access.
For Administrators, typically more interested in the overall health of the platform, ESM offers top-down workflows that begin as cluster-wide performance heatmaps, and allow them to narrow in to hotspots and drill through to individual jobs or users that may be causing undesired behaviour.
Users who are typically more concerned with the performance and resource utilisation of their own jobs can instantly see how their code is performing and how it compares to the last time it ran. Individual steps can be labeled or annotated, allowing them to identify problems and optimise where required.
A core feature of ESM is extensive job monitoring. ESM collects detailed performance data on all jobs, including return code, flow and queue information where available. ESM is compatible with IBM® Platform™ LSF® and can be integrated with schedulers like BMC® Control-M™ easily.
ESM monitors jobs for warnings, errors or any other configured events in real time, and allows instant drilldown directly to specific sections of job logs. Its job flow visualisation and investigation features are designed to accelerate root-cause analysis and identification of anomalies to help prevent future exceptions.
ESM profiles all workload and labels the resource metrics with metadata describing the user, job name, queue and any other available logical contexts. ESM aggregates and retains this performance data for a configurable period to allow for historical performance comparison and for cost allocation.
ESM provides a drag-and-drop interface that lets users generate cost allocation rules by classifying workload 'cost items' into departmental 'cost buckets'. The resulting ruleset can then be used to generate periodic departmental resource breakdown reports that aid in the implementation of a chargeback strategy for multi-tenant environments.
“We no longer need to stay awake until 2am to identify resource conflicts that threaten our regulatory batch runs. In addition, whereas some of our interactive users would bring our shared platfrom down at least once a week, since deploying ESM we have enjoyed 11 months without a single major incident.“
SAS Administrator, Npower
“Within 24 hours of installing ESM, we were able to identify and fix a long standing issue that was causing our analysts to suffer unacceptable delays when running their forecasts. After the recommended change, our application response times dropped back to seconds, and thanks to ESM, our effective capacity tripled.“
IT Support Manager, Frontline
“Our implementation of ESM helped CZ enhance the load process enormously. We process billions of rows on a daily basis, and where detailed performance information was previously missing, thanks to ESM we were able to optimise our data load processes so that they finish by early morning instead of the late afternoon.“
Cluster BI, CZ Groep (cz.nl)