Mitch's Technical Blog

WhereScape RED Job Failure Notifications

September 10, 2017
articles, etl, wherescape (revised July 15, 2018)
permalink

Summary: Solve WhereScape’s lack of built-in email support, using simple SQL Server components.

WhereScape RED’s Scheduler Doesn’t Do Email

WhereScape RED‘s ability to create and schedule a collection of sequential, grouped tasks (a “job”) is a powerful ETL tool. But can you configure a job to send email when it fails? Unfortunately, no.

Earlier this year, perhaps after receiving enough questions about the lack of native email support, WhereScape posted this support article. It describes a solution where the job’s failure event invokes a batch file, which calls a SQL proc, which polls the Wherescape repository for failure details specific to the job, then shoots off an email. I started to play with that awkward solution – not getting it to work after 20 minutes of effort – when I realized it doesn’t have to be this hard.

A Simple Solution

I retained the golden nuggets of WhereScape’s stored procedure: the metadata repository join logic, which identifies failure details; and the useful email formatting. I ditched the awkward failure event hook/batch file call. I opted instead to poll for failures at a regular interval, such as every two minutes. This is a “pull” approach, compared to “pushing” a failure notification at the individual job level.

A SQL Agent job runs frequently, in my case every two minutes. Its sole purpose is to invoke the stored procedure ScanJobFailures.

Here is the code.


USE Utils
IF NOT EXISTS (SELECT 1
    FROM INFORMATION_SCHEMA.ROUTINES
    WHERE ROUTINE_NAME = 'ScanJobFailures'
    AND ROUTINE_SCHEMA = 'ws'
    AND ROUTINE_TYPE = 'PROCEDURE'
)
BEGIN
  -- Create stored proc
  EXEC ('CREATE PROC ws.ScanJobFailures AS SELECT 1');
  -- Grant permissions (if required)
END;
GO
ALTER PROCEDURE ws.ScanJobFailures (@LookBackSeconds INT, @MailProfile VARCHAR(100), @Recipients NVARCHAR(4000), @Subject NVARCHAR(4000) = 'Wherescape Job Failure Summary', @JobName VARCHAR(1000) = NULL)
AS
/*
Procedure:    Utils.ws.ScanJobFailures
Created:      2017-08-31
Version:      1.0
Example:      EXEC Utils.ws.ScanJobFailures @LookBackSeconds = 1000, @MailProfile = 'Default', @Recipients = 'email@domain.com'
*/
BEGIN
DECLARE @JobNameSubject NVARCHAR(4000)
SET @JobNameSubject = COALESCE(@JobName, 'All Jobs')
SET @Subject = @@SERVERNAME + ': ' + @Subject + ' (Job = ' + @JobNameSubject + ', Lookback = Last ' + CONVERT(VARCHAR, ABS(@LookBackSeconds)) + ' Seconds)';
SET NOCOUNT ON
DECLARE @body   NVARCHAR(MAX),
    @Cutoff DATETIME2(3) 
SET @Cutoff = DATEADD(s, ABS(@LookBackSeconds) * -1, GETDATE());
IF EXISTS (
  SELECT 1
  FROM  WslWarehouse.dbo.ws_wrk_audit_log
  WHERE  wa_time_stamp >= @Cutoff 
      AND wa_status IN ('E', 'F')
      AND (@JobName IS NULL OR wa_job = @JobName)
)
BEGIN

  SET @body =
    N'<table cellspacing="0" cellpadding="10" width="100%" border="1">' +
    N'<tr><td><b>Job Name</b></td>' +
    N'<td><b>Task Name</b></td>' +
    N'<td><b>Status</b></td>' +
    N'<td><b>Error Message</b></td>' +
    N'<td><b>Sequence</b></td>' +
    N'<td><b>Time</b></td>' +
    N'<td><b>Message Code</b></td>' +
    N'<td><b>Description</b></td>' +
    N'<td><b>Returned Message</b></td></tr>' +
    CAST( (
    SELECT
      wa_job AS td,   '',
      ISNULL(wa_task, '.')  AS td,  '',
      wa_status  AS td,  '',
      ISNULL(wa_message, '.')  AS td,   '',
      wa_sequence AS td,  '',
      CONVERT(VARCHAR, wa_time_stamp, 107) + ' ' + CONVERT(VARCHAR, wa_time_stamp, 108) AS td,   '',
      ISNULL(wa_db_msg_code, 0)  AS td,   '',
      ISNULL(wa_db_msg_desc, '.')  AS td,   '',
      ISNULL(wtr_return_msg,'.')  AS td,   '',
      wa_row_number  AS td,  ''
    FROM  WslWarehouse.dbo.ws_wrk_audit_log
    LEFT OUTER JOIN WslWarehouse.dbo.ws_wrk_task_run
    ON    wa_task_key = wtr_task_key
    WHERE  wa_time_stamp >= @Cutoff 
        AND wa_status IN ('E', 'F')
        AND (@JobName IS NULL OR wa_job = @JobName)
    ORDER BY wa_sequence, wa_time_stamp, COALESCE(wa_row_number,0)
    FOR XML PATH('tr'), TYPE )
      AS NVARCHAR(MAX)) +
    N'</table>';
  EXEC msdb.dbo.sp_send_dbmail
    @profile_name = @MailProfile,
    @recipients = @recipients,
    @subject =  @subject,
    @body =  @body,
    @body_format =  'html',
    @importance = 'high'
END
END
GO

Installation (On SQL Server)

Disclaimer: Blindly install code from the Internet only when you know what you’re doing.

These instructions assume you have SQL Mail configured and working, and that you know the name of your server’s mail profile.

Tip: Configure @LookBackSeconds to be a second or two more than the job schedule frequency (translated to seconds). That provides a margin of overlap, to be sure your failure is detected. It also minimizes the possibility of multiple notifications for the same failure.

That’s it.

Flexibility

I like this approach because of its flexibility. Suppose you have a WhereScape job that runs every couple hours around the clock, and it sometimes fails in the middle of the night. If those particular failures are not critical (e.g. because it can self-correct next time), then avoiding alerts of those failures is simple: Set up your SQL Agent job’s schedule(s) to avoid that time period.

If you’re ambitious, you could grow this into a full-blown monitoring framework. You might have dozens of WhereScape jobs to monitor, and you want lots of fine-grained control – but don’t want the mess of having dozens of SQL Agent jobs to set up and track. In that case you could build a table-driven mechanism with lookback values and monitoring timeframes configured per job. You could indicate whether to monitor for success, failure, or both. Et cetera.

PS: An interesting related project might be building something similar for SQL Agent jobs.


Contact: hello at escapefromsql.net