ClaimSearch Services - Login issues and issues accessing services
Incident Report for ISO ClaimSearch
Postmortem

DESCRIPTION:

ClaimSearch Services - Login issues and issues accessing services

IMPACT:

Customer Impact:

Customers were unable to access most of the ClaimSearch Services.
Incident date : Jun 13, 2023, 2:49 PM ET

Resolution date : Jun 13, 2023, 5:00 PM ET

ROOT CAUSE:

The AWS Eastern Region had an Outage:- On June 13 at 2.49 PM EST, All the Claim Search applications experienced issues and were inaccessible. AWS experienced increased error rates and latency for the Lambda function invocations within the US-EAST region. This was due to a latent software defect in the software subsystem of the AWS Lambda responsible for managing compute capacity to process incoming invocations for Lambda functions, which caused invocations to fail. Upon investigation by the AWS team, it was due to the latent software defect, triggered by the scaling of the Lambda front-end fleet.

CORRECTIVE ACTION:

AWS Corrective Action - Once the traffic subsided, the lambda front-end fleet was scaled down to resolve the issue.

Verisk Corrective Action - Began failover to US-West. This was also slow, most-likely due to other AWS customers failing over from east to west. AWS East returned on its own.

PREVENTATIVE MEASURES:

  1. Implement HA (High-Availability) pair and conduct regular failover testing. Develop synchronization for East and West in multi-region deployments.
  2. Explore running Login service as hot-hot, and Claims Inquiry and Visual ClaimSearch as hot/warm.
  3. Create a run book document for bringing up US-West and flipping back to US-East.
  4. Check the feasibility of using Dynamo Database vs. PostgreSQL for DR Failover Capability.
Posted Aug 14, 2023 - 09:30 EDT

Resolved
All backlog has been processed and all services have returned to Normal. We apologize for any inconvenience
Posted Jun 13, 2023 - 19:35 EDT
Monitoring
Some services are returning to normal. For XML there will be delays as we process through the backlog. We apologize for any inconvenience.
Posted Jun 13, 2023 - 17:09 EDT
Investigating
There is currently an outage with our Service provider. Multiple ClaimSearch Services are impacted
Posted Jun 13, 2023 - 15:22 EDT
This incident affected: ClaimSearch Website, Visual Platform, Claims Inquiry (Netmap), Claims Reporting, Decision Net, NICB, ClaimDirector and System to System Interfaces (XML).