XML Delays - UF Flow has delayed Web and XML Responses
Incident Report for ISO ClaimSearch
Postmortem

RETROSPECTIVE SUMMARY:

Incident date:  3/24/2023

Resolution date: 3/24/2023

Retrospective date:  3/25/2023

CUSTOMER IMPACT:  XML Delays between 2PM and 4PM ET. Average throughput time was ~18 minutes. 

ROOT CAUSE: 

In 2019, the table was created in DynamoDB table to capture the caching SSA Keys related information for UF Search application. On March 24, 2023, the table was deleted, but it had references of that DynamoDB Table in UF Search code. During scale up, new task was trying to create the DynamoDB Table and since the role had no create permission, task was failing during startup. This resulted in high queue depth alert triggered for XML UF Search queue.

NOTE: The role in 2019 had create table privilege – This table is no longer needed, the code was cleaned up to remove the table and avengers can delete the table. There was a mismatch in the roles between ACCP and prod. This table was identified for cleanup (contained 28GB of data). There is currently a process in place for this – the CFT (Cloud Formation Template). When the initial work was done in ~2019, this process was not in place.

MONITORING: 

Are there any improvements in monitoring and alerts? Yes

RCA CATEGORY: Defect 

CORRECTIVE ACTION: 

Restoring the table took longer than expected, so the Avengers team granted table creation access in the PROD role to resolve the issue.

PREVENTIVE ACTION ITEMS BY POINT OF FAILURE:

Posted Sep 25, 2023 - 16:18 EDT

Resolved
We experienced a degradation of service in XML responses between 2PM and 4PM ET today.

We apologize for any inconvenience
Posted Mar 23, 2023 - 14:00 EDT