RETROSPECTIVE SUMMARY:
Incident date: 25th March 10:21 PM
Resolution date: 25th March 11:26 PM
Retrospective date: 27th March.
CUSTOMER IMPACT: Delay in the claims process by an hour.
ROOT CAUSE: On March 25th, the certificate for the domain ccacts.iso.com expired. The ACM was unable to renew this certificate as the validation request was not completed by the domain owners. This resulted in the Tokenization SSL error when connecting to CTS, which is impacting the claims being processed.
The lambda step function was retrying multiple times during the certificate expiry.
MONITORING: Are there any improvements in monitoring and alerts? Yes. The Heightens team is already revisiting the existing monitoring to check why the alerts did not trigger during this incident.
Are there any improvements in the troubleshooting process that could reduce the resolution time? The incident management team was not paged during the incident, that delayed in reaching out the incident management team. Reiterate the Dev team to page the incident management team in Pager Duty.
CHANGE: Was this incident caused by a change? No
RCA CATEGORY: Human Error.
CORRECTIVE ACTION: The DNS team approved the validation request, and the certificate was automatically renewed.
PREVENTIVE ACTION ITEMS BY POINT OF FAILURE: