RETROSPECTIVE SUMMARY:
Incident date: 4/10/2023
Resolution date: 4/10/2023
Retrospective date: 4/14/2023 (Due to resource availability)
ATTENDEES: Kanika, Mangesh, Rebecca, Mohan, Dean, Praveen, Praphul, Sijo, Paresh, Arvind, Rita, Ram, Mohammed, Tim.
CUSTOMER IMPACT: Customers reported missing responses – Claims were lost in processing for some customers.
ROOT CAUSE: In October 2022, XML schema changes, and enhancements across the entire system related to XML schema changes were deployed in acceptance environment. The changes made to the response schema file removed two optional fields that were only relevant in a specific case where an individual claims party had more than one coverage on a claim and both coverages had both close date and suit filed indicator. This specific case is rare and does not occur frequently in the test system, but it affects some of our claims in production, particularly those submitted by companies with closed dates. These changes were deployed in production on Sunday morning, April 9, 2023. This is a defect the response schema file which was identified in November in testing, specifically this is a check for the XML responses going from ClaimSearch to the customer.
During October 2022, there was a XML schema changes deployed in acceptance environment, and enhancements across the entire system related to XML schema. A defect was identified in November 2022 during the testing, specifically this is a check for the XML responses going from ClaimSearch to the customer.
MONITORING: Are there any improvements in monitoring and alerts? These alerts currently go to an MS Teams Channel only. An action item has been captured to pager out on-call teams in pagerduty.
Are there any improvements in the troubleshooting process that could reduce the resolution time? No.
CHANGE: Was this incident caused by a change? Yes
Was the change tested in Production? No
Was the change tested in Acceptance? Yes – the defect was logged but not fixed prior to production deployment
Is there any due diligence that needs to be done to avoid similar incidents caused by a change? Yes. All changes need to flow through the BRP and receive proper prioritization and notification of all development and support teams. The extended timeline of this particular change was also a contributing factor to deploying a known defect
RCA CATEGORY: Defect (this was identified but not fixed.)
CORRECTIVE ACTION: The Schema file was reverted.
Aviators team reviewed these schema file and determined the attribute that was missing, applied it to the current schema, redeployed and tested in acceptance tested and redeployed in production.
PREVENTIVE ACTION ITEMS BY POINT OF FAILURE: