TIMING:
February 14, 4:13 PM ET to February 16, 10:57 AM ET
DESCRIPTION:
ClaimSearch Customers were unable to log in to ClaimSearch services.
IMPACT:
Claim Director was unavailable to customers for Thursday, Feb 15 and Friday, Feb 16. The outage spread to NICB Services, Visual Platform, and caused processing delays due to high queue depth in System-to-System interfaces (XML, FTP, Web).
ROOT CAUSE:
On Wednesday, February 14th, ClaimDirector's scoring queues started alerting in the late afternoon. By February 15th, the major outage occurred due to the high database load. The DBA team identified that the issue was caused by insufficient statistics gathering on the involved party table and table's growth over time, which led to bad query plans and performance degradation in the database. This resulted in Claim Director was unavailable, and issues with NICB Services, Visual Platform, and caused processing delays due to high queue depth in System-to-System interfaces (XML, FTP, Web).
CORRECTIVE ACTION:
· The DBAs ran vacuum on impacted tables.
· ClaimDirector tasks were brought down since it was determined that these tasks were causing unusually high DB load.
· The Engineering and DBA teams implemented tuned queries to improve the database performance.
· The Engineering teams implemented a temporary fix to disable tokenization in ClaimDirector and enabled the ClaimDirector tasks.
· The DBAs increased the reader nodes in the postgress database to process the backlog in the queues.
PREVENTATIVE MEASURES: