Tekniska problem
Incident Report for Kundo
Postmortem

We apologize for any inconvenience caused by this incident. A summary of the events and measures taken follows.

Timeline (in CET)

15:06 We deploy a new version of our systems which includes the removal of several database indices, determined to be unused.

15:13 At time of updating database indices we saw a sudden increase in database query times and our services started to experience timeouts. Our engineers start a troubleshooting process in order to identify and solve the issue.

15:20 We post a status message notifying about the incident.

15:26 We identify the increased database load as a result of the indices which were removed during the deployment. The increased load on our database prevents web requests to our application from being successfully processed.

15:28 The database queries which cause the increased load are identified.

15:48 We release a new version of our systems in which the database indices are recreated and monitor the results.

16:08 The indices are recreated but we still experience a high database load. We continue to identify a solution to mitigate the problem.

16:25 We perform optimizations of the causal queries and monitor the results. The action allows our systems to recover and return to normal operation.

16:42 The incident is marked as resolved after a period of successful monitoring.

What happened?

We released a new version of our systems in which the removal of several database indices were included. It was mistakenly assumed that the indices were unused and could be safely removed. The removal caused an increased load on our database which prevented our systems from successfully handling web requests.

The issue was solved by recreating the indices and performing optimizations on the database queries which was causing the increased loads.

Mitigations and actions

To prevent this from happening again, we have started an initiative that will review our process for how we implement database changes. More specifically, we will review what is required when changing a database index and how we can identify whether it is in use or not.

From a longer perspective, we will implement improvements to our database architecture in order to reduce the impact that database difficulties have on our system as a whole.

Further details or questions

You are most welcome to contact us via email for more information: support@kundo.se

Posted Jan 27, 2022 - 13:14 CET

Resolved
We have solved the issue with high database load, and response times are back to normal. We will investigate why this outage occurred and take steps to prevent it and similar issues from happening in the future.
Posted Jan 19, 2022 - 16:42 CET
Identified
We are still seeing some database load issues and have prepared another solution to help with the remaining issues.
Posted Jan 19, 2022 - 16:31 CET
Update
We are continuing to monitor for any further issues.
Posted Jan 19, 2022 - 16:21 CET
Monitoring
We have released our solution to the issue and are monitoring the situation.
Posted Jan 19, 2022 - 16:12 CET
Update
Kundo Knowledge has been restored to partial availability, and we are still working on releasing our solution to the issue.
Posted Jan 19, 2022 - 15:58 CET
Update
We are releasing a solution to the issue soon, which we will monitor and see if it resolves the issue.
Posted Jan 19, 2022 - 15:43 CET
Update
We have identified the issue and are working on a solution to the database load.
Posted Jan 19, 2022 - 15:32 CET
Update
The issue is related to high database load, causing a major outage. We are taking steps to correct this.
Posted Jan 19, 2022 - 15:26 CET
Investigating
Right now we are experiencing technical problems. We will update this message as soon as we have more information.
Posted Jan 19, 2022 - 15:20 CET
This incident affected: Kundo, Dashboard, Mail, Chat, Forum, Knowledge and Help Center, Statistics and Social media channels (Facebook).