Problems accessing Kundo
Incident Report for Kundo
Postmortem

Problems accessing Kundo

We apologize for any inconvenience caused by this incident. A summary of the events and measures taken follows.

Summary

Kundo was the target of a large scale DDoS (distributed denial of service) attack causing performance issues and unavailability intermittently throughout the day for many users.

Timeline (in CET)

08:14 - An alarm about reduced performance in our systems is triggered.

08:17 - Resources for serving web requests are added to reduce user impact

08:21 - Our incident process is initiated and troubleshooting is started by the whole development team.

08:32 - We identified the target of the attack.

08:32 - The root cause of the incident is identified to be caused by an unusual large amounts of requests to our servers.08:33 - Information about the incident is published on status.kundo.se

09:17 - Further resources for serving web requests are added.

09:18 - We start blocking requests from identified attackers.
09:39 - Performance is improved for the majority of the users but even further resources for serving web requests are added.

10:19 - Further resources are added to the servers to stabilize the servers

10:29 - All systems are confirmed to be fully available and Incident status is set to Resolved

13:27 - We’re informed about of general unavailability of Kundo

13:29  - Our incident process is initiated and troubleshooting is started.

13:37 - Information about the incident is published on status.kundo.se

13:39 - Resources for serving web requests are added

13:49 - All systems are confirmed to be fully available and Incident status is set to Resolved

14:57 - An alarm about reduced performance in our chat service is triggered.

14:57 - Our incident process is initiated and troubleshooting is started.

15:05 - A caching improvement of our chat is released
15:06 - A caching configuration in the web servers are made

13:49 - All systems including the Chat are confirmed to available and Incident status is set to Resolved

What happened?

Kundo were the target of a DDoS attack flooding the web servers with requests bringing the processing of requests to a near halt. The large amount of requests made other parts of the system unstable. We already had several DDoS mitigations as well as auto scaling of the performance of our servers in place that normally handles this kind of situation automatically. In this case, those precautions proved inadequate, which resulted in down time over a period of time.

To mitigate this problem the capacity of the servers were increased and the malicious requests from the attackers were blocked. The increased capacity posed some difficulties for the system set up and required some manual intervention to succeed. The blocked requests still put some load on the system and made the system struggling to recover.

A series of improvements of the caching, increased server capacity and some manual handling finally made the system available again.

We have identified several actions that will mitigate impact of failures like these in future, among which are:

  • Improve our incident process to make internal communication even more effective
  • Improve our system to make scaling up capacity more reliable
  • Improve our system to more effectively ward off attacks better

Several of these have already been implemented at the date of the publication of this post mortem.

Further details or questions

You are most welcome to contact us via email for more information: support@kundo.se

Posted Feb 19, 2023 - 22:35 CET

Resolved
The incident has been resolved and Kundo is working as normal.
Posted Feb 08, 2023 - 10:24 CET
Monitoring
We have implemented a solution and are monitoring the situation
Posted Feb 08, 2023 - 10:17 CET
Identified
We are still having intermitent problems with Kundo and are working to resolve the situation
Posted Feb 08, 2023 - 10:02 CET
Update
We are continuing to monitor for any further issues.
Posted Feb 08, 2023 - 09:31 CET
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Feb 08, 2023 - 09:10 CET
Identified
The issue has been identified and a fix is being implemented.
Posted Feb 08, 2023 - 08:51 CET
Update
We are continuing to investigate this issue.
Posted Feb 08, 2023 - 08:51 CET
Update
We are continuing to investigate this issue.
Posted Feb 08, 2023 - 08:42 CET
Investigating
Some users are experiencing problems accessing Kundo at the moment. We are investigating the issue.
Posted Feb 08, 2023 - 08:33 CET
This incident affected: Kundo, Dashboard, Mail, Chat, Calls, Forum, Knowledge and Help Center, Statistics and Social media channels (Facebook).