Blog Feed Post

Auto-Mitigation with Dynatrace AI – or shall we call it Self-Healing?

After our AI-Driven DevOps webinar with Anil from Verizon Enterprise I got into a debate with my colleague Dave Anderson on how to call the auto mitigation approach we discussed during the webinar: Is it “auto mitigation”, “auto remediation” or shall we be bold and call it “self-healing”?

Instead of getting caught up on terminology I thought to write this blog post and explain our thoughts and let you decide on what you think we should be calling it.

Here is the animated slide from our webinar that started the discussion:

AI Detected Problem details allow us to build smarter automated mitigation actions. No need to wake up engineers at 2AM every time a problem happens.

Our point of the webinar was that with the Dynatrace AI (Artificial Intelligence) analyzed data, we can trigger and build much smarter Auto Mitigation actions. Here are our thoughts to explain the slide:

  • Escalate at 2AM? Dynatrace auto detects the problem and how many end users and service endpoints are impacted and this translates directly to the severity of our escalation process.
  • Auto Mitigate! Dynatrace is aware of all important events across all entities involved in the problem. (e.g: network connection issue, critical log message after a configuration change, CPU exhaustion) This allows us to write smarter auto-mitigation steps to address the root cause and not the symptom of the problem!
  • Update Dev Ticket! If the mitigation actions work, we can automatically update the JIRA ticket about the executed actions and in the daily stand-up, developers can discuss what happened last night.
  • Mark Bad Commits! If the mitigation actions didn’t solve the problem, we still have the option to rollback and mark the responsible Pull Request as BAD and Detailed Analysis can be done in the post-mortem retrospective!
  • Escalate as last resort! If rolling back doesn’t solve the situation, it’s time to definitely escalate – even at 2AM!

Auto Mitigation Implementation with AWS Lambda

Inspired by the work that my colleague Alois Reitbauer did around Auto-Mitigation in the last couple of months, we sat down with our easyTravel Demo Team. easyTravel – in case you’ve never heard of it – is our #1 application we use to demo the capabilities of Dynatrace Fullstack Monitoring and the Dynatrace AI. It is also available for anyone to download and install.

easyTravel comes with many different components, services and some built-in problem patterns, that can be enabled or scheduled, on demand or via REST. easyTravel was also recently enhanced to scale up and down, individual dockerized components, such as the backend service.

Rafal Psciuk, Team Lead in our Gdansk office, thought about good auto-remediation use cases, just in case something goes wrong with easyTravel. He implemented two use cases which he recently demoed to me and I found it just “AWSome.”

Read the original blog entry...

More Stories By APM Blog

APM: It’s all about application performance, scalability, and architecture: best practices, lifecycle and DevOps, mobile and web, enterprise, user experience