Overview

PagerDuty is an operational reliability platform that provides tools for incident management, AIOps, automation, and customer service operations. Its core function is to help organizations manage critical incidents across their IT infrastructure and applications, from initial detection to resolution. The platform integrates with a wide range of monitoring, observability, and ticketing systems, consolidating alerts into actionable incidents.

For developers and operations teams, PagerDuty offers on-call scheduling, automated escalation policies, and notification channels to ensure that the right responders are alerted promptly. It supports various notification methods, including push notifications, SMS, phone calls, and email, configurable based on urgency and team preferences. The platform's incident response capabilities include collaborative tools for responders to communicate, share information, and coordinate efforts to resolve issues efficiently.

PagerDuty's AIOps capabilities leverage machine learning to reduce alert noise, identify critical events, and automate incident correlation. This helps teams focus on high-impact issues and predict potential problems before they escalate. For instance, the platform can group related alerts from disparate systems into a single incident, reducing alert fatigue and accelerating the investigation process. Automation features allow teams to define runbooks and execute automated diagnostic or remediation steps directly from an incident, further streamlining response workflows.

Beyond technical incident response, PagerDuty extends its capabilities to customer service operations and security operations. It helps customer service teams proactively communicate with affected customers during outages and provides security teams with real-time alerting for security events. The platform is designed for organizations that require high availability for their services and need a structured approach to managing operational disruptions. Its RESTful API and SDKs facilitate integration into existing toolchains and enable custom automation of incident workflows, as detailed in the PagerDuty Developer Documentation. The platform is especially suited for environments where critical services demand continuous uptime and rapid response to any degradation.

Key features

  • Incident Management: Centralized platform for detecting, triaging, and resolving operational incidents, including automated incident creation and tracking.
  • On-Call Management: Configurable on-call schedules, escalation policies, and notification rules to ensure appropriate team members are alerted.
  • AIOps: Machine learning algorithms to reduce alert noise, correlate events, and provide insights to accelerate root cause analysis.
  • Automated Workflows: Ability to define and execute automated diagnostic, remediation, and response actions triggered by incidents.
  • Event Intelligence: Consolidates events from monitoring tools, filters noise, and groups related alerts into actionable incidents.
  • Real-time Collaboration: Tools for responders to communicate, share updates, and coordinate efforts during an incident.
  • Reporting and Analytics: Provides metrics on incident volume, resolution times, and team performance to identify areas for improvement.
  • Integrations: Pre-built integrations with a wide range of monitoring, observability, ITSM, and communication tools.
  • Mobile App: Provides incident management on the go, allowing responders to acknowledge, resolve, or escalate incidents from mobile devices.

Pricing

PagerDuty offers a free tier for small teams and several paid plans with varying features and support levels. Pricing is typically per user per month, with annual billing often providing a discount.

Plan Name Key Features Starting Price (per user/month, annual billing) Notes
Free Basic on-call management, up to 5 users, 100 alerts/month Free Limited functionality, suitable for small teams or personal use.
Professional Full incident management, advanced on-call, 200 alerts/month/user $25 Includes integrations, mobile app, and basic reporting.
Business AIOps event intelligence, advanced automation, unlimited alerts $39 Adds machine learning for noise reduction and automated diagnostics.
Enterprise Advanced security, global event routing, enhanced analytics, dedicated support Custom Designed for large organizations with complex operational needs.

Pricing as of May 2026. For the most current pricing details, refer to the official PagerDuty pricing page.

Common integrations

PagerDuty integrates with a variety of monitoring, observability, CI/CD, and IT service management (ITSM) tools. Key integration categories include:

  • Monitoring & Observability: Datadog, New Relic, Prometheus, Grafana, AWS CloudWatch, Azure Monitor, Splunk. These integrations enable PagerDuty to ingest alerts from various sources, as described in the PagerDuty Integration Guide.
  • ITSM & Ticketing: ServiceNow, Jira Service Management, Zendesk. These allow for automatic creation of tickets or synchronization of incident status.
  • Communication & Collaboration: Slack, Microsoft Teams, Zoom, Statuspage. These facilitate real-time communication and status updates during incidents.
  • Cloud Providers: Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP). Integrations enable monitoring of cloud infrastructure and services.
  • CI/CD & Deployment: Jenkins, GitHub Actions, GitLab. Used to trigger incidents based on deployment failures or build issues.
  • Security: Splunk ES, Panther, CrowdStrike. For alerting on security events and coordinating security incident response.

Alternatives

  • Opsgenie (Atlassian): A competing incident management platform offering on-call scheduling, alerting, and incident response capabilities, often integrated with other Atlassian products like Jira.
  • VictorOps (Splunk): An incident management tool focused on DevOps and SRE teams, providing alerting, on-call scheduling, and collaboration features.
  • Rootly: An incident management platform built on Slack, emphasizing automation and post-incident analysis for modern incident response.
  • Zendesk Support: While primarily a customer service platform, Zendesk offers on-call scheduling and incident management features, particularly useful for coordinating customer-facing responses to outages.

Getting started

To begin using PagerDuty, you typically set up a service, integrate it with a monitoring tool, and configure an on-call schedule. The following Python example demonstrates how to create an incident using the PagerDuty Events API v2. This assumes you have an API key and a routing key for your service.

import requests
import json

# Replace with your PagerDuty Integration Key (Routing Key for Events API v2)
ROUTING_KEY = "YOUR_ROUTING_KEY"

# Define the incident payload
payload = {
    "routing_key": ROUTING_KEY,
    "event_action": "trigger",
    "payload": {
        "summary": "[CRITICAL] Web server high CPU usage detected",
        "source": "monitoring-system-01",
        "severity": "critical",
        "component": "web-server",
        "group": "production-frontend",
        "class": "performance",
        "custom_details": {
            "load_average": "15.2",
            "cpu_utilization": "98%",
            "server_ip": "192.168.1.100"
        }
    },
    "dedup_key": "web-server-high-cpu-12345"
}

# PagerDuty Events API v2 endpoint
url = "https://events.pagerduty.com/v2/enqueue"

headers = {
    "Content-Type": "application/json"
}

try:
    response = requests.post(url, headers=headers, data=json.dumps(payload))
    response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)

    print(f"Incident created successfully: {response.status_code}")
    print(response.json())

except requests.exceptions.HTTPError as http_err:
    print(f"HTTP error occurred: {http_err}")
    print(f"Response body: {response.text}")
except Exception as err:
    print(f"An error occurred: {err}")

This code snippet sends a trigger event to PagerDuty, which will create a new incident. The routing_key directs the event to the correct service, and the payload contains details about the incident, including its summary, source, severity, and custom details. The dedup_key ensures that subsequent events with the same key update an existing incident instead of creating new ones, a common practice in incident management to prevent alert storms.

For more detailed information on integrating and using the API, refer to the PagerDuty API Reference.