Overview
VictorOps, now operating as Splunk On-Call, is an incident management platform that centralizes alerts, automates on-call rotations, and facilitates real-time incident response. Acquired by Splunk in 2018, the platform integrates with various monitoring, logging, and collaboration tools to provide a unified view of system health and operational issues Splunk VictorOps overview. It is designed for development, operations, and SRE teams responsible for maintaining high availability and rapid resolution of service disruptions.
The platform's core functionality revolves around intelligent alert routing, which directs critical notifications to the appropriate on-call personnel based on predefined schedules and escalation policies VictorOps on-call scheduling documentation. This helps reduce alert fatigue by filtering noise and ensuring that only actionable alerts trigger notifications. When an incident occurs, VictorOps provides tools for immediate communication across multiple channels, including chat, SMS, and email, enabling teams to collaborate effectively during outages. The platform also includes features for incident tracking, stakeholder communication, and post-incident analysis, supporting a continuous improvement cycle for operational resilience.
VictorOps is particularly suited for organizations that require robust on-call management, automated incident workflows, and comprehensive reporting on incident metrics. Its integration capabilities allow it to act as a central hub for incident data from diverse monitoring systems, such as Datadog, New Relic, and Prometheus. The platform supports various notification methods and allows for custom escalation paths, which can be critical for maintaining service level agreements (SLAs). For example, a critical production alert might immediately notify a senior engineer via phone call, while a lower-priority alert could trigger an email to a broader team after a delay. This layered approach helps ensure that incidents are addressed with the appropriate urgency and by the right personnel.
Beyond real-time response, VictorOps contributes to operational maturity through its post-incident review capabilities. It automatically logs incident timelines, actions taken, and team communications, providing a foundation for blameless post-mortems. This data helps teams identify root causes, implement preventative measures, and refine their incident response playbooks. The platform also offers reporting on key performance indicators (KPIs) such as mean time to acknowledge (MTTA) and mean time to resolve (MTTR), allowing organizations to track improvements in their operational efficiency over time VictorOps post-incident review documentation. This focus on both immediate response and long-term improvement aligns with modern SRE practices, which emphasize continuous learning and automation in operations.
While VictorOps focuses on incident management, alternative platforms like PagerDuty also offer similar capabilities, emphasizing alert management and on-call automation PagerDuty incident management features. The choice between platforms often depends on specific integration needs, existing ecosystem, and pricing models. VictorOps, as part of Splunk's portfolio, benefits from deeper integration with Splunk's observability and security products, offering a more consolidated experience for users already within the Splunk ecosystem.
Key features
- Intelligent Alert Routing: Centralizes alerts from monitoring tools and routes them to the correct on-call teams or individuals based on predefined rules, schedules, and escalation policies VictorOps alert routing guide.
- On-Call Scheduling: Manages complex on-call rotations, shifts, and overrides, providing a visual calendar for team members to view their schedules.
- Real-time Incident Communication: Facilitates collaboration through integrated chat, conference bridges, and status pages, allowing teams to communicate and coordinate during active incidents.
- Automated Escalation Policies: Configures multi-step escalation paths to ensure that critical alerts are acknowledged and addressed, escalating to higher-priority contacts or teams if an alert goes unacknowledged.
- Runbook Automation: Enables the execution of automated actions or pre-defined steps in response to specific alerts, helping to streamline initial incident response.
- Post-Incident Review: Captures incident timelines, actions, and communications automatically to support blameless post-mortems and identify areas for process improvement.
- Reporting and Analytics: Provides metrics on incident response performance, including Mean Time To Acknowledge (MTTA) and Mean Time To Resolve (MTTR), to track operational efficiency.
- Extensive Integrations: Offers pre-built integrations with a wide range of monitoring, logging, ticketing, and collaboration tools, and supports custom integrations via a REST API and webhooks VictorOps API reference.
- Mobile App: Provides dedicated mobile applications for on-call teams to receive alerts, manage incidents, and collaborate on the go.
Pricing
VictorOps is offered as part of Splunk On-Call. The pricing structure is subscription-based, with tiers designed for different organizational needs. As of May 2026, the publicly available pricing is as follows:
| Tier | Description | Price (per user, per month, billed annually) |
|---|---|---|
| Standard | Core on-call management, alert routing, basic incident response. | $20 |
| Professional | Includes Standard features plus advanced analytics, runbook automation, and enhanced integrations. | Contact Vendor |
| Enterprise | Includes Professional features plus advanced security, compliance, and dedicated support. | Contact Vendor |
For detailed and up-to-date pricing information, including specific feature breakdowns for Professional and Enterprise tiers, refer to the official Splunk On-Call pricing page.
Common integrations
VictorOps (Splunk On-Call) integrates with a broad ecosystem of tools across monitoring, logging, collaboration, and ticketing categories:
- Monitoring & APM: Datadog VictorOps Datadog integration, New Relic, Prometheus, Grafana, AWS CloudWatch, Azure Monitor, Google Cloud Monitoring.
- Logging & Observability: Splunk Enterprise, ELK Stack (Elasticsearch, Logstash, Kibana), Sumo Logic.
- Collaboration & ChatOps: Slack VictorOps Slack integration, Microsoft Teams, Zoom, PagerDuty (for specific workflows).
- Ticketing & ITSM: Jira Service Management VictorOps Jira Service Management integration, ServiceNow, Zendesk.
- Cloud Providers: Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP).
- Source Control: GitHub, GitLab.
Alternatives
- PagerDuty: A leading incident management platform offering on-call scheduling, automated escalations, and real-time incident response capabilities.
- Opsgenie: Part of the Atlassian suite, providing alert management, on-call scheduling, and incident command center features, often integrated with Jira.
- Rootly: An incident management platform built on top of Slack, focusing on automation, post-mortems, and incident retrospectives.
- Freshservice: An IT service management (ITSM) solution that includes incident management, service desk, and asset management functionalities.
- Zendesk Incident Management: Offers incident response tools integrated within the Zendesk service platform, focusing on customer impact and communication.
Getting started
To get started with VictorOps (Splunk On-Call), you typically begin by setting up an integration to send alerts from your monitoring tools. Here's an example of how you might send a test alert using curl to the VictorOps REST API, assuming you have an API key and a routing key configured. This example simulates a critical alert from a hypothetical monitoring system.
# Replace with your actual VictorOps API Key and Routing Key
API_KEY="YOUR_VICTOROPS_API_KEY"
ROUTING_KEY="YOUR_ROUTING_KEY"
# Define the API endpoint for sending alerts
API_ENDPOINT="https://alert.victorops.com/integrations/voflapper/v2/${API_KEY}/alert/${ROUTING_KEY}"
# Construct the JSON payload for the alert
# message_type can be CRITICAL, WARNING, INFO, ACK, RECOVERY, or CRITICAL_RECOVERY
# entity_id is a unique identifier for the alert source or component
# state_message provides a summary of the alert
# monitoring_tool indicates the source of the alert
JSON_PAYLOAD='{
"message_type": "CRITICAL",
"entity_id": "web-server-01-cpu-spike",
"state_message": "CPU usage on web-server-01 is critically high (>90%)",
"monitoring_tool": "Prometheus",
"alert_url": "http://prometheus.example.com/alerts/cpu-spike",
"severity": "CRITICAL",
"host": "web-server-01",
"service": "web-application",
"timestamp": "'$(date -u +%s)'"
}'
# Send the alert using curl
curl -X POST \
-H "Content-Type: application/json" \
-d "$JSON_PAYLOAD" \
"$API_ENDPOINT"
This curl command sends a CRITICAL alert to your VictorOps timeline. You would replace YOUR_VICTOROPS_API_KEY and YOUR_ROUTING_KEY with the actual values obtained from your VictorOps (Splunk On-Call) account. The entity_id helps VictorOps correlate alerts from the same source, preventing duplicate notifications for ongoing issues. Once the alert is sent, VictorOps processes it according to your configured routing and escalation policies, notifying the appropriate on-call team members.
For more complex integrations, such as setting up a monitoring tool like Datadog or New Relic, you would typically follow the specific integration guides provided in the VictorOps documentation, which often involve configuring webhooks or API keys within the monitoring tool itself to send events to VictorOps.