FEATURES · ALL INCLUDED, EVERY PLAN

Everything your SRE team needs, powered by AI.

Nova AI Ops is a single AI-native platform that replaces Datadog, PagerDuty, Grafana, Splunk, and 12 more tools. Built for SRE, DevOps, and platform engineering teams who refuse to accept 3am pages as normal. Every feature on this page is included in every Nova AI Ops plan — no hidden modules, no per-host pricing, no professional services required.

At the core of Nova AI Ops is a fleet of 100 autonomous AI agents organized into 12 specialized teams. These agents continuously ingest metrics, logs, traces, and events from your entire infrastructure, correlate signals across systems, and execute remediation runbooks with confidence scoring. When an incident requires human judgment, Nova pages the right engineer with a pre-built summary containing the root cause, impacted services, suggested fix, and every relevant metric. When an incident can be safely auto-remediated, Nova does it in seconds, before your customers notice.

AI and Automation

The AI layer is what makes Nova different from every AIOps tool built before 2024. We don't use statistical anomaly detection dressed up as AI, we deploy actual autonomous agents that read runbooks, execute diagnostic commands, and make remediation decisions with confidence scoring. Agents own specific categories (pod lifecycle, network policies, certificate management, database health) and coordinate when incidents span multiple domains. Every agent decision is logged with reasoning for compliance and explainability.

100 AI Agents

12 specialized teams of AI agents working 24/7. From Core Response to Security, each agent is a domain expert.

  • Core Response, Infrastructure, Cloud, DevOps teams
  • Real-time trust scores and performance tracking
  • Full audit trail of every AI decision

AI Runbooks

Auto-generated executable runbooks that learn from your incident history.

  • AI-generated runbooks for every severity level
  • What-if simulation before real incidents
  • One-click execution with approval workflows

Auto-Remediation

AI agents that execute fixes autonomously. Rollbacks, scaling, restarts, and config changes.

  • Configurable autonomy levels
  • Approval queue for high-risk actions
  • Complete rollback capability

Observability

One unified view across metrics, logs, traces, and events. No more jumping between Datadog dashboards, Splunk queries, Grafana panels, and Jaeger waterfalls to piece together what happened. Nova ingests from every major observability source, correlates across signal types, and surfaces the single timeline of cause and effect that your engineers actually need during an incident. Cardinality-independent pricing means you can tag everything without watching your monthly bill explode.

Golden Signals Dashboard

The four pillars of SRE observability in one view. Latency, Traffic, Errors, and Saturation.

  • Real-time P50, P95, P99 percentiles
  • Traffic throughput and error rate tracking
  • Instant comparison against 24h baselines

Real-time Dashboard

A single pane of glass for your entire infrastructure. Live metrics with zero query lag.

  • Unified view across all clouds and services
  • Sub-second metric refresh
  • Custom layouts with drag-and-drop Studio

Log Explorer

Search billions of log lines in milliseconds with anomaly detection and pattern analysis.

  • Full-text search across all sources
  • Automatic correlation with traces and metrics
  • Smart log pattern detection

Incident Management

Full incident lifecycle management, detection, triage, escalation, remediation, and postmortem, all in one platform. No PagerDuty subscription on top of Nova, no separate incident.io bill, no Rootly integration to maintain. Nova includes on-call scheduling with timezone awareness, escalation policies with acknowledgment timeouts, war room creation with Slack and Zoom bridges, and automated postmortem generation that writes the first draft for you based on the incident timeline and every action taken.

Intelligent Incident Management

AI-powered detection, smart severity scoring, blast radius analysis, and automated lifecycle management.

  • Reduce MTTR from 4+ hours to under 30 minutes
  • 80% less alert noise with AI deduplication
  • Auto-generated post-mortems

On-Call Management

Intelligent scheduling that respects time zones, workload, and fatigue levels.

  • Automated rotation with fairness balancing
  • Smart escalation based on skill match
  • Multi-channel notifications

AI Post-Mortems

Automatically generated incident reports with timeline reconstruction and root cause analysis.

  • Auto-generated timeline from signals
  • Actionable recommendations ranked by impact
  • Blameless format following SRE best practices

Monitoring and Testing

Synthetic checks from 30+ global regions, real user monitoring (RUM) with Core Web Vitals, uptime monitoring with per-second granularity, and chaos engineering primitives for proactive resilience testing. Nova doesn't just tell you when something is down, it tells you before it's down by learning your baseline over 14 days and flagging drift in latency, error rates, and saturation. Chaos experiments run on schedules you control so you can validate runbooks before you need them.

Synthetic Monitoring

Proactively test your APIs, websites, and critical user flows from 20+ global locations.

  • Multi-step user flow testing
  • SSL certificate expiry monitoring
  • Response time degradation detection

Predictive Detection

ML models trained on your infrastructure patterns detect anomalies before they become incidents.

  • Pattern recognition from historical data
  • Capacity forecasting and trend extrapolation
  • Early warning system with confidence scores

Performance Trends

Track infrastructure performance over weeks and months. Spot degradation trends early.

  • Long-term trend analysis
  • Capacity planning with growth projections
  • Cost optimization recommendations

Platform

Enterprise-grade infrastructure underneath every feature: SOC 2 Type II compliance, HIPAA and PCI-DSS ready, SAML SSO with any IdP, SCIM provisioning, role-based access control down to the resource level, and full audit logging with 7-year retention for regulated industries. Multi-region active-active deployment, 99.99% uptime SLA for enterprise tiers, private cloud options for customers who require data residency, and a public API that every UI action is built on top of, anything you see in the Nova UI you can automate.

Service Catalog

Real-time health, dependency mapping, and SLO compliance across every service.

  • Interactive dependency topology map
  • SLO tracking with error budget burn rates
  • Automatic service discovery

Approval Queue

Enterprise-grade change management for AI-driven actions with full audit trails.

  • Role-based approval workflows
  • Risk scoring for every AI action
  • SOC-2, ISO-27001 compliance

Nova Shell

AI-powered terminal that translates natural language into infrastructure commands.

  • Natural language to kubectl, SQL, AWS CLI
  • Safe mode with dry-run preview
  • Full command history with rollback

Quick Setup

Most teams are fully operational in under 5 minutes. One-line agent install, OAuth-based cloud account connection, and auto-discovery of your services, databases, and existing monitoring tools. Nova imports your existing Datadog dashboards, PagerDuty escalation policies, and Grafana alert rules automatically, you don't have to rebuild anything. Run Nova alongside your existing stack during the evaluation period and deprecate tools at your own pace once you trust the results.

Install in Under 3 Minutes

One command installs the Nova AI Agent on any Linux or macOS server. The agent monitors CPU, memory, disk, network, processes, containers, and logs automatically.

  • Single command install on any Linux or Unix server
  • Automatic systemd service creation and startup
  • Instant connection to 100 AI agents
  • Supports Ubuntu, Debian, CentOS, RHEL, Amazon Linux, macOS
View full install guide →
Nova AI Agent Installation
Get Started