arXiv 2507.13757 - Self-Healing Databases
Proposes a self-healing framework for databases using MAML for anomaly detection, GNNs for failure prediction, and multi-objective RL for autonomous recovery.
Summary
Traditional database operations rely on human intervention for anomaly detection and failure recovery, creating bottlenecks as system complexity grows. This paper proposes a fully autonomous self-healing framework that operates end-to-end without human involvement. Model-Agnostic Meta-Learning (MAML) enables rapid adaptation to unseen workload patterns with minimal labeled training data, addressing the cold-start problem in anomaly detection for novel workload types.
Graph Neural Networks model the dependency structure of database components, enabling the system to predict cascading failures before they fully materialize. By representing component relationships as a graph, the GNN can identify which components are at risk when an anomaly is detected in a related node. This dependency-aware approach outperforms approaches that treat components independently.
The recovery phase uses Multi-Objective Reinforcement Learning to balance competing objectives — minimize latency, maximize throughput, and control resource utilization — while selecting and executing recovery actions. This replaces brittle rule-based recovery scripts with a learned policy that generalizes across failure types and system states.
Key Arguments
- MAML enables rapid adaptation to unseen workload patterns with minimal labeled training data via few-shot learning
- GNNs model database component dependencies to predict cascading failures before full manifestation
- Multi-objective RL balances latency, throughput, and resource utilization for optimal recovery action selection
- Existing offline optimization approaches fail under dynamic, unpredictable production workloads
- The framework maintains database availability through autonomous action without human intervention
Concepts Covered
- Self-Healing Systems — primary treatment of ML-driven autonomous self-healing
- Observability — anomaly detection as an extension of observability infrastructure
- Resiliency Patterns — autonomous recovery as an evolution of manual resiliency patterns
- Reactive Architecture — responsive under failure conditions
Quality Notes
Cutting-edge research introducing ML-based self-healing as a new architectural concern for database infrastructure. Introduces the MAML+GNN+RL combination as a viable autonomous operations pattern. Relevant to the Self-Healing Systems concept page.