Lesson 8.3: Stateful Application Management

Introduction

Stateful applications require special handling: backups, restores, migrations, and data consistency. This lesson covers patterns for managing stateful applications in operators, including backup/restore, rolling updates, and ensuring data consistency.

Theory: Stateful Application Management

Stateful applications have persistent data that must be managed carefully.

Why Stateful Applications Are Complex

Data Persistence:

  • Data must survive pod restarts
  • Data must be backed up
  • Data must be restored
  • Data consistency is critical

Lifecycle Management:

  • Complex deployment procedures
  • Ordered pod creation/deletion
  • StatefulSet requirements
  • Rolling update challenges

Data Operations:

  • Backup and restore
  • Data migration
  • Version upgrades
  • Disaster recovery

StatefulSet Characteristics

Pod Identity:

  • Stable network identity
  • Stable storage
  • Ordered creation/deletion
  • Predictable naming

Storage:

  • Persistent volumes
  • Pod-specific storage
  • Data survives pod restarts
  • Storage class management

Ordering:

  • Pods created in order
  • Pods deleted in reverse order
  • Enables initialization
  • Supports stateful workloads

Backup and Restore

Backup Strategy:

  • Regular backups
  • Point-in-time backups
  • Incremental backups
  • Backup validation

Restore Strategy:

  • Restore from backup
  • Point-in-time restore
  • Data validation
  • Rollback capability

Consistency:

  • Ensure data consistency
  • Transactional operations
  • Quiesce before backup
  • Verify after restore

Understanding stateful applications helps you build operators that manage data reliably.

Stateful Application Challenges

Key Challenges

graph TB
    CHALLENGES[Challenges]
    
    CHALLENGES --> BACKUP[Backup]
    CHALLENGES --> RESTORE[Restore]
    CHALLENGES --> MIGRATION[Migration]
    CHALLENGES --> CONSISTENCY[Data Consistency]
    
    BACKUP --> SCHEDULE[Scheduled Backups]
    RESTORE --> POINT[Point-in-Time Restore]
    MIGRATION --> ZERO[Zero-Downtime]
    CONSISTENCY --> TRANSACTIONS[Transactions]
    
    style CHALLENGES fill:#FFB6C1

Backup and Restore Patterns

Backup Flow

sequenceDiagram
    participant Operator
    participant Database
    participant Backup as Backup System
    participant Storage as Storage
    
    Operator->>Database: Trigger Backup
    Database->>Database: Create Snapshot
    Database->>Backup: Export Data
    Backup->>Storage: Store Backup
    Storage-->>Backup: Backup Stored
    Backup-->>Operator: Backup Complete
    Operator->>Operator: Update Status
    
    Note over Operator: Backup scheduled<br/>or on-demand

Implementing Backups

type BackupSpec struct {
    DatabaseRef corev1.LocalObjectReference `json:"databaseRef"`
    Schedule    string                      `json:"schedule,omitempty"` // Cron format
    Retention   int                         `json:"retention,omitempty"` // Days
}

type BackupStatus struct {
    Phase          string    `json:"phase,omitempty"`
    BackupTime     time.Time `json:"backupTime,omitempty"`
    BackupLocation string    `json:"backupLocation,omitempty"`
    Size           string    `json:"size,omitempty"`
}

func (r *BackupReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    backup := &backupv1.Backup{}
    if err := r.Get(ctx, req.NamespacedName, backup); err != nil {
        return ctrl.Result{}, err
    }
    
    // Get Database
    db := &databasev1.Database{}
    err := r.Get(ctx, client.ObjectKey{
        Name:      backup.Spec.DatabaseRef.Name,
        Namespace: backup.Namespace,
    }, db)
    
    if err != nil {
        return ctrl.Result{}, err
    }
    
    // Perform backup
    if err := r.performBackup(ctx, db, backup); err != nil {
        backup.Status.Phase = "Failed"
        r.Status().Update(ctx, backup)
        return ctrl.Result{}, err
    }
    
    backup.Status.Phase = "Completed"
    backup.Status.BackupTime = metav1.Now()
    backup.Status.BackupLocation = r.getBackupLocation(backup)
    
    return ctrl.Result{}, r.Status().Update(ctx, backup)
}

Restore Patterns

Restore Flow

sequenceDiagram
    participant User
    participant Operator
    participant Database
    participant Backup as Backup System
    
    User->>Operator: Request Restore
    Operator->>Backup: Get Backup
    Backup-->>Operator: Backup Data
    Operator->>Database: Stop Database
    Operator->>Database: Restore Data
    Database-->>Operator: Restore Complete
    Operator->>Database: Start Database
    Operator->>Operator: Update Status
    
    Note over Operator: Point-in-time<br/>or latest backup

Implementing Restores

type RestoreSpec struct {
    BackupRef corev1.LocalObjectReference `json:"backupRef"`
    DatabaseRef corev1.LocalObjectReference `json:"databaseRef"`
    PointInTime *time.Time `json:"pointInTime,omitempty"`
}

func (r *RestoreReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    restore := &restorev1.Restore{}
    if err := r.Get(ctx, req.NamespacedName, restore); err != nil {
        return ctrl.Result{}, err
    }
    
    // Get Backup
    backup := &backupv1.Backup{}
    err := r.Get(ctx, client.ObjectKey{
        Name:      restore.Spec.BackupRef.Name,
        Namespace: restore.Namespace,
    }, backup)
    
    if err != nil {
        return ctrl.Result{}, err
    }
    
    // Get Database
    db := &databasev1.Database{}
    err = r.Get(ctx, client.ObjectKey{
        Name:      restore.Spec.DatabaseRef.Name,
        Namespace: restore.Namespace,
    }, db)
    
    // Perform restore
    if err := r.performRestore(ctx, db, backup, restore); err != nil {
        restore.Status.Phase = "Failed"
        r.Status().Update(ctx, restore)
        return ctrl.Result{}, err
    }
    
    restore.Status.Phase = "Completed"
    restore.Status.RestoreTime = metav1.Now()
    
    return ctrl.Result{}, r.Status().Update(ctx, restore)
}

Rolling Updates

Rolling Update Strategy

graph TB
    UPDATE[Rolling Update]
    
    UPDATE --> STEP1[Step 1: Update Pod 1]
    STEP1 --> WAIT1[Wait for Ready]
    WAIT1 --> STEP2[Step 2: Update Pod 2]
    STEP2 --> WAIT2[Wait for Ready]
    WAIT2 --> STEP3[Step 3: Update Pod 3]
    STEP3 --> COMPLETE[Complete]
    
    style UPDATE fill:#90EE90

Managing Rolling Updates

func (r *DatabaseReconciler) updateStatefulSet(ctx context.Context, db *databasev1.Database) error {
    statefulSet := &appsv1.StatefulSet{}
    err := r.Get(ctx, client.ObjectKey{
        Name:      db.Name,
        Namespace: db.Namespace,
    }, statefulSet)
    
    if err != nil {
        return err
    }
    
    // Check if update needed
    desiredImage := db.Spec.Image
    currentImage := statefulSet.Spec.Template.Spec.Containers[0].Image
    
    if desiredImage != currentImage {
        // Update image
        statefulSet.Spec.Template.Spec.Containers[0].Image = desiredImage
        
        // StatefulSet will perform rolling update automatically
        if err := r.Update(ctx, statefulSet); err != nil {
            return err
        }
        
        // Wait for update to complete
        return r.waitForRollingUpdate(ctx, statefulSet)
    }
    
    return nil
}

func (r *DatabaseReconciler) waitForRollingUpdate(ctx context.Context, ss *appsv1.StatefulSet) error {
    // Wait for all pods to be updated
    return wait.PollImmediate(5*time.Second, 5*time.Minute, func() (bool, error) {
        err := r.Get(ctx, client.ObjectKeyFromObject(ss), ss)
        if err != nil {
            return false, err
        }
        
        // Check if update complete
        return ss.Status.UpdatedReplicas == *ss.Spec.Replicas, nil
    })
}

Data Consistency

Consistency Guarantees

graph TB
    CONSISTENCY[Consistency]
    
    CONSISTENCY --> STRONG[Strong Consistency]
    CONSISTENCY --> EVENTUAL[Eventual Consistency]
    
    STRONG --> TRANSACTIONS[Transactions]
    STRONG --> LOCKING[Locking]
    
    EVENTUAL --> REPLICATION[Replication]
    EVENTUAL --> CONFLICT[Conflict Resolution]
    
    style STRONG fill:#90EE90
    style EVENTUAL fill:#FFE4B5

Ensuring Consistency

func (r *DatabaseReconciler) ensureDataConsistency(ctx context.Context, db *databasev1.Database) error {
    // For StatefulSets, consistency is handled by:
    // 1. Ordered pod creation
    // 2. Persistent volumes
    // 3. Pod identity
    
    // Check if all replicas are in sync
    statefulSet := &appsv1.StatefulSet{}
    err := r.Get(ctx, client.ObjectKey{
        Name:      db.Name,
        Namespace: db.Namespace,
    }, statefulSet)
    
    if err != nil {
        return err
    }
    
    // Verify all replicas are ready and consistent
    if statefulSet.Status.ReadyReplicas != *statefulSet.Spec.Replicas {
        return fmt.Errorf("not all replicas ready")
    }
    
    // Perform consistency check
    return r.performConsistencyCheck(ctx, db)
}

Key Takeaways

  • Backups protect data from loss
  • Restores recover from backups
  • Rolling updates update without downtime
  • Data consistency ensures correctness
  • StatefulSets provide ordered, stable pods
  • Persistent volumes maintain data
  • Point-in-time restore recovers to specific time

Understanding for Building Operators

When managing stateful applications:

  • Implement backup functionality
  • Support restore operations
  • Handle rolling updates carefully
  • Ensure data consistency
  • Use StatefulSets for stateful workloads
  • Leverage persistent volumes
  • Test backup/restore scenarios

References

Official Documentation

Further Reading

  • Kubernetes: Up and Running by Kelsey Hightower, Brendan Burns, and Joe Beda - Chapter 7: StatefulSets
  • Kubernetes Operators by Jason Dobies and Joshua Wood - Chapter 17: Stateful Applications
  • StatefulSet Patterns

Next Steps

Now that you understand stateful applications, let’s learn about real-world patterns and best practices.