Lab 7.4: Optimizing Performance

Related Lesson: Lesson 7.4: Performance and Scalability
Navigation: ← Previous Lab: HA | Module Overview

Objectives

  • Implement rate limiting
  • Add caching strategies
  • Optimize reconciliation
  • Profile and optimize performance

Prerequisites

  • Completion of Lab 7.3
  • Operator with HA setup
  • Understanding of performance concepts

Exercise 1: Configure Controller Rate Limiting

Controller-runtime (used by kubebuilder) has built-in rate limiting. Let’s configure it.

Task 1.1: Configure MaxConcurrentReconciles

Update your controller’s SetupWithManager in internal/controller/database_controller.go:

import (
    "time"
    
    "k8s.io/client-go/util/workqueue"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/controller"
    "sigs.k8s.io/controller-runtime/pkg/reconcile"
)

func (r *DatabaseReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&databasev1.Database{}).
        Owns(&appsv1.StatefulSet{}).
        Owns(&corev1.Service{}).
        Owns(&corev1.Secret{}).
        WithOptions(controller.Options{
            // Limit concurrent reconciliations
            MaxConcurrentReconciles: 2,
            // Custom rate limiter for requeue (typed for controller-runtime v0.19+)
            RateLimiter: workqueue.NewTypedItemExponentialFailureRateLimiter[reconcile.Request](
                time.Millisecond*5,    // Base delay
                time.Second*1000,      // Max delay
            ),
        }).
        Complete(r)
}

Task 1.2: Add Rate Limiting for External API Calls (Optional)

If your operator calls external APIs, add rate limiting:

import (
    "golang.org/x/time/rate"
)

type DatabaseReconciler struct {
    client.Client
    Scheme     *runtime.Scheme
    APILimiter *rate.Limiter  // For external API calls
}

// In cmd/main.go when creating the reconciler:
if err = (&controller.DatabaseReconciler{
    Client:     mgr.GetClient(),
    Scheme:     mgr.GetScheme(),
    APILimiter: rate.NewLimiter(rate.Limit(10), 1), // 10 req/sec
}).SetupWithManager(mgr); err != nil {
    setupLog.Error(err, "unable to create controller", "controller", "Database")
    os.Exit(1)
}

// In Reconcile, use before external calls:
func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // Wait for rate limiter before external API calls
    if err := r.APILimiter.Wait(ctx); err != nil {
        return ctrl.Result{}, err
    }
    // ... reconciliation with external API calls ...
}

Exercise 2: Add Field Indexing for Fast Lookups

Controller-runtime provides automatic caching. You can add custom indexes for fast lookups.

Task 2.1: Create Field Indexer

Add indexing in cmd/main.go before starting the manager:

// In cmd/main.go, after creating manager but before SetupWithManager

// Index databases by environment for fast filtering
if err := mgr.GetFieldIndexer().IndexField(
    context.Background(),
    &databasev1.Database{},
    "spec.environment",
    func(obj client.Object) []string {
        db := obj.(*databasev1.Database)
        if db.Spec.Environment == "" {
            return nil
        }
        return []string{db.Spec.Environment}
    },
); err != nil {
    setupLog.Error(err, "unable to create field index")
    os.Exit(1)
}

Task 2.2: Use Indexes in Controller

// In your reconciler, use MatchingFields for indexed queries
func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // Fast lookup using index
    prodDatabases := &databasev1.DatabaseList{}
    if err := r.List(ctx, prodDatabases, client.MatchingFields{
        "spec.environment": "production",
    }); err != nil {
        return ctrl.Result{}, err
    }
    
    // Use namespace selector for namespace-scoped queries
    nsDatabases := &databasev1.DatabaseList{}
    if err := r.List(ctx, nsDatabases, client.InNamespace(req.Namespace)); err != nil {
        return ctrl.Result{}, err
    }
    
    // ... rest of reconciliation
}

Exercise 3: Optimize Reconciliation

Task 3.1: Batch Operations

func (r *DatabaseReconciler) reconcileBatch(ctx context.Context, databases []databasev1.Database) error {
    // Group by operation
    var toCreate, toUpdate []databasev1.Database
    
    for _, db := range databases {
        if db.Status.Phase == "" {
            toCreate = append(toCreate, db)
        } else {
            toUpdate = append(toUpdate, db)
        }
    }
    
    // Batch create
    for _, db := range toCreate {
        if err := r.reconcileDatabase(ctx, &db); err != nil {
            return err
        }
    }
    
    // Batch update
    for _, db := range toUpdate {
        if err := r.reconcileDatabase(ctx, &db); err != nil {
            return err
        }
    }
    
    return nil
}

Exercise 4: Monitor Performance with Built-in Metrics

Controller-runtime automatically exposes metrics. Your postgres-operator already has custom metrics configured!

Task 4.1: Review Existing Metrics Code

Your operator already has metrics in internal/controller/metrics.go:

cd ~/postgres-operator

# Review the metrics file
cat internal/controller/metrics.go

You should see these custom metrics already defined:

var (
    // ReconcileTotal counts the total number of reconciliations
    ReconcileTotal = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "database_reconcile_total",
            Help: "Total number of reconciliations per controller",
        },
        []string{"result"}, // success, error, requeue
    )

    // ReconcileDuration measures the duration of reconciliations
    ReconcileDuration = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "database_reconcile_duration_seconds",
            Help:    "Duration of reconciliations in seconds",
            Buckets: prometheus.DefBuckets,
        },
        []string{"result"},
    )

    // DatabasesTotal tracks the current number of Database resources
    DatabasesTotal = prometheus.NewGaugeVec(
        prometheus.GaugeOpts{
            Name: "database_resources_total",
            Help: "Current number of Database resources by phase",
        },
        []string{"phase"},
    )

    // DatabaseInfo provides information about each database
    DatabaseInfo = prometheus.NewGaugeVec(
        prometheus.GaugeOpts{
            Name: "database_info",
            Help: "Information about Database resources",
        },
        []string{"name", "namespace", "image", "phase"},
    )
)

func init() {
    // Register custom metrics with the global registry
    metrics.Registry.MustRegister(
        ReconcileTotal,
        ReconcileDuration,
        DatabasesTotal,
        DatabaseInfo,
    )
}

Task 4.2: Review Metrics Usage in Controller

Check how metrics are used in the Reconcile function:

# See how metrics are recorded in the reconcile loop
grep -A 10 "Defer metrics" internal/controller/database_controller.go

You should see:

// Defer metrics recording
defer func() {
    duration := time.Since(start).Seconds()
    ReconcileDuration.WithLabelValues(reconcileResult).Observe(duration)
    ReconcileTotal.WithLabelValues(reconcileResult).Inc()
}()

And database info metrics being set:

DatabaseInfo.WithLabelValues(
    db.Name,
    db.Namespace,
    db.Spec.Image,
    db.Status.Phase,
).Set(1)

Task 4.3: Access Metrics Endpoint

The metrics endpoint requires authentication with a bearer token. We’ll use a ServiceAccount token to authenticate.

# For Docker: Build and Deploy the operator with network policies enabled
make docker-build IMG=postgres-operator:latest
kind load docker-image postgres-operator:latest --name k8s-operators-course
make deploy IMG=postgres-operator:latest

# For Podman: Build and Deploy operator - use localhost/ prefix to match the loaded image
make docker-build IMG=postgres-operator:latest CONTAINER_TOOL=podman
podman save localhost/postgres-operator:latest -o /tmp/postgres-operator.tar
kind load image-archive /tmp/postgres-operator.tar --name k8s-operators-course
rm /tmp/postgres-operator.tar
make deploy IMG=localhost/postgres-operator:latest

# Restart operator if already deployed
kubectl rollout restart deploy -n postgres-operator-system postgres-operator-controller-manager

# Port forward to metrics endpoint (using HTTPS on port 8443)
kubectl port-forward -n postgres-operator-system \
  svc/postgres-operator-controller-manager-metrics-service 8443:8443 &

# Get a token for authentication (use the controller-manager service account)
TOKEN=$(kubectl create token postgres-operator-controller-manager -n postgres-operator-system)

# create database to generate some metrics
kubectl apply -f - <<EOF
apiVersion: database.example.com/v1
kind: Database
metadata:
  name: valid-db
spec:
  image: postgres:14
  replicas: 1
  databaseName: mydb
  username: admin
  storage:
    size: 10Gi
EOF

# View custom database metrics (using -k for self-signed cert, -H for auth header)
curl -sk -H "Authorization: Bearer $TOKEN" https://localhost:8443/metrics | grep database_

# View reconciliation metrics
curl -sk -H "Authorization: Bearer $TOKEN" https://localhost:8443/metrics | grep database_reconcile

# View controller-runtime built-in metrics
curl -sk -H "Authorization: Bearer $TOKEN" https://localhost:8443/metrics | grep controller_runtime

# View all metrics
curl -sk -H "Authorization: Bearer $TOKEN" https://localhost:8443/metrics | head -100

# Stop port-forward
pkill -f "port-forward.*8443"

Note: The metrics endpoint uses Kubernetes RBAC for authorization. The ServiceAccount must have the metrics-reader ClusterRole (configured in Lab 7.2).

Task 4.4: View Metrics in Prometheus

If you have Prometheus set up (from Lab 7.2), view metrics there:

# Port forward to Prometheus
kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090 &

# Open http://localhost:9090 and query:
# - database_reconcile_total
# - database_reconcile_duration_seconds
# - database_resources_total
# - database_info

# Stop port-forward when done
pkill -f "port-forward.*9090"

Example Prometheus queries:

Query Description
database_reconcile_total Total reconciliations by result
rate(database_reconcile_total[5m]) Reconciliations per second
database_reconcile_duration_seconds_bucket Reconciliation latency histogram
histogram_quantile(0.99, rate(database_reconcile_duration_seconds_bucket[5m])) p99 latency
database_resources_total Current databases by phase
database_info Info about each database

Exercise 5: Load Testing

Task 5.1: Create Many Resources

# Create multiple databases for load testing
for i in {1..50}; do
  kubectl apply -f - <<EOF
apiVersion: database.example.com/v1
kind: Database
metadata:
  name: test-db-$i
  namespace: default
spec:
  image: postgres:14
  replicas: 1
  databaseName: db$i
  username: admin
  storage:
    size: 1Gi
EOF
done

echo "Created 50 test databases"

Task 5.2: Monitor Performance Under Load

Note: kubectl top requires metrics-server to be installed. The course setup script (scripts/setup-kind-cluster.sh) installs it automatically. If you get “Metrics API not available”, install it manually:

# Install metrics-server (if not already installed)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Patch for kind (disable TLS verification for kubelet)
kubectl patch deployment metrics-server -n kube-system --type='json' -p='[
  {"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"},
  {"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-preferred-address-types=InternalIP"}
]'

# Wait for it to be ready
kubectl rollout status deployment/metrics-server -n kube-system

Now monitor performance:

# Watch operator resource usage (requires metrics-server)
watch kubectl top pods -n postgres-operator-system -l control-plane=controller-manager

# Alternative: Check resource requests/limits if metrics-server not available
kubectl get pods -n postgres-operator-system -l control-plane=controller-manager -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].resources}{"\n"}{end}'

# In another terminal, watch reconciliation metrics
# Port forward to metrics endpoint (using HTTPS on port 8443)
kubectl port-forward -n postgres-operator-system \
  svc/postgres-operator-controller-manager-metrics-service 8443:8443 &

# Get a token for authentication (use the controller-manager service account)
TOKEN=$(kubectl create token postgres-operator-controller-manager -n postgres-operator-system)

while true; do
  curl -sk -H "Authorization: Bearer $TOKEN" https://localhost:8443/metrics 2>/dev/null | grep database_reconcile_total
  sleep 5
done

# Check queue length
curl -sk -H "Authorization: Bearer $TOKEN" https://localhost:8443/metrics | grep workqueue

# Check controller logs for reconciliation activity
kubectl logs -n postgres-operator-system -l control-plane=controller-manager --tail=20 -f

Task 5.3: Verify All Resources Are Reconciled

# Check status of all databases
kubectl get databases -o custom-columns=NAME:.metadata.name,PHASE:.status.phase,READY:.status.ready

# Count databases in each phase
kubectl get databases -o jsonpath='{range .items[*]}{.status.phase}{"\n"}{end}' | sort | uniq -c

Cleanup

# Delete test resources
kubectl delete databases --all

# Undeploy operator
make undeploy

Lab Summary

In this lab, you:

  • Configured controller-runtime rate limiting
  • Added field indexing for fast lookups
  • Optimized reconciliation with MaxConcurrentReconciles
  • Added custom performance metrics
  • Load tested the operator with many resources

Key Learnings

  1. Controller-runtime has built-in rate limiting via RateLimiter option
  2. MaxConcurrentReconciles controls parallelism
  3. Field indexes enable fast filtered queries
  4. Built-in metrics are available at :8443/metrics
  5. Custom metrics use prometheus client with metrics.Registry
  6. Load testing validates operator performance at scale
  7. client.MatchingFields{} leverages indexes for fast lookups

Solutions

Complete working solutions for this lab are available in the solutions directory:

Congratulations!

You’ve completed Module 7! You now understand:

  • Packaging and distribution
  • RBAC and security
  • High availability
  • Performance optimization

In Module 8, you’ll learn about advanced topics and real-world patterns!