Problem
Design and implement a saga pattern to orchestrate a multi-step order processing flow across multiple microservices. Traditional distributed transactions (two-phase commit) are impractical because each service has its own database. The saga pattern uses a sequence of local transactions with compensating actions to maintain data consistency.
The Order Processing Flow
- Create Order (Order Service): Create an order record with status "pending."
- Reserve Inventory (Inventory Service): Decrement available stock for each item.
- Process Payment (Payment Service): Charge the customer's payment method.
- Fulfill Order (Fulfillment Service): Create a shipment and schedule delivery.
- Send Confirmation (Notification Service): Send order confirmation email and push notification.
If any step fails, all previously completed steps must be compensated (rolled back).
Compensating Actions
- Unreserve Inventory: Add the reserved quantity back to available stock.
- Refund Payment: Issue a refund for the charged amount.
- Cancel Order: Update order status to "cancelled."
- Cancel Fulfillment: Cancel the shipment if it has not been dispatched.
Requirements
- Orchestration: Implement an orchestrator that manages the saga lifecycle and tracks which steps have completed.
- Compensation: If step N fails, execute compensating actions for steps N-1 through 1 in reverse order.
- Idempotency: Both forward actions and compensating actions must be idempotent (safe to retry).
- Visibility: Provide a way to query the current state of any saga instance.
- Timeout Handling: If a step does not complete within a timeout, trigger compensation.
- Dead Letter: If compensation itself fails, escalate to a dead-letter queue for manual intervention.
Constraints
- Services communicate via message queues (e.g., SQS, RabbitMQ).
- Each service has its own PostgreSQL database.
- The saga orchestrator must survive restarts (persist saga state).
- The system processes 200 orders per minute at peak.
- End-to-end order processing should complete within 30 seconds.
What to Design
- The saga orchestrator architecture and state machine
- The message format between orchestrator and services
- How saga state is persisted and recovered after crashes
- The compensation execution strategy
- Error handling and escalation paths