4.1 KiB
📱 Research: Flutter Integration Testing Evaluation
Issue: #533
Focus: Maestro vs. Marionette MCP (LeanCode)
Status: ✅ Completed
Target Apps: KROW Client App & KROW Staff App
1. Executive Summary & Recommendation
Following a technical spike implementing full authentication flows (Login/Signup) for both KROW platforms, Maestro is the recommended integration testing framework.
While Marionette MCP offers an innovative LLM-driven approach for exploratory debugging, it lacks the determinism required for a production-grade CI/CD pipeline. Maestro provides the stability, speed, and native OS interaction necessary to gate our releases effectively.
Why Maestro Wins for KROW:
- Zero-Flake Execution: Built-in wait logic handles Firebase Auth latency without hard-coded
sleep()calls. - Platform Parity: Single
.yamldefinitions drive both iOS and Android build variants. - Non-Invasive: Maestro tests the compiled
.apkor.app(Black-box), ensuring we test exactly what the user sees. - System Level Access: Handles native OS permission dialogs (Camera/Location/Notifications) which Marionette cannot "see."
2. Technical Evaluation Matrix
| Criteria | Maestro | Marionette MCP | Winner |
|---|---|---|---|
| Test Authoring | High Speed: Declarative YAML; Maestro Studio recorder. | Variable: Requires precise Prompt Engineering. | Maestro |
| Execution Latency | Low: Instantaneous interaction (~5s flows). | High: LLM API roundtrips (~45s+ flows). | Maestro |
| Environment | Works on Release/Production builds. | Restricted to Debug/Profile modes. | Maestro |
| CI/CD Readiness | Native CLI; easy GitHub Actions integration. | High overhead; depends on external AI APIs. | Maestro |
| Context Awareness | Interacts with Native OS & Bottom Sheets. | Limited to the Flutter Widget Tree. | Maestro |
3. Spike Analysis & Findings
Tool A: Maestro (The Standard)
We verified the login.yaml and signup.yaml flows across both apps. Maestro successfully abstracted the asynchronous nature of our Data Connect and Firebase backends.
- Pros: * Semantics Driven: By targeting
Semantics(identifier: '...')in our/design_system/, tests remain stable even if the UI text changes for localization.- Automatic Tolerance: It detects spinning loaders and waits for destination widgets automatically.
- Cons: * Requires strict adherence to adding
Semanticswrappers on all interactive components.
Tool B: Marionette MCP (The Experiment)
We spiked this using the marionette_flutter binding and executing via Cursor/Claude.
- Pros: * Phenomenal for visual "smoke testing" and live-debugging UI issues via natural language.
- Cons: * Non-Deterministic: Prone to "hallucinations" during heavy network traffic.
- Architecture Blocker: Requires the Dart VM Service to be active, making it impossible to test against hardened production builds.
4. Implementation & Migration Blueprint
Phase 1: Semantics Enforcement
We must enforce a linting rule or PR checklist: All interactive widgets in @krow/design_system must include a unique identifier.
// Standardized Implementation
Semantics(
identifier: 'login_submit_button',
child: KrowPrimaryButton(
onPressed: _handleLogin,
label: 'Sign In',
),
)
Phase 2: Repository Structure (Implemented)
Maestro flows are co-located with each app under auth/:
apps/mobile/apps/client/maestro/auth/sign_in.yaml— Client sign-inapps/mobile/apps/client/maestro/auth/sign_up.yaml— Client sign-upapps/mobile/apps/staff/maestro/auth/sign_in.yaml— Staff sign-in (phone + OTP)apps/mobile/apps/staff/maestro/auth/sign_up.yaml— Staff sign-up (phone + OTP)
Credentials are injected via env variables (never hardcoded). Use make test-e2e to run the suite.
Phase 3: CI/CD Integration
The Maestro CLI will be added to our GitHub Actions workflow to automate quality gates.
- Trigger: Every PR targeting
mainordevelop. - Action: Generate a build, execute
maestro test, and block merge on failure.