# 📱 Research: Flutter Integration Testing Evaluation **Issue:** #533 **Focus:** Maestro vs. Marionette MCP (LeanCode) **Status:** ✅ Completed **Target Apps:** `KROW Client App` & `KROW Staff App` --- ## 1. Executive Summary & Recommendation Following a technical spike implementing full authentication flows (Login/Signup) for both KROW platforms, **Maestro is the recommended integration testing framework.** While **Marionette MCP** offers an innovative LLM-driven approach for exploratory debugging, it lacks the determinism required for a production-grade CI/CD pipeline. Maestro provides the stability, speed, and native OS interaction necessary to gate our releases effectively. ### Why Maestro Wins for KROW: * **Zero-Flake Execution:** Built-in wait logic handles Firebase Auth latency without hard-coded `sleep()` calls. * **Platform Parity:** Single `.yaml` definitions drive both iOS and Android build variants. * **Non-Invasive:** Maestro tests the compiled `.apk` or `.app` (Black-box), ensuring we test exactly what the user sees. * **System Level Access:** Handles native OS permission dialogs (Camera/Location/Notifications) which Marionette cannot "see." --- ## 2. Technical Evaluation Matrix | Criteria | Maestro | Marionette MCP | Winner | | :--- | :--- | :--- | :--- | | **Test Authoring** | **High Speed:** Declarative YAML; Maestro Studio recorder. | **Variable:** Requires precise Prompt Engineering. | **Maestro** | | **Execution Latency** | **Low:** Instantaneous interaction (~5s flows). | **High:** LLM API roundtrips (~45s+ flows). | **Maestro** | | **Environment** | Works on Release/Production builds. | Restricted to Debug/Profile modes. | **Maestro** | | **CI/CD Readiness** | Native CLI; easy GitHub Actions integration. | High overhead; depends on external AI APIs. | **Maestro** | | **Context Awareness** | Interacts with Native OS & Bottom Sheets. | Limited to the Flutter Widget Tree. | **Maestro** | --- ## 3. Spike Analysis & Findings ### Tool A: Maestro (The Standard) We verified the `login.yaml` and `signup.yaml` flows across both apps. Maestro successfully abstracted the asynchronous nature of our **Data Connect** and **Firebase** backends. * **Pros:** * **Semantics Driven:** By targeting `Semantics(identifier: '...')` in our `/design_system/`, tests remain stable even if the UI text changes for localization. * **Automatic Tolerance:** It detects spinning loaders and waits for destination widgets automatically. * **Cons:** * Requires strict adherence to adding `Semantics` wrappers on all interactive components. ### Tool B: Marionette MCP (The Experiment) We spiked this using the `marionette_flutter` binding and executing via **Cursor/Claude**. * **Pros:** * Phenomenal for visual "smoke testing" and live-debugging UI issues via natural language. * **Cons:** * **Non-Deterministic:** Prone to "hallucinations" during heavy network traffic. * **Architecture Blocker:** Requires the Dart VM Service to be active, making it impossible to test against hardened production builds. --- ## 4. Implementation & Migration Blueprint ### Phase 1: Semantics Enforcement We must enforce a linting rule or PR checklist: All interactive widgets in `@krow/design_system` must include a unique `identifier`. ```dart // Standardized Implementation Semantics( identifier: 'login_submit_button', child: KrowPrimaryButton( onPressed: _handleLogin, label: 'Sign In', ), ) ``` ### Phase 2: Repository Structure (Implemented) Maestro flows are co-located with each app: * `apps/mobile/apps/client/maestro/login.yaml` — Client login * `apps/mobile/apps/client/maestro/signup.yaml` — Client signup * `apps/mobile/apps/staff/maestro/login.yaml` — Staff login (phone + OTP) * `apps/mobile/apps/staff/maestro/signup.yaml` — Staff signup (phone + OTP) Each directory has a README with run instructions. **Marionette MCP:** `marionette_flutter` is added to both apps; `MarionetteBinding` is initialized in debug mode. See [marionette-spike-usage.md](marionette-spike-usage.md) for prompts and workflow. ### Phase 3: CI/CD Integration The Maestro CLI will be added to our **GitHub Actions** workflow to automate quality gates. * **Trigger:** Every PR targeting `main` or `develop`. * **Action:** Generate a build, execute `maestro test`, and block merge on failure.