# Research: Flutter Integration Testing Tools Evaluation **Issue:** #533 | **Focus:** Maestro vs. Marionette MCP **Status:** Completed | **Target Apps:** KROW Client App & KROW Staff App --- ## 1. Executive Summary & Recommendation After performing a hands-on spike implementing core authentication flows (Login and Signup) for both the KROW Client and Staff applications, we have reached a definitive conclusion regarding the project's testing infrastructure. ### 🏆 Final Recommendation: **Maestro** **Maestro is the recommended tool for all production-level integration and E2E testing.** While **Marionette MCP** provides an impressive AI-driven interaction layer that is highly valuable for *local development and exploratory debugging*, it is not yet suitable for a stable, deterministic CI/CD pipeline. For KROW Workforce, where reliability and repeatable validation of release builds are paramount, **Maestro** is the superior architectural choice. --- ## 2. Hands-on Spike Findings ### Flow A: Client & Staff Signup * **Challenge:** New signups require dismissing native OS permission dialogs (Location, Notifications) and handling asynchronous OTP (One-Time Password) entry. * **Maestro Result:** **Pass.** Successfully dismissed iOS/Android native dialogs and used `inputText` to simulate OTP entry. The "auto-wait" feature handled the delay between clicking "Verify" and the Dashboard appearing perfectly. * **Marionette MCP Result:** **Fail (Partial).** Could not tap the native "Allow" button on OS dialogs, stalling the flow. Required manual intervention to bypass permissions. ### Flow B: Client & Staff Login * **Challenge:** Reliably targeting TextFields and asserting Successful Login states across different themes/localizations. * **Maestro Result:** **Pass.** Used Semantic Identifiers (`identifier: 'login_email_field'`) which remained stable even when UI labels changed. Test execution took ~12 seconds. * **Marionette MCP Result:** **Pass (Inconsistent).** The AI successfully identified fields by visible text, but execution time exceeded 60 seconds due to multiple LLM reasoning cycles. --- ## 3. Comparative Matrix | Evaluation Criteria | Maestro | Marionette MCP | | :--- | :--- | :--- | | **Deterministic Consistency** | **10/10** (Tests run the same way every time) | **4/10** (AI behavior can vary per run) | | **Execution Speed** | **High** (Direct binary communication) | **Low** (Bottlenecked by LLM API latency) | | **Native Modal Support** | **Full** (Handles OS permissions/dialogs) | **None** (Limited to the Flutter Widget tree) | | **CI/CD Readiness** | **Production Ready** (Lightweight CLI) | **Experimental** (High cost/overhead) | | **Release Build Testing** | **Yes** (Interacts via Accessibility layer) | **No** (Requires VM Service / Debug mode) | | **Learning Curve** | **Low** (YAML is human-readable) | **Medium** (Requires prompt engineering) | --- ## 4. Deep Dive: Why Maestro Wins for KROW ### 1. Handling the "Native Wall" KROW apps rely heavily on native features (Camera for document uploads, Location for hub check-ins). **Maestro** communicates with the mobile OS directly, allowing it to "click" outside the Flutter canvas. **Marionette** lives entirely inside the Dart VM; if a native permission popup appears, the test effectively dies. ### 2. Maintenance & Non-Mobile Engineering Support KROW’s growth requires that non-mobile engineers and QA teams contribute to testing. * **Maestro** uses declarative YAML. A search test looks like: `tapOn: "Search"`. It is readable by anyone. * **Marionette** requires managing an MCP server and writing precise AI prompts, which is harder to standardize across a large team. ### 3. CI/CD Pipeline Efficiency We need our GitHub Actions to run fast. Maestro tests are lightweight and can run in parallel on cloud emulators. Marionette requires an LLM call for *every single step*, which would balloon our CI costs and increase PR wait times significantly. --- ## 5. Implementation & Migration Roadmap To transition to the recommended Maestro-based testing suite, we will execute the following: ### Phase 1: Design System Hardening (Current Sprint) * Update the `krow_design_system` package to ensure all `UiButton`, `UiTextField`, and `UiCard` components include a `Semantics` wrapper with an `identifier` property. * Example: `Semantics(identifier: 'primary_action_button', child: child)` ### Phase 2: Core Flow Implementation * Create a `/maestro` directory in each app's root. * Implement "Golden Flows": `login.yaml`, `signup.yaml`, `post_job.yaml`, and `check_in.yaml`. ### Phase 3: CI/CD Integration * Configure GitHub Actions to trigger `maestro test` on every PR merged into `dev`. * Establish "Release Build Verification" where Maestro runs against the final `.apk`/`.ipa` before staging deployment. ### Phase 4: Clean Up * Remove `marionette_flutter` from `pubspec.yaml` to keep our production binary size optimal and security surface area low. --- ## 6. Final Verdict **Maestro** is the engine for our automation, while **Marionette MCP** remains a powerful tool for developers to use locally for code exploration and rapid UI debugging. We will move forward with **Maestro** for all regression and release-blocking test suites. ---