Files
Krow-workspace/docs/research/flutter-testing-tools.md

5.2 KiB
Raw Blame History

Research: Flutter Integration Testing Tools Evaluation

Issue: #533 | Focus: Maestro vs. Marionette MCP Status: Completed | Target Apps: KROW Client App & KROW Staff App


1. Executive Summary & Recommendation

After performing a hands-on spike implementing core authentication flows (Login and Signup) for both the KROW Client and Staff applications, we have reached a definitive conclusion regarding the project's testing infrastructure.

🏆 Final Recommendation: Maestro

Maestro is the recommended tool for all production-level integration and E2E testing.

While Marionette MCP provides an impressive AI-driven interaction layer that is highly valuable for local development and exploratory debugging, it is not yet suitable for a stable, deterministic CI/CD pipeline. For KROW Workforce, where reliability and repeatable validation of release builds are paramount, Maestro is the superior architectural choice.


2. Hands-on Spike Findings

Flow A: Client & Staff Signup

  • Challenge: New signups require dismissing native OS permission dialogs (Location, Notifications) and handling asynchronous OTP (One-Time Password) entry.
  • Maestro Result: Pass. Successfully dismissed iOS/Android native dialogs and used inputText to simulate OTP entry. The "auto-wait" feature handled the delay between clicking "Verify" and the Dashboard appearing perfectly.
  • Marionette MCP Result: Fail (Partial). Could not tap the native "Allow" button on OS dialogs, stalling the flow. Required manual intervention to bypass permissions.

Flow B: Client & Staff Login

  • Challenge: Reliably targeting TextFields and asserting Successful Login states across different themes/localizations.
  • Maestro Result: Pass. Used Semantic Identifiers (identifier: 'login_email_field') which remained stable even when UI labels changed. Test execution took ~12 seconds.
  • Marionette MCP Result: Pass (Inconsistent). The AI successfully identified fields by visible text, but execution time exceeded 60 seconds due to multiple LLM reasoning cycles.

3. Comparative Matrix

Evaluation Criteria Maestro Marionette MCP
Deterministic Consistency 10/10 (Tests run the same way every time) 4/10 (AI behavior can vary per run)
Execution Speed High (Direct binary communication) Low (Bottlenecked by LLM API latency)
Native Modal Support Full (Handles OS permissions/dialogs) None (Limited to the Flutter Widget tree)
CI/CD Readiness Production Ready (Lightweight CLI) Experimental (High cost/overhead)
Release Build Testing Yes (Interacts via Accessibility layer) No (Requires VM Service / Debug mode)
Learning Curve Low (YAML is human-readable) Medium (Requires prompt engineering)

4. Deep Dive: Why Maestro Wins for KROW

1. Handling the "Native Wall"

KROW apps rely heavily on native features (Camera for document uploads, Location for hub check-ins). Maestro communicates with the mobile OS directly, allowing it to "click" outside the Flutter canvas. Marionette lives entirely inside the Dart VM; if a native permission popup appears, the test effectively dies.

2. Maintenance & Non-Mobile Engineering Support

KROWs growth requires that non-mobile engineers and QA teams contribute to testing.

  • Maestro uses declarative YAML. A search test looks like: tapOn: "Search". It is readable by anyone.
  • Marionette requires managing an MCP server and writing precise AI prompts, which is harder to standardize across a large team.

3. CI/CD Pipeline Efficiency

We need our GitHub Actions to run fast. Maestro tests are lightweight and can run in parallel on cloud emulators. Marionette requires an LLM call for every single step, which would balloon our CI costs and increase PR wait times significantly.


5. Implementation & Migration Roadmap

To transition to the recommended Maestro-based testing suite, we will execute the following:

Phase 1: Design System Hardening (Current Sprint)

  • Update the krow_design_system package to ensure all UiButton, UiTextField, and UiCard components include a Semantics wrapper with an identifier property.
  • Example: Semantics(identifier: 'primary_action_button', child: child)

Phase 2: Core Flow Implementation

  • Create a /maestro directory in each app's root.
  • Implement "Golden Flows": login.yaml, signup.yaml, post_job.yaml, and check_in.yaml.

Phase 3: CI/CD Integration

  • Configure GitHub Actions to trigger maestro test on every PR merged into dev.
  • Establish "Release Build Verification" where Maestro runs against the final .apk/.ipa before staging deployment.

Phase 4: Clean Up

  • Remove marionette_flutter from pubspec.yaml to keep our production binary size optimal and security surface area low.

6. Final Verdict

Maestro is the engine for our automation, while Marionette MCP remains a powerful tool for developers to use locally for code exploration and rapid UI debugging. We will move forward with Maestro for all regression and release-blocking test suites.