Files
Krow-workspace/docs/research/flutter-testing-tools.md

4.1 KiB

📱 Research: Flutter Integration Testing Evaluation

Issue: #533
Focus: Maestro vs. Marionette MCP (LeanCode)
Status: Completed
Target Apps: KROW Client App & KROW Staff App


1. Executive Summary & Recommendation

Following a technical spike implementing full authentication flows (Login/Signup) for both KROW platforms, Maestro is the recommended integration testing framework.

While Marionette MCP offers an innovative LLM-driven approach for exploratory debugging, it lacks the determinism required for a production-grade CI/CD pipeline. Maestro provides the stability, speed, and native OS interaction necessary to gate our releases effectively.

Why Maestro Wins for KROW:

  • Zero-Flake Execution: Built-in wait logic handles Firebase Auth latency without hard-coded sleep() calls.
  • Platform Parity: Single .yaml definitions drive both iOS and Android build variants.
  • Non-Invasive: Maestro tests the compiled .apk or .app (Black-box), ensuring we test exactly what the user sees.
  • System Level Access: Handles native OS permission dialogs (Camera/Location/Notifications) which Marionette cannot "see."

2. Technical Evaluation Matrix

Criteria Maestro Marionette MCP Winner
Test Authoring High Speed: Declarative YAML; Maestro Studio recorder. Variable: Requires precise Prompt Engineering. Maestro
Execution Latency Low: Instantaneous interaction (~5s flows). High: LLM API roundtrips (~45s+ flows). Maestro
Environment Works on Release/Production builds. Restricted to Debug/Profile modes. Maestro
CI/CD Readiness Native CLI; easy GitHub Actions integration. High overhead; depends on external AI APIs. Maestro
Context Awareness Interacts with Native OS & Bottom Sheets. Limited to the Flutter Widget Tree. Maestro

3. Spike Analysis & Findings

Tool A: Maestro (The Standard)

We verified the login.yaml and signup.yaml flows across both apps. Maestro successfully abstracted the asynchronous nature of our Data Connect and Firebase backends.

  • Pros: * Semantics Driven: By targeting Semantics(identifier: '...') in our /design_system/, tests remain stable even if the UI text changes for localization.
    • Automatic Tolerance: It detects spinning loaders and waits for destination widgets automatically.
  • Cons: * Requires strict adherence to adding Semantics wrappers on all interactive components.

Tool B: Marionette MCP (The Experiment)

We spiked this using the marionette_flutter binding and executing via Cursor/Claude.

  • Pros: * Phenomenal for visual "smoke testing" and live-debugging UI issues via natural language.
  • Cons: * Non-Deterministic: Prone to "hallucinations" during heavy network traffic.
    • Architecture Blocker: Requires the Dart VM Service to be active, making it impossible to test against hardened production builds.

4. Implementation & Migration Blueprint

Phase 1: Semantics Enforcement

We must enforce a linting rule or PR checklist: All interactive widgets in @krow/design_system must include a unique identifier.

// Standardized Implementation
Semantics(
  identifier: 'login_submit_button',
  child: KrowPrimaryButton(
    onPressed: _handleLogin,
    label: 'Sign In',
  ),
)

Phase 2: Repository Structure (Implemented)

Maestro flows are co-located with each app under auth/:

  • apps/mobile/apps/client/maestro/auth/sign_in.yaml — Client sign-in
  • apps/mobile/apps/client/maestro/auth/sign_up.yaml — Client sign-up
  • apps/mobile/apps/staff/maestro/auth/sign_in.yaml — Staff sign-in (phone + OTP)
  • apps/mobile/apps/staff/maestro/auth/sign_up.yaml — Staff sign-up (phone + OTP)

Credentials are injected via env variables (never hardcoded). Use make test-e2e to run the suite.

Phase 3: CI/CD Integration

The Maestro CLI will be added to our GitHub Actions workflow to automate quality gates.

  • Trigger: Every PR targeting main or develop.
  • Action: Generate a build, execute maestro test, and block merge on failure.