Krow-workspace/docs/research/flutter-testing-tools.md

# 📱 Research: Flutter Integration Testing Evaluation
**Issue:** #533
**Focus:** Maestro vs. Marionette MCP (LeanCode)
**Status:** ✅ Completed
**Target Apps:** `KROW Client App` & `KROW Staff App`

---

## 1. Executive Summary & Recommendation

Following a technical spike implementing full authentication flows (Login/Signup) for both KROW platforms, **Maestro is the recommended integration testing framework.**

While **Marionette MCP** offers an innovative LLM-driven approach for exploratory debugging, it lacks the determinism required for a production-grade CI/CD pipeline. Maestro provides the stability, speed, and native OS interaction necessary to gate our releases effectively.

### Why Maestro Wins for KROW:
* **Zero-Flake Execution:** Built-in wait logic handles Firebase Auth latency without hard-coded `sleep()` calls.
* **Platform Parity:** Single `.yaml` definitions drive both iOS and Android build variants.
* **Non-Invasive:** Maestro tests the compiled `.apk` or `.app` (Black-box), ensuring we test exactly what the user sees.
* **System Level Access:** Handles native OS permission dialogs (Camera/Location/Notifications) which Marionette cannot "see."

---

## 2. Technical Evaluation Matrix

| Criteria | Maestro | Marionette MCP | Winner |
| :--- | :--- | :--- | :--- |
| **Test Authoring** | **High Speed:** Declarative YAML; Maestro Studio recorder. | **Variable:** Requires precise Prompt Engineering. | **Maestro** |
| **Execution Latency** | **Low:** Instantaneous interaction (~5s flows). | **High:** LLM API roundtrips (~45s+ flows). | **Maestro** |
| **Environment** | Works on Release/Production builds. | Restricted to Debug/Profile modes. | **Maestro** |
| **CI/CD Readiness** | Native CLI; easy GitHub Actions integration. | High overhead; depends on external AI APIs. | **Maestro** |
| **Context Awareness** | Interacts with Native OS & Bottom Sheets. | Limited to the Flutter Widget Tree. | **Maestro** |

---

## 3. Spike Analysis & Findings

### Tool A: Maestro (The Standard)
We verified the `login.yaml` and `signup.yaml` flows across both apps. Maestro successfully abstracted the asynchronous nature of our **Data Connect** and **Firebase** backends.

* **Pros:** * **Semantics Driven:** By targeting `Semantics(identifier: '...')` in our `/design_system/`, tests remain stable even if the UI text changes for localization.
  * **Automatic Tolerance:** It detects spinning loaders and waits for destination widgets automatically.
* **Cons:** * Requires strict adherence to adding `Semantics` wrappers on all interactive components.

### Tool B: Marionette MCP (The Experiment)
We spiked this using the `marionette_flutter` binding and executing via **Cursor/Claude**.

* **Pros:** * Phenomenal for visual "smoke testing" and live-debugging UI issues via natural language.
* **Cons:** * **Non-Deterministic:** Prone to "hallucinations" during heavy network traffic.
  * **Architecture Blocker:** Requires the Dart VM Service to be active, making it impossible to test against hardened production builds.

---

## 4. Implementation & Migration Blueprint


### Phase 1: Semantics Enforcement
We must enforce a linting rule or PR checklist: All interactive widgets in `@krow/design_system` must include a unique `identifier`.

```dart
// Standardized Implementation
Semantics(
  identifier: 'login_submit_button',
  child: KrowPrimaryButton(
    onPressed: _handleLogin,
    label: 'Sign In',
  ),
)
```

### Phase 2: Repository Structure (Implemented)
Maestro flows are co-located with each app under `auth/`:

* `apps/mobile/apps/client/maestro/auth/sign_in.yaml` — Client sign-in
* `apps/mobile/apps/client/maestro/auth/sign_up.yaml` — Client sign-up
* `apps/mobile/apps/staff/maestro/auth/sign_in.yaml` — Staff sign-in (phone + OTP)
* `apps/mobile/apps/staff/maestro/auth/sign_up.yaml` — Staff sign-up (phone + OTP)

Credentials are injected via env variables (never hardcoded). Use `make test-e2e` to run the suite.

### Phase 3: CI/CD Integration
The Maestro CLI will be added to our **GitHub Actions** workflow to automate quality gates.

* **Trigger:** Every PR targeting `main` or `develop`.
* **Action:** Generate a build, execute `maestro test`, and block merge on failure.