Files
Krow-workspace/docs/research/flutter-testing-tools.md

86 lines
4.1 KiB
Markdown

# 📱 Research: Flutter Integration Testing Evaluation
**Issue:** #533
**Focus:** Maestro vs. Marionette MCP (LeanCode)
**Status:** ✅ Completed
**Target Apps:** `KROW Client App` & `KROW Staff App`
---
## 1. Executive Summary & Recommendation
Following a technical spike implementing full authentication flows (Login/Signup) for both KROW platforms, **Maestro is the recommended integration testing framework.**
While **Marionette MCP** offers an innovative LLM-driven approach for exploratory debugging, it lacks the determinism required for a production-grade CI/CD pipeline. Maestro provides the stability, speed, and native OS interaction necessary to gate our releases effectively.
### Why Maestro Wins for KROW:
* **Zero-Flake Execution:** Built-in wait logic handles Firebase Auth latency without hard-coded `sleep()` calls.
* **Platform Parity:** Single `.yaml` definitions drive both iOS and Android build variants.
* **Non-Invasive:** Maestro tests the compiled `.apk` or `.app` (Black-box), ensuring we test exactly what the user sees.
* **System Level Access:** Handles native OS permission dialogs (Camera/Location/Notifications) which Marionette cannot "see."
---
## 2. Technical Evaluation Matrix
| Criteria | Maestro | Marionette MCP | Winner |
| :--- | :--- | :--- | :--- |
| **Test Authoring** | **High Speed:** Declarative YAML; Maestro Studio recorder. | **Variable:** Requires precise Prompt Engineering. | **Maestro** |
| **Execution Latency** | **Low:** Instantaneous interaction (~5s flows). | **High:** LLM API roundtrips (~45s+ flows). | **Maestro** |
| **Environment** | Works on Release/Production builds. | Restricted to Debug/Profile modes. | **Maestro** |
| **CI/CD Readiness** | Native CLI; easy GitHub Actions integration. | High overhead; depends on external AI APIs. | **Maestro** |
| **Context Awareness** | Interacts with Native OS & Bottom Sheets. | Limited to the Flutter Widget Tree. | **Maestro** |
---
## 3. Spike Analysis & Findings
### Tool A: Maestro (The Standard)
We verified the `login.yaml` and `signup.yaml` flows across both apps. Maestro successfully abstracted the asynchronous nature of our **Data Connect** and **Firebase** backends.
* **Pros:** * **Semantics Driven:** By targeting `Semantics(identifier: '...')` in our `/design_system/`, tests remain stable even if the UI text changes for localization.
* **Automatic Tolerance:** It detects spinning loaders and waits for destination widgets automatically.
* **Cons:** * Requires strict adherence to adding `Semantics` wrappers on all interactive components.
### Tool B: Marionette MCP (The Experiment)
We spiked this using the `marionette_flutter` binding and executing via **Cursor/Claude**.
* **Pros:** * Phenomenal for visual "smoke testing" and live-debugging UI issues via natural language.
* **Cons:** * **Non-Deterministic:** Prone to "hallucinations" during heavy network traffic.
* **Architecture Blocker:** Requires the Dart VM Service to be active, making it impossible to test against hardened production builds.
---
## 4. Implementation & Migration Blueprint
### Phase 1: Semantics Enforcement
We must enforce a linting rule or PR checklist: All interactive widgets in `@krow/design_system` must include a unique `identifier`.
```dart
// Standardized Implementation
Semantics(
identifier: 'login_submit_button',
child: KrowPrimaryButton(
onPressed: _handleLogin,
label: 'Sign In',
),
)
```
### Phase 2: Repository Structure (Implemented)
Maestro flows are co-located with each app under `auth/`:
* `apps/mobile/apps/client/maestro/auth/sign_in.yaml` — Client sign-in
* `apps/mobile/apps/client/maestro/auth/sign_up.yaml` — Client sign-up
* `apps/mobile/apps/staff/maestro/auth/sign_in.yaml` — Staff sign-in (phone + OTP)
* `apps/mobile/apps/staff/maestro/auth/sign_up.yaml` — Staff sign-up (phone + OTP)
Credentials are injected via env variables (never hardcoded). Use `make test-e2e` to run the suite.
### Phase 3: CI/CD Integration
The Maestro CLI will be added to our **GitHub Actions** workflow to automate quality gates.
* **Trigger:** Every PR targeting `main` or `develop`.
* **Action:** Generate a build, execute `maestro test`, and block merge on failure.