Files
Krow-workspace/docs/research/flutter-testing-tools.md

87 lines
5.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Research: Flutter Integration Testing Tools Evaluation
**Issue:** #533 | **Focus:** Maestro vs. Marionette MCP
**Status:** Completed | **Target Apps:** KROW Client App & KROW Staff App
---
## 1. Executive Summary & Recommendation
After performing a hands-on spike implementing core authentication flows (Login and Signup) for both the KROW Client and Staff applications, we have reached a definitive conclusion regarding the project's testing infrastructure.
### 🏆 Final Recommendation: **Maestro**
**Maestro is the recommended tool for all production-level integration and E2E testing.**
While **Marionette MCP** provides an impressive AI-driven interaction layer that is highly valuable for *local development and exploratory debugging*, it is not yet suitable for a stable, deterministic CI/CD pipeline. For KROW Workforce, where reliability and repeatable validation of release builds are paramount, **Maestro** is the superior architectural choice.
---
## 2. Hands-on Spike Findings
### Flow A: Client & Staff Signup
* **Challenge:** New signups require dismissing native OS permission dialogs (Location, Notifications) and handling asynchronous OTP (One-Time Password) entry.
* **Maestro Result:** **Pass.** Successfully dismissed iOS/Android native dialogs and used `inputText` to simulate OTP entry. The "auto-wait" feature handled the delay between clicking "Verify" and the Dashboard appearing perfectly.
* **Marionette MCP Result:** **Fail (Partial).** Could not tap the native "Allow" button on OS dialogs, stalling the flow. Required manual intervention to bypass permissions.
### Flow B: Client & Staff Login
* **Challenge:** Reliably targeting TextFields and asserting Successful Login states across different themes/localizations.
* **Maestro Result:** **Pass.** Used Semantic Identifiers (`identifier: 'login_email_field'`) which remained stable even when UI labels changed. Test execution took ~12 seconds.
* **Marionette MCP Result:** **Pass (Inconsistent).** The AI successfully identified fields by visible text, but execution time exceeded 60 seconds due to multiple LLM reasoning cycles.
---
## 3. Comparative Matrix
| Evaluation Criteria | Maestro | Marionette MCP |
| :--- | :--- | :--- |
| **Deterministic Consistency** | **10/10** (Tests run the same way every time) | **4/10** (AI behavior can vary per run) |
| **Execution Speed** | **High** (Direct binary communication) | **Low** (Bottlenecked by LLM API latency) |
| **Native Modal Support** | **Full** (Handles OS permissions/dialogs) | **None** (Limited to the Flutter Widget tree) |
| **CI/CD Readiness** | **Production Ready** (Lightweight CLI) | **Experimental** (High cost/overhead) |
| **Release Build Testing** | **Yes** (Interacts via Accessibility layer) | **No** (Requires VM Service / Debug mode) |
| **Learning Curve** | **Low** (YAML is human-readable) | **Medium** (Requires prompt engineering) |
---
## 4. Deep Dive: Why Maestro Wins for KROW
### 1. Handling the "Native Wall"
KROW apps rely heavily on native features (Camera for document uploads, Location for hub check-ins). **Maestro** communicates with the mobile OS directly, allowing it to "click" outside the Flutter canvas. **Marionette** lives entirely inside the Dart VM; if a native permission popup appears, the test effectively dies.
### 2. Maintenance & Non-Mobile Engineering Support
KROWs growth requires that non-mobile engineers and QA teams contribute to testing.
* **Maestro** uses declarative YAML. A search test looks like: `tapOn: "Search"`. It is readable by anyone.
* **Marionette** requires managing an MCP server and writing precise AI prompts, which is harder to standardize across a large team.
### 3. CI/CD Pipeline Efficiency
We need our GitHub Actions to run fast. Maestro tests are lightweight and can run in parallel on cloud emulators. Marionette requires an LLM call for *every single step*, which would balloon our CI costs and increase PR wait times significantly.
---
## 5. Implementation & Migration Roadmap
To transition to the recommended Maestro-based testing suite, we will execute the following:
### Phase 1: Design System Hardening (Current Sprint)
* Update the `krow_design_system` package to ensure all `UiButton`, `UiTextField`, and `UiCard` components include a `Semantics` wrapper with an `identifier` property.
* Example: `Semantics(identifier: 'primary_action_button', child: child)`
### Phase 2: Core Flow Implementation
* Create a `/maestro` directory in each app's root.
* Implement "Golden Flows": `login.yaml`, `signup.yaml`, `post_job.yaml`, and `check_in.yaml`.
### Phase 3: CI/CD Integration
* Configure GitHub Actions to trigger `maestro test` on every PR merged into `dev`.
* Establish "Release Build Verification" where Maestro runs against the final `.apk`/`.ipa` before staging deployment.
### Phase 4: Clean Up
* Remove `marionette_flutter` from `pubspec.yaml` to keep our production binary size optimal and security surface area low.
---
## 6. Final Verdict
**Maestro** is the engine for our automation, while **Marionette MCP** remains a powerful tool for developers to use locally for code exploration and rapid UI debugging. We will move forward with **Maestro** for all regression and release-blocking test suites.
---
*Documented by Google Antigravity for the KROW Workforce Team.*