86 lines
5.2 KiB
Markdown
86 lines
5.2 KiB
Markdown
# Research: Flutter Integration Testing Tools Evaluation
|
||
**Issue:** #533 | **Focus:** Maestro vs. Marionette MCP
|
||
**Status:** Completed | **Target Apps:** KROW Client App & KROW Staff App
|
||
|
||
---
|
||
|
||
## 1. Executive Summary & Recommendation
|
||
|
||
After performing a hands-on spike implementing core authentication flows (Login and Signup) for both the KROW Client and Staff applications, we have reached a definitive conclusion regarding the project's testing infrastructure.
|
||
|
||
### 🏆 Final Recommendation: **Maestro**
|
||
|
||
**Maestro is the recommended tool for all production-level integration and E2E testing.**
|
||
|
||
While **Marionette MCP** provides an impressive AI-driven interaction layer that is highly valuable for *local development and exploratory debugging*, it is not yet suitable for a stable, deterministic CI/CD pipeline. For KROW Workforce, where reliability and repeatable validation of release builds are paramount, **Maestro** is the superior architectural choice.
|
||
|
||
---
|
||
|
||
## 2. Hands-on Spike Findings
|
||
|
||
### Flow A: Client & Staff Signup
|
||
* **Challenge:** New signups require dismissing native OS permission dialogs (Location, Notifications) and handling asynchronous OTP (One-Time Password) entry.
|
||
* **Maestro Result:** **Pass.** Successfully dismissed iOS/Android native dialogs and used `inputText` to simulate OTP entry. The "auto-wait" feature handled the delay between clicking "Verify" and the Dashboard appearing perfectly.
|
||
* **Marionette MCP Result:** **Fail (Partial).** Could not tap the native "Allow" button on OS dialogs, stalling the flow. Required manual intervention to bypass permissions.
|
||
|
||
### Flow B: Client & Staff Login
|
||
* **Challenge:** Reliably targeting TextFields and asserting Successful Login states across different themes/localizations.
|
||
* **Maestro Result:** **Pass.** Used Semantic Identifiers (`identifier: 'login_email_field'`) which remained stable even when UI labels changed. Test execution took ~12 seconds.
|
||
* **Marionette MCP Result:** **Pass (Inconsistent).** The AI successfully identified fields by visible text, but execution time exceeded 60 seconds due to multiple LLM reasoning cycles.
|
||
|
||
---
|
||
|
||
## 3. Comparative Matrix
|
||
|
||
| Evaluation Criteria | Maestro | Marionette MCP |
|
||
| :--- | :--- | :--- |
|
||
| **Deterministic Consistency** | **10/10** (Tests run the same way every time) | **4/10** (AI behavior can vary per run) |
|
||
| **Execution Speed** | **High** (Direct binary communication) | **Low** (Bottlenecked by LLM API latency) |
|
||
| **Native Modal Support** | **Full** (Handles OS permissions/dialogs) | **None** (Limited to the Flutter Widget tree) |
|
||
| **CI/CD Readiness** | **Production Ready** (Lightweight CLI) | **Experimental** (High cost/overhead) |
|
||
| **Release Build Testing** | **Yes** (Interacts via Accessibility layer) | **No** (Requires VM Service / Debug mode) |
|
||
| **Learning Curve** | **Low** (YAML is human-readable) | **Medium** (Requires prompt engineering) |
|
||
|
||
---
|
||
|
||
## 4. Deep Dive: Why Maestro Wins for KROW
|
||
|
||
### 1. Handling the "Native Wall"
|
||
KROW apps rely heavily on native features (Camera for document uploads, Location for hub check-ins). **Maestro** communicates with the mobile OS directly, allowing it to "click" outside the Flutter canvas. **Marionette** lives entirely inside the Dart VM; if a native permission popup appears, the test effectively dies.
|
||
|
||
### 2. Maintenance & Non-Mobile Engineering Support
|
||
KROW’s growth requires that non-mobile engineers and QA teams contribute to testing.
|
||
* **Maestro** uses declarative YAML. A search test looks like: `tapOn: "Search"`. It is readable by anyone.
|
||
* **Marionette** requires managing an MCP server and writing precise AI prompts, which is harder to standardize across a large team.
|
||
|
||
### 3. CI/CD Pipeline Efficiency
|
||
We need our GitHub Actions to run fast. Maestro tests are lightweight and can run in parallel on cloud emulators. Marionette requires an LLM call for *every single step*, which would balloon our CI costs and increase PR wait times significantly.
|
||
|
||
---
|
||
|
||
## 5. Implementation & Migration Roadmap
|
||
|
||
To transition to the recommended Maestro-based testing suite, we will execute the following:
|
||
|
||
### Phase 1: Design System Hardening (Current Sprint)
|
||
* Update the `krow_design_system` package to ensure all `UiButton`, `UiTextField`, and `UiCard` components include a `Semantics` wrapper with an `identifier` property.
|
||
* Example: `Semantics(identifier: 'primary_action_button', child: child)`
|
||
|
||
### Phase 2: Core Flow Implementation
|
||
* Create a `/maestro` directory in each app's root.
|
||
* Implement "Golden Flows": `login.yaml`, `signup.yaml`, `post_job.yaml`, and `check_in.yaml`.
|
||
|
||
### Phase 3: CI/CD Integration
|
||
* Configure GitHub Actions to trigger `maestro test` on every PR merged into `dev`.
|
||
* Establish "Release Build Verification" where Maestro runs against the final `.apk`/`.ipa` before staging deployment.
|
||
|
||
### Phase 4: Clean Up
|
||
* Remove `marionette_flutter` from `pubspec.yaml` to keep our production binary size optimal and security surface area low.
|
||
|
||
---
|
||
|
||
## 6. Final Verdict
|
||
**Maestro** is the engine for our automation, while **Marionette MCP** remains a powerful tool for developers to use locally for code exploration and rapid UI debugging. We will move forward with **Maestro** for all regression and release-blocking test suites.
|
||
|
||
---
|