Update flutter-testing-tools.md
This commit is contained in:
@@ -1,88 +1,81 @@
|
|||||||
# Research: Flutter Integration Testing Tools Evaluation
|
# 📱 Research: Flutter Integration Testing Evaluation
|
||||||
**Issue:** #533 | **Focus:** Maestro vs. Marionette MCP
|
**Issue:** #533
|
||||||
**Status:** Completed | **Target Apps:** KROW Client App & KROW Staff App
|
**Focus:** Maestro vs. Marionette MCP (LeanCode)
|
||||||
|
**Status:** ✅ Completed
|
||||||
|
**Target Apps:** `KROW Client App` & `KROW Staff App`
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 1. Executive Summary & Recommendation
|
## 1. Executive Summary & Recommendation
|
||||||
|
|
||||||
Based on a comprehensive hands-on spike implementing full login and signup flows for both the Staff and Client applications, **our definitive recommendation for the KROW Workforce platform is Maestro.**
|
Following a technical spike implementing full authentication flows (Login/Signup) for both KROW platforms, **Maestro is the recommended integration testing framework.**
|
||||||
|
|
||||||
While Marionette MCP presents a fascinating, forward-looking paradigm for AI-driven development and exploratory smoke testing, it fundamentally fails to meet the requirements of a deterministic, fast, and scalable CI/CD pipeline. Testing mobile applications securely and reliably prior to release requires repeatable integration sweeps, which Maestro delivers flawlessly via highly readable YAML.
|
While **Marionette MCP** offers an innovative LLM-driven approach for exploratory debugging, it lacks the determinism required for a production-grade CI/CD pipeline. Maestro provides the stability, speed, and native OS interaction necessary to gate our releases effectively.
|
||||||
|
|
||||||
**Why Maestro is the right choice for KROW:**
|
### Why Maestro Wins for KROW:
|
||||||
1. **Zero Flakiness in CI:** Maestro’s built-in accessibility layer integration understands when screens are loading natively, removing the need for fragile `sleep()` or timeout logic.
|
* **Zero-Flake Execution:** Built-in wait logic handles Firebase Auth latency without hard-coded `sleep()` calls.
|
||||||
2. **Platform Parity:** A single `login.yaml` file runs natively on both our iOS and Android build variants.
|
* **Platform Parity:** Single `.yaml` definitions drive both iOS and Android build variants.
|
||||||
3. **No App Instrumentation:** Maestro interacts with the app from the outside (black-box testing). In contrast, Marionette requires binding `marionette_flutter` into our core `main.dart`, strictly limiting its use to Debug/Profile modes.
|
* **Non-Invasive:** Maestro tests the compiled `.apk` or `.app` (Black-box), ensuring we test exactly what the user sees.
|
||||||
4. **Native Dialog Interfacing:** Our onboarding flows occasionally require native OS permission checks (Camera, Notifications, Location). Maestro intercepts and handles these easily; Marionette is blind to anything outside the Flutter widget tree.
|
* **System Level Access:** Handles native OS permission dialogs (Camera/Location/Notifications) which Marionette cannot "see."
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 2. Evaluation Criteria Matrix
|
## 2. Technical Evaluation Matrix
|
||||||
|
|
||||||
The following assessment reflects the hands-on spike metrics gathered while building the Staff App and Client App authentication flows.
|
|
||||||
|
|
||||||
| Criteria | Maestro | Marionette MCP | Winner |
|
| Criteria | Maestro | Marionette MCP | Winner |
|
||||||
| :--- | :--- | :--- | :--- |
|
| :--- | :--- | :--- | :--- |
|
||||||
| **Usability: Test Writing speed** | **High:** 10-15 mins per flow using simple declarative YAML. Tests can be recorded via Maestro Studio. | **Low:** Heavy reliance on API loops; prompt engineering required rather than predictable code. | Maestro |
|
| **Test Authoring** | **High Speed:** Declarative YAML; Maestro Studio recorder. | **Variable:** Requires precise Prompt Engineering. | **Maestro** |
|
||||||
| **Usability: Skill Requirement** | **Minimal:** QA or non-mobile engineers can write flows. Zero Dart knowledge needed. | **Medium:** Requires setting up MCP servers and configuring AI clients (Cursor/Claude). | Maestro |
|
| **Execution Latency** | **Low:** Instantaneous interaction (~5s flows). | **High:** LLM API roundtrips (~45s+ flows). | **Maestro** |
|
||||||
| **Speed: Test Execution** | **Fast:** Almost instantaneous after app install (~5 seconds for full login). | **Slow:** LLM API latency bottlenecks every single click or UI interaction (~30-60 secs). | Maestro |
|
| **Environment** | Works on Release/Production builds. | Restricted to Debug/Profile modes. | **Maestro** |
|
||||||
| **Speed: Parallel Execution** | **Yes:** Maestro Cloud and local sharding support parallelization natively. | **No:** Each AI agent session runs sequentially within its context window. | Maestro |
|
| **CI/CD Readiness** | Native CLI; easy GitHub Actions integration. | High overhead; depends on external AI APIs. | **Maestro** |
|
||||||
| **CI/CD Overhead** | **Low:** A single lightweight CLI command. | **High:** Costly API dependencies; high failure rate due to LLM hallucination. | Maestro |
|
| **Context Awareness** | Interacts with Native OS & Bottom Sheets. | Limited to the Flutter Widget Tree. | **Maestro** |
|
||||||
| **Use Case: Core Flows (Forms/Nav)** | **Excellent:** Flawlessly tapped TextFields, entered OTPs, and navigated router pushes. | **Acceptable:** Succeeded, but occasional context-length issues required manual intervention. | Maestro |
|
|
||||||
| **Use Case: OS Modals / Bottom Sheets** | **Excellent:** Fully interacts with native maps, OS permissions, and camera inputs. | **Poor:** Cannot interact outside the Flutter canvas (fails on Native OS permission popups). | Maestro |
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 3. Detailed Spike Results & Analysis
|
## 3. Spike Analysis & Findings
|
||||||
|
|
||||||
### Tool A: Maestro
|
### Tool A: Maestro (The Standard)
|
||||||
During the spike, Maestro completely abstracted away the asynchronous nature of Firebase Authentication and Data Connect. For both the Staff App and Client App, we authored `login.yaml` and `signup.yaml` files.
|
We verified the `login.yaml` and `signup.yaml` flows across both apps. Maestro successfully abstracted the asynchronous nature of our **Data Connect** and **Firebase** backends.
|
||||||
|
|
||||||
**Pros (from spike):**
|
* **Pros:** * **Semantics Driven:** By targeting `Semantics(identifier: '...')` in our `/design_system/`, tests remain stable even if the UI text changes for localization.
|
||||||
* **Accessibility-Driven:** By utilizing `Semantics(identifier: 'btn_login')` within our `/design_system/` package, Maestro tapped the exact widget instantly, even if the text changed based on localization.
|
* **Automatic Tolerance:** It detects spinning loaders and waits for destination widgets automatically.
|
||||||
* **Built-in Tolerance:** When the Staff application paused to verify the OTP code over the network, Maestro automatically detected the spinning loader and waited for the "Dashboard" element to appear. No `await.sleep()` or mock data insertion was needed.
|
* **Cons:** * Requires strict adherence to adding `Semantics` wrappers on all interactive components.
|
||||||
* **Cross-Platform Simplicity:** The exact same script functioned on the iOS Simulator and Android Emulator without conditional logic.
|
|
||||||
|
|
||||||
**Cons (from spike):**
|
### Tool B: Marionette MCP (The Experiment)
|
||||||
* **Semantics Dependency:** Maestro requires that developers remember to add `Semantics` wrappers. If an interactive widget lacks a Semantic label, targeting it via UI hierarchy limits stability.
|
We spiked this using the `marionette_flutter` binding and executing via **Cursor/Claude**.
|
||||||
* **No Web Support:** While it works magically for our iOS and Android targets, Maestro does not support Flutter Web (our Admin Dashboard), necessitating a separate tool (like Playwright) just for web.
|
|
||||||
|
|
||||||
### Tool B: Marionette MCP (LeanCode)
|
* **Pros:** * Phenomenal for visual "smoke testing" and live-debugging UI issues via natural language.
|
||||||
We spiked Marionette by initializing `MarionetteBinding` in the debug build and executing the testing through Cursor via the `marionette_mcp` server.
|
* **Cons:** * **Non-Deterministic:** Prone to "hallucinations" during heavy network traffic.
|
||||||
|
* **Architecture Blocker:** Requires the Dart VM Service to be active, making it impossible to test against hardened production builds.
|
||||||
**Pros (from spike):**
|
|
||||||
* **Dynamic Discovery:** The AI was capable of viewing screenshots and JSON logs on the fly, making it phenomenal for live-debugging a UI issue. You can instruct the agent: *"Log in with these credentials, tell me if the dashboard rendered correctly."*
|
|
||||||
* **Visual Confidence:** The agent inherently checks the visual appearance rather than just code conditions.
|
|
||||||
|
|
||||||
**Cons (from spike):**
|
|
||||||
* **Non-Deterministic:** Regression testing demands absolute consistency. During the Staff signup flow spike, the agent correctly entered the phone number, but occasionally hallucinated the OTP input field, causing the automated flow to crash randomly.
|
|
||||||
* **Production Blocker:** Marionette is strictly a local/debug tooling capability via the Dart VM Service. You fundamentally cannot run Marionette against a hardened Release APK/IPA, defeating the purpose of pre-release smoke validation.
|
|
||||||
* **Native OS Blindness:** When the Client App successfully logged in and triggered the iOS push notification modal, Marionette could not proceed.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 4. Migration & Integration Blueprint
|
## 4. Implementation & Migration Blueprint
|
||||||
|
|
||||||
To formally integrate Maestro and deprecate existing flaky testing methods (e.g., standard `flutter_driver` or manual QA), the team should proceed with the following steps:
|
|
||||||
|
|
||||||
1. **Semantic Identifiers Standard:**
|
|
||||||
* Enforce a new linting protocol or PR review checklist: Every actionable UI element inside `/apps/mobile/packages/design_system/` must feature a `Semantics` wrapper with a unique, persistent `identifier`.
|
|
||||||
* *Example:* `Semantics(identifier: 'auth_submit_btn', child: ElevatedButton(...))`
|
|
||||||
|
|
||||||
2. **Repository Architecture:**
|
### Phase 1: Semantics Enforcement
|
||||||
* Create two generic directories at the root of our mobile application folders:
|
We must enforce a linting rule or PR checklist: All interactive widgets in `@krow/design_system` must include a unique `identifier`.
|
||||||
* `/apps/mobile/apps/client/maestro/`
|
|
||||||
* `/apps/mobile/apps/staff/maestro/`
|
|
||||||
* Commit the core validation flows (Signup, Login, Edit Profile) into these directories so any engineer can run `maestro test maestro/login.yaml` instantly.
|
|
||||||
|
|
||||||
3. **CI/CD Pipeline Updates:**
|
```dart
|
||||||
* Integrate the Maestro CLI within our GitHub Actions / Bitrise configuration.
|
// Standardized Implementation
|
||||||
* Configure it to execute against a generated Release build of the `.apk` or `.app` on every pull request submitted against the `main` or `dev` branch.
|
Semantics(
|
||||||
|
identifier: 'login_submit_button',
|
||||||
|
child: KrowPrimaryButton(
|
||||||
|
onPressed: _handleLogin,
|
||||||
|
label: 'Sign In',
|
||||||
|
),
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
4. **Security Notice:**
|
### Phase 2: Repository Structure
|
||||||
* Ensure that the `marionette_flutter` package dependency is **fully removed** from `pubspec.yaml` to ensure no active VM service bindings leak into staging or production configurations.
|
Tests will be localized within the respective app directories to maintain modularity:
|
||||||
|
|
||||||
---
|
* `apps/mobile/apps/client/maestro/`
|
||||||
|
* `apps/mobile/apps/staff/maestro/`
|
||||||
|
|
||||||
*This document validates issue #533 utilizing strict, proven engineering metrics. Evaluated and structured for the engineering leadership team's final review.*
|
### Phase 3: CI/CD Integration
|
||||||
|
The Maestro CLI will be added to our **GitHub Actions** workflow to automate quality gates.
|
||||||
|
|
||||||
|
* **Trigger:** Every PR targeting `main` or `develop`.
|
||||||
|
* **Action:** Generate a build, execute `maestro test`, and block merge on failure.
|
||||||
|
|||||||
Reference in New Issue
Block a user