# Hotfix Process **For Emergency Production Fixes** --- ## 🚨 When to Hotfix Use hotfix when: - ✅ Critical bug in production affecting users - ✅ Data loss or security vulnerability - ✅ Service unavailable or major feature broken - ✅ Customer-blocking issue **Don't use hotfix for:** - ❌ Minor bugs (can wait for next release) - ❌ Feature requests - ❌ Nice-to-have improvements - ❌ Styling issues --- ## 🔄 Hotfix Process ### Step 1: Assess & Declare Emergency ``` Issue: [Brief description] Severity: CRITICAL / HIGH / MEDIUM Product: [Staff Mobile / Client Mobile / Web / Backend] Environment: Production Impact: [How many users affected] ``` Once severity confirmed → Start hotfix immediately. --- ### Step 2: Create Hotfix Branch ```bash # From production tag git checkout -b hotfix/krow-withus-worker-mobile-v0.1.1 \ krow-withus-worker-mobile/prod-v0.1.0 # Verify you're on the right tag git log -1 --oneline ``` **Format**: `hotfix/-v` --- ### Step 3: Fix the Bug ```bash # Make your fix # Edit files, test locally # Commit with clear message git commit -m "fix: [issue description] HOTFIX for production Issue: [what happened] Solution: [what was fixed] Tested: [how was it tested locally]" ``` **Keep it minimal:** - Only fix the specific bug - Don't refactor or optimize - Don't add new features --- ### Step 4: Update Version Update PATCH version only (0.1.0 → 0.1.1): **For Mobile** (`apps/mobile/apps/*/pubspec.yaml`): ```yaml # Old version: 0.1.0+5 # New version: 0.1.1+6 # Only PATCH changed ``` **For Web** (`apps/web/package.json`): ```json "version": "0.1.1" ``` **For Backend** (`backend/*/package.json`): ```json "version": "0.1.1" ``` --- ### Step 5: Update CHANGELOG Add entry to **top** of appropriate CHANGELOG: ```markdown | 2026-03-05 | 0.1.1 | HOTFIX: [Issue fixed] | (previous entries below...) ``` --- ### Step 6: Code Review (Expedited) ```bash # Push hotfix branch git push origin hotfix/krow-withus-worker-mobile-v0.1.1 # Create PR on GitHub with URGENT label gh pr create --title "HOTFIX: [Issue description]" \ --body "**URGENT PRODUCTION FIX** Issue: [What was broken] Impact: [Users affected] Solution: [What was fixed] Testing: [Local verification]" ``` **Get approval within 15 minutes if possible.** --- ### Step 7: Merge to Main ```bash # Review complete - merge git checkout main git pull origin main git merge --ff-only hotfix/krow-withus-worker-mobile-v0.1.1 git push origin main ``` --- ### Step 8: Create Production Tag ```bash # Create tag from main git tag -a krow-withus-worker-mobile/prod-v0.1.1 \ -m "HOTFIX: [Issue fixed]" git push origin krow-withus-worker-mobile/prod-v0.1.1 ``` --- ### Step 9: Deploy to Production ```bash # Follow your deployment procedure # Higher priority than normal releases ./scripts/deploy-mobile-production.sh krow-withus-worker-mobile/prod-v0.1.1 ``` **Deployment time**: Within 30 minutes of approval --- ### Step 10: Verify & Monitor ```bash # Smoke tests - App launches - Core features work - No new errors # Monitor for 2 hours - Watch error logs - Check user reports - Verify fix worked ``` --- ### Step 11: Communicate **Immediately after deployment:** ```markdown 🚨 PRODUCTION HOTFIX DEPLOYED Product: Worker Mobile Version: 0.1.1 Issue: [Fixed issue] Impact: [Resolved for X users] Status: ✅ Deployed & verified No user action required. Service restored to normal. ``` **24 hours later:** ```markdown ✅ HOTFIX STATUS UPDATE Production hotfix v0.1.1 deployed 24 hours ago. Zero errors reported post-deployment. System stable. Thank you for your patience! ``` --- ## ⏱️ Timeline ``` T-0: Issue detected & reported T+5min: Severity assessed, hotfix branch created T+15: Fix implemented, code review started T+30: Approved & merged, tag created T+45: Deployed to production T+60: Smoke tests pass, monitoring enabled T+120: Declare emergency resolved, communicate T+1day: Follow-up communication ``` **Total time: 2-4 hours from detection to resolution** --- ## 🚫 Common Mistakes to Avoid ❌ **Don't**: - Skip code review (even in emergency) - Add multiple unrelated fixes in one hotfix - Forget to update version number - Forget CHANGELOG entry - Deploy without testing - Forget to communicate with users ✅ **Do**: - Keep hotfix minimal and focused - Test every fix locally first - Get at least one approval - Update all version files - Deploy immediately after approval - Monitor actively for 2+ hours --- ## 📋 Hotfix Checklist Copy for each emergency: ``` Hotfix: [Product] v[Old Version] → v[New Version] □ Severity assessed & documented □ Branch created from production tag □ Bug fixed & tested locally □ Version number updated (PATCH only) □ CHANGELOG entry added □ Commit message clear □ Code review requested (marked URGENT) □ Approval obtained □ Merged to main □ Production tag created □ Tag pushed to remote □ Deployed to production □ Smoke tests passed □ Error logs monitored (2+ hours) □ Users notified □ GitHub Release created □ Incident documented Total Time: ___ minutes ``` --- ## 🔍 Post-Incident After emergency is resolved: 1. **Document what happened** - Root cause analysis - Why it wasn't caught before - What testing was missed 2. **Schedule postmortem** (within 24 hours) - Review what went wrong - Discuss prevention - Update processes if needed 3. **Plan prevention** - Add test coverage - Update CI/CD checks - Improve monitoring 4. **Communicate findings** - Share with team - Update documentation - Prevent recurrence --- ## 📞 Emergency Contacts When issue detected: 1. **Notify**: - Release Engineer - DevOps - Product Owner - Affected Team 2. **Communication Channel**: - Slack: #emergency-releases - Time-sensitive decisions on call 3. **Decision Maker**: - Product Owner approves rollback vs hotfix - Release Engineer executes - DevOps monitors infrastructure --- ## 🔗 Related - [OVERALL_RELEASE_PLAN.md](./OVERALL_RELEASE_PLAN.md) - Main release strategy - [MOBILE_RELEASE_PLAN.md](./MOBILE_RELEASE_PLAN.md) - Mobile-specific process - [../../CHANGELOG.md](../../CHANGELOG.md) - Version history --- **Last Updated**: 2026-03-05 **Severity Levels**: - 🔴 CRITICAL: Service down, data loss, security (< 1 hour) - 🟠 HIGH: Major feature broken, workaround available (< 4 hours) - 🟡 MEDIUM: Minor feature affected (next release OK)