Cloudflare Workers Subrequest Limit — Post-mortem

TL;DR ¶

Deploying a Jira time-report API on Cloudflare Pages (Workers edge runtime) caused silent data loss for 25 out of 57 issues. The root cause was Cloudflare Workers' 50-subrequest-per-invocation limit on the free tier. Fixed by embedding worklogs in the JQL search response instead of making separate API calls per issue.

The Problem ¶

When selecting all 9 Jira boards on the time report page, team member “Ngọc Định Nguyễn” showed 8h instead of the correct 30.5h. Selecting only boards 36+37 showed the correct value.

Investigation Timeline ¶

Attempt 1: Sequential board processing (failed) ¶

Hypothesis: The combined JQL query across all boards was silently dropping some projects.

Fix tried: Query each board independently in a for loop, merge results.

Result: Same data loss. The issue wasn’t in the JQL query — all 57 issues were found correctly.

Attempt 2: Deduplication bug (failed) ¶

Hypothesis: A processedIssueKeys Set was marking issues as “done” even when their worklog fetch failed. Later boards seeing the same issue would skip it.

Fix tried: Remove cross-board deduplication, deduplicate worklogs by ID in the aggregation step instead.

Result: Fixed a real bug, but data still lost on Cloudflare. Worked perfectly locally.

Attempt 3: Rate limiting / batch size (failed) ¶

Hypothesis: Jira API rate limit (429) causing worklog fetches to silently return [].

Fix tried:

Reduce batch size from 10 → 3, increase delay 100ms → 400ms
Add 3-attempt retry with exponential backoff for 429/5xx

Result: Still failed on Cloudflare. Worked locally. The setTimeout delays were eating into Cloudflare’s CPU time budget.

Attempt 4: Full parallel, no sleep (failed) ¶

Hypothesis: setTimeout is unreliable on Cloudflare Workers edge runtime.

Fix tried: Fire all 57 worklog requests in parallel with Promise.all, no delays.

Result: Same data loss on Cloudflare (42/79 worklogs). Locally: 2.5s, all data correct.

Attempt 5: Concurrency limiter (failed) ¶

Hypothesis: 57 concurrent requests overwhelming something.

Fix tried: Semaphore pattern — process in chunks of 5-8, no setTimeout.

Result: Same. At this point, added debug logging to surface the actual error.

Breakthrough: Error surfacing ¶

Added sentinel objects to track which issues failed and why:

STATUS: Failed 25 issues (sample error: status=none msg=Too many subrequests 
by single Worker invocation. To configure this limit, refer to 
https://developers.cloudflare.com/workers/wrangler/configuration/#limits)

The error was not a Jira rate limit. It was a Cloudflare platform limit.

Attempt 6: wrangler.toml `[limits]` (failed) ¶

Fix tried: Add [limits] subrequests = 1000 to wrangler.toml.

Result: This setting requires a paid Workers plan and doesn’t apply to Pages on the free tier.

Attempt 7: Embed worklogs in JQL response (SUCCESS) ¶

Insight: Instead of making 57 separate /issue/{key}/worklog API calls, request the worklog field directly in the JQL search. Jira embeds worklogs in each issue’s response.

Fix: Add 'worklog' to the JQL fields array, then extract worklogs from issue.fields.worklog.worklogs[]. Only fall back to the dedicated worklog endpoint when Jira truncates (total > returned count).

Subrequest count: 9 board JQL calls + ~0-5 fallback calls = ~9-14 total (well under 50 limit).

Result: All data correct on Cloudflare. Ngọc Định = 30.5h ✓.

Key Lessons ¶

1. Cloudflare Workers has a hard 50-subrequest limit (free tier) ¶

Every fetch() call from your Worker counts as a subrequest. This includes:

API calls to external services (Jira, GitHub, etc.)
Even failed/retried requests count

The limit is per invocation, not per second. You can’t work around it with batching or delays.

Paid plan: 1000 subrequests (configurable via wrangler.toml).

2. The error is silent by default ¶

When you hit the limit, subsequent fetch() calls throw a generic error with status=none. There’s no special HTTP status code or header — it looks like a network failure. If your code catches errors and returns [], the data loss is completely invisible.

3. setTimeout is problematic on edge runtime ¶

Cloudflare Workers count setTimeout delays toward CPU time limits. A pattern like “batch of 5 + 400ms delay” that works locally can cause different failures on the edge (timeouts, unexpected behavior).

4. Embed data in search queries when possible ¶

Instead of:

1 search query → N results → N individual detail queries

Use:

1 search query with extra fields → N results with details embedded

Most APIs (Jira, GitHub, etc.) support requesting additional fields in search/list endpoints. This is always preferable on platforms with subrequest limits.

5. Local testing doesn’t catch platform limits ¶

The code worked perfectly locally (Node.js has no subrequest limit). Always test on the actual deployment platform, especially for:

Subrequest/fetch limits
CPU time limits
Memory limits
Response size limits

6. Debug logging should surface errors, not hide them ¶

The original code:

catch (error) {
  return []; // silently swallow
}

Should have been:

catch (error) {
  sendStatus(controller, `Failed: ${error.message}`);
  return [];
}

Surface errors in the response stream so the UI can show them.

Architecture Before vs After ¶

Before (66 subrequests) ¶

Board 1 → JQL search → issue list     ─┐
Board 2 → JQL search → issue list      │  9 requests
...                                     │
Board 9 → JQL search → issue list     ─┘

Issue 1 → GET /worklog                 ─┐
Issue 2 → GET /worklog                  │  57 requests
...                                     │
Issue 57 → GET /worklog                ─┘

Total: 66 subrequests ❌ (exceeds 50 limit)

After (9 subrequests) ¶

Board 1 → JQL search (with worklog field) → issues + worklogs  ─┐
Board 2 → JQL search (with worklog field) → issues + worklogs   │  9 requests
...                                                              │
Board 9 → JQL search (with worklog field) → issues + worklogs  ─┘

Extract worklogs from issue.fields.worklog (no API call needed)
Fallback: only if worklog.total > worklogs.length (rare)

Total: ~9 subrequests ✓

Performance Comparison ¶

Metric	Sequential (v1)	Parallel+sleep (v2)	Embedded (final)
Subrequests	66	66	~9
Local time	25s	20s	2.5s
Cloudflare	❌ data loss	❌ data loss	✓ correct
Ngọc Định hours	8h (wrong)	8h (wrong)	30.5h (correct)

Date: 2026-04-03 | Project: jira-report | Platform: Cloudflare Pages (Workers edge runtime)