Cloudflare Outage Analysis (Feb 20, 2026): Connectivity Issues, 403 Errors, and Workers AI 429 Spikes

Cloudflare Outage Analysis (Feb 20, 2026): Connectivity Issues, 403 Errors, and Workers AI 429 Spikes

On February 20, 2026, Cloudflare reported multiple service disruptions affecting connectivity, landing page access for 1.1.1.1, and elevated error rates in Workers AI. While not all core services were fully down, users experienced timeouts, HTTP 403 responses, and HTTP 429 rate limit errors.

This article provides a complete technical breakdown of:

  • What happened
  • Which services were affected
  • Why specific HTTP errors occurred
  • The networking layer implications
  • Lessons for DevOps and infrastructure engineers
  • How to diagnose similar issues in your own environment

Incident Overview

Cloudflare reported three primary issues:

  1. Connectivity issues affecting a subset of BYOIP prefixes
  2. Elevated 403 errors on the 1.1.1.1 landing page
  3. Workers AI elevated 429 error rates

Each incident had a different root cause and impact layer.


1. Connectivity Issues and BYOIP Prefix Impact

What Happened?

Cloudflare identified impact to a subset of BYOIP (Bring Your Own IP) prefixes. The company mitigated the underlying issue and advised impacted customers to re-advertise prefixes via the dashboard.

Users may have experienced:

  • Timeouts
  • Routing failures
  • Inability to reach certain services
  • Intermittent packet loss

What Is BYOIP?

BYOIP allows customers to use their own IP address ranges while leveraging Cloudflare’s global network.

This means:

  • The customer owns the IP block
  • Cloudflare announces (advertises) it via BGP
  • Traffic flows through Cloudflare’s edge

If prefix advertisement fails or becomes inconsistent, routing problems occur.


Why Prefix Advertisement Matters

Cloudflare relies on BGP (Border Gateway Protocol) to announce IP prefixes across the internet.

If:

  • A prefix is withdrawn
  • An advertisement becomes inconsistent
  • A route propagation issue occurs

Then traffic destined for that prefix may:

  • Blackhole
  • Route incorrectly
  • Timeout entirely

Likely Technical Scenario

While full internal details were not disclosed, typical causes include:

  • Route dampening
  • Incorrect BGP attributes
  • Edge propagation delay
  • Partial prefix withdrawal

Because only a subset of BYOIP prefixes were affected, this suggests a targeted advertisement inconsistency rather than a full global routing failure.


How to Diagnose Similar Issues

If you suspect prefix routing issues:

Check BGP announcements:

whois <IP>

Use route views:

https://bgp.he.net/

Test connectivity:

traceroute <IP>
mtr <IP>

Validate DNS resolution:

dig yourdomain.com

If DNS resolves but traffic times out, suspect routing layer problems.


2. Elevated 403 Errors on 1.1.1.1 Landing Page

Important Clarification

Cloudflare stated:

DNS resolution using Public DNS resolver 1.1.1.1 is not impacted.

This is critical.

The DNS service remained operational.

The landing page returned elevated 403 errors.


Why a Landing Page Can Fail While DNS Works

  1. DNS resolver service runs separately
  2. The web landing page is an HTTP service
  3. CDN caching and access control policies apply to HTTP layer

Thus:

  • DNS (UDP/TCP port 53) may function
  • HTTPS landing page may return 403

What Does HTTP 403 Mean?

403 = Forbidden.

Possible causes:

  • WAF rule triggered
  • Access policy misconfiguration
  • Origin permission problem
  • Edge misrouting
  • Cache configuration error

Since Cloudflare mentioned cache purge and debugging page impact, this suggests:

  • Edge configuration issue
  • Cache invalidation inconsistency
  • Possibly permission scope misalignment

How to Reproduce / Diagnose 403 Errors

curl -I https://1.1.1.1

Look for:

  • Server headers
  • CF-Ray header
  • CF-Cache-Status
  • Response time

Compare behavior from multiple regions.


Operational Insight

This type of issue highlights separation between:

  • Network layer
  • Application layer
  • DNS layer
  • Edge cache layer

Understanding these layers prevents false assumptions during incidents.


3. Workers AI Elevated 429 Error Rates

What Happened?

Users experienced higher 429 errors when using Workers AI.

429 = Too Many Requests.

This indicates rate limiting.


Why 429 Happens

Possible causes:

  • Sudden traffic spike
  • AI model resource saturation
  • Token limit exhaustion
  • Regional compute imbalance
  • Auto-scaling delay

AI workloads are bursty and GPU-bound. If demand exceeds scaling capacity, rate limiting protects the system.


Understanding 429 in Depth

Typical 429 response includes:

Retry-After: <seconds>

This header indicates when to retry.

Well-designed clients should:

  • Implement exponential backoff
  • Respect retry-after
  • Avoid aggressive retry storms

Diagnosing 429 Issues

curl -I https://api.example.com

Check:

  • Retry-After header
  • Response latency
  • Region consistency

If 429 spikes coincide with:

  • Product launches
  • AI feature rollouts
  • High compute bursts

Then capacity planning needs adjustment.


Timeline Analysis

The incident lifecycle followed this pattern:

  1. Investigating
  2. Identified
  3. Mitigated
  4. Monitoring

This is standard SRE incident management flow.


Architectural Lessons

1. Separation of Layers Matters

DNS availability does not equal HTTP availability.

Routing availability does not equal application availability.

Engineers must isolate:

  • L3 (Network)
  • L4 (Transport)
  • L7 (Application)

2. BYOIP Adds Complexity

While powerful, BYOIP introduces:

  • Additional BGP dependencies
  • Propagation risks
  • Edge advertisement responsibility

Organizations using BYOIP must:

  • Monitor prefix visibility
  • Use multi-vendor route validation
  • Set up BGP alerting

3. Rate Limiting Is Protection, Not Failure

429 errors often mean:

The system is protecting itself.

But:

  • Poor client retry design can worsen incidents
  • Retry storms amplify load
  • Backoff logic must be implemented correctly

DevOps Playbook: What To Do During Similar Incidents

Step 1: Validate DNS

dig domain.com

Step 2: Test TCP Connectivity

nc -zv domain.com 443

Step 3: Check HTTP Layer

curl -I https://domain.com

Step 4: Compare Multiple Regions

Use:

  • VPN
  • Remote monitoring nodes
  • Cloud VMs in different regions

Was This a Major Outage?

No full global DNS failure occurred.

However:

  • Subset routing issues
  • HTTP access disruptions
  • AI service rate limiting

These represent partial service degradation.

Modern infrastructure rarely fails completely. It degrades in layers.

Understanding partial degradation is critical for modern engineers.


Strategic Takeaways for Infrastructure Teams

  1. Monitor BGP announcements continuously
  2. Separate DNS health from HTTP health monitoring
  3. Implement robust rate limiting observability
  4. Use synthetic monitoring from multiple continents
  5. Document incident timelines thoroughly

Final Thoughts

This incident demonstrates how modern cloud infrastructure failures are rarely binary.

Instead, they involve:

  • Prefix advertisement complexity
  • Layered service architecture
  • Rate limiting safeguards
  • Cache and edge configuration subtleties

For engineers building on global networks, the key lesson is:

You must think in layers.

Routing.
DNS.
Edge.
Application.
AI compute.

Each layer can fail independently.

And diagnosing them requires disciplined isolation.