Arbitrum Network Outage
Incident Report for Block Farms
Postmortem

Incident Report

Incident Name: Chainlink Arbitrum Mainnet Outage

Report Date: June 7, 2023

Incident Dates: 07/06/23T1800 - 07/06/23T1930

Incident Duration: ~90 minutes

Incident Description:

Arbitrum Chainlink node stopped serving requests, because the RPC connection timed out. Upon further inspection of RPC connections, RPC nodes were not producing blocks correctly, and syncing with the chain.
The network issue seemed to be sourced from the relayer not working properly. There is a RPC node upstream of the relayer which kept producing blocks, but all self hosted or non-foundation RPC nodes were down stream of the relayer, leading to the out of sync issue.

Root Cause:
Arbitrum network relayer offline because Arbitrum Foundation didn't pay their gas (ETH) bill and couldn't process tx's through the relayer with 0 eth in gas.

Read Arbitrum Foundation, full incident report:
https://arbitrumfoundation.notion.site/arbitrumfoundation/June-7-2023-Batch-Poster-Outage-d49c50df42864c7b83521fd7aa5897f2

Resolution:
Team had to wait for Arbitrum Network to resolve the root cause issue on their end.

Preventative Measures:
None

Follow-Up:

A review of the incident with the team members to share lessons learned and identify opportunities for improvement.
Updating documentation on incident response procedures when the source is directly from the underlying operating network.
Conducting a post-incident review to evaluate the effectiveness of the preventative measures put in place.

Impact:

The incident resulted in a downtime of ~90minute (1) Arbitrum Mainnet Chainlink node services. Tx's were processed intermintently when blockFarms RPC endpoints could sync properly over the 90minute window but overall degraded services for the 90minute window was experienced.

Incident Owner: Matthew Gladson (Team Lead)

Posted Jun 12, 2023 - 06:59 UTC

Resolved
Incident Report

Incident Name: Chainlink Arbitrum Mainnet Outage

Components Affected:
(1) Chainlink Feed DON1 - Arbitrum Mainnet,
(2) Network RPC Services - Arbitrum Mainnet

Report Date: June 7, 2023

Incident Dates: 07/06/23T1800 - 07/06/23T1930

Incident Duration: ~90 minutes

Incident Description:

Arbitrum Chainlink node stopped serving requests, because the RPC connection timed out. Upon further inspection of RPC connections, RPC nodes were not producing blocks correctly, and syncing with the chain. This issue was not sourced from blockFarms, and was an Arbitrum network wide outage.

The network issue seemed to be sourced from the relayer not working properly. There is a RPC node upstream of the relayer which kept producing blocks, but all self hosted or non-foundation RPC nodes were down stream of the relayer, leading to the out of sync issue.

Incident Owner: Matthew Gladson (Team Lead)
Posted Jun 07, 2023 - 09:00 UTC