Resolved
The issue has been fully resolved. All affected components are operating normally again.
Monitoring
The operations are restored for all projects. We are continuing to monitor the situation.
Monitoring
The majority of projects have been fixed and are now operating normally. Our team is diligently working to fix the remaining few projects.
Identified
We have identified the underlying cause of this incident and are actively working to restore operations for all affected projects. Most projects have already been migrated to a working pageserver node, but it will take more time to restore operations for all projects. We expect that it will take about an hour to fix issues for all projects.
Investigating
We detected an unreachable pageserver node in our cluster. We are aware of the issue and are investigating the scope, root cause, and potential mitigations.