Skip to main content

Oracle Ithaca Upgrade Postmortem (Incident #2)

Date: 2022-04-01

Authors: dcale

Status: Complete, action items in progress

Summary:

Impact: Denial of Service for smart contracts relying on get_price entrypoint. No funds were directly at risk at any time during the incident.

Root Causes: The upgrade to Ithaca introduced a 7x increase in gas consumption for contract in a non-cached state. Since data transmitters were using a fixed value for the gas consumption the transactions failed.

Trigger: Ithaca upgrade.

Resolution: Data transmitters have been update with new client software that allows configuration of gas usage.

Detection: Failed transactions after ithaca.

Action Items:

Action ItemTypeOwnerState
Identify reason for failuremitigatedcaleDONE
Have no-op transaction to hoftix and bring oracle in storagemitigateflorinDONE
Adapt client software to allow for gas usage to be specified in jobmitigatedcaleDONE
Deploy new scheduler to allow specification of gas and storage usagemitigateflorinDONE
Trigger upgrade request of data transmittersmitigatepascalDONE

Lessons Learned#

What went well#

  • The Tezos community helped to identify the issue. (thank you!!!)

What went wrong#

  • Wrong assumptions on the protocol were made.

Where we got lucky#

  • The market was stable at the time, so the impact was minimal.

Conclusion#

  • Tezos is an evolving protocol, this comes with many advantages but also has things to consider.
  • Upgrading data transmitters requires a long time.

Timeline#

2022-04-01 (all times UTC)

  • 19:06 First operations start to fail
  • 19:34 Ask Tezos developer chat for ithaca changes that could cause this
  • 21:20 Gas limit is discovered as sympton for fail
  • 22:12 Protocol caching is discovered as cause
  • 09:23+1 Understanding what influences cache and identifying "no-op" as potential fix
  • 10:41+1 Signatures requested from multisig participants
  • 13:22+1 Hotfix to bring oracle back to cach (onxPpj6W9E1PKxVd4y3jBeXC7fbvYJ42YDLJx2t3rfY6Z3JYayG)

Note: After the hotfix the proper adaption of the data transmitters was done, distributing and upgrading the data transmitters took the various parties 2 weeks