LogoLogo
SDKAPI
Version-v1.4.0 (current)
Version-v1.4.0 (current)
  • Learn
    • Introduction
      • Obol Collective
      • OBOL Incentives
      • Key Staking Concepts
      • Obol vs Other DV Implementations
      • Obol Splits
      • DV Launchpad
      • Frequently Asked Questions
    • Charon
      • Introduction to Charon
      • Distributed Key Generation
      • Cluster Configuration
      • Charon Networking
      • CLI Reference
    • Futher Reading
      • Ethereum and Its Relationship With DVT
      • Community Testing
      • Peer Score
      • Useful Links
  • Run a DV
    • Quickstart
      • Quickstart Overview
      • Create a DV Alone
      • Create a DV With a Group
      • Push Metrics to Obol Monitoring
    • Prepare to Run a DV
      • How and Where To Run DVs
      • Deployment Best Practices
      • Test a Cluster
    • Running a DV
      • Activate a DV
      • Update a DV
      • Monitoring Your Node
      • Claim Rewards
      • Exit a DV
    • Partner Integrations
      • Create an EigenLayer DV
      • Create a Lido CSM DV
      • DappNode
  • Advanced & Troubleshooting
    • Advanced Guides
      • Create a DV Using the SDK
      • Migrate an Existing Validator
      • Enable MEV
      • Combine DV Private Key Shares
      • Self-Host a Relay
      • Advanced Docker Configs
      • Beacon node authentication
    • Troubleshooting
      • Errors & Resolutions
      • Handling DKG Failure
      • Client Configuration
      • Test Commands
    • Security
      • Overview
      • Centralization Risks and Mitigation
      • Obol Bug Bounty Program
      • Smart Contract Audit
      • Software Development at Obol
      • Charon Threat Model
      • Contacts
  • Community & Governance
    • Governance
      • Collective Overview
      • The Token House
      • The RAF
      • Delegate Guide
      • RAF1 Guide
    • The OBOL Token
      • Token Utility
      • Token Distribution & Liquidity
      • TGE FAQ
    • Community
      • Staking Mastery Program
      • Techne
    • Contribution & Feedback
      • Filing a Bug Report
      • Documentation Standards
      • Feedback
  • Walkthrough Guides
    • Walkthroughs
      • Walkthrough Guides
  • SDK
    • Intro
    • Enumerations
      • FORK_MAPPING
    • Classes
      • Client
    • Interfaces
      • ClusterDefinition
      • RewardsSplitPayload
    • Type-Aliases
      • BuilderRegistration
      • BuilderRegistrationMessage
      • ClusterCreator
      • ClusterLock
      • ClusterOperator
      • ClusterPayload
      • ClusterValidator
      • DepositData
      • DistributedValidator
      • ETH_ADDRESS
      • OperatorPayload
      • SplitRecipient
      • TotalSplitPayload
    • Functions
      • validateClusterLock
  • API
    • What is this API?
    • System
    • Metrics
    • Cluster Definition
    • Cluster Lock
    • State
    • DV Exit
    • Cluster Effectiveness
    • Terms And Conditions
    • Techne Credentials
    • Address
    • OWR Information
  • Specification
Powered by GitBook
On this page
  • Cleaning up the .charon directory
  • Further debugging
Edit on GitHub
  1. Advanced & Troubleshooting
  2. Troubleshooting

Handling DKG Failure

Handling DKG failure

While the DKG process has been tested and validated against many different configuration instances, it can still encounter issues which might result in failure.

Our DKG is designed in a way that doesn't allow for inconsistent results: either it finishes correctly for every peer, or it fails.

This is a safety feature: you don't want to deposit an Ethereum distributed validator that not every operator is able to participate in.

The most common source of issues lies in the network stack: if any of the peers' Internet connection glitches substantially, the DKG will fail. If you are attempting to run the dkg command in two places at once, or you have a charon run command with the same charon-enr-private-key as you are trying to DKG with, these may also disrupt a key generation ceremony.

Charon's DKG doesn't allow peer reconnection once the process is started, but it does allow for re-connections before that.

When you see the following message:

14:08:34.505 INFO dkg        Waiting to connect to all peers...

this means your Charon instance is waiting for all the other cluster peers to start their DKG process: at this stage, peers can disconnect and reconnect at will, the DKG process will still continue.

A log line will confirm the connection of a new peer:

14:08:34.523 INFO dkg        Connected to peer 1 of 3                 {"peer": "fantastic-adult"}
14:08:34.529 INFO dkg        Connected to peer 2 of 3                 {"peer": "crazy-bunch"}
14:08:34.673 INFO dkg        Connected to peer 3 of 3                 {"peer": "considerate-park"}

As soon as all the peers are connected, this message will be shown:

14:08:34.924 INFO dkg        All peers connected, starting DKG ceremony

Past this stage no disconnections are allowed, and all peers must leave their terminals open in order for the DKG process to complete: this is a synchronous phase, and every peer is required in order to reach completion.

If for some reason the DKG process fails, you would see error logs that resemble this:

14:28:46.691 ERRO cmd        Fatal error: sync step: p2p connection failed, please retry DKG: context canceled

As the error message suggests, the DKG process needs to be retried.

Cleaning up the .charon directory

One cannot simply retry the DKG process: Charon refuses to overwrite any runtime file in order to avoid inconsistencies and private key loss.

When attempting to re-run a DKG with an unclean data directory - which is either .charon or what was specified with the --data-dir CLI parameter - this is the error that will be shown:

14:44:13.448 ERRO cmd        Fatal error: data directory not clean, cannot continue {"disallowed_entity": "cluster-lock.json", "data-dir": "/compose/node0"}

The disallowed_entity field lists all the files that Charon refuses to overwrite, while data-dir is the full path of the runtime directory the DKG process is using.

In order to retry the DKG process one must delete the following entities, if present:

  • validator_keys directory

  • cluster-lock.json file

  • deposit-data.json file

The charon-enr-private-key file must be preserved, failure to do so requires the DKG process to be restarted from the beginning by creating a new cluster definition.

If you're doing a DKG with a custom cluster definition - for example, create with charon create dkg, rather than the Obol Launchpad - you can re-use the same file.

Once this process has been completed, the cluster operators can retry a DKG.

Further debugging

If you are trying to create an extremely large, geographically diverse cluster, there is a chance the process could be timing out. Consider adding the flags --timeout=5m --shutdown-delay=60s to allow more time for the ceremony to complete and safely shut down across all nodes.

In order for the Obol team to debug your issue as quickly and precisely as possible, please provide full logs in text form, not through screenshots or display photos.

Providing complete debug logs from all peers is particularly important, since it allows the team to reconstruct precisely what happened throughout the ceremony.

PreviousErrors & ResolutionsNextClient Configuration

Last updated 6 days ago

If for some reason the DKG process still fails, node operators are advised to reach out to the Obol team by opening an , detailing the troubleshooting steps that were taken and providing debug logs.

To enable debug logs, first clean up the Charon data directory as explained in , then run your DKG command while appending --log-level=debug at the end.

issue
the previous section