BitClout Code Walkthrough
Today, bitclout.com is powered by the following repos. Together, these repos make up the entirety of what runs on bitclout.com while also supporting the ability for anyone to run their own BitClout node with all of the same data that bitclout.com has access to:
- This is a Golang repo that contains all of the "consensus" code behind BitClout. It's meant to be kernel that's embedded as a library into projects that want to build on the BitClout firehose.
- The backend repo embeds core as a library and exposes a rich API on top of it to support transaction construction, submitting transactions to the blockchain, storing user data, and more. In some sense, it's the first "reference" app built on the core BitClout blockchain.
Below is a simple diagram that shows visually how these repositories fit together:
We think the easiest way to understand the architecture is to describe how a node syncs with other nodes, and then to walk through key codepaths with pointers to functions and line numbers. We do this below. We use the following commit hashes to refer to the code:
- The entrypoint to everything the node does is
main.go. It's better to start tracing from the backend repo's main rather than the core repo's main, since the core repo is mainly intended to be used as a library. Moreover, since backend uses the core repo as a library, we will hit all of the core functionality by starting here anyway.
- There is a lot of indirection in main introduced by the fact that we are using Viper to manage our command-line flags. When the backend binary is run, a command is passed, such as "run," which triggers a
Run()function defined in the cmd package.
- Note the core repo's flags are effectively imported into backend. This allows for maximum composability, whereby someone can include the core repo and get all of its functionality embedded into their binary for free.
- When a node starts up, it looks for peers that it can download blocks and transactions from. There are two main ways a node finds peers:
- Because it would cost O($1M) to buy all of the seeds, and because a node only needs one valid sync peer in order to thwart an "eclipse" attack, and because a node can iterate over tens of thousand of DNS records per second, and because DNS seeds can be changed by node operators if a particular prefix is monopolized, we think this is a safe way to find initial peers.
- Commandline flags.
--connect-ipsmeans a peer will connect to the specified peer and nothing else.
--add-ipsmeans these peers will be added to the list of things that the peer is going to try and connect to. When we spin up new nodes, we often use
--connect-ipswith a trusted node because it's easier than bootstrapping from the sea of nodes that are running in the wider internet.
- The ConnectionManager is responsible for managing all connections with peers. It's initialized using a
Start()function that is kicked off in main.go. Tracing the code starting from this function is a great way to understand how connections with peers are established and maintained.
- When the ConnectionManager connects to a peer, it does a "version negotiation" similar to Bitcoin. This happens in
ConnectPeer(). If the peer passes this version negotiation, then the peer is passed off to
server.govia a "newPeerChan." server.go is then responsible for doing higher-level interactions with the peer.
- Peer messages just contain messages that came from one of the peers that the node was connected to. You can see there aren't very many of them, and they're fairly straightforward.
- Control messages are basically notifications about things that happened internally to the node. For example, a new peer connected or a new peer disconnected.
- The initial sync for a node is currently completely single-threaded. A sync peer is found and other peer messages are largely ignored until the node has downloaded up to the last 24 hours worth of blocks.
- Below are the steps to syncing with a peer, which can be traced by following the functions in server.go:
- Node processes the
_handleHeaderBundle()and responds with different messages depending on how synced the peer is.
- If the node has exhausted the peer's headers then it downloads blocks until it has a block for every corresponding header that the peer sent it. This is exactly the same as the "headers-first" synchronization that Bitcoin does. The
MsgBitCloutGetBlocksmessage is sent in
- Once the node has all the headers it needs from the peer, and if the node has downloaded and validated all the blocks from this peer, then the node is fully synced.
- Once we get to this state, the node listens to INV messages from all of its peers. If it sees an INV message for a new block that it doesn't have yet, then it will send the peer a GetHeaders request, which will kick off this headers-first process for the single missing header/block.
- Once the node has gotten through this loop, it is fully synced and in a "steady-state." At this point, the node listens for INV messages from its peer to update its state.
MsgBitCloutInvare processed via
messageHandler()just like everything else.
INVmessages can be for a block, as mentioned previously OR for a transaction. Below is the case for a transaction
- Note that some "handle" functions are defined in peer.go rather than server.go. When this is the case, the server.go
_handlePeerMessages()function will just enqueue the message for the peer's thread to process it. This is done in order to move processing into another thread for efficiency reasons (not doing this would cause server.go to be *too* single-threaded). Here you can see the
server.godelegate the call to peer.go, and here you see peer.go dequeuing it to process it. Note that there are several messages that are delegated in this way, all defined in the
- When a transaction is processed in
server.go, it is basically just calling
mempool.go. If the transaction is valid then it will be added to the mempool, and if not then it will be rejected. In order to validate a transaction, mempool uses the previously mentioned
ConnectTransaction()function defined in
- Now we understand how a node syncs initial blocks, and how it accepts new blocks and transactions in the steady-state. The next step is to understand how blocks are created and mined:
_getBlockTemplate()contains the logic for constructing a new block. It basically does the following:
- Add txns from the mempool to the block until the block is full.
- Compute the fee, merkle root, etc.
block_producer.gojust produces block templates, but it's up to miners to compute winning hashes. That happens via a remote process as follows:
- Miners run
remote_miner_main.goand connect to any node they want via a flag. This can be their own local node or a remote node like api.bitclout.com.
remote_miner_main.gowill continuously call
GetBlockTemplate()on the chosen node and hash it until it's found a block. Once it has found a winning hash, it calls
SubmitBlock(), which then causes the node to process it and broadcast it to the rest of the network.
- Because all nodes expose
get-block-template, all nodes can be used to mine blocks in this way. Miners generally don't need to do anything other than point to a valid BitClout node somewhere on the network.
- Note that we are currently working on increasing the nonce size to 64 bits up from 32 bits. This will result in ExtraNonce being basically deprecated, and will make
GetBlockTemplate()much faster because it won't have to copy a block.
- Once a block has been submitted via SubmitBlock, it is then relayed to other peers via the INV mechanism described previously. This happens as follows:
- It downloads headers until it is fully synced with the Bitcoin peer.
- All we really need from a Bitcoin node is its header chain.
- In addition to the header chain, new blocks are downloaded from the Bitcoin node in order to extract valid BitcoinExchange transactions from them. Basically, any transaction that sends Bitcoin to the sink address, defined here, is recognized as being able to print BitClout on the Bitcoin chain.
- The BitcoinManager does some other things, like for example it is used to broadcast BitcoinExchange transactions to many peers at once here. But its main purpose is to download the Bitcoin header chain and, to a lesser extent, to download new blocks and extract valid BitcoinExchange transactions from them.
- Note also that using a single Bitcoin peer may seem insecure, but because the node checks the minimum work is above a certain threshold, it's generally not an issue. Additionally, nodes that run bitclout.com are pointed at specific trustworthy Bitcoin peers using --bitcoin_connect_peer
Below we trace how seeds and transactions are created while giving detail on their format and how validation works.
- First, a user lands on bitclout.com, which is the Angular frontend.
- They all hit corresponding API endpoints defined on the node's JSON API, which is fully defined in frontend_server.go.
- When a node starts up it opens up three ports: A "web" port that serves the Angular app, a "protocol" port that is used to connect with peers and process all blockchain-related messages, and an "API" port that is used to handle requests from the Angular app.
- By default these ports are: 4002=Angular app, 17001=JSON API, 17000=protocol port
- Note that the “web” port is deprecated in favor of running the frontend Angular app as a stand-alone service. So very soon a node will only have a JSON API port and a protocol port.
- Anytime the angular app needs to do something like construct a transaction or download the data for a user, it uses the API port. The JSON API is like "glue" between the blockchain and the frontend.
- Creating and storing the seed
- When a user hits “Sign Up,” they are taken to identity.bitclout.com.
- All of the seed phrases stored in
localStorageare encrypted using a call to
- Note: This is tab-to-tab communication. bitclout.com opens identity.bitclout.com, identity generates the
encryptedSeedHex, and then sends it back to bitclout.com. This same process works if you replace bitclout.com with the host of your own third-party node. The difference is that your third-party node will need to ask the user for permission in order to get encryptedSeedHex sent back to it.
- Once bitclout.com has the
encryptedSeedHex, it uses it to sign things. It does this by calling various operations on an iframe of identity.bitclout.com embedded within it.
- Why is this so complicated? Why send
encryptedSeedHexback to the host? Wouldn’t it be better to just keep everything in identity.bitclout.com?
- The reason for this setup is that iOS devices does not allow identity.bitclout.com to access persistent
localStoragewhen it’s embedded as an
iframein bitclout.com. This is due to Apple’s crusade against third-party cookies. However, Apple does allow identity.bitclout.com to access its cookies when its embedded as an iframe on bitclout.com if those cookies are set as first-party cookies.
- So, what do we do? We push the user to create their seed on identity.bitclout.com, where we can set an encryption key as a first-party cookie. Then, back on bitclout.com we store the
encryptedSeedHex. When signing is needed, the
encryptedSeedHexis passed to the identity.bitclout.com iframe, which has access to the encryption key in the cookie, which it then uses to decrypt the
encryptedSeedHexand sign the transaction.
- One draw-back of this approach is that cookies are sent to the identity.bitclout.com automatically when the page or iframe loads. This is not ideal, but that information is useless without the actual seed. Moreover, and critically, cookies are only used on iOS devices. On non-iOS devices, the encryption key is stored in
localStorage. This means that only iOS devices are subject to this drawback.
- One other draw-back is that an XSS attack on bitclout.com or a third-party node could technically give the attacker access to the
encryptedSeedHex. However, this information is useless without the encryption key stored exclusively in identity.bitclout.com.
- When a user does any kind of "write" operation in the app, such as submitting a post, liking, or updating their profile, a corresponding endpoint in
frontend_server.gois called to construct a transaction. That transaction is then returned unsigned, signed by the identity iframe, and then submitted back to core via
- As an example, consider
/send-bitclout, which is relatively straightforward:
- First, a universal view is fetched. More on this later, but it basically gives the endpoint a "union" of the "state" between what's in the mempool and what's in the blocks. For example, if someone sent you BitClout in a txn that's in the mempool, you can use the view to find that UTXO. And if they sent it to you in a txn that's been mined into a block, you can also find it in that view.
- In order to create the spend transaction, the endpoint needs to find UTXO's for the user. This generally always happens in
AddInputsAndChangeToTransaction(), which is a good function to trace through. I'm not aware of any transaction assembly that does not utilize this function for UTXO fetching.
- The key function is
GetSpendableUtxosForPublicKey(), which generates a universal view that includes txns from the mempool and then returns all UTXO's that are associated with the particular public key. These UTXO's can then be assembled into a transaction.
- Again, basically all transaction assembly runs through this codepath.
- Then the transaction is sent back to the frontend and signed.
- The transaction is then validated and broadcasted in
- Once the transaction is in the mempool, the node will eventually relay the transaction to its peers via a separate thread running in server.go that's kicked off in
- This thread is basically looking at the mempool at regular intervals and sending transactions to peers that they don't already have. This is how a transaction that's generated in the UI makes it to the rest of the network.
- Once a transaction has gone into the mempool then we're done. It will eventually be mined into a block.
- A note on the
- This endpoint is called when a user buys BitClout using Bitcoin in the "Buy BitClout" tab. It does the following:
- Constructs a Bitcoin transaction sending the user's Bitcoin to the "sink" address
- Broadcasts it to the Bitcoin blockchain
- Waits some amount of time for the transaction to propagate
- Checks to see if a double-spend occurred during this interval.
- If no double-spend was detected, the transaction is added to the BitClout mempool with the expectation that it will eventually mine into a Bitcoin block (and subsequently a BitClout block).
- The fee is generally set to 2x the "fastest" fee to ensure very high probability that the txn is processed. This is currently set in the frontend, but there is no reason why it can't be re-enforced in either the frontend_server.go code or in the mempool itself prior to accepting the Bitcoin txn.
- Once this transaction has been accepted into the mempool, the user can immediately spend it.
- This means there will be some risk of reversion of the user's transactions if the transaction isn't ultimately confirmed by the Bitcoin blockchain. But we have yet to have someone successfully double-spend against the latest iteration of the double-spend checking logic.
- BitcoinExchange transactions can also be added to the mempool via relay from other peers. In this case, the node can be set to ignore unmined Bitcoin transactions from peers so there is minimal risk of a double-spend or reversion.
- Importantly, no matter what the mempool does, the BitClout blockchain will not allow a BitcoinExchange transaction into it without at least one block of work on it. In practice, three blocks of work are required because miners wait for three blocks in order to be safe. This happens via a param called
MinerBitcoinMinBurnWorkBlocksthat is utilized by the block producer.
- All of these messages have serialization functions called ToBytes() that are defined by us in order to guarantee that all nodes serialize to the exact same bytes. If we were to rely on protobufs of JSON, nodes could get different serialized byte strings for the same messages because these formats do not guarantee consistent serialization across machines.
- Transactions are based on UTXO's. They contain the following:
TxnMeta. More on this later
PublicKey. In BitClout transactions are very simple and only have one public key that can be deemed to be the "executor" of the transaction. The transaction is generally always signed by this public key.
ExtraData. This is a flexible map that arbitrary data can be added to. It is currently used to support Reclouts via
IsQuoteRecloutedparams. It can be used to augment a transaction without causing a hard fork, which significantly increases the extensibility of BitClout by the community. For example, one can trivially add a "pinned posts" feature using
ExtraDatawithout consulting the core BitClout devs about it.
- Transaction metadata is used to determine what type of transaction we're dealing with. For each type of transaction in the system, a metadata type is defined that implements the
BitCloutTxnMetadatainterface. The full list of transaction types can be viewed here. To see descriptions of each one, simply find where that transaction type implements the interface.
- For example, here is the
BitcoinExchangeMetadata. You can see it contains a full Bitcoin transaction plus a merkle proof into the Bitcoin blockchain. This is how a node verifies that a particular Bitcoin transaction has a sufficient amount of work on it.
- TODO: The comments on these transaction types could use some work.
- Validation works by applying the transaction to a "view," which is basically a "simulation" of what would happen if the transaction were written to the database, but that doesn’t actually modify the database. This is useful because a view can allow you to "simulate" what would happen if you applied a bunch of transactions to the database in sequence in order to validate whole blocks before ever actually writing anything to the database. And this is exactly what
- We can walk through connecting an UpdateProfile transaction to see how it works.
- Note that deleting something from the view never actually deletes a mapping, it only marks it as
isDeleted=true. This is because the flush needs to propagate this change to the db, and it can only do that if it knows the entry is scheduled to be deleted by leaving it in the view.
- Every transaction has both a
_disconnectrestores the view to the state it was in before the transaction was connected.
_disconnectcode is rarely used, but it supports reorgs of blocks, which happen from time to time.
- During a reorg, we need to disconnect some transactions from some blocks and connect transactions from some other blocks in order to validate the fork, *before* writing anything to the db. This happens in ProcessBlock here.