Try   HackMD

BSC Node Setup Story

I had an AWS c6g.2xlarge instance even set IOPS to 16k doesn't seem to help after one day. I've started a new AWS EC2 i3 instance with 900G ephemeral NVME SSD to verify whether an even higher IOPS can help sync eventually. Will update this thread if I got some progress. Date when the comment was posted: 2021-05-28.

Update 1

Some raw numbers to check during syncing:

i3.xlarge instance with ephemeral native NVME, IOPS 70K, iostat MB_wrtn/s was averaged to be about 10M/s

expand to see details
# /usr/local/bin/geth attach http://localhost:8545 --exec "eth.syncing"
{
  currentBlock: 5045035,
  highestBlock: 7796190,
  knownStates: 96385243,
  pulledStates: 95904052,
  startingBlock: 0
}

# iostat -m -d 10 10
Linux 4.14.231-173.361.amzn2.x86_64 (ip-172-31-66-40.ec2.internal) 	05/28/2021 	_x86_64_	(4 CPU)

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
xvda              9.06         0.04         0.10        236        587
nvme0n1         125.74         0.00       164.96          6     982969

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
xvda              0.30         0.00         0.00          0          0
nvme0n1          69.80         0.00         4.22          0         42

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
xvda              0.00         0.00         0.00          0          0
nvme0n1         202.20         0.00        18.09          0        180

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
xvda             39.20         0.00         0.34          0          3
nvme0n1         117.30         0.00         6.66          0         66

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
xvda              0.20         0.00         0.00          0          0
nvme0n1          84.00         0.00         4.57          0         45

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
xvda             35.60         0.00         0.26          0          2
nvme0n1         192.90         0.00        18.79          0        187

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
xvda             31.20         0.10         0.18          1          1
nvme0n1          87.20         0.00         4.41          0         44

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
xvda              0.40         0.00         0.00          0          0
nvme0n1          55.70         0.00         3.03          0         30

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
xvda             36.50         0.00         0.28          0          2
nvme0n1         185.50         0.00        18.31          0        183

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
xvda              0.00         0.00         0.00          0          0
nvme0n1          97.10         0.00         6.20          0         62

c5.2xlarge instance with gp3 volume, IOPS 16K, MB_wrtn/s was almost 0

expand to see details
# /usr/local/bin/geth attach http://localhost:8545 --exec "eth.syncing"
{
  currentBlock: 7797374,
  highestBlock: 7797452,
  knownStates: 687576703,
  pulledStates: 687420045,
  startingBlock: 7794295
}

# iostat -m -d 10 10
Linux 4.14.231-173.361.amzn2.aarch64 (ip-172-31-3-204.ec2.internal) 	05/28/2021 	_aarch64_	(8 CPU)

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
nvme0n1        1631.77        15.09         1.67   10541477    1166468

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
nvme0n1        1503.30        12.04         0.66        120          6

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
nvme0n1         759.20         5.85         0.00         58          0

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
nvme0n1        1555.24        12.45         0.00        124          0

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
nvme0n1        1356.30        10.76         0.00        107          0

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
nvme0n1         750.30         6.12         1.07         61         10

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
nvme0n1         837.80         6.93         0.00         69          0

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
nvme0n1         927.60         7.53         0.00         75          0

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
nvme0n1        1528.40        12.61         0.06        126          0

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
nvme0n1        1484.80        12.07         0.00        120          0

Update 2

Finally synced in 10 hours!
Based on geth process log, those are some final signals before the node became synced.

  • Indexing transactions, it kept printing out percentage until it reaches 100, took about 20mins for this may be, during this period eth.syncing result was frozen, but we can notice that pulledStates was the same as knownStates.
  • Generating state snapshot, this was a new feature in 1.10 for a faster snapshot sync mode, it kept saying resuming/aborting. Just let it running for a while and don't worry about it.
expand to see details
# /usr/local/bin/geth attach http://localhost:8545 --exec "eth.syncing"
{
  currentBlock: 7808427,
  highestBlock: 7808533,
  knownStates: 340871401,
  pulledStates: 340871401,
  startingBlock: 7803641
}

# /usr/local/bin/geth attach http://localhost:8545 --exec "eth.syncing"
false

# date
Fri May 28 17:48:12 UTC 2021

# journalctl -f -u geth
May 28 17:50:38 ip-172-31-66-40.ec2.internal geth[3882]: INFO [05-28|17:50:38.371] Imported new chain segment               blocks=1    txs=104     mgas=14.633   elapsed=271.241ms   mgasps=53.948  number=7,808,617 hash=d33527..30585e dirty=1019.97MiB
May 28 17:50:38 ip-172-31-66-40.ec2.internal geth[3882]: INFO [05-28|17:50:38.373] Unindexed transactions                   blocks=1    txs=77      tail=5,458,618 elapsed=1.835ms
May 28 17:50:41 ip-172-31-66-40.ec2.internal geth[3882]: INFO [05-28|17:50:41.471] Imported new chain segment               blocks=1    txs=212     mgas=22.106   elapsed=392.647ms   mgasps=56.300  number=7,808,618 hash=a543ea..e0b103 dirty=1020.60MiB
May 28 17:50:41 ip-172-31-66-40.ec2.internal geth[3882]: INFO [05-28|17:50:41.472] Unindexed transactions                   blocks=1    txs=59      tail=5,458,619 elapsed=1.510ms

Update 3

I'll share my AWS EC2 based configuration here hope that can help others struggling with building a fully synced node.

1. EC2 Instance Config

I used a i3.xlarge EC2 instance with 4 CPUs, 32GB Memory, 900G native NVME and 8G gp3 volume for hosting the operating system. The AMI was Amazon Linux 2 AMI (HVM), SSD Volume Type, in 64-bit (x86) arch.

When the instance first started you have to create file system on the 900G blk device by mkfs -t xfs /dev/nvme0n1 (check device id by running lsblk). And then mkdir /opt/bsc and mount /dev/nvme0n1 /opt/bsc

2. /opt/bsc/node/config.toml
[Eth]
NetworkId = 56
SyncMode = "fast"
NoPruning = false
NoPrefetch = false
LightPeers = 100
DatabaseCache = 24000
DatabaseFreezer = ""
TrieCleanCache = 256
TrieDirtyCache = 256
UltraLightFraction = 75
TrieTimeout = 5000000000000
EnablePreimageRecording = false
EWASMInterpreter = ""
EVMInterpreter = ""

[Eth.Miner]
DelayLeftOver = 50000000
GasFloor = 8000000
GasCeil = 8000000
GasPrice = 1000000000
Recommit = 3000000000
Noverify = false

[Eth.Ethash]
CacheDir = "ethash"
CachesInMem = 2
CachesOnDisk = 3
CachesLockMmap = false
DatasetDir = "/opt/bsc/node/ethash"
DatasetsInMem = 1
DatasetsOnDisk = 2
DatasetsLockMmap = false
PowMode = 0

[Eth.TxPool]
Locals = []
NoLocals = false
Journal = "transactions.rlp"
Rejournal = 3600000000000
PriceLimit = 1000000000
PriceBump = 10
AccountSlots = 512
GlobalSlots = 10000
AccountQueue = 256
GlobalQueue = 5000
Lifetime = 10800000000000

[Eth.GPO]
Blocks = 20
Percentile = 60
OracleThreshold = 1000

[Node]
DataDir = "/opt/bsc/node/data"
HTTPHost = "0.0.0.0"
NoUSB = true
InsecureUnlockAllowed = false
IPCPath = "geth.ipc"
HTTPPort = 8545
HTTPVirtualHosts = ["localhost"]
HTTPModules = ["net", "web3", "eth", "debug"]
WSPort = 8546
WSModules = ["net", "web3", "eth", "debug"]

[Node.P2P]
MaxPeers = 1000
NoDiscovery = false
BootstrapNodes = ["enode://1cc4534b14cfe351ab740a1418ab944a234ca2f702915eadb7e558a02010cb7c5a8c295a3b56bcefa7701c07752acd5539cb13df2aab8ae2d98934d712611443@52.71.43.172:30311","enode://28b1d16562dac280dacaaf45d54516b85bc6c994252a9825c5cc4e080d3e53446d05f63ba495ea7d44d6c316b54cd92b245c5c328c37da24605c4a93a0d099c4@34.246.65.14:30311","enode://5a7b996048d1b0a07683a949662c87c09b55247ce774aeee10bb886892e586e3c604564393292e38ef43c023ee9981e1f8b335766ec4f0f256e57f8640b079d5@35.73.137.11:30311"]
StaticNodes = ["enode://f3cfd69f2808ef64838abd8786342c0b22fdd28268703c8d6812e26e109f9a7cb2b37bd49724ebb46c233289f22da82991c87345eb9a2dadeddb8f37eeb259ac@18.180.28.21:30311","enode://ae74385270d4afeb953561603fcedc4a0e755a241ffdea31c3f751dc8be5bf29c03bf46e3051d1c8d997c45479a92632020c9a84b96dcb63b2259ec09b4fde38@54.178.30.104:30311","enode://d1cabe083d5fc1da9b510889188f06dab891935294e4569df759fc2c4d684b3b4982051b84a9a078512202ad947f9240adc5b6abea5320fb9a736d2f6751c52e@54.238.28.14:30311","enode://f420209bac5324326c116d38d83edfa2256c4101a27cd3e7f9b8287dc8526900f4137e915df6806986b28bc79b1e66679b544a1c515a95ede86f4d809bd65dab@54.178.62.117:30311","enode://c0e8d1abd27c3c13ca879e16f34c12ffee936a7e5d7b7fb6f1af5cc75c6fad704e5667c7bbf7826fcb200d22b9bf86395271b0f76c21e63ad9a388ed548d4c90@54.65.247.12:30311","enode://f1b49b1cf536e36f9a56730f7a0ece899e5efb344eec2fdca3a335465bc4f619b98121f4a5032a1218fa8b69a5488d1ec48afe2abda073280beec296b104db31@13.114.199.41:30311","enode://4924583cfb262b6e333969c86eab8da009b3f7d165cc9ad326914f576c575741e71dc6e64a830e833c25e8c45b906364e58e70cdf043651fd583082ea7db5e3b@18.180.17.171:30311","enode://4d041250eb4f05ab55af184a01aed1a71d241a94a03a5b86f4e32659e1ab1e144be919890682d4afb5e7afd837146ce584d61a38837553d95a7de1f28ea4513a@54.178.99.222:30311","enode://b5772a14fdaeebf4c1924e73c923bdf11c35240a6da7b9e5ec0e6cbb95e78327690b90e8ab0ea5270debc8834454b98eca34cc2a19817f5972498648a6959a3a@54.170.158.102:30311","enode://f329176b187cec87b327f82e78b6ece3102a0f7c89b92a5312e1674062c6e89f785f55fb1b167e369d71c66b0548994c6035c6d85849eccb434d4d9e0c489cdd@34.253.94.130:30311","enode://cbfd1219940d4e312ad94108e7fa3bc34c4c22081d6f334a2e7b36bb28928b56879924cf0353ad85fa5b2f3d5033bbe8ad5371feae9c2088214184be301ed658@54.75.11.3:30311","enode://c64b0a0c619c03c220ea0d7cac754931f967665f9e148b92d2e46761ad9180f5eb5aaef48dfc230d8db8f8c16d2265a3d5407b06bedcd5f0f5a22c2f51c2e69f@54.216.208.163:30311","enode://352a361a9240d4d23bb6fab19cc6dc5a5fc6921abf19de65afe13f1802780aecd67c8c09d8c89043ff86947f171d98ab06906ef616d58e718067e02abea0dda9@79.125.105.65:30311","enode://bb683ef5d03db7d945d6f84b88e5b98920b70aecc22abed8c00d6db621f784e4280e5813d12694c7a091543064456ad9789980766f3f1feb38906cf7255c33d6@54.195.127.237:30311","enode://11dc6fea50630b68a9289055d6b0fb0e22fb5048a3f4e4efd741a7ab09dd79e78d383efc052089e516f0a0f3eacdd5d3ffbe5279b36ecc42ad7cd1f2767fdbdb@46.137.182.25:30311","enode://21530e423b42aed17d7eef67882ebb23357db4f8b10c94d4c71191f52955d97dc13eec03cfeff0fe3a1c89c955e81a6970c09689d21ecbec2142b26b7e759c45@54.216.119.18:30311","enode://d61a31410c365e7fcd50e24d56a77d2d9741d4a57b295cc5070189ad90d0ec749d113b4b0432c6d795eb36597efce88d12ca45e645ec51b3a2144e1c1c41b66a@34.204.129.242:30311","enode://bb91215b1d77c892897048dd58f709f02aacb5355aa8f50f00b67c879c3dffd7eef5b5a152ac46cdfb255295bec4d06701a8032456703c6b604a4686d388ea8f@75.101.197.198:30311","enode://786acbdf5a3cf91b99047a0fd8305e11e54d96ea3a72b1527050d3d6f8c9fc0278ff9ef56f3e56b3b70a283d97c309065506ea2fc3eb9b62477fd014a3ec1a96@107.23.90.162:30311","enode://4653bc7c235c3480968e5e81d91123bc67626f35c207ae4acab89347db675a627784c5982431300c02f547a7d33558718f7795e848d547a327abb111eac73636@54.144.170.236:30311","enode://c6ffd994c4ef130f90f8ee2fc08c1b0f02a6e9b12152092bf5a03dd7af9fd33597d4b2e2000a271cc0648d5e55242aeadd6d5061bb2e596372655ba0722cc704@54.147.151.108:30311","enode://99b07e9dc5f204263b87243146743399b2bd60c98f68d1239a3461d09087e6c417e40f1106fa606ccf54159feabdddb4e7f367559b349a6511e66e525de4906e@54.81.225.170:30311","enode://1479af5ea7bda822e8747d0b967309bced22cad5083b93bc6f4e1d7da7be067cd8495dc4c5a71579f2da8d9068f0c43ad6933d2b335a545b4ae49a846122b261@52.7.247.132:30311"]
ListenAddr = ":30311"
EnableMsgEvents = false

[Node.HTTPTimeouts]
ReadTimeout = 30000000000
WriteTimeout = 30000000000
IdleTimeout = 120000000000
3. /etc/systemd/system/geth.service
[Unit]
Description=BSC geth go client
After=syslog.target network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/geth --config /opt/bsc/node/config.toml --metrics --metrics.addr 0.0.0.0
KillMode=process
KillSignal=SIGINT
TimeoutStopSec=90
Restart=on-failure
RestartSec=10s

[Install]
WantedBy=multi-user.target
4. chain initialization steps
# follow the guide to setup AWS EC2 time sync
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html

# use /tmp as the working directory
cd /tmp

# download bsc 1.1.0-beta
wget https://github.com/binance-chain/bsc/releases/download/v1.1.0-beta/geth_linux
chmod +x geth_linux
mv geth_linux /usr/local/bin/geth

# initialize genesis
wget https://github.com/binance-chain/bsc/releases/download/v1.1.0-beta/mainnet.zip
unzip mainnet.zip 
/usr/local/bin/geth --datadir /opt/bsc/node/data init genesis.json

# start service
systemctl enable geth
systemctl start geth

# check syncing progress
/usr/local/bin/geth attach http://localhost:8545 --exec "eth.syncing"

# check peer count
curl -s 127.0.0.1:6060/debug/metrics | grep peer

# check geth daemon logs
journalctl -f -u geth

Update 4

Cost

According to AWS Pricing Calculator, the cost for a i3.xlarge instance is about $228.40 per month.

Caveats

i3.xlarge instance NVME disk was ephemeral which means the data would be gone if you ever stopped the EC2 instance. Since EC2 SLA is not particularly high (99.99% which translates into 52m 35s downtime per year), it is recommended to run at least 2 of the above instances behind a load balancer to ensure continuous service.

Update 5

Instance type should be changed to i3.2xlarge now due to disk space usage is reaching i3.xlarge instance type's SSD upper limits.