Client Analysis - Data Collection
- To retrieve network data about nodes (such as client, ip, hosting, peers etc)
- Analyze EL-CL pairings, Client diversity, Client effectiveness, Peering differences between clients.
- Setup a data pipeline to ingest data into a data store for further analysis.
- Without setting up a node or a crawler, there was no way to query the required information.
Identified and experimented the various open source network crawler implementations listed below:
- node-crawler (Eth1) : 🧡 Although the documentation was good for setting this up, it could only parse 100 mainnet Eth1 nodes/day. There is no guide to configure this.
- node-watch (Eth2) : 💚 Easy to setup. Crawled Eth2 nodes and stored them in MongoDB. We wrote scripts to extract data from MongoDB and load it into an AWS RDS MySQL instance.
- CrawlEth (Eth1) : ❤️ It appears to be a useful tool, but it was difficult to set up due to a lack of documentation.
We were able to set up nodewatch in an AWS instance and parse below information for Eth2 nodes
- Fork Digest
- Sync Status
- Code: nodewatch-to-db.py
We were unsuccessful in retrieving some of the desired information (such as peers of the nodes). This would be one of the action items in our roadmap.
We were also unsuccessful in setting up a Eth1 Crawler.
We ended up crawling ethernodes to retrieve required Eth1 info.
- Code: ethernode-to-db.py
To retrieve hosting provider information, We utilized Maxmind free database. We have to manually filter out ISP providers without hosting infrastructure to remove noise.
- Code: geoip-to-db.py