I like to learn something new everyday, whether it is related to my PhD research (big data value creation) or not. I have investigated and learned many big data artifacts in multiple layers, i.e.,
- Openstack, OpenFlow, Cumulus VX, VSphere, HPC, etc. in technology/infrastructure layer
- MPP/distributed solutions such as Hadoop, Spark, Elasticsearch, Kafka, MapD etc. in application layer
- SQL, NoSQL, streaming storage, in-memory storage, GPU storage in data layer
- Beam, Airflow, Oozie, Kubernetes, Ambari, etc. in process and business layer
Learning the artifacts is enough using a laptop/desktop, but mastering the artifact (e.g., setting up, operating, monitoring, troubleshooting and maintaining the artifact in a cluster) is too heavy for a laptop. For example, to observe their performance when scaling out, our laptop/desktop is likely to run out CPU or memory to setup few VMs. Therefore, we need a server-grade lab that has more resources.
However, building a lab is so expensive that makes me several times to think about to have one. When high resources are needed, I usually utilize these 2 solutions:
- Using HPC TU Delft. As a non-sudo user, I am not allowed to install an application. Although I could build applications from the source, but important applications such as vagrant, docker, kubernetes are not. The workaround is usually using Conda to build the binaries together with the dependencies. However, not everything is available in Conda repository.
- Using cloud-based solution such as AWS and GCP. However, recent experiences makes me think to leave this solution. When preparing for CCA 131 exam, I had to spent almost $50 on the half way of the course. Most VMs I build were using spot instances (the cheapest price). Also, I had to spent almost $20 for transcribing the courses using Google Cloud Speech-to-Text. Cloud Platform are easy to start and play with, but the cost could be surprising. Moreover, I am not a fan of UI-type applications. Although it is easy and practicable, but replications and flexibility are difficult to achieve.
Therefore, having a home lab becomes a priority for me due to many courses in my bucket: VSphere, SDN, Hadoop/Spark, etc. I started to find the hardware from Aliexpress & Marktplaats (second-hand online market in the Netherlands). New parts are expensive in Europe, so I prefer buying from Aliexpress. However, delivering big size and heavy items from China is expensive. Marktplaats was chosen for this reason.
The result could be seen in the following table. With budget less than $500, I could have a system with 16 threads Xeon 2.6 GHz, 64 GB RAM, 512 GB NVME PCIe, two GPUs (GTX 650Ti & GTX 590, I need the GPUs to try multi-GPU experiment) and Power Supply 1000 Watt.
Item | Condition | Source | Qty. | Unit Price (US $) | Total (US $) |
Motherboard X79 | New | https://www.aliexpress.com/item/33017449005.html | 1 | 74.4 | 74.4 |
RAM 32 GB (DDR3, ECC, 1866) | New | https://www.aliexpress.com/item/32960292204.html | 2 | 45.59 | 91.18 |
NVME PCIe 512 GB | New | https://www.aliexpress.com/item/4000026399982.html | 1 | 56.85 | 56.85 |
CPU Xeon E5-2650 v2 (16 threads) | Refurbished | https://www.aliexpress.com/item/32825475599.html | 1 | 56.9 | 56.9 |
Power Supply Cooler Master 1000W | Second | Marktplaats | 1 | 66.1 | 66.1 |
Computer Case ATX | Second | Marktplaats | 1 | 8.8 | 8.8 |
GPU NVIDIA GTX 650Ti | Second | Marktplaats | 1 | 16.5 | 16.5 |
GPU NVIDIA GTX 590 | Second | Marktplaats | 1 | 60.6 | 60.6 |
Heatsink + Fan CPU | New | https://www.aliexpress.com/item/4000540173318.html | 1 | 26.36 | 26.36 |
Heatsink SSD | New | https://www.aliexpress.com/item/4000053073130.html | 1 | 1.19 | 1.19 |
TOTAL | 458.88 |
The initial use of the system is to build Cumulus Demo. There are 16 VMs need to be built in the topology, as shown in the figure below. Using the home lab, it is not a heavy job to establish.
To check how good is the system, the easiest way to do is to employ current benchmark from Phoronix. This benchmark seems to be fit, since it currently consists of 31 tested systems, either AMD (Ryzen, A-series, E-series, FX) or Intel (Core i-series, Xeon).
Rodinia 2.4: pts/rodinia-1.2.2 [Test: OpenMP CFD Solver] Test 1 of 22 Estimated Trial Run Count: 3 Estimated Test Run-Time: 10 Minutes Estimated Time To Completion: 3 Hours, 36 Minutes [03:47 CET Jan 25] Started Run 1 @ 00:11:13 Started Run 2 @ 00:12:12 Started Run 3 @ 00:13:11 Test: OpenMP CFD Solver: 55.601 54.503 54.654 Average: 54.919 Seconds Deviation: 1.08% Seconds < Lower Is Better AMD E-350 ................ 685.06 |========================================= AMD A6-7400K ............. 248.07 |=============== AMD Athlon II X3 425 ..... 234.93 |============== Intel Core 2 Duo E8400 ... 206.00 |============ Pentium G3258 ............ 156.22 |========= Core i3 2100 ............. 155.79 |========= AMD A10-7800 ............. 139.20 |======== AMD A10-5800K ............ 130.06 |======== AMD A10-7870K ............ 119.94 |======= Core i5 2400S ............ 114.38 |======= Core i7 870 .............. 91.46 |===== AMD FX-8350 .............. 70.46 |==== Core i7 990X ............. 63.64 |==== Willow6C ................. 62.35 |==== Core i5 7600K ............ 59.91 |==== <<HomeLabMN356 ............. 54.92 |===>> Core i5 8500T ............ 53.72 |=== Willow16C ................ 50.33 |=== Ryzen 5 2400G ............ 45.91 |=== Core i7 7740X ............ 39.02 |== Core i5 8400 ............. 35.98 |== Ryzen 7 1700 ............. 34.92 |== Core i7 5960X ............ 33.82 |== Ryzen 5 2600 ............. 32.70 |== Ryzen 5 2600X ............ 31.17 |== Ryzen 7 2700 ............. 29.28 |== Core i7 8700K ............ 29.04 |== Ryzen 7 1800X ............ 28.52 |== Ryzen 7 2700X ............ 27.22 |== Core i9 7900X ............ 18.23 |= Ryzen Threadripper 1950X . 13.34 |= Core i9 7980XE ........... 11.93 |=
Hope this short article helps you building your own home lab!