๐Ÿš‘ PostgreSQL Masuk UGD? Kenalan dengan pgFirstAid!

Mimpi buruk terbesar seorang DBA atau Backend Engineer: Database corrupt, log isinya PANIC: page verification failed, dan backup terakhir ternyata rusak. ๐Ÿ˜ฑSebelum kalian menyerah dan kehilangan data, ada satu repository di GitHub yang berfungsi sebagai “Defibrillator” untuk Postgres kalian. Namanya pgFirstAid.Mari kita bedah isinya! ๐Ÿ‘‡ ๐Ÿ›‘ 1. The Problem (Masalah Utama)PostgreSQL didesain untuk sangat ketat … Read more๐Ÿš‘ PostgreSQL Masuk UGD? Kenalan dengan pgFirstAid!

Home Lab

I like to learn something new everyday, whether it is related to my PhD research (big data value creation) or not. I have investigated and learned many big data artifacts in multiple layers, i.e., Openstack, OpenFlow, Cumulus VX, VSphere, HPC, etc. in technology/infrastructure layer MPP/distributed solutions such as Hadoop, Spark, Elasticsearch, Kafka, MapD etc. in … Read moreHome Lab

Benchmark Python’s Dataframe: Pandas vs. Datatable vs. PySpark SQL

Setup Machine: 16-thread Xeon 2.6 GHz, 32 GB RAM, NVME PCIx16 System: Ubuntu 16.04, Spark 2.4.4, Python 3.7.4, Pandas 0.25.1, Datatable 0.10.1 Data: 100 million rows generated CSV (1.6 GB gzip compressed) Operation: Create a dataframe from a compressed file Group the dataframe by 3 columns Aggregate 2 different columns with 2 different function (group … Read moreBenchmark Python’s Dataframe: Pandas vs. Datatable vs. PySpark SQL

Apache Hadoop: What is that & how to install and use it? (Part 2)

Part 2: How to install a standalone Hadoop Now, we are going to install a standalone Hadoop. The easiest way is to use VM sandbox provided by vendors such as Hortonworks/Cloudera and MapR. However, since the sandbox has many components (not only Hadoop, but also HBase, Spark, Hive, Oozie, etc.), it requires substantial resources (4 … Read moreApache Hadoop: What is that & how to install and use it? (Part 2)

Apache Hadoop: What is that & how to install and use it? (Part 1)

Next: How to install a standalone Hadoop Part 1: Understanding Apache Hadoop as a Big Data Distributed Processing & Storage Cluster In the last post, I discussed on which occasion we prefer distributed approach such as Hadoop and Spark over the monolithic approach. I will discuss more detail about Apache Hadoop in this article. This … Read moreApache Hadoop: What is that & how to install and use it? (Part 1)

Standardized patterns for improving the data quality of big data

Abstract: Data seldom create value by themselves. They need to be linked and combined from multiple sources, which can often come with variable data quality. The task of improving data quality is a recurring challenge. In this paper, we use a case study of a large telecom company to develop a generic process pattern model … Read moreStandardized patterns for improving the data quality of big data

Arista, a Linux-based networking devices

For years, the networking industry is dominated by Cisco and its operation system, IOS. IOS, IMHO, is not designed to be customized by the users. Configuration, monitoring, and O&M should be done manually lines by lines on Cisco IOS. Revolution has occurred in the networking operating system. Disruptive switches like Arista and bare-metal switches that … Read moreArista, a Linux-based networking devices

Container on ARM SBSc

Most single-board computers (SBCs) today are powered by ARM. Containerization on SBCs like Raspberry Pi or Orange Pi brings so much flexibility in a Smart Home project, i.e., modular, scalable, decoupled, and interconnectivity.

Saving electricity by suspending idle servers

Electricity is a (very) expensive resource in Europe. By putting the servers into sleep/suspend mode (while idle), I can save 80% of the power consumption. A very-low-energy microcomputer (e.g. raspberry pi or even the smartphone) is enough to get them to wake up when being used.