data_engineer - .:: Data Sains Lab ::.

🚑 PostgreSQL Masuk UGD? Kenalan dengan pgFirstAid!

3rd December 20253rd December 2025 by agungw132

Mimpi buruk terbesar seorang DBA atau Backend Engineer: Database corrupt, log isinya PANIC: page verification failed, dan backup terakhir ternyata rusak. 😱Sebelum kalian menyerah dan kehilangan data, ada satu repository di GitHub yang berfungsi sebagai “Defibrillator” untuk Postgres kalian. Namanya pgFirstAid.Mari kita bedah isinya! 👇 🛑 1. The Problem (Masalah Utama)PostgreSQL didesain untuk sangat ketat … Read more🚑 PostgreSQL Masuk UGD? Kenalan dengan pgFirstAid!

Home Lab

25th January 202025th January 2020 by agungw132

I like to learn something new everyday, whether it is related to my PhD research (big data value creation) or not. I have investigated and learned many big data artifacts in multiple layers, i.e., Openstack, OpenFlow, Cumulus VX, VSphere, HPC, etc. in technology/infrastructure layer MPP/distributed solutions such as Hadoop, Spark, Elasticsearch, Kafka, MapD etc. in … Read moreHome Lab

Benchmark Python’s Dataframe: Pandas vs. Datatable vs. PySpark SQL

24th January 202024th January 2020 by agungw132

Setup Machine: 16-thread Xeon 2.6 GHz, 32 GB RAM, NVME PCIx16 System: Ubuntu 16.04, Spark 2.4.4, Python 3.7.4, Pandas 0.25.1, Datatable 0.10.1 Data: 100 million rows generated CSV (1.6 GB gzip compressed) Operation: Create a dataframe from a compressed file Group the dataframe by 3 columns Aggregate 2 different columns with 2 different function (group … Read moreBenchmark Python’s Dataframe: Pandas vs. Datatable vs. PySpark SQL

Apache Hadoop: What is that & how to install and use it? (Part 2)

4th June 20193rd June 2019 by agungw132

Part 2: How to install a standalone Hadoop Now, we are going to install a standalone Hadoop. The easiest way is to use VM sandbox provided by vendors such as Hortonworks/Cloudera and MapR. However, since the sandbox has many components (not only Hadoop, but also HBase, Spark, Hive, Oozie, etc.), it requires substantial resources (4 … Read moreApache Hadoop: What is that & how to install and use it? (Part 2)

Apache Hadoop: What is that & how to install and use it? (Part 1)

3rd June 20193rd June 2019 by agungw132

Next: How to install a standalone Hadoop Part 1: Understanding Apache Hadoop as a Big Data Distributed Processing & Storage Cluster In the last post, I discussed on which occasion we prefer distributed approach such as Hadoop and Spark over the monolithic approach. I will discuss more detail about Apache Hadoop in this article. This … Read moreApache Hadoop: What is that & how to install and use it? (Part 1)

Repository of public datasets

2nd June 20192nd June 2019 by agungw132

For anyone who is looking for datasets for his/her project.

Standardized patterns for improving the data quality of big data

2nd June 20192nd June 2019 by agungw132

Abstract: Data seldom create value by themselves. They need to be linked and combined from multiple sources, which can often come with variable data quality. The task of improving data quality is a recurring challenge. In this paper, we use a case study of a large telecom company to develop a generic process pattern model … Read moreStandardized patterns for improving the data quality of big data

Arista, a Linux-based networking devices

3rd June 20192nd June 2019 by agungw132

For years, the networking industry is dominated by Cisco and its operation system, IOS. IOS, IMHO, is not designed to be customized by the users. Configuration, monitoring, and O&M should be done manually lines by lines on Cisco IOS. Revolution has occurred in the networking operating system. Disruptive switches like Arista and bare-metal switches that … Read moreArista, a Linux-based networking devices

Container on ARM SBSc

2nd June 20192nd June 2019 by agungw132

Most single-board computers (SBCs) today are powered by ARM. Containerization on SBCs like Raspberry Pi or Orange Pi brings so much flexibility in a Smart Home project, i.e., modular, scalable, decoupled, and interconnectivity.

Saving electricity by suspending idle servers

2nd June 20192nd June 2019 by agungw132

Electricity is a (very) expensive resource in Europe. By putting the servers into sleep/suspend mode (while idle), I can save 80% of the power consumption. A very-low-energy microcomputer (e.g. raspberry pi or even the smartphone) is enough to get them to wake up when being used.

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: