Apache Hadoop: What is that & how to install and use it? (Part 1)

Next: How to install a standalone Hadoop Part 1: Understanding Apache Hadoop as a Big Data Distributed Processing & Storage Cluster In the last post, I discussed on which occasion we prefer distributed approach such as Hadoop and Spark over the monolithic approach. I will discuss more detail about Apache Hadoop in this article. This … Read moreApache Hadoop: What is that & how to install and use it? (Part 1)

Big Data & AI landscape 2018

As described by AgileEngine (https://agileengine.com/megatrends-in-big-data/), there are four megatrends in big data in 2019, i.e.:1) From “big data” to “just data” because most organizations currently already embrace big data. 2) machine learning is the new engine after many organizations suffer creating value from big data. 3) everyone to the cloud because of the following benefits: … Read moreBig Data & AI landscape 2018

Google BigQuery, a serverless Datawarehouse-as-a-Service to batch query huge datasets (Part 1)

Next: Part 2 Google Big Query (GBQ) as a serverless service from Google Serverless is one of big data solution to watch in 2018 according to Computer World UK (/compwuk). Google BigQuery (GBQ) is an example of enterprise-grade serverless service (either Function-as-a-Service, FaaS or Datawarehouse-as-a-Service) offered by Google Cloud Platform. GBQ was first launched as … Read moreGoogle BigQuery, a serverless Datawarehouse-as-a-Service to batch query huge datasets (Part 1)

Apache Zeppelin, a polyglot data science tools

Use of polyglot application for big data exploratory will be more important in the future. It allows us to run multiple interpreters in a single notebook and transfer variables among multiple kernels. The use of multiple language programming give unleashes the strong points from each interpreter.

Standardized patterns for improving the data quality of big data

Abstract: Data seldom create value by themselves. They need to be linked and combined from multiple sources, which can often come with variable data quality. The task of improving data quality is a recurring challenge. In this paper, we use a case study of a large telecom company to develop a generic process pattern model … Read moreStandardized patterns for improving the data quality of big data

Arista, a Linux-based networking devices

For years, the networking industry is dominated by Cisco and its operation system, IOS. IOS, IMHO, is not designed to be customized by the users. Configuration, monitoring, and O&M should be done manually lines by lines on Cisco IOS. Revolution has occurred in the networking operating system. Disruptive switches like Arista and bare-metal switches that … Read moreArista, a Linux-based networking devices

Container on ARM SBSc

Most single-board computers (SBCs) today are powered by ARM. Containerization on SBCs like Raspberry Pi or Orange Pi brings so much flexibility in a Smart Home project, i.e., modular, scalable, decoupled, and interconnectivity.

Saving electricity by suspending idle servers

Electricity is a (very) expensive resource in Europe. By putting the servers into sleep/suspend mode (while idle), I can save 80% of the power consumption. A very-low-energy microcomputer (e.g. raspberry pi or even the smartphone) is enough to get them to wake up when being used.

Tableau, the “de facto” distributed visualization platform for big data

Long time not checked Tableau, this application has incorporated new connectors for recent technologies, e.g., Google Big Query, Spark SQL, etc. It surely positions itself as the ‘de facto’ distributed visualization platform for big data. I was wondering when Tableau delivers its desktop version on Linux platform. Microsoft Excel Text File Microsoft Access JSON File … Read moreTableau, the “de facto” distributed visualization platform for big data