Apache Hadoop: What is that & how to install and use it? (Part 2)

Part 2: How to install a standalone Hadoop Now, we are going to install a standalone Hadoop. The easiest way is to use VM sandbox provided by vendors such as Hortonworks/Cloudera and MapR. However, since the sandbox has many components (not only Hadoop, but also HBase, Spark, Hive, Oozie, etc.), it requires substantial resources (4 … Read moreApache Hadoop: What is that & how to install and use it? (Part 2)

Apache Hadoop: What is that & how to install and use it? (Part 1)

Next: How to install a standalone Hadoop Part 1: Understanding Apache Hadoop as a Big Data Distributed Processing & Storage Cluster In the last post, I discussed on which occasion we prefer distributed approach such as Hadoop and Spark over the monolithic approach. I will discuss more detail about Apache Hadoop in this article. This … Read moreApache Hadoop: What is that & how to install and use it? (Part 1)

Big Data & AI landscape 2018

As described by AgileEngine (https://agileengine.com/megatrends-in-big-data/), there are four megatrends in big data in 2019, i.e.:1) From “big data” to “just data” because most organizations currently already embrace big data. 2) machine learning is the new engine after many organizations suffer creating value from big data. 3) everyone to the cloud because of the following benefits: … Read moreBig Data & AI landscape 2018

Apache Zeppelin, a polyglot data science tools

Use of polyglot application for big data exploratory will be more important in the future. It allows us to run multiple interpreters in a single notebook and transfer variables among multiple kernels. The use of multiple language programming give unleashes the strong points from each interpreter.

Standardized patterns for improving the data quality of big data

Abstract: Data seldom create value by themselves. They need to be linked and combined from multiple sources, which can often come with variable data quality. The task of improving data quality is a recurring challenge. In this paper, we use a case study of a large telecom company to develop a generic process pattern model … Read moreStandardized patterns for improving the data quality of big data

Tableau, the “de facto” distributed visualization platform for big data

Long time not checked Tableau, this application has incorporated new connectors for recent technologies, e.g., Google Big Query, Spark SQL, etc. It surely positions itself as the ‘de facto’ distributed visualization platform for big data. I was wondering when Tableau delivers its desktop version on Linux platform. Microsoft Excel Text File Microsoft Access JSON File … Read moreTableau, the “de facto” distributed visualization platform for big data

Big Data Application & eTOM

Which department in an enterprise is concerned with a certain big data objective? The most compelling answer is to combine the eTOM framework with big data use cases. To understand how the big data applications are implemented in practices, we need to pinpoint the applications on the business processes in an organization. For such purpose, we … Read moreBig Data Application & eTOM

Google Conversation atop Translate API

I met a lot of potential Indonesians, but lack of English made them unconfident to deliver their capability. I just noticed Android’s conversation service atop Google Translate that could reduce the hassle. Just connecting Android to a Bluetooth earphone, we can watch 2 people speaking in their mother languages confidently. This is an example: Waverly … Read moreGoogle Conversation atop Translate API

Sonoff RF Bridge (before the firmware flashed)

Turning a remote controller into a (WiFi) lamp switcher using a Sonoff RF bridge that converts a learned RF incoming signal to an MQTT publishing action. Sonoff original application is used (it needs to send the message to the Sonoff cloud). IFTTT (If-This-Then-That) is created to respond to an RF request to turn on a … Read moreSonoff RF Bridge (before the firmware flashed)