Google BigQuery, a serverless Datawarehouse-as-a-Service to batch query huge datasets (Part 2)

Contents How to use GBQ Note that the data should be located in Google cloud, whether in Google Cloud Storage, Google Drive, Cloud BigTable, or output from any Google’s SaaS. Google BigQuery also provides a number of public datasets that make users easier to combine instantly with their own dataset such as NOAA, Bitcoin, WorldBank, … Read moreGoogle BigQuery, a serverless Datawarehouse-as-a-Service to batch query huge datasets (Part 2)

Apache Hadoop: What is that & how to install and use it? (Part 2)

Part 2: How to install a standalone Hadoop Now, we are going to install a standalone Hadoop. The easiest way is to use VM sandbox provided by vendors such as Hortonworks/Cloudera and MapR. However, since the sandbox has many components (not only Hadoop, but also HBase, Spark, Hive, Oozie, etc.), it requires substantial resources (4 … Read moreApache Hadoop: What is that & how to install and use it? (Part 2)

Apache Hadoop: What is that & how to install and use it? (Part 1)

Next: How to install a standalone Hadoop Part 1: Understanding Apache Hadoop as a Big Data Distributed Processing & Storage Cluster In the last post, I discussed on which occasion we prefer distributed approach such as Hadoop and Spark over the monolithic approach. I will discuss more detail about Apache Hadoop in this article. This … Read moreApache Hadoop: What is that & how to install and use it? (Part 1)

Big Data & AI landscape 2018

As described by AgileEngine (https://agileengine.com/megatrends-in-big-data/), there are four megatrends in big data in 2019, i.e.:1) From “big data” to “just data” because most organizations currently already embrace big data. 2) machine learning is the new engine after many organizations suffer creating value from big data. 3) everyone to the cloud because of the following benefits: … Read moreBig Data & AI landscape 2018

Google BigQuery, a serverless Datawarehouse-as-a-Service to batch query huge datasets (Part 1)

Next: Part 2 Google Big Query (GBQ) as a serverless service from Google Serverless is one of big data solution to watch in 2018 according to Computer World UK (/compwuk). Google BigQuery (GBQ) is an example of enterprise-grade serverless service (either Function-as-a-Service, FaaS or Datawarehouse-as-a-Service) offered by Google Cloud Platform. GBQ was first launched as … Read moreGoogle BigQuery, a serverless Datawarehouse-as-a-Service to batch query huge datasets (Part 1)

Apache Zeppelin, a polyglot data science tools

Use of polyglot application for big data exploratory will be more important in the future. It allows us to run multiple interpreters in a single notebook and transfer variables among multiple kernels. The use of multiple language programming give unleashes the strong points from each interpreter.

Standardized patterns for improving the data quality of big data

Abstract: Data seldom create value by themselves. They need to be linked and combined from multiple sources, which can often come with variable data quality. The task of improving data quality is a recurring challenge. In this paper, we use a case study of a large telecom company to develop a generic process pattern model … Read moreStandardized patterns for improving the data quality of big data

Arista, a Linux-based networking devices

For years, the networking industry is dominated by Cisco and its operation system, IOS. IOS, IMHO, is not designed to be customized by the users. Configuration, monitoring, and O&M should be done manually lines by lines on Cisco IOS. Revolution has occurred in the networking operating system. Disruptive switches like Arista and bare-metal switches that … Read moreArista, a Linux-based networking devices

Container on ARM SBSc

Most single-board computers (SBCs) today are powered by ARM. Containerization on SBCs like Raspberry Pi or Orange Pi brings so much flexibility in a Smart Home project, i.e., modular, scalable, decoupled, and interconnectivity.