agungw132 - .:: Data Sains Lab ::.

Apache Hadoop: What is that & how to install and use it? (Part 1)

3rd June 20193rd June 2019 by agungw132

Next: How to install a standalone Hadoop Part 1: Understanding Apache Hadoop as a Big Data Distributed Processing & Storage Cluster In the last post, I discussed on which occasion we prefer distributed approach such as Hadoop and Spark over the monolithic approach. I will discuss more detail about Apache Hadoop in this article. This … Read moreApache Hadoop: What is that & how to install and use it? (Part 1)

Big Data & AI landscape 2018

2nd June 20192nd June 2019 by agungw132

As described by AgileEngine (https://agileengine.com/megatrends-in-big-data/), there are four megatrends in big data in 2019, i.e.:1) From “big data” to “just data” because most organizations currently already embrace big data. 2) machine learning is the new engine after many organizations suffer creating value from big data. 3) everyone to the cloud because of the following benefits: … Read moreBig Data & AI landscape 2018

Google BigQuery, a serverless Datawarehouse-as-a-Service to batch query huge datasets (Part 1)

5th June 20192nd June 2019 by agungw132

Next: Part 2 Google Big Query (GBQ) as a serverless service from Google Serverless is one of big data solution to watch in 2018 according to Computer World UK (/compwuk). Google BigQuery (GBQ) is an example of enterprise-grade serverless service (either Function-as-a-Service, FaaS or Datawarehouse-as-a-Service) offered by Google Cloud Platform. GBQ was first launched as … Read moreGoogle BigQuery, a serverless Datawarehouse-as-a-Service to batch query huge datasets (Part 1)

Apache Zeppelin, a polyglot data science tools

2nd June 20192nd June 2019 by agungw132

Use of polyglot application for big data exploratory will be more important in the future. It allows us to run multiple interpreters in a single notebook and transfer variables among multiple kernels. The use of multiple language programming give unleashes the strong points from each interpreter.

Repository of public datasets

2nd June 20192nd June 2019 by agungw132

For anyone who is looking for datasets for his/her project.

Standardized patterns for improving the data quality of big data

2nd June 20192nd June 2019 by agungw132

Abstract: Data seldom create value by themselves. They need to be linked and combined from multiple sources, which can often come with variable data quality. The task of improving data quality is a recurring challenge. In this paper, we use a case study of a large telecom company to develop a generic process pattern model … Read moreStandardized patterns for improving the data quality of big data

Arista, a Linux-based networking devices

3rd June 20192nd June 2019 by agungw132

For years, the networking industry is dominated by Cisco and its operation system, IOS. IOS, IMHO, is not designed to be customized by the users. Configuration, monitoring, and O&M should be done manually lines by lines on Cisco IOS. Revolution has occurred in the networking operating system. Disruptive switches like Arista and bare-metal switches that … Read moreArista, a Linux-based networking devices

Container on ARM SBSc

2nd June 20192nd June 2019 by agungw132

Most single-board computers (SBCs) today are powered by ARM. Containerization on SBCs like Raspberry Pi or Orange Pi brings so much flexibility in a Smart Home project, i.e., modular, scalable, decoupled, and interconnectivity.

Saving electricity by suspending idle servers

2nd June 20192nd June 2019 by agungw132

Electricity is a (very) expensive resource in Europe. By putting the servers into sleep/suspend mode (while idle), I can save 80% of the power consumption. A very-low-energy microcomputer (e.g. raspberry pi or even the smartphone) is enough to get them to wake up when being used.

Tableau, the “de facto” distributed visualization platform for big data

3rd June 20192nd June 2019 by agungw132

Long time not checked Tableau, this application has incorporated new connectors for recent technologies, e.g., Google Big Query, Spark SQL, etc. It surely positions itself as the ‘de facto’ distributed visualization platform for big data. I was wondering when Tableau delivers its desktop version on Linux platform. Microsoft Excel Text File Microsoft Access JSON File … Read moreTableau, the “de facto” distributed visualization platform for big data

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: