Battle of Time-series Database (TSDB)

The rise of time-series database is accelerated by massive IoTs. It can provide users low latency queries & distributed storage. Competition of TSDB vendors is dynamic and progressive on which newcomers introduce better & better performance and functionalities. The full comparison is as follows. https://docs.google.com/spreadsheets/d/1sMQe9oOKhMhIVw9WmuCEWdPtAoccJ4a-IuZv4fXDHxM/edit#gid=0

Integrating RF smoke sensors into Smart Home

Reflecting from the fire in the Office of Ministry of Transportation This is an industrial-grade smoke detector that sounds very noisy and transmits an “RF-433 MHz alarm” when exposing heavy smoke (e.g., in case of fire). I converted it to a network device using an MQTT-RF bridge, so:1) it can be monitored anywhere, anytime, and … Read moreIntegrating RF smoke sensors into Smart Home

Big data networking disruptive technology

Investigating big data technology in my job also means observing the revolution in networking technology as the mean for transporting the data. For years, the networking spaces were dominated by just a number of players, i.e. Cisco, Juniper, etc. I was so surprised looking at disruptive innovators such as Cumulus Linux which offers independent network … Read moreBig data networking disruptive technology

Big Data Expo NL

I was so lucky visiting the annual Big Data Expo @ Utrecht, the Netherlands on 19-20 September 2018 (https://www.bigdata-expo.nl/en) Here are some presentations I could capture:1. HOW TABLEAU ENABLED ABN AMRO TO VISUALIZE BIG DATA EFFICIENTLY (https://www.dropbox.com/sh/piazgsw9esw8dsl/AAAjjdjNNQFO0AbDRGyVns6ya?dl=0)2. HOW TO LET AN ELEPHANT DANCE; IMPLEMENTATION OF A DATA LAKE IN NEAR REAL-TIME AT A DUTCH INSURER … Read moreBig Data Expo NL

Tensorframes: Tensorflow + Spark

Combining data-intensive best solution (apache spark) and compute-intensive best approach (Tensorflow with GPU) results in Tensorframes. The speedup is remarkable. Hopefully, I could get a multi-GPU cluster to play with. Spark Summit EU talk by Tim Hunter from Spark Summit

InfluxDB compression

I’m always amazed at how people improve data storing technique, e.g., Influxdb, a time-series NoSQL database, that not only responds to a query very fast (even aggregation in a long time range) but also draws a substantially less amount of storage space. My smart home project, which collects almost 200 measurement points every 10 second, … Read moreInfluxDB compression

Combining neural network and GPU in Google Cloud Platform

Imagine we want to recognize/identify an object in the images streamed from camera feeds (such as to recognize thief/suspect at the immigration checkpoint, airports, stations, etc.). To do that, the convolutional neural network (CNN) is currently the most used method. Such popular CNN architectures such as LeNet, AlexNet, VGG, GoogLeNet, ResNet, YOLO, etc. could be … Read moreCombining neural network and GPU in Google Cloud Platform

Polyglot data science applications

There is no such a “Swiss Army knife” tool; every tool has its advantages in a certain circumstance, e.g., we know R has the most comprehensive statistical packages, but it also lacks scalability support. Python, another language, has tons of crowded discussion so that looking for a solution from the community is trivial. What if … Read morePolyglot data science applications

Why we need a big data platform such as Hadoop & Spark?

On the last post, I mentioned that aggregating & sorting 100 million rows dataset (~ 2.4 GB) using monolithic approach takes 4 seconds to 5 minutes (R data.table, ptyhon pandas, awk, perl) to complete. Spark, a distributed platform that could be horizontally paralleled, takes almost 2 minutes. I extend the trial using Spark atop YARN … Read moreWhy we need a big data platform such as Hadoop & Spark?

Be cautious to include legacy resources as part of the big data system

Very often, many organizations insist to involve legacy resources (e.g., applications, data storage) into the big data system. On one hand, it could accelerate and ease the implementation of a big data use case, but it also creates a bottleneck in the workflow that would be problematic in the long term. If the monolithic applications … Read moreBe cautious to include legacy resources as part of the big data system