InfluxDB compression

I’m always amazed at how people improve data storing technique, e.g., Influxdb, a time-series NoSQL database, that not only responds to a query very fast (even aggregation in a long time range) but also draws a substantially less amount of storage space. My smart home project, which collects almost 200 measurement points every 10 second, … Read moreInfluxDB compression

Mailbox checker

As an online shopper, I need to regularly check my mailbox, 5 floors separated from home. It is really annoying to check whether a new delivery arrives. I came out with this mailbox checker solution that should draw low power (less frequent battery replacement, >6 months) and small (portable). To detect mailbox door activity, I … Read moreMailbox checker

Syncsort DMX-h & IBM SPSS Modeler

Two other popular data processing platform in the IT world are explored, i.e., DMX-h and SPSS Modeler. 1) DMX-h I was an extensive user of this beast software in 2009-2012. It is an amazing ETL platform, I used to process terabytes of chunked files which was completed in a short time (compared to a relational … Read moreSyncsort DMX-h & IBM SPSS Modeler

My first experiment with Zigbee

Instead of buying an expensive and proprietary Zigbee gateway, I bought a Zigbee sniffer that could be “changed” to be a universal gateway (https://lnkd.in/dj9HwBF). The cheapest Zigbee device I could play with is the Ikea Tradfri lamp (~7 euro). But it’s more worth than buying a full set of Ikea smart lighting (cost > 25 … Read moreMy first experiment with Zigbee

Zwave+ vs. WiFi-based IoT devices

There are at least 4 competing IoT connectivity technologies that have already numerous rolled-out products in the market, i.e., RF-433 MHz, WiFi (ESP8266-based devices), Zigbee, and Zwave+. The first two have abundant cheap products, the other two have limited vendor-independent commercial products. I saw a lot of Zigbee & Zwave vendor-locked devices at IoT Expo … Read moreZwave+ vs. WiFi-based IoT devices

Why we need a big data platform such as Hadoop & Spark?

On the last post, I mentioned that aggregating & sorting 100 million rows dataset (~ 2.4 GB) using monolithic approach takes 4 seconds to 5 minutes (R data.table, ptyhon pandas, awk, perl) to complete. Spark, a distributed platform that could be horizontally paralleled, takes almost 2 minutes. I extend the trial using Spark atop YARN … Read moreWhy we need a big data platform such as Hadoop & Spark?

Be cautious to include legacy resources as part of the big data system

Very often, many organizations insist to involve legacy resources (e.g., applications, data storage) into the big data system. On one hand, it could accelerate and ease the implementation of a big data use case, but it also creates a bottleneck in the workflow that would be problematic in the long term. If the monolithic applications … Read moreBe cautious to include legacy resources as part of the big data system