Flexibility vs. Speed

Data science tools such as Rapidminer, Dataiku, and KNIME offer so much flexibility and provide easy-to-understand building blocks that abstract data processing functions. It allows data analysts implementing a business case quickly. However, it comes with a price: slowing down the execution speed due to variable transfer between tasks. Here is the trial. Aggregating 100 … Read moreFlexibility vs. Speed

CPU vs. GPU

Inspired by the benchmark from Matt Dowle (https://h2oai.github.io/db-benchmark/), I compared his benchmark with GPU (Detail: https://lnkd.in/e7iHg7N). For processing big data, GPU K20 2 GB is slightly better than 20 cores CPU Xeon 2.6 GHz 125.8 GB RAM, even much better in some tests 🙂 Of course, the performance comes with a price. Thanks to Omnisci … Read moreCPU vs. GPU

Google Colab vs. Microsoft Azure notebook

Although I knew this service for a while, I just recently put attention on 2 “serverless” notebook services on the cloud: Google Colab and Microsoft Azure Notebooks. Here are my short reviews. <<Google colab>> 1. only support python (currently 3.6.7 and 2.7.15). you can build the packages through pip directly from the notebook. no way … Read moreGoogle Colab vs. Microsoft Azure notebook

Google Cloud Vision API to detect vehicle plate number

This is why every organization engages artificial intelligence & machine learning. Once they have an “extensively trained” model that has a very good performance, they start selling it. Example: I tried Google Vision API to detect vehicle plates. The accuracy is amazing! Put many CCTVs on the roads, feed the image streams, predict the plate … Read moreGoogle Cloud Vision API to detect vehicle plate number

Combining neural network and GPU in Google Cloud Platform

Imagine we want to recognize/identify an object in the images streamed from camera feeds (such as to recognize thief/suspect at the immigration checkpoint, airports, stations, etc.). To do that, the convolutional neural network (CNN) is currently the most used method. Such popular CNN architectures such as LeNet, AlexNet, VGG, GoogLeNet, ResNet, YOLO, etc. could be … Read moreCombining neural network and GPU in Google Cloud Platform

Syncsort DMX-h & IBM SPSS Modeler

Two other popular data processing platform in the IT world are explored, i.e., DMX-h and SPSS Modeler. 1) DMX-h I was an extensive user of this beast software in 2009-2012. It is an amazing ETL platform, I used to process terabytes of chunked files which was completed in a short time (compared to a relational … Read moreSyncsort DMX-h & IBM SPSS Modeler

Data science & ML (commercial) tools: their competitive landscape

In the last post, I mentioned the Gartner’s magic quadrant as well as the competitive landscape of BI products. KDnuggets covers the data science & ML products in their article (https://bit.ly/2Pococi). Some interesting observations: 1) KNIME & Mathworks increases their completeness of vision in the last 3 years. KDNuggets quotes KNIME “With a wealth of … Read moreData science & ML (commercial) tools: their competitive landscape

Precision agriculture for a rural area with limited connectivity

Among many hyped agriculture technologies under industry 4.0 umbrella I’ve ever seen, this Microsoft solution, namely FarmBeats, appropriately addresses Indonesia’s geographical challenges. As a solution architect, what I like the most from the presentation is that how they synthesized the problem and designed a solution for it.Here are the summaries of the challenges:1) limited connectivity … Read morePrecision agriculture for a rural area with limited connectivity

Polyglot data science applications

There is no such a “Swiss Army knife” tool; every tool has its advantages in a certain circumstance, e.g., we know R has the most comprehensive statistical packages, but it also lacks scalability support. Python, another language, has tons of crowded discussion so that looking for a solution from the community is trivial. What if … Read morePolyglot data science applications