Battle of ML/DL framework on stand-alone vs. distributed platform

Will steep improvement of algorithm + decrease on hardware cost (CPU, memory, disk) drag the distributed approach irrelevant? IMHO, at this time, the winner is h2oai which gives an impressive performance in stand-alone mode and supports distributed platform (i.e., atop Spark using h2o sparkling water). I was so surprised that Standford’s statistics maestro, Tibshirani & … Read moreBattle of ML/DL framework on stand-alone vs. distributed platform

Dataiku: flexible data science tools

In the previous post, the flexibility given by data science tools greatly reduces the performance, i.e., the execution speed. Fortunately, Dataiku, a data science tool, provides multiple ways to aggregate big data: 1) using the built-in building blocks; 2) using a custom R script with the built-in I/O blocks; or3) using an independent custom R … Read moreDataiku: flexible data science tools

Flexibility vs. Speed

Data science tools such as Rapidminer, Dataiku, and KNIME offer so much flexibility and provide easy-to-understand building blocks that abstract data processing functions. It allows data analysts implementing a business case quickly. However, it comes with a price: slowing down the execution speed due to variable transfer between tasks. Here is the trial. Aggregating 100 … Read moreFlexibility vs. Speed

GPU Database

I think one of the promising technology in the next couple of years is the use of GPU for accelerating any kinds of job. One of the company follows the direction is OmniSci (formerly MapD). They have a live demo showing how fast GPU processes almost 400 million tweets and visualizes them geographically in less … Read moreGPU Database

Google Cloud Vision API to detect vehicle plate number

This is why every organization engages artificial intelligence & machine learning. Once they have an “extensively trained” model that has a very good performance, they start selling it. Example: I tried Google Vision API to detect vehicle plates. The accuracy is amazing! Put many CCTVs on the roads, feed the image streams, predict the plate … Read moreGoogle Cloud Vision API to detect vehicle plate number

Syncsort DMX-h & IBM SPSS Modeler

Two other popular data processing platform in the IT world are explored, i.e., DMX-h and SPSS Modeler. 1) DMX-h I was an extensive user of this beast software in 2009-2012. It is an amazing ETL platform, I used to process terabytes of chunked files which was completed in a short time (compared to a relational … Read moreSyncsort DMX-h & IBM SPSS Modeler

Data science & ML (commercial) tools: their competitive landscape

In the last post, I mentioned the Gartner’s magic quadrant as well as the competitive landscape of BI products. KDnuggets covers the data science & ML products in their article (https://bit.ly/2Pococi). Some interesting observations: 1) KNIME & Mathworks increases their completeness of vision in the last 3 years. KDNuggets quotes KNIME “With a wealth of … Read moreData science & ML (commercial) tools: their competitive landscape

Polyglot data science applications

There is no such a “Swiss Army knife” tool; every tool has its advantages in a certain circumstance, e.g., we know R has the most comprehensive statistical packages, but it also lacks scalability support. Python, another language, has tons of crowded discussion so that looking for a solution from the community is trivial. What if … Read morePolyglot data science applications