Flexibility vs. Speed

Data science tools such as Rapidminer, Dataiku, and KNIME offer so much flexibility and provide easy-to-understand building blocks that abstract data processing functions. It allows data analysts implementing a business case quickly. However, it comes with a price: slowing down the execution speed due to variable transfer between tasks.

Here is the trial.
Aggregating 100 million rows data on 16 cores Xeon 2.4 GHz & RAM 144 GB.
Workflow description:
1) transfer data to RAM;
2) set index;
3) aggregate;
4) sort descending;
5) sample five rows.

Workflow using Dataminer’s built-in blocks
R script ported to Dataminer

1) Rapidminer workflow: ~5 minute
2) R data.table workflow: 4 second


Leave a Reply

Your email address will not be published. Required fields are marked *