![](https://i2.wp.com/1xltkxylmzx3z8gd647akcdvov-wpengine.netdna-ssl.com/wp-content/uploads/2016/06/rapidminer-logo-retina.png?w=640&ssl=1)
Data science tools such as Rapidminer, Dataiku, and KNIME offer so much flexibility and provide easy-to-understand building blocks that abstract data processing functions. It allows data analysts implementing a business case quickly. However, it comes with a price: slowing down the execution speed due to variable transfer between tasks.
Here is the trial.
Aggregating 100 million rows data on 16 cores Xeon 2.4 GHz & RAM 144 GB.
Workflow description:
1) transfer data to RAM;
2) set index;
3) aggregate;
4) sort descending;
5) sample five rows.
![](https://dl.dropbox.com/s/bklr6pfd1qizfeg/rapidminer-blocks.png)
![](https://dl.dropbox.com/s/85z409lmd3uypem/r-rscript.png)
–Result–
1) Rapidminer workflow: ~5 minute
2) R data.table workflow: 4 second
![](https://dl.dropbox.com/s/ch7rpkbofztmzcm/rapidminer-vs-datatable.png)
Comments