数据挖掘 - 异步算法使用什么框架？ - 吾爱随笔录

我有一个非常大的数据集（谁没有？）的问题，它存储在块中，因此块之间的差异很小（即，块具有代表性）。我想玩弄算法以异步方式进行一些分类，但我想自己编写代码。

示例代码看起来像

start a master
distribute 10 chunks on 10 slaves
while some criterion is not met 
 for each s in slave:
  classify the data inexactly using some kind of iterative algorithm and return to master
 master waits for any 2 slaves to report the classifier, averages the classifier and sends it back for the slaves to continue

我使用什么框架？Hadoop，Spark，其他？

如果我在纯 C 中执行此操作，我将使用 pthread 并对线程、锁和互斥锁进行非常精细的控制。在这种分布式数据科学环境中是否有任何类似的框架？