计算科学 - 比较两个大集合的算法 - 吾爱随笔录

比较两个大集合的算法

计算科学算法

2021-12-04 13:56:39

我是算法领域的新手，对所使用的分类一无所知。请原谅我。我有两组大的数字 A 和 B 其中 A = {x| 0< x< 9999999999 } B= {y | 0 < y < 9999999999 }。这些集合的基数超过一百万。哪个是计算集合差异的最佳算法 $A \setminus B$ 和 $B \setminus A$ ? 这些集合是排序的（我应该说是有序的）吗？用我的豌豆大小的大脑，我只能以线性方式将 A 的每个元素与 B 的每个元素进行比较。有人可以指出最好的方法吗？我也会感谢并感谢“先去阅读这本教科书/先选择一个好的某某教科书”这样的答案。

3个回答

Python 集是具有主要探针的开放寻址哈希表。换句话说，每个设置值都可以快速查找，因为它以这种方式（散列）插入以区分它并从其他值中找到它（而不是排序）。因此python中的操作如下：

a = set([1,2,3,4]) 
b = set([3,4,5,6]) #etc..
a&b
#gives you {3, 4}
a|b
#gives you {1,2,3,4,5,6}
a^b
#gives you {1,2,5,6}
a-b
#gives you {1, 2}

如果您使用的是 C++，标准库中对此提供了支持（std::set_difference）

http://www.cplusplus.com/reference/algorithm/set_difference/

该文档甚至包括等效的“伪代码”（实际上，只是更多的 C++），您可以使用它们将想法移植到其他语言。该算法接近合并排序的“合并”部分。请注意，std::set_difference 作用于排序范围，而不是 std::sets（这是一件好事 - 意味着排序的 std::vector 就足够了）。

无法通过小于线性搜索找到集合差异，因为出现在任一集合中的任何地方的条目都可能属于 $A\backslash B$ 或者 $B\backslash A$ （或两者都没有）。

实现此限制的算法（在排序的数据列表上）如下：

Given input sets A,B, each a strictly ascending list. 
Initialize A\B and B\A as empty lists.

Until A or B is empty, compare the heads of both lists
    If the heads are equal, remove them.
    If the head of A preceeds the head of B, remove the head from A
       and include it in A\B.
    If the head of B preceeds the head of A, remove the head from B
       and include it in B\A.

Now A or B is empty.
Transfer entries left in A to A\B or entries left in B to B\A.

A,B在 Prolog 中，假设不同数值的排序列表：

/*  set_differ(A,B,AminusB,BminusA)  */
set_differ([ ],B,[ ],B) :- !.
set_differ(A,[ ],A,[ ]) :- !.
set_differ([H|A],[H|B],AminusB,BminusA) :-
    !,
    set_differ(A,B,AminusB,BminusA).
set_differ([Ha|A],[Hb|B],AminusB,BminusA) :-
    ( Ha < Hb )
      -> ( AminusB = [Ha|A_B], set_differ(A,[Hb|B],A_B,BminusA) )
      ;  ( BminusA = [Hb|B_A], set_differ([Ha|A],B,AminusB,B_A) ) .

使用尾递归，这是内存高效的。

其它你可能感兴趣的问题

上一篇最大负特征值下一篇是否已经建立了针对斯托克斯问题的 hp-DG 方法的 inf-sup 常数 k 的统一估计？