数据挖掘 - 生成一个 0 和 1 的随机 numpy ndarray，其特定范围为 1 值 - 吾爱随笔录

生成一个 0 和 1 的随机 numpy ndarray，其特定范围为 1 值

数据挖掘 Python 统计数据麻木的可能性编程

2022-02-14 21:32:42

我想生成一个 0 和 1 的随机 numpy ndarray。我希望每个特定行中 1 的出现次数在 2 到 5 之间。我试过：x = np.random.randint(0,2, size=(10, 10))但我无法控制个数。我试过 np. random.choice() 但我们只能控制数字的概率。例如，1 将是数组的 0.2。然而，我希望这个概率变化并在一个特定的范围内。我还为我的 ndarray 中的每一行尝试了这段代码。 one_count = np.random.randint(2, 5))

zero_count = colnumber - one_count

my_array = [0] * zero_count + [1] * one_count

np.random.shuffle(my_array)

您能帮我找到更好的解决方案吗？

1个回答

它最终取决于您的模拟所代表的基础数据的性质。例如，如果您有某种类型的审查泊松过程，那么我将展示的内容将没有意义。

但是，一种方法是生成满足您条件的所有可能排列（这里最终有 627 个可能的排列满足 {10 选择 2} + {10 选择 3} ... + {10 选择 5}）。然后你可以从更大的选择集中随机抽样。

import itertools as it
import numpy as np
np.random.seed(10)

# Lets create the whole sets of possible permutation lists
res = []
zr = np.zeros(10)
for i in range(2,6):
    for p in it.combinations(range(10),i):
        on = zr.copy()
        on[list(p)] = 1
        res.append(on.copy())

resnp = np.stack(res,axis=0)

# Now lets sample 1000 from this list
total_perms = resnp.shape[0]
samp = np.random.choice(total_perms,1000)
res_samp = resnp[samp]

# Check to make sure this is OK
np.unique(res_samp.sum(axis=1),return_counts=True)

如果您有观察到的数据，您可以从观察到的数据中生成概率，并将其输入到的p概率参数中np.random.choice。

在这种情况下，10 选择 5 的排列比 10 选择 2 的排列要多，如果您希望这些类型的场景以相等的概率发生，您可以将10 选择 2 场景的概率总和设置为等于10选5。

其它你可能感兴趣的问题

上一篇证明变压器中多头比单头效果更好下一篇使用深度学习的产品标题中的自定义命名实体识别 (NER)