对于这样的事情,您可以采用更简单的方法。一个想法是使用每个城市被访问的次数作为概率,在给定乘客访问过的城市中随机抽样。
这是您可以这样做的一种方法。我在数据框中添加了更多示例,以便更清楚地看到应用程序。假设你有:
Passenger Trip
0 John London
1 Jack Girona
2 Jack Paris
3 Joe Sydney
4 Joe Amsterdam
5 Joe Barcelona
6 Joe Barcelona
7 John London
8 John Paris
9 Jill Newyork
10 Jim Sydney
11 Jack Paris
12 James Sydney
您可以定义如下函数,以便从数据框中的现有数据中随机抽样:
def random_sample(df, name):
import numpy as np
# group the dataframe by Passenger and count
# the different trips
g = df.groupby('Passenger').Trip.value_counts()
# Make the probabilities add up to 1
freq = g[name] / g[name].sum()
# random destination based on
# its probabilities
random_name = np.random.choice(a=freq.index, size=1,
p = freq.values)[0]
# return likelyhood of next randomly chosen
# destination and destination
return freq[random_name], random_name
用法
假设我们要选择一个随机样本的目的地,Joe并且还要知道哪个是可能性。考虑到去过的目的地Joe是:
Trip
Barcelona 2
Amsterdam 1
Sydney 1
例如,我们可以得到:
for _ in range(5):
freq, dest = random_sample(df, 'Joe')
print('Chosen destination {} with a probability of {}'.format(dest, freq))
Chosen destination Sydney with a probability of 0.25
Chosen destination Barcelona with a probability of 0.5
Chosen destination Barcelona with a probability of 0.5
Chosen destination Barcelona with a probability of 0.5
Chosen destination Sydney with a probability of 0.25