数据挖掘 - 以下代码中的错误在哪里？ - 吾爱随笔录

以下代码中的错误在哪里？

数据挖掘 Python

2022-02-28 21:12:44

我正在尝试使用从 imdb 下载的 python 分析电影数据库。在尝试生成一些图时，我遇到了让我感到困惑的错误。我正在尝试生成一个小数字矩阵，它可以向我展示任何隐藏的模式等。这是代码：

fig, axes = plt.subplots(nrows=4, ncols=6, figsize=(12, 8), 
                         tight_layout=True)

bins = np.arange(1950,2012,3)
for ax, genre in zip(axes.ravel(), movieGenre):
    ax.hist(movieDF[movieDF['%s'%genre]==1].year, bins=bins, histtype='stepfilled', normed=True, color='r', alpha=.3, ec='None')
    ax.hist(movieDF.year, bins=bins, histtype='stepfilled', ec='None', normed=True, zorder=0, color='grey')
    ax.annotate(genre, xy=(1955, 3e-2), fontsize=14)
    ax.xaxis.set_ticks(np.arange(1950, 2013, 30))
    ax.set_yticks([])
    ax.set_xlabel('Year')

第一个hist不起作用，但是当我注释掉第一个时，第二个正在起作用。这是回溯：

 KeyError                                  Traceback (most recent call last)
<ipython-input-158-c2e7c2737372> in <module>()
      4 bins = np.arange(1950,2012,3)
      5 for ax, genre in zip(axes.ravel(), movieGenre):
----> 6     ax.hist(movieDF[movieDF['%s'%genre]==1].year, bins=bins, histtype='stepfilled', normed=True, color='r', alpha=.3, ec='None')
      7     ax.hist(movieDF.year, bins=bins, histtype='stepfilled', ec='None', normed=True, zorder=0, color='grey')
      8     ax.annotate(genre, xy=(1955, 3e-2), fontsize=14)

/Users/dt/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/matplotlib/axes.pyc in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
   8247         # Massage 'x' for processing.
   8248         # NOTE: Be sure any changes here is also done below to 'weights'
-> 8249         if isinstance(x, np.ndarray) or not iterable(x[0]):
   8250             # TODO: support masked arrays;
   8251             x = np.asarray(x)

/Users/dt/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    477     def __getitem__(self, key):
    478         try:
--> 479             result = self.index.get_value(self, key)
    480 
    481             if not np.isscalar(result):

/Users/dt/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
   1169 
   1170         try:
-> 1171             return self._engine.get_value(s, k)
   1172         except KeyError as e1:
   1173             if len(self) > 0 and self.inferred_type == 'integer':



KeyError: 0

这是数据的前几列：

     imdbID     title                      rating    vote      runtime  year    genre
0   tt0111161   The Shawshank Redemption    9.3      1,439,277  142    1994 [Crime, Drama]
1   tt0468569   The Dark Knight             9.0      1,410,124  152    2008 [Action, Crime, Drama]
2   tt1375666   Inception                   8.8      1,209,159  148    2010 [Action, Mystery, Sci-Fi, Thriller]
3   tt0137523   Fight Club                  8.9      1,123,462  139    1999 [Drama]
4   tt0110912   Pulp Fiction                8.9      1,117,193  154    1994 [Crime, Drama]

movieGenre 基本上是从 'genre' 列中收集所有不同的流派，并删除了重复项：movieGenre = set(movieDF.genre.sum()). 然后，我为每种类型的 movieDF 数据框添加了一个列，这样如果特定电影属于该类型，那么该单元格True就是False. 例如，对于电影《盗梦空间》，Action列被标记True但Crime列被标记False等等。

谢谢。

1个回答

首先，由于您似乎只想了解 Python 错误，因此您的问题可能更适合 Stack Overflow。

其次，用您提供的信息回答这个问题并不有趣。这当然不是一个最小的工作示例（MWE）——我不能照原样复制你的代码，然后自己运行它。我只能猜测很多事情：

你在进口matplotlib吗？
你改名matplotlib.pyplot为plt?
您是否正在导入numpy并重命名它np？
您的回溯是否以字面意思结束，KeyError is 0或者是，KeyError: 0还是KeyError: False？你能显示完整的回溯吗？MWE 的追溯通常不长。

在有问题的行的位置参数 ( movieDF[movieDF['%s'%genre]==1].year) 中，您要求一个movieDF['%s'%genre]==1看起来应该评估为布尔值的键 ( )。这永远是0吗？因为你没有提供样本数据集，我不得不抓住稻草；我不应该自己从 IMDb 下载。

编辑：

只是为了排除故障，你能自己x = movieDF[movieDF['%s'%genre]==1].year放在一条线上并告诉我x[0]是什么吗？那会是什么类型的x？它是一个字典，该.hist()方法需要一个列表或其他数组吗？该文档听起来好像该方法不接受字典...什么类型movieDF.year？

其它你可能感兴趣的问题

上一篇随机森林，类型 - 回归，重要性计算示例下一篇扩展数值数据集的最佳方法是什么