我正在尝试使用从 imdb 下载的 python 分析电影数据库。在尝试生成一些图时,我遇到了让我感到困惑的错误。我正在尝试生成一个小数字矩阵,它可以向我展示任何隐藏的模式等。这是代码:
fig, axes = plt.subplots(nrows=4, ncols=6, figsize=(12, 8),
tight_layout=True)
bins = np.arange(1950,2012,3)
for ax, genre in zip(axes.ravel(), movieGenre):
ax.hist(movieDF[movieDF['%s'%genre]==1].year, bins=bins, histtype='stepfilled', normed=True, color='r', alpha=.3, ec='None')
ax.hist(movieDF.year, bins=bins, histtype='stepfilled', ec='None', normed=True, zorder=0, color='grey')
ax.annotate(genre, xy=(1955, 3e-2), fontsize=14)
ax.xaxis.set_ticks(np.arange(1950, 2013, 30))
ax.set_yticks([])
ax.set_xlabel('Year')
第一个hist不起作用,但是当我注释掉第一个时,第二个正在起作用。这是回溯:
KeyError Traceback (most recent call last)
<ipython-input-158-c2e7c2737372> in <module>()
4 bins = np.arange(1950,2012,3)
5 for ax, genre in zip(axes.ravel(), movieGenre):
----> 6 ax.hist(movieDF[movieDF['%s'%genre]==1].year, bins=bins, histtype='stepfilled', normed=True, color='r', alpha=.3, ec='None')
7 ax.hist(movieDF.year, bins=bins, histtype='stepfilled', ec='None', normed=True, zorder=0, color='grey')
8 ax.annotate(genre, xy=(1955, 3e-2), fontsize=14)
/Users/dt/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/matplotlib/axes.pyc in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
8247 # Massage 'x' for processing.
8248 # NOTE: Be sure any changes here is also done below to 'weights'
-> 8249 if isinstance(x, np.ndarray) or not iterable(x[0]):
8250 # TODO: support masked arrays;
8251 x = np.asarray(x)
/Users/dt/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
477 def __getitem__(self, key):
478 try:
--> 479 result = self.index.get_value(self, key)
480
481 if not np.isscalar(result):
/Users/dt/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
1169
1170 try:
-> 1171 return self._engine.get_value(s, k)
1172 except KeyError as e1:
1173 if len(self) > 0 and self.inferred_type == 'integer':
KeyError: 0
这是数据的前几列:
imdbID title rating vote runtime year genre
0 tt0111161 The Shawshank Redemption 9.3 1,439,277 142 1994 [Crime, Drama]
1 tt0468569 The Dark Knight 9.0 1,410,124 152 2008 [Action, Crime, Drama]
2 tt1375666 Inception 8.8 1,209,159 148 2010 [Action, Mystery, Sci-Fi, Thriller]
3 tt0137523 Fight Club 8.9 1,123,462 139 1999 [Drama]
4 tt0110912 Pulp Fiction 8.9 1,117,193 154 1994 [Crime, Drama]
movieGenre 基本上是从 'genre' 列中收集所有不同的流派,并删除了重复项:movieGenre = set(movieDF.genre.sum()). 然后,我为每种类型的 movieDF 数据框添加了一个列,这样如果特定电影属于该类型,那么该单元格True就是False. 例如,对于电影《盗梦空间》,Action列被标记True但Crime列被标记False等等。
谢谢。