如果我理解正确的话,那么ImageNet的标签是基于 WordNet 的:
ImageNet 是根据 WordNet 层次结构组织的图像数据集。WordNet 中每个有意义的概念,可能由多个单词或词组描述,称为“同义词集”或“同义词集”。WordNet 中有超过 100,000 个同义词集,其中大部分是名词(80,000+)。在 ImageNet 中,我们的目标是平均提供 1000 张图像来说明每个同义词集。每个概念的图像都经过质量控制和人工注释。在完成后,我们希望 ImageNet 能够为 WordNet 层次结构中的大多数概念提供数千万张干净排序的图像。
来源:http: //image-net.org/about-overview
上面的文字已经说明 WordNet 定义了一个层次结构。您可以通过使用 Pythonnltk
包看到这一点:
from nltk.corpus import wordnet
nltk.download()
def get_hyponyms(synset):
hyponyms = set()
for hyponym in synset.hyponyms():
hyponyms |= set(get_hyponyms(hyponym))
return hyponyms | set(synset.hyponyms())
dog = wordnet.synset('dog.n.01')
print(get_hyponyms(dog))
harrier = wordnet.synset('harrier.n.02')
print(get_hyponyms(harrier))
(感谢Stephan Ds 的回答)
给
集合([Synset('harrier.n.02'),Synset('water_spaniel.n.01'),Synset('standard_poodle.n.01'),Synset('dandie_dinmont.n.01'),Synset(' courser.n.03')、Synset('wirehair.n.01')、Synset('toy_manchester.n.01')、Synset('puppy.n.01')、Synset('briard.n.01' ), Synset('beagle.n.01'), Synset('siberian_husky.n.01'), Synset('manchester_terrier.n.01'), Synset('bloodhound.n.01'), Synset('gordon_setter .n.01'), Synset('leonberg.n.01'), Synset('king_charles_spaniel.n.01'), Synset('yorkshire_terrier.n.01'), Synset('sealyham_terrier.n.01') , Synset('american_water_spaniel.n.01'), Synset('skye_terrier.n.01'), Synset('cluumber.n.01'), Synset('pembroke.n.01'), Synset('wire- haired_fox_terrier.n.01'), Synset('shih-tzu.n.01'), Synset('newfoundland.n.01'), Synset('retriever.n.01'), Synset('cocker_spaniel.n.01'), Synset( 'springer_spaniel.n.01'), Synset('american_foxhound.n.01'), Synset('large_poodle.n.01'), Synset('lapdog.n.01'), Synset('bull_mastiff.n.01 ')、Synset('affenpinscher.n.01')、Synset('irish_water_spaniel.n.01')、Synset('terrier.n.01')、Synset('keeshond.n.01')、Synset(' vizsla.n.01'), Synset('dalmatian.n.02'), Synset('bird_dog.n.01'), Synset('irish_terrier.n.01'), Synset('miniature_poodle.n.01' ), Synset('dachshund.n.01'), Synset('australian_terrier.n.01'), Synset('blenheim_spaniel.n.01'), Synset('weimaraner.n.01'), Synset('soft -涂层小麦梗.n.01'), Synset('doberman.n.01'), Synset('kelpie.n.02'), Synset('water_dog.n.02'), Synset('feist.n.01'), Synset('attack_dog .n.01'), Synset('french_bulldog.n.01'), Synset('papillon.n.01'), Synset('bedlington_terrier.n.01'), Synset('foxhound.n.01') , Synset('labrador_retriever.n.01'), Synset('great_dane.n.01'), Synset('kerry_blue_terrier.n.01'), Synset('miniature_schnauzer.n.01'), Synset('pariah_dog. n.01'), Synset('border_terrier.n.01'), Synset('staghound.n.01'), Synset('norwegian_elkhound.n.01'), Synset('redbone.n.01'), Synset('pooch.n.01'), Synset('old_english_sheepdog.n.01'), Synset('police_dog.n.01'), Synset('welsh_terrier.n.01'), Synset('spitz.n .01'), Synset('boxer.n.04'), Synset('tibetan_terrier.n.01'), Synset('shetland_sheepdog.n.01'), Synset('boarhound.n.01'), Synset('border_collie.n.01'), Synset( 'wolfhound.n.01'), Synset('lhasa.n.02'), Synset('scotch_terrier.n.01'), Synset('coondog.n.01'), Synset('giant_schnauzer.n.01') '), Synset('japanese_spaniel.n.01'), Synset('german_short-haired_pointer.n.01'), Synset('entlebucher.n.01'), Synset('griffon.n.03'), Synset ('griffon.n.02'), Synset('welsh_springer_spaniel.n.01'), Synset('clydesdale_terrier.n.01'), Synset('hound.n.01'), Synset('brittany_spaniel.n. 01'), Synset('corgi.n.01'), Synset('pekinese.n.01'), Synset('mastiff.n.01'), Synset('flat-coated_retriever.n.01'), Synset('sennenhunde.n.01'), Synset('schipperke.n.01'), Synset('english_toy_spaniel.n.01'), Synset('ibizan_hound.n.01'), Synset('airedale.n.01'), Synset( 'cardigan.n.02'), Synset('miniature_pinscher.n.01'), Synset('bluetick.n.01'), Synset('west_highland_white_terrier.n.01'), Synset('seizure-alert_dog.n .01'), Synset('pomeranian.n.01'), Synset('english_foxhound.n.01'), Synset('bernese_mountain_dog.n.01'), Synset('norfolk_terrier.n.01'), Synset ('greater_swiss_mountain_dog.n.01'), Synset('collie.n.01'), Synset('chow.n.03'), Synset('pug.n.01'), Synset('scottish_deerhound.n. 01'), Synset('groenendael.n.01'), Synset('golden_retriever.n.01'), Synset('schnauzer.n.01'), Synset('irish_setter.n.01'),Synset('german_shepherd.n.01'), Synset('walker_hound.n.01'), Synset('english_setter.n.01'), Synset('english_springer.n.01'), Synset('sporting_dog.n .01'), Synset('afghan_hound.n.01'), Synset('caairn.n.02'), Synset('rhodesian_ridgeback.n.01'), Synset('chesapeake_bay_retriever.n.01'), Synset ('irish_wolfhound.n.01'), Synset('fox_terrier.n.01'), Synset('sled_dog.n.01'), Synset('toy_dog.n.01'), Synset('staffordshire_bullterrier.n. 01'), Synset('bullterrier.n.01'), Synset('seeing_eye_dog.n.01'), Synset('samoyed.n.03'), Synset('bouvier_des_flandres.n.01'), Synset( 'otterhound.n.01'), Synset('kuvasz.n.01'), Synset('cur.n.01'), Synset('guide_dog.n.01'), Synset('malinois.n.01' '),Synset('malamute.n.01'), Synset('poodle.n.01'), Synset('curly-coated_retriever.n.01'), Synset('toy_spaniel.n.01'), Synset('basset .n.01'), Synset('toy_terrier.n.01'), Synset('tibetan_mastiff.n.01'), Synset('basenji.n.01'), Synset('field_spaniel.n.01') , Synset('mexican_hairless.n.01'), Synset('setter.n.02'), Synset('great_pyrenees.n.01'), Synset('american_staffordshire_terrier.n.01'), Synset('rottweiler. n.01'), Synset('standard_schnauzer.n.01'), Synset('black-and-tan_coonhound.n.01'), Synset('borzoi.n.01'), Synset('bulldog.n. 01'), Synset('sausage_dog.n.01'), Synset('sussex_spaniel.n.01'), Synset('spaniel.n.01'), Synset('working_dog.n.01'), Synset( 'belgian_sheepdog.n.01')、Synset('watchdog.n.02')、Synset('silky_terrier.n.01')、Synset('eskimo_dog.n.01')、Synset('brabancon_griffon.n.01')、Synset( 'whippet.n.01'), Synset('plott_hound.n.01'), Synset('liver-spotted_dalmatian.n.01'), Synset('coonhound.n.01'), Synset('saluki.n .01'), Synset('toy_poodle.n.01'), Synset('hearing_dog.n.01'), Synset('chihuahua.n.03'), Synset('lakeland_terrier.n.01'), Synset ('shepherd_dog.n.01'), Synset('rat_terrier.n.01'), Synset('italian_greyhound.n.01'), Synset('komondor.n.01'), Synset('saint_bernard.n. 01')、Synset('norwich_terrier.n.01')、Synset('pointer.n.04')、Synset('appenzeller.n.01')、Synset('maltese_dog.n.01')、Synset( 'hunting_dog.n.01'),Synset('smooth-haired_fox_terrier.n.01'), Synset('housedog.n.01'), Synset('boston_bull.n.01'), Synset('pinscher.n.01'), Synset('greyhound .n.01')])
放([])
现在我的问题:
ImageNet 的所有图像是否都在叶子中,例如,'harrier.n.02'
还是某些图像仅标记为'dog.n.01'
?
(子问题:如果我没记错的话,ImageNet 是由普通人(不是该主题的专家,例如狗不是专门研究狗或类似事物的生物学家)通过 Amazon Mechanical Turk 标记的。那里的人是如何知道所有这些的不同种类的狗?我不知道猎狼犬是什么……那么他们如何检查标签是否正确且足够具体?)
21841 同义词集(源)是叶子还是内部节点?
ImageNet 中的所有图像都在叶子中吗?有几片叶子?