我喜欢先简单地做事,然后在需要时添加更多复杂性。我会先简单地删除我们不关心的电视功能,然后假设品牌总是先于模型返回剩下的东西。以 Python 为例:
def get_brand_model(input):
"""
Returns the brand and model number from a TV description
>>> get_brand_model('50" Sony KDL 50W756CSAEP Smart LED Full HD')
('SONY', 'KDL 50W756CSAEP')
>>> get_brand_model('55" Samsung UE55JU6400 Smart LED HD')
('SAMSUNG', 'UE55JU6400')
>>> get_brand_model('LG 55LF652V 55" SMART 3D FULL HD')
('LG', '55LF652V')
>>> get_brand_model("HITACHI 55HGW69 55'' LED ULTRA SMART WIFI")
('HITACHI', '55HGW69')
>>> get_brand_model('TV 65" SAMSUNG UE65KS7500 4K LED Smart')
('SAMSUNG', 'UE65KS7500')
"""
def filter(word):
# Basic filter to remove TV features from the input string
skip_words = ['3d', '720p', '1080p', 'hd', '4k', 'smart', 'wifi',
'led', 'full', 'tv', 'ultra', 'inch']
is_measurement = '"' in word or "'" in word
return not word.lower() in skip_words and not is_measurement
words = [w.upper() for w in input.split(' ') if filter(w)]
# Return a tuple of (brand, model number)
return (words[0], ' '.join(words[1:]))
这可能需要一些调整,但问题中的 5 个示例在运行包含的 doctest 时都通过了。