此响应基于您的域。例如销售,一个产品可能不是每天都售出,因此不会有记录。
根据您希望每天在数据集中看到产品出现的数据集,您可以考虑以下方法。
数学上:
A = the set of all possibilities (i.e. product occurrence for each date - you could generate this and use )
B = sample data set provided would be provided
C = A - B
= days that a product was missing.
DPART1 = You could then continue by retrieving the min(date) and max(date) for each product in the dataset to represent the introduction of the new product and possible cease of an existing product.
D= You could then filter the dataset (C) to remove dates less than the min(date) and greater than the maxdate).
In terms of sql:
A - Cartesion/Cross product of all your products and dates
B = Your current sample data set
C = SELECT * FROM A MINUS SELECT * FROM B
DPART1 = SELECT PRODUCT, MIN(date) as INTRODUCED_DATE, MAX(date) as CEASED_DATE from YourSampleDataSET GROUP BY PRODUCT
D = SELECT C.PRODUCT, C.Date FROM C LEFT JOIN DPART1 ON C.PRODUCT = DPART1.PRODUCT
WHERE DPART1.PRODUCT IS NULL OR (
C.Date BETWEEN DPART1.INTRODUCED_DATE AND DPART1.CEASED_DATE
)
NB。DPART1.PRODUCT IS NULL确保您不会过滤可能不在样本子集中的产品。
操作方面的考虑,我会推荐您的数据库中的索引来协助查询,并在可能的情况下进行分区。