我认为您已经列出了实现它的主要方法 - 您可以通过迭代或合并来实现它。什么是“最佳”取决于您的用例。
这是一种通过迭代数据框来实现的方法。这种方式可以让您更好地控制填充的内容,即您可以添加更多条件来填充要填充的值。我首先创建一个新的数据框,其中包含所有完整日期和A B A B列中的交替值C1:
import pandas as pd
import numpy as np
dates = pd.date_range(start="1/1/2019", end="1/10/2019")
repeated_dates = np.repeat(dates, 2)
df = pd.DataFrame(index=repeated_dates, columns=["C1", "C2"])
df["C1"] = (len(df) // 2) * ["A", "B"]
# See first 5 rows
print(df.head())
C1 C2
2019-01-01 A NaN
2019-01-01 B NaN
2019-01-02 A NaN
2019-01-02 B NaN
2019-01-03 A NaN
我们稍后将填充该C2列的值。
接下来制作一个数据框(它实际上是您的起始数据),从上面的“结果”数据框中删除几行:
df_missing = df.drop(df.index[[3, 9]])
C2_col = []
# grouping by the index (i.e. the date) gives us two rows at a time
for date, group in df.groupby(df.index):
try:
# see which values your data has for this data and extract them
day = df_missing.loc[date, ["C1", "C2"]]
C2_A, C2_B = day.C2.values
# If the date wasn't there, we can catch the error and give any values we want
except KeyError as e:
# Could now use more condition e.g. on the date or previous values, etc.
C2_A = C2_B = "was_missing"
# Keep the values in a list
C2_col.extend([C2_A, C2_B])
# Overwrite the column that was full of NaN values
df["C2"] = C2_col
我们可以在最终结果中看到所有日期以及A B A模式都存在,我们可以将我们想要的任何内容插入到那些缺少值的日期中:
print(df)
C1 C2
2019-01-01 A NaN
2019-01-01 B NaN
2019-01-02 A was_missing
2019-01-02 B was_missing
2019-01-03 A NaN
2019-01-03 B NaN
2019-01-04 A NaN
2019-01-04 B NaN
2019-01-05 A was_missing
2019-01-05 B was_missing
2019-01-06 A NaN
2019-01-06 B NaN
2019-01-07 A NaN
2019-01-07 B NaN
2019-01-08 A NaN
2019-01-08 B NaN
2019-01-09 A NaN
2019-01-09 B NaN
2019-01-10 A NaN
2019-01-10 B NaN