我很难理解 VAE 所需的尺寸,尤其是对于 mu、logvar 和 z 层。
假设我有一个 512x512 的输入,1 个颜色通道(CT 图像),批量大小 32。然后我的编码器/解码器如下所示:
self.encoder = nn.Sequential(
nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1), # 32x512x512
nn.ReLU(True),
nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1), # 32x256x256
nn.ReLU(True),
nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1), # 32x128x128
nn.ReLU(True),
nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1), # 32x64x64
nn.ReLU(True),
nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1), # 32x32x32
nn.ReLU(True))
self.decoder = nn.Sequential(
nn.ConvTranspose2d(32, 32, kernel_size=4, stride=2, padding=1),
nn.ReLU(True),
nn.ConvTranspose2d(32, 32, kernel_size=4, stride=2, padding=1),
nn.ReLU(True),
nn.ConvTranspose2d(32, 32, kernel_size=4, stride=2, padding=1),
nn.ReLU(True),
nn.ConvTranspose2d(32, 32, kernel_size=3, stride=1, padding=1),
nn.ReLU(True),
nn.ConvTranspose2d(32, 1, kernel_size=4, stride=2, padding=1),
nn.Sigmoid())
mu/logvar 和 z 的正确尺寸是多少?潜暗 = 1000,过滤深度 = 32
我不确定线性层 mu/logvar 的输入是否正确?
mu = nn.Linear(self.filter_depth * 32 * 32, self.latent_dim)
logvar = nn.Linear(self.filter_depth * 32 * 32, self.latent_dim)
z = nn.Linear(self.latent_dim, self.filter_depth * 32 * 32)