在变分自动编码器中,潜在分布和潜在向量的均值和方差的维数应该是多少?

人工智能 变分自动编码器
2021-10-31 00:59:19

我很难理解 VAE 所需的尺寸,尤其是对于 mu、logvar 和 z 层。

假设我有一个 512x512 的输入,1 个颜色通道(CT 图像),批量大小 32。然后我的编码器/解码器如下所示:

self.encoder = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1),  # 32x512x512
            nn.ReLU(True),
            nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1),  # 32x256x256
            nn.ReLU(True),
            nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1),  # 32x128x128
            nn.ReLU(True),
            nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1),  # 32x64x64
            nn.ReLU(True),
            nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1),  # 32x32x32
            nn.ReLU(True))

self.decoder = nn.Sequential(
            nn.ConvTranspose2d(32, 32, kernel_size=4, stride=2, padding=1),
            nn.ReLU(True),
            nn.ConvTranspose2d(32, 32, kernel_size=4, stride=2, padding=1),
            nn.ReLU(True),
            nn.ConvTranspose2d(32, 32, kernel_size=4, stride=2, padding=1),
            nn.ReLU(True),
            nn.ConvTranspose2d(32, 32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(True),
            nn.ConvTranspose2d(32, 1, kernel_size=4, stride=2, padding=1),
            nn.Sigmoid())

mu/logvar 和 z 的正确尺寸是多少?潜暗 = 1000,过滤深度 = 32

我不确定线性层 mu/logvar 的输入是否正确?

mu = nn.Linear(self.filter_depth * 32 * 32, self.latent_dim)
logvar = nn.Linear(self.filter_depth * 32 * 32, self.latent_dim)
z = nn.Linear(self.latent_dim, self.filter_depth * 32 * 32)
0个回答
没有发现任何回复~