According to the Bayesian theory, f(θ|x1,...,xn)=f(x1,...,xn|θ)∗f(θ)f(x1,...,xn) holds, that is posterior=likelihood∗priorevidence.
Notice that the maximum likelihood estimate omits the prior beliefs(or defaults it to zero-mean Gaussian and count on it as the L2 regularization or weight decay) and treats the evidence as constant(when calculating the partial derivative with respect to θ).
It tries to maximize the likelihood by adjusting θ and just treating f(θ|x1,...,xn) equal to f(x1,...,xn|θ) which we can easily get(usually the loss) and keep the likelihood as L(θ|x). The true probability f(x1,...,xn|θ)∗f(θ)f(x1,...,xn) can hardly be worked out because the evidence(the denominator), ∫θf(x1,...,xn,θ)dθ, is intractable.
Hope this helps.