There's no guarantee that having smaller weights is actually better. Lasso and ridge regression work by imposing prior knowledge/assumptions/constraints on the solution. This approach will work well if the prior/assumptions/constraints are well suited to the actual distribution that generated the data, and may not work well otherwise. Regarding simplicity/complexity, it's not the individual models that are simpler or more complex. Rather, it's the family of models under consideration.
From a geometric perspective, lasso and ridge regression impose constraints on the weights. For example, the common penalty/Lagrangian form of ridge regression:
minβ∥y−Xβ∥22+λ∥β∥22
can be re-written in the equivalent constraint form:
minβ∥y−Xβ∥22s.t. ∥β∥22≤c
This makes it clear that ridge regression constrains the weights to lie within a hypersphere whose radius is governed by the regularization parameter. Similarly, lasso constrains the weights to lie within a polytope whose size is governed by the regularization parameter. These constraints mean that most of the original parameter space is off-limits, and we search for the optimal weights within a much smaller subspace. This smaller subspace can be considered less 'complex' than the full space.
From a Bayesian perspective, one can think about the posterior distribution over all possible choices of weights. Both lasso and ridge regression are equivalent to MAP estimation after placing a prior on the weights (lasso uses a Laplacian prior and ridge regression uses a Gaussian prior). A narrower posterior corresponds to greater restriction and less complexity, because high posterior density is given to a smaller set of parameters. For example, multiplying the likelihood function by a narrow Gaussian prior (which corresponds to a large ridge penalty) produces a narrower posterior.
One of the primary reasons to impose constraints/priors is that choosing the optimal model from a more restricted family is less likely to overfit than choosing it from a less restricted family. This is because the less restricted family affords 'more' ways to fit the data, and it's increasingly likely that one of them will be able to fit random fluctuations in the training set. For a more formal treatment, see the bias-variance tradeoff. This doesn't necessarily mean that choosing a model from a more restricted family will work well. Getting good performance requires that the restricted family actually contains good models. This means we have to choose a prior/constraint that's well-matched to the specific problem at hand.