What is the meaning of lplp norm in this model for sparse channel estimation?

信息处理 self-study norm
2022-02-21 19:14:52

The signal y(n)

(1)y(n)=i=0L1h(i)x(ni)+v(t)

where h=[h1,h2,,hL]TL1v(t)σv2

In the Eq(20) found in the paper, http://www.eurasip.org/Proceedings/Eusipco/Eusipco2008/papers/1569101936.pdf

the eq has the term ||h||pp0<p1

Consider an array h=[1,0.2,0,0.5]||h||pph

The Authors in Eq(24)present the derivative of the cost function computed in Eq(20). The derivative is taken with respect to hpλh~h~p

2个回答

The symbol ||h||p

||h||p=|h1|p+|h2|p+...+|hn|pp

When they write ||h||pp, it is just that multiplied by itself p times, so that the root disappears. Mathematically:

||h||pp=i=1n|hi|p

To add on @Tendero, the expression kxkp is sometimes called the "power p-norm" when p1. Most often, you can see mentions of the "squared 2 norm" or "2 norm squared". For p=1, the exponent does not modify the computation, so it is just called the 1 norm.

The use of the power is often more convenient mathematically and computationally: having a p-root (p) can be cumbersome in computing derivatives to find extrema.

When p1, it satisfies all the norm axioms. But when 0<p<1, the triangle inequality is not satisfied anymore, so it should not be called a norm. The correct denomination is a quasi-norm, with a modulus of concavity modulus K such that

p(x+y)K(p(x)+p(y)).

When p=0, this is not a norm nor a quasi-norm anymore. It can be called cardinality function, sparsity, count index.

In signal processing where sparsity is considered useful, 0 is usually the target to minimize: number of non-zero samples, number of taps for a filter.

However, it is quite intractable too (not differentiable). Under some theoretical conditions, the minimization of an 0 penalty can be replaced by an 1 penalty, the "last" convex pp term.

However, more and more works address non-convex penalties (p<1) that better approximate 0.

Finally, in a Bayesian context, the prior of a Laplacian distribution can be encapsulated in an 1 penalty, as the Gaussian distribution can be encapsulated in an 2 penalty, see for instance Why is Laplace prior producing sparse solutions?.