计算科学 - 如何将每个优化的向量分量约束为非负数？ - 吾爱随笔录

我正在建立一个基于投资组合优化的梯度下降模型。目前，我已经完成了模型，并且能够顺利运行它，没有任何问题。但是，有一个问题我无法解决：最佳向量分量具有负值。由于我的投资组合不支持卖空股票，因此这种最佳解决方案对我的投资组合可能不是一个令人信服的解决方案。

型号说明

我对投资组合问题的成本函数定义为

F (x) = \frac{β_{1}}{2} (x^{T} Σ x) - μ^{T} x + \frac{γ}{2} (e^{T} x - 1)^{2} + λ (e^{T} x - 1) + \frac{β_{2}}{2} ‖ x ‖_{2}^{2} + ρ \sum_{i = 1}^{n} max (0, - x_{i})^{2}

$F(x)=\frac{\beta_1}{2}(x^T\boldsymbol{\Sigma}x)-\mu^Tx+\frac{\gamma}{2}(e^Tx-1)^2+\lambda(e^Tx-1)+\frac{\beta_2}{2}\Vert{x}\Vert^2_2+\rho\sum_{i=1}^n\max(0,-x_i)^2$

其中和是我的最小化向量和正定协方差矩阵。是约束和的所有分量施加的惩罚。 $x$ $\boldsymbol{\Sigma}$ $\frac{\gamma}{2}(e^Tx-1)^2+\lambda(e^Tx-1)$ $e^Tx=1$ $\rho\sum_{i=1}^n\max(0,-x_i)^2$ $x_i>0]$ $x$

从这里可以看出，我的目标是使用最速下降法因此，我开始构建我的 Python 代码，如下所示： $x$

import numpy as np
from numpy.linalg import norm

def penalty_f(v):
    return  np.sum([np.power(max(0., -xi),2) for xi in v])

def penalty_df(v):
    return -2.*np.array([max(0, -xi) for xi in v])

def aug_lag(lam, v):
    new_lam = lam + gam*(np.sum(v)-1.)
    return new_lam

def lipschitz(v):
    lips = beta_1*np.sqrt(np.trace(cov@cov)) + beta_2*np.sqrt(len(v)) + gam*len(v)
    return min(1,1/lips)

def project_f(v, lam_k):
    func = 0.5*beta_1*v.T@cov@v - b*mean.T@v + gam/2*(np.sum(v)-1.)**2 + lam_k*(np.sum(v)-1.) + beta_2/2*v@v + rho*penalty_f(v)
    return func

def project_df(v, lam_k):
    par_func = beta_1*cov@v - b*mean + gam*np.ones_like(v)*(np.sum(v)-1.) + lam_k*np.ones_like(v) + beta_2*v + rho*penalty_df(v)
    return par_func

def gradient_descent(f, df, ini_v, tolerance, lam, MAX_ITER=10000, output_fname='output.txt'):
    vec = ini_v
    negative = 0
    aug_lam = lam
    with open(output_fname, 'w') as out:
        for i in range(MAX_ITER):
            alpha = lipschitz(vec)
            f_value = f(vec, aug_lam)
            gradient = df(vec, aug_lam)
            direction = np.negative(gradient)
            vec += alpha * direction
            aug_lam = aug_lag(aug_lam, vec)
            print("No. of zeros: ", len(vec)-np.count_nonzero(vec))
            print(norm(gradient,2), "," , f_value)
            msg = f'{f_value},{norm(gradient,2)}, {str(list(vec))}\n'
            out.write(msg)

            # stopping criteria
            if norm(gradient,2) < tolerance:
                print ("The optimum vector for", {df}, " is at ", vec,"at iteration ", i+1)
                for i in vec:
                    if i < 0:
                        negative += 1
                print("No of negative: ", negative)
                print("Vector sum: ",np.sum(vec))
                print("Gradient: ", norm(gradient,2))
                break
            
            if i == MAX_ITER:
                print ('Higher no. of iterations is needed for', {df})
                print ("Vector: ", vec)
                print("Vector sum: ",np.sum(vec))
                print("Gradient: ", norm(gradient,2))
                
    return vec

这是我使用 33 只股票获得的结果：

The optimum vector for {<function project_df at 0x000001FA50FF2D30>}  is at  [-0.03442667 -0.01839447  0.03305939 -0.04319195  0.00353058  0.13003235
  0.09405175  0.00022958  0.08010633 -0.02831581 -0.04066535 -0.02875569
  0.04953858 -0.04019486  0.22866438 -0.03474848 -0.00029308  0.09792178
  0.11669689 -0.08569448  0.00405406  0.05544077  0.12273198  0.09615974
 -0.04374827  0.16994986  0.00670517  0.04772238  0.01729612  0.05218545
  0.03484764 -0.03438397 -0.00811171] at iteration  1230
No of negative:  13
Vector sum:  1.0000000001114249
Gradient:  9.961948573544883e-07

当我评估 30 只或更多股票时，负面成分开始出现。在执行投资组合优化时，我有什么可以改进或注意的地方吗？

PS 均值和协方差的数据在链接中https://www.dropbox.com/scl/fi/rycj948t4bnq5u60m13ow/Covariance.xlsx?dl=0&rlkey=d8u18ntuxk7wjcl1eup8gmxoa

import pandas as pd

df1 = pd.read_excel('Covariance.xlsx', sheet_name=0, header=None)
df2 = pd.read_excel('Covariance.xlsx', sheet_name=1, header=None)
np_cov_1 = df1.values
mean_1 = df2.values.reshape(len(df2))
ini_vec_1 = np.array([1. / (len(df2)) for i in range(len(df1))])