嵌入式应用的定点三角法

电器工程 C 嵌入式 手臂
2022-01-11 06:38:35

我需要在嵌入式应用程序中进行旋转(和其他)转换,需要 sin() cos() 和 tan() 函数。我知道您可以使用查找表,这是我自己研究时能找到的唯一解决方案,但是那里有一个好的定点三角库吗?

我正在考虑为应用程序使用皮质 M3,所以我想尽可能地远离浮点以保持应用程序的流畅。

3个回答

在嵌入式应用程序中进行三角函数的一个好方法是使用多项式逼近您需要的函数。代码紧凑,数据由几个系数组成,唯一需要的操作是乘法和加法/减法。许多嵌入式系统都有硬件乘法器,可以提供良好的性能。

您反对为此使用定点 Cortex 库吗?

q31_t arm_sin_q31 (q31_t x)
快速逼近 Q31 数据的三角正弦函数。

从:

CMSIS-DSP:DSP 库集合,包含 60 多个函数,适用于各种数据类型:定点(小数 q7、q15、q31)和单精度浮点(32 位)。该库适用于 Cortex-M0、Cortex-M3和 Cortex-M4。

它使用带有二次插值的查找表,但速度非常快。您可以将其调整为线性插值以获得更快的速度但更多的错误。

另请注意,即使 Cortex M4 也不一定具有 FPU。如果他们这样做,我已经看到他们被称为“M4F”。

该答案旨在通过两个变体的具体示例来扩充当前接受的答案,并提供一些具体的设计建议。

如果所需的精度相当高并且有可用的硬件乘法器,则多项式逼近通常是一种更好的方法。即使在使用一次插值(例如线性、二次)和压缩方案(例如二分表)时,表的大小也会迅速增加,需要超过大约 16 个“好”位。

强烈建议对多项式使用极小极大近似,因为它们可以最小化生成它们的区间内的最大误差。与泰勒级数展开相比,这可以显着减少特定精度所需的项数,泰勒级数展开仅在展开它们的点处提供最佳精度。Mathematica、Maple 等常用工具和开源Sollya 工具提供了生成极小极大近似值的内置方法。

乘高运算是定点算术中多项式评估的基本计算构建块。它们返回整数乘积的更重要的一半。大多数体系结构提供有符号和无符号变体,其他体系结构提供乘法运算,并在两个寄存器中返回双宽度结果。一些架构甚至提供乘高加加组合,这可能特别有用。优化编译器通常能够将与这些操作相对应的 HLL 源代码惯用语(例如在下面的 ISO-C 代码中使用的惯用语)翻译成适当的硬件指令。

为了最大限度地提高多项式评估的准确性,人们希望通过选择具有最大可能小数位数的定点格式,在中间计算期间始终利用可能的最大位数。为了提高效率,当与乘高运算结合使用时,等于寄存器宽度的比例因子避免了通过移位重新缩放的需要。

虽然 Horner 方案通常用于浮点计算以评估高精度的多项式,但这在定点计算中通常是不必要的,并且由于多项式评估的冗长依赖链暴露了乘法延迟,因此可能对性能有害。允许最佳利用具有多周期延迟的流水线乘法器的并行评估方案通常是可取的。在下面的代码中,我将每个多项式的项成对组合,并从中建立对完整多项式的评估。

下面的 ISO-C 代码演示了根据这些设计原则在区间 [0, π/2] 上同时计算正弦和余弦,其中输入和输出采用 S8.23 (Q8.23) 格式。它实现了基本上完全准确的结果,最大误差约为 10 -7和 80+% 的结果正确四舍五入。

第一个变体 insincos_fixed_nj()使用将参数简化为 [0, π/4] 的经典方法,以及对该区间上的正弦和余弦的多项式逼近。然后重建阶段基于象限将多项式值映射到正弦和余弦。中的第二个变体sincos_fixed_ollyw基于OllyW 的博客文章他们建议将变换 a = (2/π)x-1/2 应用于区间 [-1/2, 1/2],然后需要在区间上近似 sin ((2πa + π)/4 和 cos ((2πa + π)/4。这些 ( sin , cos ) 的级数展开是相同的,只是奇次幂项的符号相反。这意味着可以分别对奇次幂和偶次幂项求和,然后计算正弦和余弦作为累加和的和和差。

使用编译器资源管理器,我使用 Clang 11.0 为armv7-a具有完全优化的 32 位 ARM 目标编译 ( -O3)。两种变体都编译成 41 条指令子程序,每个子程序使用 9 个存储的 32 位常量。sincos_fixed_ollyw()比使用多一条乘法指令,sincos_fixed_nj但寄存器压力略低。当使用 Clang 为其他架构目标构建时,情况似乎相似,因此人们希望尝试两种变体,看看哪个在给定平台上表现更好。可以通过将正弦结果除以余弦结果来计算正切。

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <math.h>

#define SINCOS_NJ    (1)
#define SINCOS_OLLYW (2)
#define VARIANT      (SINCOS_NJ)

/* a single instruction in many 32-bit architectures */
uint32_t umul32hi (uint32_t a, uint32_t b)
{
    return (uint32_t)(((uint64_t)a * b) >> 32);
}

/* a single instruction in many 32-bit architectures */
int32_t mul32hi (int32_t a, int32_t b)
{
    return (int32_t)(uint32_t)((uint64_t)((int64_t)a * b) >> 32);
}

/*
  compute sine and cosine of argument in [0, PI/2]
  input and output in S8.23 format
  max err sine = 9.86237533e-8  max err cosine = 1.02729891e-7
  rms err sine = 4.11141973e-8  rms err cosine = 4.11752018e-8
  sin correctly rounded: 10961278 (83.19%)  
  cos correctly rounded: 11070113 (84.01%)
*/
void sincos_fixed_nj (int32_t x, int32_t *sine, int32_t *cosine)
{
    // minimax polynomial approximation for sine on [0, PI/4]
    const uint32_t s0 = (uint32_t)(1.9510998390614986e-4 * (1LL << 32) + 0.5);
    const uint32_t s1 = (uint32_t)(8.3322080317884684e-3 * (1LL << 32) + 0.5);
    const uint32_t s2 = (uint32_t)(1.6666648373939097e-1 * (1LL << 32) + 0.5);
    const uint32_t s3 = (uint32_t)(9.9999991734512150e-1 * (1LL << 32) + 0.5);
    // minimax polynomial approximation for cosine on [0, PI/4]
    const uint32_t c0 = (uint32_t)(1.3578890357166529e-3 * (1LL << 32) + 0.5);
    const uint32_t c1 = (uint32_t)(4.1654359549283981e-2 * (1LL << 32) + 0.5);
    const uint32_t c2 = (uint32_t)(4.9999838648363948e-1 * (1LL << 32) + 0.5);
    const uint32_t c3 = (uint32_t)(9.9999997159466147e-1 * (1LL << 32) + 0.5);
    // auxilliary constants
    const int32_t hpi_p23 = (int32_t)(3.141592653590 / 2 * (1LL << 23) + 0.5);
    const int32_t qpi_p23 = (int32_t)(3.141592653590 / 4 * (1LL << 23) + 0.5);
    const int32_t one_p23 = (int32_t)(1.0000000000000e+0 * (1LL << 23) + 0.5);
    uint32_t a, s, q, h, l, t, sn, cs;

    /* reduce range from [0, PI/2] to [0, PI/4] */
    t = (x > qpi_p23) ? (hpi_p23 - x) : x; // S8.23

    /* scale up argument for maximum precision in intermediate computation */
    a = t << 9; // U0.32

    /* pre-compute a**2 and a**4 */
    s = umul32hi (a, a); // U0.32
    q = umul32hi (s, s); // U0.32

    /* approximate sine on [0, PI/4] */
    h = s3 - umul32hi (s2, s); // U0.32
    l = umul32hi (s1 - umul32hi (s0, s), q); // U0.32
    sn = umul32hi (h + l, a); // U0.32

    /* approximate cosine on [0, PI/4] */
    h = c3 - umul32hi (c2, s); // U0.32
    l = umul32hi (c1 - umul32hi (c0, s), q); // U0.32
    cs = h + l; // U0.32

    /* round results to target precision */
    sn = ((sn + 256) >> 9); // S8.23
    cs = ((cs + 256) >> 9); // S8.23

    /* cosine result overflows U0.32 format for small arguments */
    cs = (t < 0xb50) ? one_p23 : cs; // S8.23

    /* map sine/cosine approximations based on quadrant */
    *sine   = (t != x) ? cs : sn; // S8.23
    *cosine = (t != x) ? sn : cs; // S8.23
}   

/*
  compute sine and cosine of argument in [0, PI/2]
  input and output in S8.23 format
  max err sine = 1.13173883e-7  max err cosine = 1.13158773e-7
  rms err sine = 4.30955921e-8  rms err cosine = 4.31472191e-8
  sin correctly rounded: 10844170 (82.30%)  
  cos correctly rounded: 10855609 (82.38%)

  Based on an approach by OllyW (http://www.olliw.eu/2014/fast-functions/, 
  retrieved 10/23/2020). We transform a = 2/PI*x-1/2, then we approximate
  sin ((2*PI*a + PI)/4 and cos ((2*PI*a + PI)/4. Except for sign flipping
  in the odd-power terms of the expansions the two series expansions match:

https://www.wolframalpha.com/input/?i=series++sin+%28%282*pi*a+%2B+pi%29%2F4%29
https://www.wolframalpha.com/input/?i=series++cos+%28%282*pi*a+%2B+pi%29%2F4%29

  This means we can sum the odd-power and the even-power terms seperately,
  then compute the sum and difference of those sums giving sine and cosine.
*/
void sincos_fixed_ollyw (int32_t x, int32_t *sine, int32_t *cosine)
{
    // minimax polynomial approximation for sin ((2*PI*a + PI)/4 on [-0.5, 0.5]
    const uint32_t c0 = (uint32_t)(7.0710676768794656e-1 * (1LL << 32) + 0.5);
    const uint32_t c1 = (uint32_t)((1.110721191857 -.25) * (1LL << 32) + 0.5);
    const uint32_t c2 = (uint32_t)(8.7235601339489222e-1 * (1LL << 32) + 0.5);
    const uint32_t c3 = (uint32_t)(4.5677902549505234e-1 * (1LL << 32) + 0.5);
    const uint32_t c4 = (uint32_t)(1.7932640877552330e-1 * (1LL << 32) + 0.5);
    const uint32_t c5 = (uint32_t)(5.6449491763487458e-2 * (1LL << 32) + 0.5);
    const uint32_t c6 = (uint32_t)(1.4444266213104129e-2 * (1LL << 32) + 0.5);
    const uint32_t c7 = (uint32_t)(3.4931597765535116e-3 * (1LL << 32) + 0.5);
    // auxiliary constants
    const uint32_t twoopi = (uint32_t)(2/3.1415926535898 * (1LL << 32) + 0.5);
    const uint32_t half_p31 = (uint32_t)(0.5000000000000 * (1LL << 31) + 0.5);
    const uint32_t quarter_p30 = (uint32_t)(0.2500000000 * (1LL << 30) + 0.5);
    uint32_t s, t, q, h, l;
    int32_t a, o, e, sn, cs;

    /* scale up argument for maximum precision in intermediate computation */
    t = (uint32_t)x << 8; // U1.31

    /* a = 2*PI*x - 0.5 */
    a = umul32hi (twoopi, t) - half_p31; // S0.31

    /* precompute a**2 and a**4 */
    s = (uint32_t)mul32hi (a, a) << 2; // U0.32
    q = umul32hi (s, s); // U0.32

    /* sum odd power terms; add in second portion of c1 (= 0.25) at the end */
    h = c1 - umul32hi (c3, s); // U0.32
    l = umul32hi ((c5 - umul32hi (c7, s)), q); // U0.32
    o = ((h + l) >> 2) + quarter_p30; // S1.30
    o = mul32hi (o, a); // S2.29

    /* sum even power terms */
    h = c0 - umul32hi (c2, s); // U0.32
    l = umul32hi ((c4 - umul32hi (c6, s)), q); // U0.32
    e = (h + l) >> 3; // S2.29 

    /* compute sine and cosine as sum and difference of odd / even terms */
    sn = e + o; // S2.29 sum -> sine 
    cs = e - o; // S2.29 difference -> cosine

    /* round results to target precision */
    sn = (sn + 32) >> 6; // S8.23
    cs = (cs + 32) >> 6; // S8.23

    *sine = sn;
    *cosine = cs;
}

double s8p23_to_double (int32_t a)
{
    return (double)a / (1LL << 23);
}

int32_t double_to_s8p23 (double a)
{
    return (int32_t)(a * (1LL << 23) + 0.5);
}

/* exhaustive test of S8.23 fixed-point sincos on [0,PI/2] */
int main (void)
{
    double errc, errs, maxerrs, maxerrc, errsqs, errsqc;
    int32_t arg, sin_correctly_rounded, cos_correctly_rounded;

#if VARIANT == SINCOS_OLLYW
    printf ("S8.23 fixed-point sincos OllyW variant\n");
#elif VARIANT == SINCOS_NJ
    printf ("S8.23 fixed-point sincos NJ variant\n");
#else // VARIANT
#error unsupported VARIANT
#endif // VARIANT

    maxerrs = 0; 
    maxerrc = 0;
    errsqs = 0;
    errsqc = 0;
    sin_correctly_rounded = 0;
    cos_correctly_rounded = 0;

    for (arg = 0; arg <= double_to_s8p23 (3.14159265358979 / 2); arg++) {
        double argf, refs, refc;
        int32_t sine, cosine, refsi, refci;
#if VARIANT == SINCOS_OLLYW
        sincos_fixed_ollyw (arg, &sine, &cosine);
#elif VARIANT == SINCOS_NJ
        sincos_fixed_nj (arg, &sine, &cosine);
#endif // VARIANT
        argf = s8p23_to_double (arg);
        refs = sin (argf);
        refc = cos (argf);
        refsi = double_to_s8p23 (refs);
        refci = double_to_s8p23 (refc);
        /* print function values near endpoints of interval */
        if ((arg < 5) || (arg > 0xc90fd5)) {
            printf ("arg=%08x  sin=%08x  cos=%08x\n", arg, sine, cosine);
        }
        if (sine == refsi) sin_correctly_rounded++;
        if (cosine == refci) cos_correctly_rounded++;
        errs = fabs (s8p23_to_double (sine) - refs);
        errc = fabs (s8p23_to_double (cosine) - refc);
        errsqs += errs * errs;
        errsqc += errc * errc;
        if (errs > maxerrs) maxerrs = errs;
        if (errc > maxerrc) maxerrc = errc;
    }
    printf ("max err sine = %15.8e  max err cosine = %15.8e\n", 
            maxerrs, maxerrc);
    printf ("rms err sine = %15.8e  rms err cosine = %15.8e\n", 
            sqrt (errsqs / arg), sqrt (errsqc / arg));
    printf ("sin correctly rounded: %d (%.2f%%)  cos correctly rounded: %d (%.2f%%)\n", 
            sin_correctly_rounded, 1.0 * sin_correctly_rounded / arg * 100,
            cos_correctly_rounded, 1.0 * cos_correctly_rounded / arg * 100);
    return EXIT_SUCCESS;
}

封闭的测试框架的输出基本上应该是这样的:

S8.23 fixed-point sincos NJ variant
arg=00000000  sin=00000000  cos=00800000
arg=00000001  sin=00000001  cos=00800000
arg=00000002  sin=00000002  cos=00800000
arg=00000003  sin=00000003  cos=00800000
arg=00000004  sin=00000004  cos=00800000
arg=00c90fd6  sin=00800000  cos=00000005
arg=00c90fd7  sin=00800000  cos=00000004
arg=00c90fd8  sin=00800000  cos=00000003
arg=00c90fd9  sin=00800000  cos=00000002
arg=00c90fda  sin=00800000  cos=00000001
arg=00c90fdb  sin=00800000  cos=00000000
max err sine = 9.86237533e-008  max err cosine = 1.02729891e-007
rms err sine = 4.11141973e-008  rms err cosine = 4.11752018e-008
sin correctly rounded: 10961278 (83.19%)  cos correctly rounded: 11070113 (84.01%)

fixed-point sincos OllyW variant
arg=00000000  sin=00000000  cos=00800000
arg=00000001  sin=00000001  cos=00800000
arg=00000002  sin=00000002  cos=00800000
arg=00000003  sin=00000003  cos=00800000
arg=00000004  sin=00000004  cos=00800000
arg=00c90fd6  sin=00800000  cos=00000005
arg=00c90fd7  sin=00800000  cos=00000004
arg=00c90fd8  sin=00800000  cos=00000003
arg=00c90fd9  sin=00800000  cos=00000002
arg=00c90fda  sin=00800000  cos=00000001
arg=00c90fdb  sin=00800000  cos=00000000
max err sine = 1.13173883e-007  max err cosine = 1.13158773e-007
rms err sine = 4.30955921e-008  rms err cosine = 4.31472191e-008
sin correctly rounded: 10844170 (82.30%)  cos correctly rounded: 10855609 (82.38%)