IT技术 - 如何将 uint8 数组转换为 base64 编码字符串？ - 吾爱随笔录

如何将 uint8 数组转换为 base64 编码字符串？

IT技术 javascript arrays base64

2021-02-10 13:14:49

我有一个 webSocket 通信，我收到 base64 编码的字符串，将其转换为 uint8 并对其进行处理，但现在我需要发回，我得到了 uint8 数组，并且需要将其转换为 base64 字符串，以便我可以发送它。我怎样才能进行这种转换？

6个回答

如果您的数据可能包含多字节序列（不是纯 ASCII 序列）并且您的浏览器具有TextDecoder，那么您应该使用它来解码您的数据（为 TextDecoder 指定所需的编码）：

var u8 = new Uint8Array([65, 66, 67, 68]);
var decoder = new TextDecoder('utf8');
var b64encoded = btoa(decoder.decode(u8));

如果您需要支持没有 TextDecoder 的浏览器（目前只有 IE 和 Edge），那么最好的选择是使用TextDecoder polyfill。

如果您的数据包含纯 ASCII（不是多字节 Unicode/UTF-8），那么有一个简单的替代方法String.fromCharCode应该得到普遍支持：

var ascii = new Uint8Array([65, 66, 67, 68]);
var b64encoded = btoa(String.fromCharCode.apply(null, ascii));

并将 base64 字符串解码回 Uint8Array：

var u8_2 = new Uint8Array(atob(b64encoded).split("").map(function(c) {
    return c.charCodeAt(0); }));

如果您有非常大的数组缓冲区，那么应用可能会失败，您可能需要对缓冲区进行分块（基于@RohitSengar 发布的那个）。同样，请注意，这仅在您的缓冲区仅包含非多字节 ASCII 字符时才正确：

function Uint8ToString(u8a){
  var CHUNK_SZ = 0x8000;
  var c = [];
  for (var i=0; i < u8a.length; i+=CHUNK_SZ) {
    c.push(String.fromCharCode.apply(null, u8a.subarray(i, i+CHUNK_SZ)));
  }
  return c.join("");
}
// Usage
var u8 = new Uint8Array([65, 66, 67, 68]);
var b64encoded = btoa(Uint8ToString(u8));

base64 字符串或Uint8Array. TextDecoder在这里使用绝对是错误的，因为如果您的Uint8Array字节范围为 128..255，文本解码器会将它们错误地转换为 unicode 字符，这会破坏 base64 转换器。

2021-03-17 13:14:49

如果字节数组不是有效的 Unicode，这将不起作用。

2021-04-01 13:14:49

@MichaelPaulukonis 我的猜测是实际上是 String.fromCharCode.apply 导致堆栈大小被超出。如果您有一个非常大的 Uint8Array，那么您可能需要迭代地构建字符串，而不是使用 apply 来执行此操作。apply() 调用将数组的每个元素作为参数传递给 fromCharCode，因此如果数组的长度为 128000 字节，那么您将尝试使用 128000 个参数进行函数调用，这很可能会破坏堆栈。

2021-04-03 13:14:49

谢谢。我所需要的只是btoa(String.fromCharCode.apply(null, myArray))

2021-04-03 13:14:49

这在 Firefox 中对我有用，但 Chrome 因“未捕获的范围错误：超出最大调用堆栈大小”而窒息（执行 btoa）。

2021-04-11 13:14:49

如果您使用的是 Node.js，那么您可以使用此代码将 Uint8Array 转换为 base64

var b64 = Buffer.from(u8).toString('base64');

惊人的！谢谢。最好的答案

2021-03-14 13:14:49

就性能而言，这是一个更好的答案，然后是上面的手卷功能。

2021-04-02 13:14:49

简短、精确且高效的出色解决方案。

2021-04-04 13:14:49

非常简单的 JavaScript 解决方案和测试！

ToBase64 = function (u8) {
    return btoa(String.fromCharCode.apply(null, u8));
}

FromBase64 = function (str) {
    return atob(str).split('').map(function (c) { return c.charCodeAt(0); });
}

var u8 = new Uint8Array(256);
for (var i = 0; i < 256; i++)
    u8[i] = i;

var b64 = ToBase64(u8);
console.debug(b64);
console.debug(FromBase64(b64));

它在大数据（例如图像）上失败 RangeError: Maximum call stack size exceeded

2021-03-14 13:14:49

最干净的解决方案！

2021-03-16 13:14:49

我收到错误“InvalidCharacterError：字符串包含无效字符。” 尝试在字符串上使用函数 FromBase 64 时eJyLjjYy1CEXxeqM6h5Wui1giFzdhngMIEo3BS4fomE qpsWumMB4VPulQ==

2021-04-03 13:14:49

它也使typescript不愉快，但它似乎有效。

2021-04-05 13:14:49

完美解决方案

2021-04-09 13:14:49

所有已经提出的解决方案都有严重的问题。一些解决方案无法在大型数组上工作，一些提供错误的输出，如果中间字符串包含多字节字符，一些会在 btoa 调用时抛出错误，一些会消耗比需要更多的内存。

所以我实现了一个直接转换功能，无论输入如何，它都可以工作。它在我的机器上每秒转换大约 500 万字节。

https://gist.github.com/enepomnyaschih/72c423f727d395eeaa09697058238727

Show code snippet

/*
MIT License
Copyright (c) 2020 Egor Nepomnyaschih
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
*/

/*
// This constant can also be computed with the following algorithm:
const base64abc = [],
	A = "A".charCodeAt(0),
	a = "a".charCodeAt(0),
	n = "0".charCodeAt(0);
for (let i = 0; i < 26; ++i) {
	base64abc.push(String.fromCharCode(A + i));
}
for (let i = 0; i < 26; ++i) {
	base64abc.push(String.fromCharCode(a + i));
}
for (let i = 0; i < 10; ++i) {
	base64abc.push(String.fromCharCode(n + i));
}
base64abc.push("+");
base64abc.push("/");
*/
const base64abc = [
	"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M",
	"N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z",
	"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m",
	"n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z",
	"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "+", "/"
];

/*
// This constant can also be computed with the following algorithm:
const l = 256, base64codes = new Uint8Array(l);
for (let i = 0; i < l; ++i) {
	base64codes[i] = 255; // invalid character
}
base64abc.forEach((char, index) => {
	base64codes[char.charCodeAt(0)] = index;
});
base64codes["=".charCodeAt(0)] = 0; // ignored anyway, so we just need to prevent an error
*/
const base64codes = [
	255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
	255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
	255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 62, 255, 255, 255, 63,
	52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 255, 255, 255, 0, 255, 255,
	255, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
	15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 255, 255, 255, 255, 255,
	255, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
	41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51
];

function getBase64Code(charCode) {
	if (charCode >= base64codes.length) {
		throw new Error("Unable to parse base64 string.");
	}
	const code = base64codes[charCode];
	if (code === 255) {
		throw new Error("Unable to parse base64 string.");
	}
	return code;
}

export function bytesToBase64(bytes) {
	let result = '', i, l = bytes.length;
	for (i = 2; i < l; i += 3) {
		result += base64abc[bytes[i - 2] >> 2];
		result += base64abc[((bytes[i - 2] & 0x03) << 4) | (bytes[i - 1] >> 4)];
		result += base64abc[((bytes[i - 1] & 0x0F) << 2) | (bytes[i] >> 6)];
		result += base64abc[bytes[i] & 0x3F];
	}
	if (i === l + 1) { // 1 octet yet to write
		result += base64abc[bytes[i - 2] >> 2];
		result += base64abc[(bytes[i - 2] & 0x03) << 4];
		result += "==";
	}
	if (i === l) { // 2 octets yet to write
		result += base64abc[bytes[i - 2] >> 2];
		result += base64abc[((bytes[i - 2] & 0x03) << 4) | (bytes[i - 1] >> 4)];
		result += base64abc[(bytes[i - 1] & 0x0F) << 2];
		result += "=";
	}
	return result;
}

export function base64ToBytes(str) {
	if (str.length % 4 !== 0) {
		throw new Error("Unable to parse base64 string.");
	}
	const index = str.indexOf("=");
	if (index !== -1 && index < str.length - 2) {
		throw new Error("Unable to parse base64 string.");
	}
	let missingOctets = str.endsWith("==") ? 2 : str.endsWith("=") ? 1 : 0,
		n = str.length,
		result = new Uint8Array(3 * (n / 4)),
		buffer;
	for (let i = 0, j = 0; i < n; i += 4, j += 3) {
		buffer =
			getBase64Code(str.charCodeAt(i)) << 18 |
			getBase64Code(str.charCodeAt(i + 1)) << 12 |
			getBase64Code(str.charCodeAt(i + 2)) << 6 |
			getBase64Code(str.charCodeAt(i + 3));
		result[j] = buffer >> 16;
		result[j + 1] = (buffer >> 8) & 0xFF;
		result[j + 2] = buffer & 0xFF;
	}
	return result.subarray(0, result.length - missingOctets);
}

export function base64encode(str, encoder = new TextEncoder()) {
	return bytesToBase64(encoder.encode(str));
}

export function base64decode(str, decoder = new TextDecoder()) {
	return decoder.decode(base64ToBytes(str));
}

将 base64abc 作为字符串数组是否比使其成为字符串更快？"ABCDEFG..."?

2021-03-13 13:14:49

这很漂亮

2021-03-14 13:14:49

我尝试在带有 Edge 的 Word Web AddIn 中使用它，但出现错误“未定义 TextDecoder”。幸运的是，我只需要 bytesToBase64 函数并且可以删除依赖项。

2021-03-16 13:14:49

function Uint8ToBase64(u8Arr){
  var CHUNK_SIZE = 0x8000; //arbitrary number
  var index = 0;
  var length = u8Arr.length;
  var result = '';
  var slice;
  while (index < length) {
    slice = u8Arr.subarray(index, Math.min(index + CHUNK_SIZE, length)); 
    result += String.fromCharCode.apply(null, slice);
    index += CHUNK_SIZE;
  }
  return btoa(result);
}

如果您有一个非常大的 Uint8Array，则可以使用此函数。这是针对 Javascript 的，在 FileReader readAsArrayBuffer 的情况下很有用。

这不安全，是吗？如果我的块的边界穿过多字节 UTF8 编码字符，那么fromCharCode()将无法从边界两侧的字节创建合理的字符，是吗？

2021-03-23 13:14:49

@Jens 二进制数据数组中的多字节 UTF8 编码字符是什么？我们在这里不处理 unicode 字符串，而是处理任意二进制数据，不应将其视为 utf-8 代码点。

2021-03-30 13:14:49

@JensString.fromCharCode.apply()方法无法重现 UTF-8：UTF-8 字符的长度可能从一个字节到四个字节String.fromCharCode.apply()不等，但检查 UInt8 段中的 UInt8Array，因此它错误地假设每个字符正好是一个字节长并且与相邻字符无关那些。如果输入的 UInt8Array 中编码的字符都恰好在 ASCII（单字节）范围内，它会偶然工作，但不能重现完整的 UTF-8。为此，您需要 TextDecoder 或类似的算法。

2021-04-08 13:14:49

有趣的是，在 Chrome 中，我在一个 300kb+ 的缓冲区上进行计时，并发现像您一样分块进行操作比逐字节操作要慢一点。这让我很惊讶。

2021-04-09 13:14:49

@马特很有趣。与此同时，Chrome 现在可能已经检测到这种转换并对其进行了特定的优化，并且分块数据可能会降低其效率。

2021-04-11 13:14:49

其它你可能感兴趣的问题

上一篇使用 Chrome，如何查找绑定到元素的事件下一篇如何检查cookie是否存在？