仅限 Puppeteer 的解决方案
这可以单独使用 puppeteer 来完成。您描述的问题response.buffer
是在导航时被清除,可以通过一个接一个地处理每个请求来规避。
怎么运行的
下面的代码page.setRequestInterception
用于拦截所有请求。如果当前有正在处理/正在等待的请求,则将新请求放入队列中。然后,response.buffer()
可以在没有其他请求可能异步擦除缓冲区的问题的情况下使用,因为没有并行请求。一旦处理了当前处理的请求/响应,就会处理下一个请求。
代码
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const [page] = await browser.pages();
const results = []; // collects all results
let paused = false;
let pausedRequests = [];
const nextRequest = () => { // continue the next request or "unpause"
if (pausedRequests.length === 0) {
paused = false;
} else {
// continue first request in "queue"
(pausedRequests.shift())(); // calls the request.continue function
}
};
await page.setRequestInterception(true);
page.on('request', request => {
if (paused) {
pausedRequests.push(() => request.continue());
} else {
paused = true; // pause, as we are processing a request now
request.continue();
}
});
page.on('requestfinished', async (request) => {
const response = await request.response();
const responseHeaders = response.headers();
let responseBody;
if (request.redirectChain().length === 0) {
// body can only be access for non-redirect responses
responseBody = await response.buffer();
}
const information = {
url: request.url(),
requestHeaders: request.headers(),
requestPostData: request.postData(),
responseHeaders: responseHeaders,
responseSize: responseHeaders['content-length'],
responseBody,
};
results.push(information);
nextRequest(); // continue with next request
});
page.on('requestfailed', (request) => {
// handle failed request
nextRequest();
});
await page.goto('...', { waitUntil: 'networkidle0' });
console.log(results);
await browser.close();
})();