如何使用 Selenium 的 WebDriver 下载文件?

软件测试 自动化测试 硒网络驱动程序 硒2 Python 文件
2022-01-26 23:20:09

基本上我想至少检查一个可下载的文件是否存在/下载链接是否有效,并且最好也得到文件大小之类的东西。

这是一个例子:

link = self.browser.find_element_by_link_text('link text')
href = link.get_attribute('href')
download = self.browser.get(href)
print download

第四行打印“无”,大概是因为我没有手动单击“保存”按钮,即使我有,我也怀疑 WebDriver 是否能够“看到”该文件。

有任何想法吗?我使用 Firefox 作为我的被测浏览器,并且我知道下载的文件处理在某种程度上是特定于浏览器和/或操作系统的。

4个回答

这是一个解决方案。将 Firefox 的首选项设置为自动保存,并且不弹出下载窗口。然后你只需抓取文件,它就会下载。

所以,像这样:

FirefoxProfile fxProfile = new FirefoxProfile();

fxProfile.setPreference("browser.download.folderList",2);
fxProfile.setPreference("browser.download.manager.showWhenStarting",false);
fxProfile.setPreference("browser.download.dir","c:\\mydownloads");
fxProfile.setPreference("browser.helperApps.neverAsk.saveToDisk","text/csv");

WebDriver driver = new FirefoxDriver(fxProfile);
driver.navigate().to("http://www.foo.com/bah.csv");

并且鉴于您现在有了下载目录,从不要求保存,并且没有出现下载管理器,从这一点开始自动化应该很简单。

您可以检查标头响应以检查您是否获得 200 OK(或者可能是重定向,取决于您的预期结果),它会告诉您文件存在。

这是我的实现

这会在页面上找到链接并提取链接到的 URL。然后它使用 apache commons 复制 selenium 使用的浏览器会话,然后下载文件。在某些情况下它不起作用(在页面上找到的链接实际上并没有链接到下载文件,而是一个防止自动文件下载的层)。

通常,它运行良好并且兼容跨平台/跨浏览器。

代码是:

 /*
  * Copyright (c) 2010-2011 Ardesco Solutions - http://www.ardescosolutions.com
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
  * You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an "AS IS" BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */

package com.lazerycode.ebselen.customhandlers;

import com.google.common.annotations.Beta;
import com.lazerycode.ebselen.EbselenCore;
import com.lazerycode.ebselen.handlers.FileHandler;
import org.apache.commons.httpclient.*;
import org.apache.commons.httpclient.cookie.CookiePolicy;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.methods.GetMethod;

import java.io.*;
import java.net.URL;
import java.util.Set;

import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

@Beta
public class FileDownloader {

    private static final Logger LOGGER = LoggerFactory.getLogger(EbselenCore.class);
    private WebDriver driver;
    private String downloadPath = System.getProperty("java.io.tmpdir");

    public FileDownloader(WebDriver driverObject) {
        this.driver = driverObject;
    }

    /**
     * Get the current location that files will be downloaded to.
     *
     * @return The filepath that the file will be downloaded to.
     */
    public String getDownloadPath() {
        return this.downloadPath;
    }

    /**
     * Set the path that files will be downloaded to.
     *
     * @param filePath The filepath that the file will be downloaded to.
     */
    public void setDownloadPath(String filePath) {
        this.downloadPath = filePath;
    }


    /**
     * Load in all the cookies WebDriver currently knows about so that we can mimic the browser cookie state
     *
     * @param seleniumCookieSet
     * @return
     */
    private HttpState mimicCookieState(Set<org.openqa.selenium.Cookie> seleniumCookieSet) {
        HttpState mimicWebDriverCookieState = new HttpState();
        for (org.openqa.selenium.Cookie seleniumCookie : seleniumCookieSet) {
            Cookie httpClientCookie = new Cookie(seleniumCookie.getDomain(), seleniumCookie.getName(), seleniumCookie.getValue(), seleniumCookie.getPath(), seleniumCookie.getExpiry(), seleniumCookie.isSecure());
            mimicWebDriverCookieState.addCookie(httpClientCookie);
        }
        return mimicWebDriverCookieState;
    }

    /**
     * Mimic the WebDriver host configuration
     *
     * @param hostURL
     * @return
     */
    private HostConfiguration mimicHostConfiguration(String hostURL, int hostPort) {
        HostConfiguration hostConfig = new HostConfiguration();
        hostConfig.setHost(hostURL, hostPort);
        return hostConfig;
    }

    public String fileDownloader(WebElement element) throws Exception {
        return downloader(element, "href");
    }

    public String imageDownloader(WebElement element) throws Exception {
        return downloader(element, "src");
    }

    public String downloader(WebElement element, String attribute) throws Exception {
        //Assuming that getAttribute does some magic to return a fully qualified URL
        String downloadLocation = element.getAttribute(attribute);
        if (downloadLocation.trim().equals("")) {
            throw new Exception("The element you have specified does not link to anything!");
        }
        URL downloadURL = new URL(downloadLocation);
        HttpClient client = new HttpClient();
        client.getParams().setCookiePolicy(CookiePolicy.RFC_2965);
        client.setHostConfiguration(mimicHostConfiguration(downloadURL.getHost(), downloadURL.getPort()));
        client.setState(mimicCookieState(driver.manage().getCookies()));
        HttpMethod getRequest = new GetMethod(downloadURL.getPath());
        FileHandler downloadedFile = new FileHandler(downloadPath + downloadURL.getFile().replaceFirst("/|\\\\", ""), true);
        try {
            int status = client.executeMethod(getRequest);
            LOGGER.info("HTTP Status {} when getting '{}'", status, downloadURL.toExternalForm());
            BufferedInputStream in = new BufferedInputStream(getRequest.getResponseBodyAsStream());
            int offset = 0;
            int len = 4096;
            int bytes = 0;
            byte[] block = new byte[len];
            while ((bytes = in.read(block, offset, len)) > -1) {
                downloadedFile.getWritableFileOutputStream().write(block, 0, bytes);
            }
            downloadedFile.close();
            in.close();
            LOGGER.info("File downloaded to '{}'", downloadedFile.getAbsoluteFile());
        } catch (Exception Ex) {
            LOGGER.error("Download failed: {}", Ex);
            throw new Exception("Download failed!");
        } finally {
            getRequest.releaseConnection();
        }
        return downloadedFile.getAbsoluteFile();
    }
}

据我所知,没有简单的方法可以让 Selenium 下载文件,因为浏览器使用无法由 JavaScript 控制的本机对话框,所以你需要一些“hack”。检查这个,希望它有帮助。

通过使用 Ajax 请求并返回字节,我制作了自己的下载器版本。优点是直接使用浏览器,不需要处理认证和cookies。它的缺点是您受到同源规则的限制,它可能需要大量内存,并且在旧浏览器中也可能会失败。

有时仍然非常有用:

import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.concurrent.TimeUnit;

public class AjaxFileDownloader {

    private WebDriver driver;

    public AjaxFileDownloader(WebDriver driverObject) {
        this.driver = driverObject;
        driver.manage().timeouts().setScriptTimeout(15, TimeUnit.SECONDS); // maybe you need a different timeout
    }

    public InputStream download(String url) throws IOException {
        String script = "var url = arguments[0];" +
                "var callback = arguments[arguments.length - 1];" +
                "var xhr = new XMLHttpRequest();" +
                "xhr.open('GET', url, true);" +
                "xhr.responseType = \"arraybuffer\";" + //force the HTTP response, response-type header to be array buffer
                "xhr.onload = function() {" +
                "  var arrayBuffer = xhr.response;" +
                "  var byteArray = new Uint8Array(arrayBuffer);" +
                "  callback(byteArray);" +
                "};" +
                "xhr.send();";
        Object response = ((JavascriptExecutor) driver).executeAsyncScript(script, url);
        // Selenium returns an Array of Long, we need byte[]
        ArrayList<Long> byteList = (ArrayList<Long>) response;
        byte[] bytes = new byte[byteList.size()];
        for(int i = 0; i < byteList.size(); i++) {
            bytes[i] = (byte)(long)byteList.get(i);
        }
        return new ByteArrayInputStream(bytes);
    }

}