0

Downloading Multiple Files Simultaneously with Guzzle

Posted (Updated ) in PHP

In a recent project I had to download and process a bunch of CSVs. Initially I had an extremely ugly exec() call to linux’s wget command for reasons I won’t go into but obviously a better, PHP-based solution was required. I had previous experience with Guzzle and its pooled requests so it was the obvious place to go.

Below is the script I ended up with. It takes an array of files and downloads them all to a __FILE__.’/downloads/’ directory. Not the cleanest thing in the world but it did the trick and you should be able to adapt it as you need.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
<?php
 
ini_set('display_errors', true);
require __DIR__.'/vendor/autoload.php';
 
use GuzzleHttp\Client;
use GuzzleHttp\Pool;
use GuzzleHttp\Psr7\Response;
 
$files = array(
	'http://mysite.com/1.csv',
	'http://mysite.com/2.csv',
	'http://mysite.com/3.csv',
	'http://mysite.com/4.csv',
);
 
// Track redirects so our Pool's fulfilled closure knows which URL the current
// download is for.
$client = new Client([
	'allow_redirects' => ['track_redirects' => true]
]);
 
$requests = function($total) use ($client, $files, $import_path) {
    foreach ( $files as $file )
    {
        yield function($poolOpts) use ($client, $file) {
            $reqOpts = array_merge($poolOpts, [
            	// Sink option specifies the download path for this file
                'sink' => __DIR__.'/downloads/' . basename($file)
            ]);
 
            return $client->getAsync($file, $reqOpts);
        };
    }
};
 
$pool = new Pool($client, $requests(100), [
    'concurrency' => 3,
    'fulfilled' => function(Response $response, $index) use ($files) {
        // Grab the URLs this file redirected through to download in chronological order.
        $urls = $response->getHeader(\GuzzleHttp\RedirectMiddleware::HISTORY_HEADER);
 
        echo "Downloaded ", end($urls), "<br/>\n";
    },
    'rejected' => function(Exception $reason, $index) use (&$import_errors) {
        $url = (string)$reason->getRequest()->getUri();
 
        echo "Failed to download ", $url, ": ", $reason->getMessage(), "<br/>\n";
    },
]);
 
$pool->promise()->wait();
 
echo "Finished downloading.<br/>";