In a recent project I had to download and process a bunch of CSVs. Initially I had an extremely ugly exec() call to linux’s wget command for reasons I won’t go into but obviously a better, PHP-based solution was required. I had previous experience with Guzzle and its pooled requests so it was the obvious place to go.
Below is the script I ended up with. It takes an array of files and downloads them all to a __FILE__.’/downloads/’ directory. Not the cleanest thing in the world but it did the trick and you should be able to adapt it as you need.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | <?php ini_set('display_errors', true); require __DIR__.'/vendor/autoload.php'; use GuzzleHttp\Client; use GuzzleHttp\Pool; use GuzzleHttp\Psr7\Response; $files = array( 'http://mysite.com/1.csv', 'http://mysite.com/2.csv', 'http://mysite.com/3.csv', 'http://mysite.com/4.csv', ); // Track redirects so our Pool's fulfilled closure knows which URL the current // download is for. $client = new Client([ 'allow_redirects' => ['track_redirects' => true] ]); $requests = function($total) use ($client, $files, $import_path) { foreach ( $files as $file ) { yield function($poolOpts) use ($client, $file) { $reqOpts = array_merge($poolOpts, [ // Sink option specifies the download path for this file 'sink' => __DIR__.'/downloads/' . basename($file) ]); return $client->getAsync($file, $reqOpts); }; } }; $pool = new Pool($client, $requests(100), [ 'concurrency' => 3, 'fulfilled' => function(Response $response, $index) use ($files) { // Grab the URLs this file redirected through to download in chronological order. $urls = $response->getHeader(\GuzzleHttp\RedirectMiddleware::HISTORY_HEADER); echo "Downloaded ", end($urls), "<br/>\n"; }, 'rejected' => function(Exception $reason, $index) use (&$import_errors) { $url = (string)$reason->getRequest()->getUri(); echo "Failed to download ", $url, ": ", $reason->getMessage(), "<br/>\n"; }, ]); $pool->promise()->wait(); echo "Finished downloading.<br/>"; |