Posted in PHP

If you need to do multiple CURL requests with PHP, curl_multi is a great way to do it. The problem with curl_multi is that it waits for all requests to complete before it begins processing any of the responses – for example if you’re making 100 requests and just one is slow, the other 99 will be held off for processing until the one has come back. This is wasteful so I’ve whipped up a script originally written by Josh Fraser with my own modifications and improvements that will begin processing a response immediately before dispatching the next request.
 

Usage:

The $callback (second) argument of rolling_url() can take any callback supported by PHP.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
function curl_multi_download(array $urls, callable $callback, array $custom_options = array())
{
	// make sure the rolling window isn't greater than the # of urls
	$rolling_window = 5;
	$rolling_window = (sizeof($urls) < $rolling_window) ? sizeof($urls) : $rolling_window;
 
	$master = curl_multi_init();
	$curl_arr = array();
	$options = array(
		CURLOPT_RETURNTRANSFER => true,
		CURLOPT_FOLLOWLOCATION => true,
		CURLOPT_MAXREDIRS => 5,
	) + $custom_options;
 
	// start the first batch of requests
	for ( $i = 0; $i < $rolling_window; $i++ )
	{
		$ch = curl_init();
		$options[CURLOPT_URL] = $urls[$i];
		curl_setopt_array($ch, $options);
		curl_multi_add_handle($master, $ch);
	}
 
	do
	{
		while(($execrun = curl_multi_exec($master, $running)) == CURLM_CALL_MULTI_PERFORM);
		if($execrun != CURLM_OK)
			break;
		// a request was just completed -- find out which one
		while( $done = curl_multi_info_read($master) )
		{
			$info = curl_getinfo($done['handle']);
 
			// request successful.  process output using the callback function.
			$output = curl_multi_getcontent($done['handle']);
			call_user_func_array($callback, array($info, $output));
 
			if ( isset($urls[$i+1]) )
			{
				// start a new request (it's important to do this before removing the old one)
				$ch = curl_init();
				$options[CURLOPT_URL] = $urls[$i++];  // increment i
				curl_setopt_array($ch, $options);
				curl_multi_add_handle($master, $ch);
			}
 
			// remove the curl handle that just completed
			curl_multi_remove_handle($master, $done['handle']);
		}
	} while ($running);
 
	curl_multi_close($master);
	return true;
}

 

Bonus Round

Here’s a curl_download() function, used to grab the response from a single URL (For extra points switch out curl_exec() with curl_exec_utf8() using the same method as above):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
function curl_multi_getcontent_utf8( $ch )
{
	$data = curl_multi_getcontent( $ch );
	if ( !is_string($data) )
		return $data;
 
	unset($charset);
	$content_type = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
 
	/* 1: HTTP Content-Type: header */
	preg_match( '@([\w/+]+)(;\s*charset=(\S+))?@i', $content_type, $matches );
	if ( isset( $matches[3] ) )
		$charset = $matches[3];
 
	/* 2: <meta> element in the page */
	if ( !isset($charset) )
	{
		preg_match( '@<meta\s+http-equiv="Content-Type"\s+content="([\w/]+)(;\s*charset=([^\s"]+))?@i', $data, $matches );
		if ( isset( $matches[3] ) )
			$charset = $matches[3];
	}
 
	/* 3: <xml> element in the page */
	if ( !isset($charset) )
	{
		preg_match( '@<\?xml.+encoding="([^\s"]+)@si', $data, $matches );
		if ( isset( $matches[1] ) )
			$charset = $matches[1];
	}
 
	/* 4: PHP's heuristic detection */
	if ( !isset($charset) )
	{
		$encoding = mb_detect_encoding($data);
		if ($encoding)
			$charset = $encoding;
	}
 
	/* 5: Default for HTML */
	if ( !isset($charset) )
	{
		if (strstr($content_type, "text/html") === 0)
			$charset = "ISO 8859-1";
	}
 
	/* Convert it if it is anything but UTF-8 */
	/* You can change "UTF-8"  to "UTF-8//IGNORE" to
	   ignore conversion errors and still output something reasonable */
	if ( isset($charset) && strtoupper($charset) != "UTF-8" )
		$data = iconv($charset, 'UTF-8', $data);
 
	return $data;
}

See inside for the full code.

Read More »

Posted (Updated ) in PHP

If you’ve ever come across the infuriating error

htmlspecialchars(): Invalid multibyte sequence in argument

I have a simple solution for you: Turn display_errors on in your php.ini file!

It turns out there’s a weird bug that doesn’t appear to be getting fixed any time soon that causes htmlspecialchars() to display this error only when display_errors is set to Off.

See this post for further details and a very big thank you to the Andy Young for writing it and saving me (and I’m sure many others) alot of time!

Read More »