PHP array_map vs foreach

--- tags: dev, php title: PHP array_map vs foreach description: an attempt to understand why array_map is slower than foreach when traversing an array --- # PHP, array_map vs foreach About [array_map](https://www.php.net/manual/en/function.array-map.php): ```php array_map ( callable $callback , array $array1 [, array $... ] ) : array ``` > Applies the callback to the elements of the given arrays. > `array_map()` returns an array containing the results of applying the callback function to the corresponding index of array1 (and ... if more arrays are provided) used as arguments for the callback. > The number of parameters that the callback function accepts should match the number of arrays passed to array_map(). *Source: <https://www.php.net/manual/en/function.array-map.php>.* `array_map` provides a functional-like paradigm, preventing us from using [`foreach`](https://www.php.net/manual/en/control-structures.foreach.php) loops. And now the question that led to writing this post is: "What is the performance overhead of using `array_map` instead of a `foreach` loop to map an array?" [TL;DR](#Wrap-up). *PHP version used at time of writing is **7.4.2**.* ## Benchmark *PHP code used can be found [here](#Benchmark-Code).* Result: | # | [array_map.php](#array_map.php) | [foreach.php](#foreach.php) | [foreach_init.php](#foreach_init.php) | |:-:|--:|--:|--:| | 1 | 1.582978 | 0.107797 | 0.104314 | | 2 | 1.502175 | 0.118180 | 0.099604 | | 3 | 1.488776 | 0.105700 | 0.101177 | | 4 | 1.489551 | 0.121292 | 0.100275 | | 5 | 1.537671 | 0.106138 | 0.099592 | | 6 | 1.550987 | 0.106349 | 0.100255 | | 7 | 1.550303 | 0.105413 | 0.100309 | | 8 | 1.545123 | 0.106352 | 0.101183 | | 9 | 1.438311 | 0.105927 | 0.100107 | | 10 | 1.618930 | 0.106874 | 0.100911 | *Times are in seconds.* *Code executed on a MacBook Pro with a 2.4 GHz Dual-Core Intel Core i5 processor and 8 GB of memory.* ![](https://i.imgur.com/9c0exSC.png) *R code used to build this graphic can be found [here](#Benchmark-Dataviz) and executed on [RStudio.cloud](https://rstudio.cloud/project/1116509) (requires an [RStudio.cloud](https://rstudio.cloud/) account).* When we have volume (the benchmark code processed an array of 1 000 000 entries), `foreach` appears to be much faster than `array_map`. Also, we notice that [output array pre-allocation](#foreach_init.php) doesn't impact performance in any way. But how does `array_map` work? The rest of the post is an attempt to dive into the implementation of `array_map`. ## array_map Implementation `array_map` is implemented in [array.c](https://github.com/php/php-src/blob/36935e42eac054a422b7edade69923db7727f268/ext/standard/array.c#L6101). ### Array traversal `array_map` array traversal uses core macros: - [ZEND_HASH_FOREACH](https://github.com/php/php-src/blob/master/Zend/zend_hash.h#L917): ```c #define ZEND_HASH_FOREACH(_ht, indirect) do { \ HashTable *__ht = (_ht); \ Bucket *_p = __ht->arData; \ Bucket *_end = _p + __ht->nNumUsed; \ for (; _p != _end; _p++) { \ zval *_z = &_p->val; \ if (indirect && Z_TYPE_P(_z) == IS_INDIRECT) { \ _z = Z_INDIRECT_P(_z); \ } \ if (UNEXPECTED(Z_TYPE_P(_z) == IS_UNDEF)) continue; ``` - [ZEND_HASH_FOREACH_END](https://github.com/php/php-src/blob/master/Zend/zend_hash.h#L941): ```c #define ZEND_HASH_FOREACH_END() \ } \ } while (0) ``` - [ZEND_HASH_FOREACH_KEY_VAL_IND](https://github.com/php/php-src/blob/master/Zend/zend_hash.h#L1016): ```c #define ZEND_HASH_FOREACH_KEY_VAL_IND(ht, _h, _key, _val) \ ZEND_HASH_FOREACH(ht, 1); \ _h = _p->h; \ _key = _p->key; \ _val = _z; ``` An array is traversed this way: ```c ZEND_HASH_FOREACH_KEY_VAL_IND(Z_ARRVAL(arrays[0]), num_key, str_key, zv) { // ... } ZEND_HASH_FOREACH_END(); ``` If we expand the core macros, this gives us: ```c zend_ulong num_key; zend_string *str_key; zval *zv; do { HashTable *__ht = (Z_ARRVAL(arrays[0])); Bucket *_p = __ht->arData; Bucket *_end = _p + __ht->nNumUsed; for (; _p != _end; _p++) { zval *_z = &_p->val; if (indirect && Z_TYPE_P(_z) == IS_INDIRECT) { _z = Z_INDIRECT_P(_z); } if (UNEXPECTED(Z_TYPE_P(_z) == IS_UNDEF)) continue; num_key = _p->h; str_key = _p->key; zv = _z; { // ... } } } while (0) ``` ### Callback Let's focus on the callback invocation: ```c zval *zv, arg; int ret; zend_fcall_info fci = empty_fcall_info; zend_fcall_info_cache fci_cache = empty_fcall_info_cache; { fci.retval = &result; fci.param_count = 1; fci.params = &arg; fci.no_separation = 0; ZVAL_COPY(&arg, zv); ret = zend_call_function(&fci, &fci_cache); // Some check-code was removed for clarity. if (str_key) { _zend_hash_append(Z_ARRVAL_P(return_value), str_key, &result); } else { zend_hash_index_add_new(Z_ARRVAL_P(return_value), num_key, &result); } } ``` 1. `zv` is a pointer to the array current value. `arg` is a pointer to the value to pass to the callback. *They both have type [`zval`](https://github.com/php/php-src/blob/43443857b74503246ee4ca25859b302ed0ebc078/Zend/zend_types.h#L302).* 2. `zv` is copied to `arg` using [ZVAL_COPY](https://github.com/php/php-src/blob/43443857b74503246ee4ca25859b302ed0ebc078/Zend/zend_types.h#L1213) (which internaly calls [ZVAL_COPY_VALUE_EX](https://github.com/php/php-src/blob/43443857b74503246ee4ca25859b302ed0ebc078/Zend/zend_types.h#L1187)): *`w2`, of type `uint32_t`, is a member of [zend_value](https://github.com/php/php-src/blob/43443857b74503246ee4ca25859b302ed0ebc078/Zend/zend_types.h#L282).* ```c arg->value.ww.w2 = zv->value.ww.w2; ``` 3. Callback is invoked using [zend_call_function](https://github.com/php/php-src/blob/master/Zend/zend_execute_API.c#L642) function. *Lot of things happen in this function before executing the callback.* 4. Output array is filled using [_zend_hash_append](https://github.com/php/php-src/blob/33ef3d64dac366733f2af40d5bce2bac4e5bca1e/Zend/zend_hash.h#L1143) (for string key) or [zend_hash_index_add_new](https://github.com/php/php-src/blob/dcbf020f76a2568b855508c812f68170884398da/Zend/zend_hash.c#L1070) (for numeric key). ## Wrap up While `array_map` offers a beautiful way to perform array mapping, it appears to be a **slower** solution with **huge arrays**. This can be explained by the extra processing internaly done, which makes it safe and robust (handling of reference counter, etc.). Even if `foreach` is a **faster** solution, extreme **precautions** will have to be taken when used, as we'll have to take care of everything on our own. This is each developper decision to find the right trade-off between code speed and clarity/maintainability. ## Annex ### Benchmark Code Code used for benchmarking. #### array_create.php ```php <?php CONST ENTRY_COUNT = 1000000; function array_create(int $count = ENTRY_COUNT): array { return array_map( function($value) { return rand(); }, range(0, ENTRY_COUNT - 1) ); } ``` #### array_map.php ```php <?php require_once "array_create.php"; $in = array_create(); $start = microtime(true); $out = array_map( function($value) { return $value; }, $in ); $end = microtime(true); echo sprintf("%f\n", $end - $start); ``` #### foreach.php ```php <?php require_once "array_create.php"; $in = array_create(); $start = microtime(true); $out = []; foreach ($in as $key => $value) { $out[$key] = $value; } $end = microtime(true); echo sprintf("%f\n", $end - $start); ``` *Output array is built on the fly.* #### foreach_init.php ```php <?php require_once "array_create.php"; $in = array_create(); $start = microtime(true); $out = array_fill(0, count($in), null); foreach ($in as $key => $value) { $out[$key] = $value; } $end = microtime(true); echo sprintf("%f\n", $end - $start); ``` *Output array is pre-allocated.* #### Benchmark Script Shell script: ```bash for pass in {1..10} do echo "Pass ${pass}" php -f array_map.php php -f foreach.php php -f foreach_init.php done ``` #### Benchmark Dataviz [R](https://cran.r-project.org/) Code: ```r library(tidyverse) bench_data <- tibble( pass = 1:10, array_map = c(1.582978, 1.502175, 1.488776, 1.489551, 1.537671, 1.550987, 1.550303, 1.545123, 1.438311, 1.618930), foreach = c(0.107797, 0.118180, 0.105700, 0.121292, 0.106138, 0.106349, 0.105413, 0.106352, 0.105927, 0.106874), foreach_init = c(0.104314, 0.099604, 0.101177, 0.100275, 0.099592, 0.100255, 0.100309, 0.101183, 0.100107, 0.100911) ) bench_data_long <- bench_data %>% pivot_longer(cols = c(array_map, foreach, foreach_init), names_to = "method", values_to = "time") %>% mutate(method = as.factor(method)) bench_data_long %>% ggplot(aes(pass, time, group = method, colour = method)) + geom_line() + geom_point() + scale_x_continuous("Pass", limits = range(bench_data_long$pass), breaks = bench_data$pass) + scale_y_continuous("Execution Time (sec)", limits = c(0, max(bench_data_long$time))) + ggtitle("PHP array_map vs foreach Benchmark") ```