Try   HackMD

PHP, array_map vs foreach

About array_map:

array_map ( callable $callback , array $array1 [, array $... ] ) : array

Applies the callback to the elements of the given arrays.
array_map() returns an array containing the results of applying the callback function to the corresponding index of array1 (and if more arrays are provided) used as arguments for the callback.
The number of parameters that the callback function accepts should match the number of arrays passed to array_map().

Source: https://www.php.net/manual/en/function.array-map.php.

array_map provides a functional-like paradigm, preventing us from using foreach loops.

And now the question that led to writing this post is:
"What is the performance overhead of using array_map instead of a foreach loop to map an array?"

TL;DR.

PHP version used at time of writing is 7.4.2.

Benchmark

PHP code used can be found here.

Result:

# array_map.php foreach.php foreach_init.php
1 1.582978 0.107797 0.104314
2 1.502175 0.118180 0.099604
3 1.488776 0.105700 0.101177
4 1.489551 0.121292 0.100275
5 1.537671 0.106138 0.099592
6 1.550987 0.106349 0.100255
7 1.550303 0.105413 0.100309
8 1.545123 0.106352 0.101183
9 1.438311 0.105927 0.100107
10 1.618930 0.106874 0.100911

Times are in seconds.
Code executed on a MacBook Pro with a 2.4 GHz Dual-Core Intel Core i5 processor and 8 GB of memory.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

R code used to build this graphic can be found here and executed on RStudio.cloud (requires an RStudio.cloud account).

When we have volume (the benchmark code processed an array of 1 000 000 entries), foreach appears to be much faster than array_map.
Also, we notice that output array pre-allocation doesn't impact performance in any way.

But how does array_map work?
The rest of the post is an attempt to dive into the implementation of array_map.

array_map Implementation

array_map is implemented in array.c.

Array traversal

array_map array traversal uses core macros:

  • ZEND_HASH_FOREACH:

    ​​​​#define ZEND_HASH_FOREACH(_ht, indirect) do { \
    ​​​​  HashTable *__ht = (_ht); \
    ​​​​  Bucket *_p = __ht->arData; \
    ​​​​  Bucket *_end = _p + __ht->nNumUsed; \
    ​​​​  for (; _p != _end; _p++) { \
    ​​​​    zval *_z = &_p->val; \
    ​​​​    if (indirect && Z_TYPE_P(_z) == IS_INDIRECT) { \
    ​​​​      _z = Z_INDIRECT_P(_z); \
    ​​​​    } \
    ​​​​    if (UNEXPECTED(Z_TYPE_P(_z) == IS_UNDEF)) continue;
    
  • ZEND_HASH_FOREACH_END:

    ​​​​#define ZEND_HASH_FOREACH_END() \
    ​​​​    } \
    ​​​​  } while (0)
    
  • ZEND_HASH_FOREACH_KEY_VAL_IND:

    ​​​​#define ZEND_HASH_FOREACH_KEY_VAL_IND(ht, _h, _key, _val) \
    ​​​​  ZEND_HASH_FOREACH(ht, 1); \
    ​​​​  _h = _p->h; \
    ​​​​  _key = _p->key; \
    ​​​​  _val = _z;
    

An array is traversed this way:

ZEND_HASH_FOREACH_KEY_VAL_IND(Z_ARRVAL(arrays[0]), num_key, str_key, zv) {
  // ...
} ZEND_HASH_FOREACH_END();

If we expand the core macros, this gives us:

zend_ulong num_key;
zend_string *str_key;
zval *zv;

do {
  HashTable *__ht = (Z_ARRVAL(arrays[0]));
  Bucket *_p = __ht->arData;
  Bucket *_end = _p + __ht->nNumUsed;

  for (; _p != _end; _p++) {
    zval *_z = &_p->val;
    if (indirect && Z_TYPE_P(_z) == IS_INDIRECT) {
      _z = Z_INDIRECT_P(_z);
    }
    if (UNEXPECTED(Z_TYPE_P(_z) == IS_UNDEF)) continue;

    num_key = _p->h;
    str_key = _p->key;
    zv = _z;

    {
      // ...
    }
  }
} while (0)

Callback

Let's focus on the callback invocation:

zval *zv, arg;
int ret;
zend_fcall_info fci = empty_fcall_info;
zend_fcall_info_cache fci_cache = empty_fcall_info_cache;

{
  fci.retval = &result;
  fci.param_count = 1;
  fci.params = &arg;
  fci.no_separation = 0;

  ZVAL_COPY(&arg, zv);
  ret = zend_call_function(&fci, &fci_cache);
  
  // Some check-code was removed for clarity.
  
  if (str_key) {
    _zend_hash_append(Z_ARRVAL_P(return_value), str_key, &result);
  } else {
    zend_hash_index_add_new(Z_ARRVAL_P(return_value), num_key, &result);
  }  
}
  1. zv is a pointer to the array current value.
    arg is a pointer to the value to pass to the callback.
    They both have type zval.
  2. zv is copied to arg using ZVAL_COPY (which internaly calls ZVAL_COPY_VALUE_EX):
    w2, of type uint32_t, is a member of zend_value.
​​arg->value.ww.w2 = zv->value.ww.w2;
  1. Callback is invoked using zend_call_function function.
    Lot of things happen in this function before executing the callback.
  2. Output array is filled using _zend_hash_append (for string key) or zend_hash_index_add_new (for numeric key).

Wrap up

While array_map offers a beautiful way to perform array mapping, it appears to be a slower solution with huge arrays.
This can be explained by the extra processing internaly done, which makes it safe and robust (handling of reference counter, etc.).

Even if foreach is a faster solution, extreme precautions will have to be taken when used, as we'll have to take care of everything on our own.

This is each developper decision to find the right trade-off between code speed and clarity/maintainability.

Annex

Benchmark Code

Code used for benchmarking.

array_create.php

<?php
CONST ENTRY_COUNT = 1000000;

function array_create(int $count = ENTRY_COUNT): array {
    return array_map(
        function($value) {
            return rand();
        },
        range(0, ENTRY_COUNT - 1)
    );
}

array_map.php

<?php
require_once "array_create.php";

$in = array_create();

$start = microtime(true);

$out = array_map(
    function($value) {
        return $value;
    },
    $in
);

$end = microtime(true);
echo sprintf("%f\n", $end - $start);

foreach.php

<?php
require_once "array_create.php";

$in = array_create();

$start = microtime(true);

$out = [];
foreach ($in as $key => $value) {
    $out[$key] = $value;
}

$end = microtime(true);
echo sprintf("%f\n", $end - $start);

Output array is built on the fly.

foreach_init.php

<?php
require_once "array_create.php";

$in = array_create();

$start = microtime(true);

$out = array_fill(0, count($in), null);
foreach ($in as $key => $value) {
    $out[$key] = $value;
}

$end = microtime(true);
echo sprintf("%f\n", $end - $start);

Output array is pre-allocated.

Benchmark Script

Shell script:

for pass in {1..10}
do
    echo "Pass ${pass}"
    php -f array_map.php
    php -f foreach.php
    php -f foreach_init.php
done

Benchmark Dataviz

R Code:

library(tidyverse)

bench_data <- tibble(
  pass = 1:10,
  array_map = c(1.582978, 1.502175, 1.488776, 1.489551, 1.537671, 1.550987, 1.550303, 1.545123, 1.438311, 1.618930),
  foreach = c(0.107797, 0.118180, 0.105700, 0.121292, 0.106138, 0.106349, 0.105413, 0.106352, 0.105927, 0.106874),
  foreach_init = c(0.104314, 0.099604, 0.101177, 0.100275, 0.099592, 0.100255, 0.100309, 0.101183, 0.100107, 0.100911)
)

bench_data_long <- bench_data %>%
  pivot_longer(cols = c(array_map, foreach, foreach_init), names_to = "method", values_to = "time") %>%
  mutate(method = as.factor(method))

bench_data_long %>%
  ggplot(aes(pass, time, group = method, colour = method)) +
    geom_line() +
    geom_point() +
    scale_x_continuous("Pass", limits = range(bench_data_long$pass), breaks = bench_data$pass) +
    scale_y_continuous("Execution Time (sec)", limits = c(0, max(bench_data_long$time))) +
    ggtitle("PHP array_map vs foreach Benchmark")