Extension Writing Part II: Parameters, Arrays, and ZVALs

June 6, 2005

Tutorials

Introduction
Accepting Values
The ZVAL
Creating ZVALs
Arrays
Symbol Tables as Arrays
Reference Counting
. Copies versus References
. Sanity Check
. What’s Next?


Introduction

In Part One of this series you looked at the basic framework of a PHP extension. You declared simple functions that returned both static and dynamic values to the calling script, defined INI options, and declared internal values (globals). In this tutorial, you’ll learn how to accept values passed into your functions from a calling script and discover how PHP and the Zend Engine manage variables internally.


Accepting Values

Unlike in userspace code, the parameters for an internal function aren’t actually declared in the function header. Instead, a reference to the parameter list is passed into every function – whether parameters were passed or not – and that function can then ask the Zend Engine to turn them into something usable.

Let’s take a look at this by defining a new function, hello_greetme(), which will accept one parameter and output it along with some greeting text. As before, we’ll be adding code in three places:

In php_hello.h, next to the other function prototypes:


PHP_FUNCTION(hello_greetme);

In hello.c, at the end of the hello_functions structure:


    PHP_FE(hello_bool, NULL)
    PHP_FE(hello_null, NULL)
    PHP_FE(hello_greetme, NULL)
    {NULL, NULL, NULL}
};

And down near the end of hello.c after the other functions:


PHP_FUNCTION(hello_greetme)
{
    char *name;
    int name_len;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &name, &name_len) == FAILURE) {
        RETURN_NULL();
    }

    php_printf("Hello %s
", name);

    RETURN_TRUE;
}

The bulk of the zend_parse_parameters() block will almost always look the same. ZEND_NUM_ARGS() provides a hint to the Zend Engine about the parameters which are to be retrieved, TSRMLS_CC is present to ensure thread safety, and the return value of the function is checked for SUCCESS or FAILURE. Under normal circumstances zend_parse_parameters() will return SUCCESS; however, if a calling script has attempted to pass too many or too few parameters, or if the parameters passed cannot be converted to the proper data type, Zend will automatically output an error message leaving your function to return control back to the calling script gracefully.

In this example you specified s to indicate that this function expects one and only one parameter to be passed, and that that parameter should be converted into a string data type and populated into the char* variable, which will be passed by reference (i.e. by name).

Note that an int variable was also passed into zend_parse_parameters() by reference. This allows the Zend Engine to provide the length of the string in bytes so that binary-safe functions don’t need to rely on strlen(name) to determine the string’s length. In fact, using strlen(name) may not even give the correct result, as name may contain one or more NULL characters prior to the end of the string.

Once your function has the name parameter firmly in hand, the next thing it does is output it as part of a formal greeting. Notice that php_printf() is used rather than the more familiar printf(). Using this function is important for several reasons. First, it allows the string being output to be processed through PHP’s output buffering mechanism, which may, in addition to actually buffering the data, perform additional processing such as gzip compression. Secondly, while stdout is a perfectly fine target for output when using CLI or CGI, most SAPIs expect output to come via a specific pipe or socket. Therefore, attempting to simply printf() to stdout could lead to data being lost, sent out of order, or corrupted, because it bypassed preprocessing.

Finally the function returns control to the calling program by simply returning TRUE. While you could allow control to reach the end of your function without explicitly returning a value (it will default to NULL), this is considered bad practice. A function that doesn’t have anything meaningful to report should typically return TRUE simply to say, “Everything’s cool, I did what you wanted”.

Because PHP strings may, in fact, contain NULLs, the way to output a binary-safe string, including NULLs and even characters following NULLs, would be to replace the php_printf() statement with the following block:


php_printf("Hello ");
PHPWRITE(name, name_len);
php_printf("
");

This block uses php_printf() to handle the strings which are known not to contain NULL characters, but uses another macro – PHPWRITE – to handle the user-provided string. This macro accepts the length (name_len) parameter provided by zend_parse_parameters() so that the entire contents of name can be printed out regardless of a stray NULL.

zend_parse_parameters() will also handle optional parameters. In the next example, you’ll create a function which expects a long (PHP’s integer data type), a double (float), and an optional Boolean value. A userspace declaration for this function might look something like:


function hello_add($a, $b, $return_long = false) {

    $sum = (int)$a + (float)$b;

    if ($return_long) {
        return
intval($sum);
    } else {
        return
floatval($sum);
    }
}

In C, this function will look like the following (don’t forget to add entries in php_hello.h and hello_functions[] to enable this when you add it to hello.c):


PHP_FUNCTION(hello_add)
{
    long a;
    double b;
    zend_bool return_long = 0;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "ld|b", &a, &b, &return_long) == FAILURE) {
        RETURN_NULL();
    }

    if (return_long) {
        RETURN_LONG(a + b);
    } else {
        RETURN_DOUBLE(a + b);
    }
}

This time, your data type string reads as: “I want a (l)ong, then a (d)ouble”. The next character, a pipe, signifies that the rest of the parameter list is optional. If an optional parameter is not passed during the function call, then zend_parse_parameters() will not change the value passed into it. The final b is, of course, for Boolean. After the data type string, a, b, and return_long are passed through by reference so that zend_parse_parameters() can populate them with values.

Warning: while int and long are often used interchangeably on 32-bit platforms, using one in place of the other can be very dangerous when your code is recompiled on 64-bit hardware. So remember to use long for longs, and int for string lengths.

Table 1 shows the various types, and their corresponding letter codes and C types which can be used with zend_parse_parameters():

Table 1: Types and letter codes used in zend_parse_parameters()
Type Code Variable Type
Boolean b
zend_bool
Long l
long
Double d
double
String s
char*, int
Resource r
zval*
Array a
zval*
Object o
zval*
zval z
zval*

You probably noticed right away that the last four types in Table 1 all return the same data type – a zval*. A zval, as you’ll soon learn, is the true data type which all userspace variables in PHP are stored as. The three “complex” data types, Resource, Array and Object, are type-checked by the Zend Engine when their data type codes are used with zend_parse_parameters(), but because they have no corresponding data type in C, no conversion is actually performed.


The ZVAL

The zval, and PHP userspace variables in general, will easily be the most difficult concepts you’ll need to wrap your head around. They will also be the most vital. To begin with, let’s look at the structure of a zval:


struct {
    union {
        long lval;
        double dval;
        struct {
            char *val;
            int len;
        } str;
        HashTable *ht;
        zend_object_value obj;
    } value;
    zend_uint refcount;
    zend_uchar type;
    zend_uchar is_ref;
} zval;

As you can see, every zval has three basic elements in common: type, is_ref, and refcount. is_ref and refcount will be covered later on in this tutorial; for now let’s focus on type.

By now you should already be familiar with PHP’s eight data types. They’re the seven listed in Table 1, plus NULL, which despite (or perhaps because of) the fact that it literally is nothing, is a type unto its own. Given a particular zval, the type can be examined using one of three convenience macros: Z_TYPE(zval), Z_TYPE_P(zval*), or Z_TYPE_PP(zval**). The only functional difference between these three is the level of indirection expected in the variable passed into it. The convention of using _P and _PP is repeated in other macros, such as the *VAL macros you’re about to look at.

The value of type determines which portion of the zval‘s value union will be set. The following piece of code demonstrates a scaled down version of var_dump():


PHP_FUNCTION(hello_dump)
{
    zval *uservar;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", uservar) == FAILURE) {
        RETURN_NULL();
    }

    switch (Z_TYPE_P(uservar)) {
        case IS_NULL:
            php_printf("NULL
");
            break;
        case IS_BOOL:
            php_printf("Boolean: %s
", Z_LVAL_P(uservar) ? "TRUE" : "FALSE");
            break;
        case IS_LONG:
            php_printf("Long: %ld
", Z_LVAL_P(uservar));
            break;
        case IS_DOUBLE:
            php_printf("Double: %f
", Z_DVAL_P(uservar));
            break;
        case IS_STRING:
            php_printf("String: ");
            PHPWRITE(Z_STRVAL_P(uservar), Z_STRLEN_P(uservar));
            php_printf("
");
            break;
        case IS_RESOURCE:
            php_printf("Resource
");
            break;
        case IS_ARRAY:
            php_printf("Array
");
            break;
        case IS_OBJECT:
            php_printf("Object
");
            break;
        default:
            php_printf("Unknown
");
    }

    RETURN_TRUE;
}

As you can see, the Boolean data type shares the same internal element as the long data type. Just as with RETURN_BOOL(), which you used in Part One of this series, FALSE is represented by 0, while TRUE is represented by 1.

When you use zend_parse_parameters() to request a specific data type, such as string, the Zend Engine checks the type of the incoming variable. If it matches, Zend simply passes through the corresponding parts of the zval to the right data types. If it’s of a different type, Zend converts it as is appropriate and/or possible, using its usual type-juggling rules.

Modify the hello_greetme() function you implemented earlier by separating it out into smaller pieces:


PHP_FUNCTION(hello_greetme)
{
    zval *zname;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &zname) == FAILURE) {
        RETURN_NULL();
    }

    convert_to_string(zname);

    php_printf("Hello ");
    PHPWRITE(Z_STRVAL_P(zname), Z_STRLEN_P(zname));
    php_printf("
");

    RETURN_TRUE;
}

This time, zend_parse_parameters() was told to simply retrieve a PHP variable (zval) regardless of type, then the function explicitly cast the variable as a string (similar to $zname = (string)$zname; ), then php_printf() was called using the STRing VALue of the zname structure. As you’ve probably guessed, other convert_to_*() functions exist for bool, long, and double.


Creating ZVALs

So far, the zvals you’ve worked with have been allocated by the Zend Engine and will be freed the same way. Sometimes, however, it’s necessary to create your own zval. Consider the following block of code:


{
    zval *temp;

    ALLOC_INIT_ZVAL(temp);

    Z_TYPE_P(temp) = IS_LONG;
    Z_LVAL_P(temp) = 1234;

    zval_ptr_dtor(&temp);
}

ALLOC_INIT_ZVAL(), as its name implies, allocates memory for a zval* and initializes it as a new variable. Once that’s done, the Z_*_P() macros can be used to set the type and value of this variable. zval_ptr_dtor() handles the dirty work of cleaning up the memory allocated for the variable.

The two Z_*_P() calls could have actually been reduced to a single statement:


ZVAL_LONG(temp, 1234);

Similar macros exist for the other types, and follow the same syntax as the RETURN_*() macros you saw in Part One of thise series. In fact the RETURN_*() macros are just thin wrappers for RETVAL_*() and, by extension, ZVAL_*(). The following five versions are all identical:


RETURN_LONG(42);

RETVAL_LONG(42);
return;

ZVAL_LONG(return_value, 42);
return;

Z_TYPE_P(return_value) = IS_LONG;
Z_LVAL_P(return_value) = 42;
return;

return_value->type = IS_LONG;
return_value->value.lval = 42;
return;

If you’re sharp, you’re thinking about the impact of how these macros are defined on the way they’re used in functions like hello_long(). “Where does return_value come from and why isn’t it being allocated with ALLOC_INIT_ZVAL()?”, you might be wondering.

While it may be hidden from you in your day-to-day extension writing, return_value is actually a function parameter defined in the prototype of every PHP_FUNCTION() definition. The Zend Engine allocates memory for it and initializes it as NULL so that even if your function doesn’t explicitly set it, a value will still be available to the calling program. When your internal function finishes executing, it passes that value to the calling program, or frees it if the calling program is written to ignore it.


Arrays

Since you’ve used PHP in the past, you’ve already recognized an array as a variable whose purpose is to carry around other variables. The way this is represented internally is through a structure known as a HashTable. When creating arrays to be returned to PHP, the simplest approach involves using one of the functions listed in Table 2.

Table 2: zval array creation functions
PHP Syntax C Syntax (arr is a zval*) Meaning
$arr = array();
array_init(arr);
Initialize a new array
$arr[] = NULL;
add_next_index_null(arr);
Add a value of a given type to
a numerically indexed array
$arr[] = 42;
add_next_index_long(arr, 42);
$arr[] = true;
add_next_index_bool(arr, 1);
$arr[] = 3.14;
add_next_index_double(arr, 3.14);
$arr[] = 'foo';
add_next_index_string(arr, "foo", 1);
$arr[] = $myvar;
add_next_index_zval(arr, myvar);
$arr[0] = NULL;
add_index_null(arr, 0);
Add a value of a given type to
a specific index in an array
$arr[1] = 42;
add_index_long(arr, 1, 42);
$arr[2] = true;
add_index_bool(arr, 2, 1);
$arr[3] = 3.14;
add_index_double(arr, 3, 3.14);
$arr[4] = 'foo';
add_index_string(arr, 4, "foo", 1);
$arr[5] = $myvar;
add_index_zval(arr, 5, myvar);
$arr['abc'] = NULL;
add_assoc_null(arr, "abc");
Add a value of a given type to
an associatively indexed array
$arr['def'] = 711;
add_assoc_long(arr, "def", 711);
$arr['ghi'] = true;
add_assoc_bool(arr, "ghi", 1);
$arr['jkl'] = 1.44;
add_assoc_double(arr, "jkl", 1.44);
$arr['mno'] = 'baz';
add_assoc_string(arr, "mno", "baz", 1);
$arr['pqr'] = $myvar;
add_assoc_zval(arr, "pqr", myvar);

As with the RETURN_STRING() macro, the add_*_string() functions take a 1 or a 0 in the final parameter to indicate whether the string contents should be copied. They also have a kissing cousin in the form of an add_*_stringl() variant for each. The l indicates that the length of the string will be explicitly provided (rather than having the Zend Engine determine this with a call to strval(), which is binary-unsafe).

Using this binary-safe form is as simple as specifying the length just before the duplication parameter, like so:


add_assoc_stringl(arr, "someStringVar", "baz", 3, 1);

Using the add_assoc_*() functions, all array keys are assumed to contain no NULLs – the add_assoc_*() functions themselves are not binary-safe with respect to keys. Using keys with NULLs in them is discouraged (as it is already a technique used with protected and private object properties), but if doing so is necessary, you’ll learn how you can do it soon enough, when we get into the zend_hash_*() functions later.

To put what you’ve just learned into practice, create the following function to return an array of values to the calling program. Be sure to add entries to php_hello.h and hello_functions[] so as to properly declare this function.


PHP_FUNCTION(hello_array)
{
    char *mystr;
    zval *mysubarray;

    array_init(return_value);

    add_index_long(return_value, 42, 123);

    add_next_index_string(return_value, "I should now be found at index 43", 1);

    add_next_index_stringl(return_value, "I'm at 44!", 10, 1);

    mystr = estrdup("Forty Five");
    add_next_index_string(return_value, mystr, 0);

    add_assoc_double(return_value, "pi", 3.1415926535);

    ALLOC_INIT_ZVAL(mysubarray);
    array_init(mysubarray);
    add_next_index_string(mysubarray, "hello", 1);
    add_assoc_zval(return_value, "subarray", mysubarray);    
}

Building this extension and issuing var_dump(hello_array()); gives:

array(6) {
  [42]=>
  int(123)
  [43]=>
  string(33) "I should now be found at index 43"
  [44]=>
  string(10) "I'm at 44!"
  [45]=>
  string(10) "Forty Five"
  ["pi"]=>
  float(3.1415926535)
  ["subarray"]=>
  array(1) {
    [0]=>
    string(5) "hello"
  }
}

Reading values back out of arrays means extracting them as zval**s directly from a HashTable using the zend_hash family of functions from the ZENDAPI. Let’s start with a simple function which accepts one array as a parameter:


function hello_array_strings($arr) {

    if (!is_array($arr)) return NULL;

    printf("The array passed contains %d elements
"
, count($arr));

    foreach($arr as $data) {
        if (
is_string($data)) echo "$data
"
;
    }
}

Or, in C:


PHP_FUNCTION(hello_array_strings)
{
    zval *arr, **data;
    HashTable *arr_hash;
    HashPosition pointer;
    int array_count;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "a", &arr) == FAILURE) {
        RETURN_NULL();
    }

    arr_hash = Z_ARRVAL_P(arr);
    array_count = zend_hash_num_elements(arr_hash);

    php_printf("The array passed contains %d elements
", array_count);

    for(zend_hash_internal_pointer_reset_ex(arr_hash, &pointer); zend_hash_get_current_data_ex(arr_hash, (void**) &data, &pointer) == SUCCESS; zend_hash_move_forward_ex(arr_hash, &pointer)) {

        if (Z_TYPE_PP(data) == IS_STRING) {
            PHPWRITE(Z_STRVAL_PP(data), Z_STRLEN_PP(data));
            php_printf("
");
        }
    }
    RETURN_TRUE;
}

In this function only those array elements which are of type string are output, in order to keep the function brief. You may be wondering why we didn’t just use convert_to_string() as we did in the hello_greetme() function earlier. Let’s give that a shot; replace the for loop above with the following:


    for(zend_hash_internal_pointer_reset_ex(arr_hash, &pointer); zend_hash_get_current_data_ex(arr_hash, (void**) &data, &pointer) == SUCCESS; zend_hash_move_forward_ex(arr_hash, &pointer)) {

        convert_to_string_ex(data);
        PHPWRITE(Z_STRVAL_PP(data), Z_STRLEN_PP(data));
        php_printf("
");
    }

Now compile your extension again and run the following userspace code through it:


<?php

$a = array('foo',123);
var_dump($a);
hello_array_strings($a);
var_dump($a);

?>

Notice that the original array was changed! Remember, the convert_to_*() functions have the same effect as calling set_type(). Since you’re working with the same array that was passed in, changing its type here will change the original variable. In order to avoid this, you need to first make a copy of the zval. To do this, change that for loop again to the following:


    for(zend_hash_internal_pointer_reset_ex(arr_hash, &pointer); zend_hash_get_current_data_ex(arr_hash, (void**) &data, &pointer) == SUCCESS; zend_hash_move_forward_ex(arr_hash, &pointer)) {

        zval temp;

        temp = **data;
        zval_copy_ctor(&temp);
        convert_to_string(&temp);
        PHPWRITE(Z_STRVAL(temp), Z_STRLEN(temp));
        php_printf("
");
        zval_dtor(&temp);
    }

The more obvious part of this version – temp = **data – just copies the data members of the original zval, but since a zval may contain additional resource allocations like char* strings, or HashTable* arrays, the dependent resources need to be duplicated with zval_copy_ctor(). From there it’s just an ordinary convert, print, and a final zval_dtor() to get rid of the resources used by the copy.

If you’re wondering why you didn’t do a zval_copy_ctor() when we first introduced convert_to_string(), it’s because the act of passing a variable into a function automatically performs a copy separating the zval from the original variable. This is only done on the base zval through, so any subordinate resources (such as array elements and object properties) still need to be separated before use.

Now that you’ve seen array values, let’s extend the exercise a bit by looking at the keys as well:


for(zend_hash_internal_pointer_reset_ex(arr_hash, &pointer); zend_hash_get_current_data_ex(arr_hash, (void**) &data, &pointer) == SUCCESS; zend_hash_move_forward_ex(arr_hash, &pointer)) {

    zval temp;
    char *key;
    int key_len;
    long index;

    if (zend_hash_get_current_key_ex(arr_hash, &key, &key_len, &index, 0, &pointer) == HASH_KEY_IS_STRING) {
        PHPWRITE(key, key_len);
    } else {
        php_printf("%ld", index);
    }

    php_printf(" => ");

    temp = **data;
    zval_copy_ctor(&temp);
    convert_to_string(&temp);
    PHPWRITE(Z_STRVAL(temp), Z_STRLEN(temp));
    php_printf("
");
    zval_dtor(&temp);
}

Remember that arrays can have numeric indexes, associative string keys, or both. Calling zend_hash_get_current_key_ex() makes it possible to fetch either type from the current position in the array, and determine its type based on the return values, which may be any of HASH_KEY_IS_STRING, HASH_KEY_IS_LONG, or HASH_KEY_NON_EXISTANT. Since zend_hash_get_current_data_ex() was able to return a zval**, you can safely assume that HASH_KEY_NON_EXISTANT will not be returned, so only the IS_STRING and IS_LONG possibilities need to be checked.

There’s another way to iterate through a HashTable. The Zend Engine exposes three very similar functions to accommodate this task: zend_hash_apply(), zend_hash_apply_with_argument(), and zend_hash_apply_with_arguments(). The first form just loops through a HashTable, the second form allows a single argument to be passed through as a void*, while the third form allows an unlimited number of arguments via a vararg list. hello_array_walk() shows each of these in action:


static int php_hello_array_walk(zval **element TSRMLS_DC)
{
    zval temp;

    temp = **element;
    zval_copy_ctor(&temp);
    convert_to_string(&temp);
    PHPWRITE(Z_STRVAL(temp), Z_STRLEN(temp));
    php_printf("
");
    zval_dtor(&temp);

    return ZEND_HASH_APPLY_KEEP;
}

static int php_hello_array_walk_arg(zval **element, char *greeting TSRMLS_DC)
{
    php_printf("%s", greeting);
    php_hello_array_walk(element TSRMLS_CC);

    return ZEND_HASH_APPLY_KEEP;
}

static int php_hello_array_walk_args(zval **element, int num_args, var_list args, zend_hash_key *hash_key)
{
    char *prefix = va_arg(args, char*);
    char *suffix = va_arg(args, char*);
    TSRMLS_FETCH();

    php_printf("%s", prefix);
    php_hello_array_walk(element TSRMLS_CC);
    php_printf("%s
", suffix);

    return ZEND_HASH_APPLY_KEEP;
}

PHP_FUNCTION(hello_array_walk)
{
    zval *zarray;
    int print_newline = 1;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "a", &zarray) == FAILURE) {
        RETURN_NULL();
    }

    zend_hash_apply(Z_ARRVAL_P(zarray), (apply_func_t)php_hello_array_walk TSRMLS_CC);
    zend_hash_apply_with_argument(Z_ARRVAL_P(zarray), (apply_func_arg_t)php_hello_array_walk_arg, "Hello " TSRMLS_CC);
    zend_hash_apply_with_arguments(Z_ARRVAL_P(zarray), (apply_func_args_t)php_hello_array_walk_args, 2, "Hello ", "Welcome to my extension!");

    RETURN_TRUE;
}

By now you should be familiar enough with the usage of the functions involved that most of the above code will be obvious. The array passed to hello_array_walk() is looped through three times, once with no arguments, once with a single argument, and a third time with two arguments. In this design, the walk_arg() and walk_args() functions actually rely on the no-argument walk() function to do the job of converting and printing the zval, since the job is common across all three.

In this block of code, as in most places where you’ll use zend_hash_apply(), the apply() functions return ZEND_HASH_APPLY_KEEP. This tells the zend_hash_apply() function to leave the element in the HashTable and continue on with the next one. Other values which can be returned here are: ZEND_HASH_APPLY_REMOVE, which does just what it says – removes the current element and continues applying at the next – and ZEND_HASH_APPLY_STOP, which will halt the array walk at the current element and exit the zend_hash_apply() function completely.

The less familiar component in all this is probably TSRMLS_FETCH(). As you may recall from Part One, the TSRMLS_* macros are part of the Thread Safe Resource Management layer, and are necessary to keep one thread from trampling on another. Because the multi-argument version of zend_hash_apply() uses a vararg list, the tsrm_ls marker doesn’t wind up getting passed into the walk() function. In order to recover it for use when we call back into php_hello_array_walk(), your function calls TSRMLS_FETCH() which performs a lookup to find the correct thread in the resource pool. (Note: This method is substantially slower than passing the argument directly, so use it only when unavoidable.)

Iterating through an array using this foreach-style approach is a common task, but often you’ll be looking for a specific value in an array by index number or by associative key. This next function will return a value from an array passed in the first parameter based on the offset or key specified in the second parameter.


PHP_FUNCTION(hello_array_value)
{
    zval *zarray, *zoffset, **zvalue;
    long index = 0;
    char *key = NULL;
    int key_len = 0;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "az", &zarray, &zoffset) == FAILURE) {
        RETURN_NULL();
    }

    switch (Z_TYPE_P(zoffset)) {
        case IS_NULL:
            index = 0;
            break;
        case IS_DOUBLE:
           index = (long)Z_DVAL_P(zoffset);
            break;
        case IS_BOOL:
        case IS_LONG:
        case IS_RESOURCE:
            index = Z_LVAL_P(zoffset);
            break;
        case IS_STRING:
            key = Z_STRVAL_P(zoffset);
            key_len = Z_STRLEN_P(zoffset);
            break;
        case IS_ARRAY:
            key = "Array";
            key_len = sizeof("Array") - 1;
            break;
        case IS_OBJECT:
            key = "Object";
            key_len = sizeof("Object") - 1;
            break;
        default:
            key = "Unknown";
            key_len = sizeof("Unknown") - 1;
    }

    if (key && zend_hash_find(Z_ARRVAL_P(zarray), key, key_len + 1, (void**)&zvalue) == FAILURE) {
        RETURN_NULL();
    } else if (!key && zend_hash_index_find(Z_ARRVAL_P(zarray), index, (void**)&zvalue) == FAILURE) {
        RETURN_NULL();
    }

    *return_value = **zvalue;
    zval_copy_ctor(return_value);
}

This function starts off with a switch block that treats type conversion in much the same way as the Zend Engine would. NULL is treated as 0, Booleans are treated as their corresponding 0 or 1 values, doubles are cast to longs (and truncated in the process) and resources are cast to their numerical value. The treatment of resource types is a hangover from PHP 3, when resources really were just numbers used in a lookup and not a unique type unto themselves.

Arrays and objects are simply treated as a string literal of “Array” or “Object”, since no honest attempt at conversion would actually make sense. The final default condition is put in as an ultra-careful catchall just in case this extension gets compiled against a future version of PHP, which may have additional data types.

Since key is only set to non-NULL if the function is looking for an associative key, it can use that value to decide whether it should use an associative or index based lookup. If the chosen lookup fails, it’s because the key doesn’t exist, and the function therefore returns NULL to indicate failure. Otherwise that zval is copied into return_value.


Symbol Tables as Arrays

If you’ve ever used the $GLOBALS array before, you know that every variable you declare and use in the global scope of a PHP script also appears in this array. Recalling that the internal representation of an array is a HashTable, one question comes to mind: “Is there some special place where the GLOBALS array can be found?” The answer is “Yes”. It’s in the Executor Globals structure as EG(symbol_table), which is of type HashTable (not HashTable*, mind you, just HashTable).

You already know how to find associatively keyed elements in an array, and now that you know where to find the global symbol table, it should be a cinch to look up variables from extension code:


PHP_FUNCTION(hello_get_global_var)
{
    char *varname;
    int varname_len;
    zval **varvalue;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &varname, &varname_len) == FAILURE) {
        RETURN_NULL();
    }

    if (zend_hash_find(&EG(symbol_table), varname, varname_len + 1, (void**)&varvalue) == FAILURE) {
        php_error_docref(NULL TSRMLS_CC, E_NOTICE, "Undefined variable: %s", varname);
        RETURN_NULL();
    }

    *return_value = **varvalue;
    zval_copy_ctor(return_value);
}

This should all be intimately familiar to you by now. The function accepts a string parameter and uses that to find a variable in the global scope which it returns as a copy.

The one new item here is php_error_docref(). You’ll find this function, or a near sibling thereof, throughout the PHP source tree. The first parameter is an alternate documentation reference (the current function is used by default). Next is the ubiquitous TSRMLS_CC, followed by a severity level for the error, and finally there’s a printf() style format string and associated parameters for the actual text of the error message. It’s important to always provide some kind of meaningful error whenever your function reaches a failure condition. In fact, now would be a good time to go back and add an error statement to hello_array_value(). The Sanity Check section at the end of this tutorial will include these as well.

In addition to the global symbol table, the Zend Engine also keeps track of a reference to the local symbol table. Since internal functions don’t have symbol tables of their own (why would they need one after all?) the local symbol table actually refers to the local scope of the userland function that called the current internal function. Let’s look at a simple function which sets a variable in the local scope:


PHP_FUNCTION(hello_set_local_var)
{
    zval *newvar;
    char *varname;
    int varname_len;
    zval *value;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "sz", &varname, &varname_len, &value) == FAILURE) {
        RETURN_NULL();
    }

    ALLOC_INIT_ZVAL(newvar);
    *newvar = *value;
    zval_copy_ctor(newvar);
    zend_hash_add(EG(active_symbol_table), varname, varname_len + 1, &newvar, sizeof(zval*), NULL);

    RETURN_TRUE;
}

Absolutely nothing new here. Go ahead and build what you’ve got so far and run some test scripts against it. Make sure that what you expect to happen, does happen.


Reference Counting

So far, the only zvals we’ve added to HashTables have been newly created or freshly copied ones. They’ve stood alone, occupying their own resources and living nowhere but that one HashTable. As a language design concept, this approach to creating and copying variables is “good enough”, but since you’re accustomed to programming in C, you know that it’s not uncommon to save memory and CPU time by not copying a large block of data unless you absolutely have to. Consider this userspace block of code:


<?php

    $a = file_get_contents('fourMegabyteLogFile.log');
    
$b = $a;
    unset(
$a);

?>

If $a were copied to $b by doing a zval_copy_ctor() (which performs an estrndup() on the string contents in the process) then this short script would actually use up eight megabytes of memory to store two identical copies of the four megabyte file. Unsetting $a in the final step only adds insult to injury, since the original string gets efree()d. Had this been done in C it would have been a simple matter of: b = a; a = NULL;

Fortunately, the Zend Engine is a bit smarter than that. When $a is first created, an underlying zval is made for it of type string, with the contents of the log file. That zval is assigned to the $a variable through a call to zend_hash_add(). When $a is copied to $b, however, the Engine does something similar to the following:


{
    zval **value;

    zend_hash_find(EG(active_symbol_table), "a", sizeof("a"), (void**)&value);

    ZVAL_ADDREF(*value);

    zend_hash_add(EG(active_symbol_table), "b", sizeof("b"), value, sizeof(zval*));
}

Of course, the real code is much more complex, but the important part to focus on here is ZVAL_ADDREF(). Remember that there are four principle elements in a zval. You’ve already seen type and value; this time you’re working with refcount. As the name may imply, refcount is a counter of the number of times a particular zval is referenced in a symbol table, array, or elsewhere.

When you used ALLOC_INIT_ZVAL(), refcount this was set to 1 so you didn’t have to do anything with it in order to return it or add it into a HashTable a single time. In the code block above, you’ve retrieved a zval from a HashTable, but not removed it, so it has a refcount which matches the number of places it’s already referenced from. In order to reference it from another location, you need to increase its reference count.

When the userspace code calls unset($a), the Engine performs a zval_ptr_dtor() on the variable. What you didn’t see, in prior uses of zval_ptr_dtor(), is that this call doesn’t necessarily destroy the zval and all its contents. What it actually does is decrease its refcount. If, and only if, the refcount reaches zero, does the Zend Engine destroy the zval

Continued on Page 2.

Copyright © Sara Golemon, 2005. All rights reserved.

4 Responses to “Extension Writing Part II: Parameters, Arrays, and ZVALs”

  1. julise Says:

    how can i use loadlibrary("my.dll") when writing php extension?
    thanks a lot!
    pls mail to me :luckylzs@hotmail.com
    thank u very very much!

  2. snbrdfntc Says:

    The is a type for the function header:

    static int php_hello_array_walk_args(zval **element, int num_args, var_list args, zend_hash_key *hash_key)

    it should read:
    static int php_hello_array_walk_args(zval **element, int num_args, va_list args, zend_hash_key *hash_key)

  3. _____anonymous_____ Says:

    Nothing here about output parameters. I’m trying to use the GnuPG PHP extension, but the output (last) parameter to gnupg_verify() is always empty.

    I’ve looked in the source code for the extension, and I kinda understand what it is doing, partly with the help of articles like this, but nowhere on the internet can I find an explicit description of handling and using output parameters.

    It looks like just specifying it as an output parameter when you declare the function should be enough, retrieving it as a zval, and setting the value, but it would be nice if there was just ONE example on the internet somewhere.

  4. _____anonymous_____ Says:

    Firstly, I want to point out that this is a stellar article! Very clear and informative.

    However, in the code above for PHP_FUNCTION(hello_dump), there is a typo.
    The statment zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", uservar) should accept uservar by reference, as in:
    zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &uservar)

    Thanks!