Flattening Arrays (The GNU Awk User’s Guide)

From Get docs
Gawk/docs/latest/Flattening-Arrays


17.4.11.3 Working With All The Elements of an Array

To flatten an array is to create a structure that represents the full array in a fashion that makes it easy for C code to traverse the entire array. Some of the code in extension/testext.c does this, and also serves as a nice example showing how to use the APIs.

We walk through that part of the code one step at a time. First, the gawk script that drives the test extension:

@load "testext"
BEGIN {
    n = split("blacky rusty sophie raincloud lucky", pets)
    printf("pets has %d elements\n", length(pets))
    ret = dump_array_and_delete("pets", "3")
    printf("dump_array_and_delete(pets) returned %d\n", ret)
    if ("3" in pets)
        printf("dump_array_and_delete() did NOT remove index \"3\"!\n")
    else
        printf("dump_array_and_delete() did remove index \"3\"!\n")
    print ""
}

This code creates an array with split() (see section String-Manipulation Functions) and then calls dump_array_and_delete(). That function looks up the array whose name is passed as the first argument, and deletes the element at the index passed in the second argument. The awk code then prints the return value and checks if the element was indeed deleted. Here is the C code that implements dump_array_and_delete(). It has been edited slightly for presentation.

The first part declares variables, sets up the default return value in result, and checks that the function was called with the correct number of arguments:

static awk_value_t *
dump_array_and_delete(int nargs, awk_value_t *result)
{
    awk_value_t value, value2, value3;
    awk_flat_array_t *flat_array;
    size_t count;
    char *name;
    int i;

    assert(result != NULL);
    make_number(0.0, result);

    if (nargs != 2) {
        printf("dump_array_and_delete: nargs not right "
               "(%d should be 2)\n", nargs);
        goto out;
    }

The function then proceeds in steps, as follows. First, retrieve the name of the array, passed as the first argument, followed by the array itself. If either operation fails, print an error message and return:

    /* get argument named array as flat array and print it */
    if (get_argument(0, AWK_STRING, & value)) {
        name = value.str_value.str;
        if (sym_lookup(name, AWK_ARRAY, & value2))
            printf("dump_array_and_delete: sym_lookup of %s passed\n",
                   name);
        else {
            printf("dump_array_and_delete: sym_lookup of %s failed\n",
                   name);
            goto out;
        }
    } else {
        printf("dump_array_and_delete: get_argument(0) failed\n");
        goto out;
    }

For testing purposes and to make sure that the C code sees the same number of elements as the awk code, the second step is to get the count of elements in the array and print it:

    if (! get_element_count(value2.array_cookie, & count)) {
        printf("dump_array_and_delete: get_element_count failed\n");
        goto out;
    }

    printf("dump_array_and_delete: incoming size is %lu\n",
           (unsigned long) count);

The third step is to actually flatten the array, and then to double-check that the count in the awk_flat_array_t is the same as the count just retrieved:

    if (! flatten_array_typed(value2.array_cookie, & flat_array,
                              AWK_STRING, AWK_UNDEFINED)) {
        printf("dump_array_and_delete: could not flatten array\n");
        goto out;
    }

    if (flat_array->count != count) {
        printf("dump_array_and_delete: flat_array->count (%lu)"
               " != count (%lu)\n",
                (unsigned long) flat_array->count,
                (unsigned long) count);
        goto out;
    }

The fourth step is to retrieve the index of the element to be deleted, which was passed as the second argument. Remember that argument counts passed to get_argument() are zero-based, and thus the second argument is numbered one:

    if (! get_argument(1, AWK_STRING, & value3)) {
        printf("dump_array_and_delete: get_argument(1) failed\n");
        goto out;
    }

The fifth step is where the “real work” is done. The function loops over every element in the array, printing the index and element values. In addition, upon finding the element with the index that is supposed to be deleted, the function sets the AWK_ELEMENT_DELETE bit in the flags field of the element. When the array is released, gawk traverses the flattened array, and deletes any elements that have this flag bit set:

    for (i = 0; i < flat_array->count; i++) {
        printf("\t%s[\"%.*s\"] = %s\n",
            name,
            (int) flat_array->elements[i].index.str_value.len,
            flat_array->elements[i].index.str_value.str,
            valrep2str(& flat_array->elements[i].value));

        if (strcmp(value3.str_value.str,
                   flat_array->elements[i].index.str_value.str) == 0) {
            flat_array->elements[i].flags |= AWK_ELEMENT_DELETE;
            printf("dump_array_and_delete: marking element \"%s\" "
                   "for deletion\n",
                flat_array->elements[i].index.str_value.str);
        }
    }

The sixth step is to release the flattened array. This tells gawk that the extension is no longer using the array, and that it should delete any elements marked for deletion. gawk also frees any storage that was allocated, so you should not use the pointer (flat_array in this code) once you have called release_flattened_array():

    if (! release_flattened_array(value2.array_cookie, flat_array)) {
        printf("dump_array_and_delete: could not release flattened array\n");
        goto out;
    }

Finally, because everything was successful, the function sets the return value to success, and returns:

    make_number(1.0, result);
out:
    return result;
}

Here is the output from running this part of the test:

pets has 5 elements
dump_array_and_delete: sym_lookup of pets passed
dump_array_and_delete: incoming size is 5
        pets["1"] = "blacky"
        pets["2"] = "rusty"
        pets["3"] = "sophie"
dump_array_and_delete: marking element "3" for deletion
        pets["4"] = "raincloud"
        pets["5"] = "lucky"
dump_array_and_delete(pets) returned 1
dump_array_and_delete() did remove index "3"!