1
2
3
4
5
6
2
3
4
5
6
#({1,2,3,4,5}) -- evaluates to 5 as expected #({[1] = 1, [2] = 2, [5] = 5}) -- evaluates to 2, since only 1 and 2 are adjacent to each other --this is equivalent to table.insert({1,2}, 5, 5) #({[1] = 1, [2] = 2, [4] = 4}) -- however, evaluates to 4...
in fact, if you have the following table
{[1] = 1, [2] = 2, [4] = 4, [8] = 8, [16] = 16, [32] = 32, [64] = 64}
its length will be 64.
This came in part from the design of lua. Since tables are used as both hashtables and arrays, there must be a way to hash lua objects while still being able to effectively calculate the length of the array portion of the lua, even when randomly populated.
For example, one may feel that the natural way of finding the number of elements within an array is to just check whether the current index is one greater than the previous length. However, this will not work in the following example:
1
2
3
2
3
arr = {1, 2} arr[4] = 4 -- not counted as part of the "array", do not increase the length counter arr[3] = 3 -- 3 is #arr+1, so we increase the length counter, which will in effect mean that {1,2,3,4}, when populated in this order, will return 3 as its length.
This means that every time we assign a numerical key to a table, we MUST recalculate the length of the table from scratch. This is a very expensive operation for large arrays, because to populate an array with n items, we will need to have sigma->n{n} operations, which means that a table with 1000 elements in it will take C*1001*500 operations for some constant C, and in general, C*(n+1)*(n/2) which approaches c*n^2 asymptotically.
This is VERY BAD, which is why if you need to populate data, use table.insert (which is natively optimized) rather than doing it at random. But if you have to, then keep the following in mind:
in order to make the look up time closer to linear, they implemented a bounds checking algorithms that automatically returns the length with time complexity
T(n) = T(n/2) + O(1)
which is roughly c*ln(n). So for n objects populated at random, the time complexity is C*n*ln(n), which is exactly what we would expect because in effect, we're just sorting a random table.
Anyways, the trade off for this speed gain is the unexpected behavior you see in the above example. The binary search bounds checking algorithm looks for the upper bound of the array in powers of twos. Since it's EXTREMELY unlikely to have a table randomly populated with consecutive powers of two, this is seen as an acceptable trade off.
For the most part, the only practical use affected by this is the table
1
tab = {1,2,[4]=4}