ColdFusion Muse

Memory usage on unevenly populated arrays

It sounds like a dull topic. Right up there with "planning for retirement for 20 year olds". However, it certainly is important to understand some of the nuances of arrays and structures because how you use them (and your choices on which one to use can greatly affect the future scalablility of your application.

I saw an excellent discussion recently on an email list. I owe this information on arrays to Pete Freitag and Joe Rinehart. The question posed was, is the "length" of the array strictly tied to the number of members in the array. The reason this is important is because arrays are widely used for reasons of speed. Since an array is a list of references (points) it usually ranks as the fastest way to "loop" through data. I say "usually" to cover my butt. I actually don't know of any faster way in any language - although I'm sure the excellent and knowledgable readers of this blog will happily point some out to me (ha). Anyway, consider this code...

r/>

<cfscript>
   // initialize an array
   foo = arrayNew();
   // set a member at position 1
   foo[1]   =   'fooMember1';
   // set a member at position 200
   foo[200]   =   'fooMember200';
</cfscript>

The value of arrayLength(foo) is 200, but if I try to use the value in say... position 20 (foo[20]) I get an error like "the variable at position 200 of array foo is undedfined". Does that mean that in reality my array is only taking up enough room for 2 variables since I have only 2 populated members? The answer appears to be no. The explanation is as follows:

Coldfusion MX uses a Java array as the holder for the array and as such it resizes the array to hold pointers for all the possible positions up to the last populated member (200 in the example above). Since the pointers are all null until they are defined they throw an error, but they DO appear to exist. This simple test from Joe Rinehart illustrates it nicely. In a controlled environment Joe watched the heap closely and ran these 2 bits of code.

<!--- Testing with 2 sequential values --->
<cfset foo = arrayNew(1) />
<cfset foo[1] = "var1" />
<cfset foo[2] = "var2" />
<!--- about 1k of memory used --->
<!--- --->
<!--- Testing with 2 populated members and non sequential --->
<cfset foo = arrayNew(1) />
<cfset foo[1] = "var1" />
<cfset foo[20000000] = "var2" />
<!--- About 40k of memory used. --->

Obviously it appears that the second example resized the array to 20 million null pointers minus 2. The lesson here is probably that arrays are not a panacea. I've noted that some developers use arrays so religiously that they end up writing ridiculously un-maintainable code as a result. Actually, you can use a structure to do a "sparsely populated" array and make the keys and structure mimic an array. Pete Freitag offers this tricky little piece of code

<cfset a = structnew()>
<cfset a[1] = "var 2">
<cfset a[10000] = "var 2">
<cfoutput>Size of datastructure #a.size()#</cfoutput>

The size in this case is 2 - not 10000. Why? Because it's not actually an array - it's a structure with 2 keys "1" and "10000". Incidentally, you can handle the values in this case as #a.1#, #a[1]# or #a["1"]# - and they would all work. That's pretty neat and the "structSort( )" function coupled with the NUMERIC sort type would allow you to sort the values as if they were an array. Note however, that the naming convention of this structure (making an object mimic an array) is in itself a little confusing because it looks like an array but it isn't. It probably goes against many coding standards - which usually require an object variable to begin with an alpha character.

  • Share:

3 Comments


Leave this field empty

Write a comment

If you subscribe, any new posts to this thread will be sent to your email address.

  • Craig M. Rosenblum's Gravatar
    Posted By
    Craig M. Rosenblum | 5/25/05 5:52 PM
    Wow, nice article, i love boring stuff like this.

    I tend to drool over charts indicating which modularity is best for speed. So this is right up there.

    I mean we want to store data at one point and at another retrieve it. Sometimes with knowing the pointers, and sometimes not. Which makes it difficult to decide which to use.

    However lately, I've had more fun using querysetcell's then i can do queries against the data, and doing sorts in sql code, which i find more natural for some strange reason.

    Back in the day, we had all kinds of charts comparing the memory usage and speed of each method used in coding.

    Very good blog posting!
  • Paul Roe's Gravatar
    Posted By
    Paul Roe | 5/27/05 4:55 PM
    I think that you ment:

    Incidentally, you can handle the values in this case as #a.1#, #a[1]# or #a["1"]# - and they would all work.

    Because #a(1)# or #a("1")# accessing a structure using paren's will throw an error.
  • mkruger's Gravatar
    Posted By
    mkruger | 5/27/05 5:24 PM
    Ah! You are right. I have corrected it.