CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in
Add ListingModify ListingTell A FriendLink to TPASubscribeNew ListingsCool ListingsTop RatedRandom Link
Newest Reviews
  • review
  • hagen software
  • NOT GPL!
  • Hagan Software
  • Wasted Time with ...
  • poor pre-sale sup...
  • no response
  • rating the offer
  • Good Stuff
  • Good idea but use...


  •  
    Perl Archive : TLC : Programming : Perl : Autovivification : What is it and why do I care?
    Guide Search entire directory 
     

    Date Published: 2001-02-16

    During the growth curve of every Perl hacker they come to managing complex data structures like hash of hashes and lists of lists, etc. They usually get the hang of it with help from perllol, perldsc, some good books, usenet, #perl and whatever other resources they can find. But one subtle Perl feature seems to trip many of them up and that is the subject of this tutorial.

    Let's say you create a data structure like this:

    [Code]
    $HoH =	{                                     
    	'foo'	=> {                                                  
    			'x'	=> 23,                                
    	},                                                            
    	'bar'	=> {                                                  
    			'y'	=> 18,                                
    	},                                                            
    } ;                                                                   
    

    We can print this using Data::Dumper.

    [Code]
    use Data::Dumper ;
                                                                          
    
    print Dumper $HoH ;

    and we see:

    [Output]
    $VAR1 = {
              'foo' => {
                         'x' => 23
                       },
              'bar' => {
                         'y' => 18
                       }
            };
    

    which is what we expect.

    But now we try to see if there is a entry for $HoH->{'baz'}{'z'} which we know doesn't exist. And we are smart enough to test it with exists:

    [Code]
    print "baz->z doesn't exist\n" 
        unless exists $HoH->{'baz'}{'z'} ;
    
    print Dumper $HoH ;
    

    But when we look at the data structure again we see:

    [Output]
    $VAR1 = {
              'foo' => {
                         'x' => 23
                       },
              'baz' => {},
              'bar' => {
                         'y' => 18
                       }
            };
    

    Where did that 'baz' entry come from? We never created it? Or did we?

    What happened is that Perl saw that $HoH->{'baz'} was being used as a hash reference (referring to a hash with 'z' as the key) and that $HoH->{'baz'} was not defined (actually it doesn't exist either) so Perl created it for you. That is called autovivification which means bringing to life automagically!

    Here is the same concept but with anonymous arrays instead of hashes:

    [Code]
    $LoL =	[
    	[ 2, 4, 6 ],
    	[ 3, 5, 7 ],
    ] ;
    
    print Dumper $LoL ;
    

    [Output]
    $VAR1 = [
              [
                2,
                4,
                6
              ],
              [
                3,
                5,
                7
              ]
            ];
    

    [Code]
    print "[2][1] isn't defined\n" 
        unless defined $LoL->[2][1] ;
    

    [Output]
    [2][1] isn't defined
    

    [Code]
    print Dumper $LoL ;
    

    [Output]
    $VAR1 = [
              [
                2,
                4,
                6
              ],
              [
                3,
                5,
                7
              ],
              []
            ];
    

    Notice the anonymous array created in $LoL->[2]! It just got autovivified because the code assumed it had to exist and Perl created it for you.

    Here is another example which is a common idiom and confuses some newbies:

    [Code]
    $list_ref = undef ;
    push @{$list_ref}, 1 .. 4 ;
    
    print Dumper $list_ref ;
    

    [Output]
    $VAR1 = [
              1,
              2,
              3,
              4
            ];
    

    Note that undef is only assigned to $list_ref for this example. In normal code it would probably be a my'ed variable and start out undefined. Without autovivification you would have to assign an empty anonymous array to $list_ref first.

    [Code]
    $list_ref = [] ;
    push @{$list_ref}, 1 .. 4 ;
    

    A variant on that would be:

    [Code]
    push @{$list_ref ||= []}, 1 .. 4 ;
    

    That initializes $list_ref to [] if it is false (most likely it was undefined as in the above cases).

    It is still cleaner and definitely faster to let Perl do the defined test and initialization with [] for you.

    Autovivification even works on references to scalars:

    [Code]
    my $scalar_ref = undef ;
    ${$scalar_ref} = 'i am refered to' ;
    
    print "ref $scalar_ref value [${$scalar_ref}]\n" ;
    

    Now is the time for some explanation of what is happening under the hood. Autovivification of references only occurs when you dereference an undefined value. If there is a defined value (and not a reference of the proper type), it will be used as a symbolic reference and not be what you want. Remember, symbolic references are black magic and should only be used in very few cases and never by newbies. You should be using strict which disables symbolic references and would thereby detect the error of dereferencing a variable which has a value other than undef or a proper reference.

    So Perl first evaluates a dereference expression and sees that the current reference value is undefined. It notes the type of dereference (scalar, array or hash) and allocates an anonymous reference of that type. Perl then stores that new reference value where the undefined value was stored. Then the dereference operation in progress is continued. If you do a nested dereference expression, then each level from top to bottom can cause its own autovivication. Look at this:

    [Code]
    $deep_ref = undef ;
    
    $deep_ref->{'foo'}{'bar'}[1]{'baz'} = 1 ;
    
    print Dumper $deep_ref ;
    

    [Output]
    $VAR1 = {
              'foo' => {
                         'bar' => [
                                    undef,
                                    {
                                      'baz' => 1
                                    }
                                  ]
                       }
            };
    

    Four anonymous references were created there by autovivification working from the top level with $deep_ref all the way down to the hash that has 'baz' for its only key.

    This last example illustrates the power and primary use of autovivifiction. If you wanted to assign the lowest level hash before the higher levels existed, without autovivifiaction, you would have to do the loop yourself and test each level and optionally create it as you went down. The call would have to take a list of pairs - reference type and index or key. You could simplify it by restricting it to one type:

    [Code]
    sub deep_hash_assign {
    
        my( $ref_ref, $val, @keys ) = @_ ;
    
        unless ( @keys ) {
            warn "deep_hash_assign: no keys" ;
            return ;
        }
    
        foreach my $key ( @keys ) {
    
            my $ref = ${$ref_ref} ;
    
    # this is the autoviv step
            unless ( defined( $ref ) ) {
    
                $ref = { $key => undef } ;
                ${$ref_ref} = $ref ;
            }
    
    # this checks we have a valid hash ref as a current value
    
            unless ( ref $ref eq 'HASH' and exists( $ref->{ $key } ) ) {
    
                warn "deep_hash_assign: not a hash ref at $key in @keys" ;
                return ;
            }
    
    # this points to the next level down the hash tree
    
            $ref_ref = \$ref->{ $key } ;
    
        }
    
        ${$ref_ref} = $val ;
    }
    
    
    $deep_ref2 = undef ;
    
    deep_hash_assign( \$deep_ref2, 17, qw( foo bar baz ) ) ;
    
    print Dumper $deep_ref2 ;
    
    $deep_ref2 = undef ;
    
    deep_hash_assign( \$deep_ref2, 17 ) ;
    

    As you can see, that sub is not very robust, clumsy to use and probably a lot slower than having Perl do it for you. Also it can't handle a mix of hashes and arrays. To do that you would have to also specify hash or array along with each key or index.

    So autovivification saves code and trouble when assigning deep into a data structure, but why does it also happen when using exists and defined? Many people think that exists and defined should fail at the first level thay can. Let's look at exists and defined again with this code:

    [Code]
    %hash = (
    	'foo'	=> 3,
    ) ;
    
    print Dumper \%hash ;
    
    if ( exists( $hash{'bar'}{'baz'} ) ) {
    	print "{'bar'}{'baz'} exists\n" ;
    }
    
    print Dumper \%hash ;
    

    Where did the 'bar' => {} and 'array' => [] entries in %hash come from? Well, the way Perl works, exists and defined do not provide any special contexts to their expressions. So if their expression would autovivify, it will happen before the exists or defined test occurs. This issue has been argued heavily in various fora including p5p but it won't be changed as too much code works with the current behavior. It is the way Perl treats it and you can't directly get around it. Perl6 has been discussing this and may do something to support this and it could be controlled by a pragma. But there are still gray areas, such as if you take a reference deep into a tree where autovivification would be triggered, does passing that to an exists call stop it from happening? Similarly passing a potentially autovivified expression to a sub which may only call defined on it, should that work as it does now?

    Here is a sub you can use to test for existance of a key at any level and it will not trigger autovivification:

    [Code]
    sub deep_exists {
    
        my( $hash_ref, @keys ) = @_ ;
    
        unless ( @keys ) {
    
            warn "deep_exists: no keys" ;
            return ;
        }
    
        foreach my $key ( @keys ) {
    
            unless( ref $hash_ref eq 'HASH' ) {
    
                warn "$hash_ref not a HASH ref" ;
                return ;
            }
    
    
            return 0 unless exists( $hash_ref->{$key} ) ;
    
            $hash_ref = $hash_ref->{$key} ;
        }
    
        return 1 ;
    }
    
    %exist_hash = (
    
        'foo'    => {
            'bar'    => 3
        }
    ) ;
    
    print "\$exist_hash{foo}{bar} exists\n"
            if deep_exists( \%exist_hash, qw( foo bar ) ) ;
    
    
    print "\$exist_hash{foo}{bar}{baz} doesn't exist\n"
            unless deep_exists( \%exist_hash, qw( foo bar baz ) ) ;
    
    print Dumper \%exist_hash ;
    

    [Output]
    $VAR1 = {
              'foo' => {
                         'bar' => 3
                       }
            };
    

    Notice that the data structure did not get modified as we didn't trigger autovivification and we exited as soon as an exists call failed. Also it returns 0 on normal failure and undef on detecting an error.

    That sub only works on hashes of hashes and it tests with exists. Here it is, modified to work with hashes or arrays and it uses defined for the test:

    [Code]
    sub deep_defined {
    
        my( $ref, @keys ) = @_ ;
    
        unless ( @keys ) {
    
            warn "deep_defined: no keys" ;
            return ;
        }
    
        foreach my $key ( @keys ) {
    
            if( ref $ref eq 'HASH' ) {
    
    # fail when the key doesn't exist at this level
    
                return unless defined( $ref->{$key} ) ;
    
                $ref = $ref->{$key} ;
                next ;
            }
    
            if( ref $ref eq 'ARRAY' ) {
    
    # fail when the index is out of range or is not defined
    
                return unless 0 <= $key && $key < @{$ref} ;
    
                return unless defined( $ref->[$key] ) ;
    
                $ref = $ref->[$key] ;
                next ;
            }
    
    # fail when the current level is not a hash or array ref
    
            return ;
        }
    
        return 1 ;
    }
    
    my $defined_tree = {
    
        'foo'    => [
    
                {
                'bar'    => 3,
                'baz'    => 'four',
            },
                {
                'bar'    => 5,
                'baz'    => 'six',
            }
        ],
        'oof'    => [
    
                {
                'bar'    => 7,
                'baz'    => 'eight',
            },
                {
                'bar'    => 9,
            }
        ],
    } ;
    
    print "\$defined_tree->{foo}[0]{bar} is defined\n"
            if deep_defined( $defined_tree, 'foo', 0, 'bar' ) ;
    
    
    print "\$defined_tree->{oof}[1]{baz} isn't defined\n"
            unless deep_defined( $defined_tree, 'oof', 1, 'baz' ) ;
    
    print "\$defined_tree->{goof}[1]{baz} isn't defined\n"
            unless deep_defined( $defined_tree, 'goof', 1, 'baz' ) ;
    
    print DumperX $defined_tree ;
    
    $defined_tree->{foo}[0]{bar} is defined
    $defined_tree->{oof}[1]{baz} isn't defined
    $defined_tree->{goof}[1]{baz} isn't defined
    

    [Output]
    $VAR1 = {
              'oof' => [
                         {
                           'baz' => 'eight',
                           'bar' => 7
                         },
                         {
                           'bar' => 9
                         }
                       ],
              'foo' => [
                         {
                           'baz' => 'four',
                           'bar' => 3
                         },
                         {
                           'baz' => 'six',
                           'bar' => 5
                         }
                       ]
            };
    

    As you can see it works and doesn't autovivify higher levels as it returns when it doesn't find a reference. It is a cleaner subroutine than deep_hash_assign since it can see what there is at each level and do the right thing.

    So to review the concept, autovivification happens when Perl automatically create a reference of the appropriate type when an undefined scalar value is dereferenced. It is a useful concept and is used in many programs. If Perl didn't do it, you would have to resort to clumsier code and special subroutines to create the new levels of your data structures. Some complain it shouldn't happen with exists or defined but the sub to work around that is not tricky to create or use. There is interest that in Perl 6 those two operations won't autovivify but that is not for certain.

    Note: all the above code is also in the file auto.pl. It uses Data::Dumper and the DumperX sub as regular Dumper seems to have problems with this code.

     

    About the Author:

    Uri Guttman co-authored the award winning paper, "A Fresh Look at Efficient Perl Sorting", presented at the 3rd Perl Conference in August, 1999; was a technical reviewer of Object Oriented Perl by Damian Conway and was a past Technical editor of The Perl Journal. He is also an active participant in the comp.lang.perl.misc newsgroup and boston.pm, the local chapter of Perl Mongers.

     
     


    About The Perl ArchiveLink Validation ProcessSearch Tips
    Web Applications & Managed Hosting Powered by Gossamer Threads
    Visit our Mailing List Archives