Hash and an array as a default value

Recently, I needed to schedule a set of operations over a fixed number of slots so that the number of operations in each slot is uniform-ish1. Because the number of slots was fixed, it made sense to initialize an array so that each item would have a default empty list of operations (thus avoiding the nil test when adding operations to slots).

schedule = Array.new(number_of_slots) { [] }

This worked nicely for a time, but later came in a requirement to split the operations into groups while keeping a single schedule optimizing the total number of operations in each slot regardless of the type. OK, the natural step was to change schedule from an array of arrays into an array of hashes of arrays.

schedule = Array.new(number_of_slots) { Hash.new([]) }

However, this led me to a very strange place. It can be boiled down to this piece of code:

# Create two hashes with an array as the default value
h1 = Hash.new([])
h2 = Hash.new { |hash, key| hash[key] = [] }

# Test that the hashes are empty
puts h1.inspect
# => {}
puts h2.inspect
# => {}

# Add a value to h1
h1[:foo] << 42
h2[:foo] << 42

# Verify the values are there
puts h1[:foo].inspect
# => [42]
puts h2[:foo].inspect
# => [42]

# Print the hashes
puts h1.inspect
# => {}
puts h2.inspect
# => {:foo=>[42]}

Where did the value in h1 disappear? Going back to Ruby documentation:

new(obj) → new_hash … If obj is specified, this single object will be used for all default values.

Ruby 2.2.1 documentation

This suggests (despite the strangely placed emphasis), that the same object is used for all items of the hash. That’s not a problem in case of scalar values, but for arrays it might be one.

puts h1[:bar].inspect
# => [42]
puts h2[:bar].inspect
# => []

So now h1 under the key :bar returns the same values as were inserted under :foo. h2 behaves as expected and returns an empty array. Let’s try one more experiment to make sure.

h1[:zoo] << 616
puts h1[:bar].inspect
# => [42, 616]

Now it should be obvious. Every time an item of the hash is accessed, the default value array is returned. Any modification of that value modifies the default array. The hash never gets any value assigned and appears empty when asked, but returns something when asked for an item. If I ever assigned to any key of the hash (instead of adding an item to the list) that key would actually get set.

Therefore the takeaway is: If you need to initialize a Hash with an Array, always use the more verbose approach. Other simple types like integers and strings can use the shorter syntax.

Plus, read the docs carefully even when strangely emphasized.

  1. Each operation had different periodicity which makes the problem much harder to solve. Luckily just an approximate solution was sufficient for the use case. Still, I’d like to know if a perfect solution can be found in a reasonable time. [return]

We're looking for developers to help us save energy

If you're interested in what we do and you would like to help us save energy, drop us a line at jobs@enectiva.cz.

comments powered by Disqus