Ever wanted to iterate over a huge number of objects and found
that find(:all)
doesn’t cut it? Then this plugin
might be for you. Install it with
$ script/plugin install svn://rubyforge.org/var/svn/ar-enumerable/ar_enumerable/trunk
First, a note of caution. This plugin has not seen much practical
use yet, in its current state it is only slightly above a
proof-of-concept. There are several tests included that cover most
of the functionality. If you intend to use this plugin, please
make sure to independently check that it does what you want.
Enumerating ActiveRecord objects
Let’s start with a boring example, to get the general idea
Person.where(['lastname = ?', 'Smith']).map(&:firstname)
Slightly more interesting
Person.where(['lastname = ?', 'Smith']).inject(0) do |income, person| income += person.income end
siblings = Person.where( ['lastname = ?', 'Smith'], :include => :siblings).inject([0, 0]) do |memo, person| memo[0] += person.siblings.count memo[1] += 1 memo end siblings[0].to_f / siblings[1]
The point here is that your database may contain millions of Smiths,
for what it’s worth, the entire population of the earth could have
turned into Smiths, still, only a limited number of them would be
loaded into memory at the same time.
Internally, enumeration works in two completely different ways.
By default, a chunk of objects is retrieved from the database using
OFFSET and LIMIT clauses. This way is supported by all databases,
however it can be slow.
Therefore, the preferred alternative is to use a standard SQL cursor
for iterating over the result set of a query. It would simply look like
this
Person.where(['lastname = ?', 'Smith'], :use_cursor => true).inject(0) do |income, person| income += person.income end
A further parameter is how many records (or rows, in the case of cursors)
are fetched from the database in one access. The default is 1000. You can
set it explicitly like this
Person.where(['lastname = ?', 'Smith'], :fetch_count => 100).inject(0) do |income, person| income += person.income end
Performance Considerations
Some informal tests suggest that for large result sets without
included associations cursor-based enumeration can be faster than
OFFSET/LIMIT-based enumeration by a factor of 3 to 5.
When associations are included, things become more blurry. In this
case, cursor-based enumeration imposes an ordering on the result
set that can slow down things considerably.
The fetch count has considerable influence on performance, basically,
you can buy speed by spending memory.