Update: On 8 Feb 2008 (exactly 1 day after my post), Evan merged an update into Ultrasphinx where "Records of the same class now load as a group" (version 1.8.1 revision 1661). Call that agility :-)

Until revision 1650 the behavior was as described below:
Ultrasphinx returns all the objects that are found in a full text search. Fine, ... but in reality, we only show a few short fields from the table and do not show the long descriptions in the list view. Also, Ultrasphinx by default fetches all the job objects one by one from the databases, yielding the behavior as shown in these kind of logs.

From the query log of the full text search engine (sphinx):

[Thu Feb 7 15:14:01.005 2008] 0.000 sec [ext/0/attr- 385 (0,20)] [complete] bat*
[Thu Feb 7 15:14:08.397 2008] 0.000 sec [ext/0/attr- 144 (0,20)] [complete] batc*
[Thu Feb 7 15:15:25.422 2008] 0.000 sec [ext/0/attr- 144 (0,20)] [complete] batch*
[Thu Feb 7 15:16:56.056 2008] 0.000 sec [ext/0/attr- 144 (0,20)] [complete] batc*

Searches in a 17,000 records set of test data, yielding hundreds of records, cost less than 1 ms (< 0.001 s). But then reading the full details of the 20 first matches with Active Record from mysql costs, 20 times 2 to 3 milliseconds:

Search Query: batc*
** ultrasphinx: searching for {"class_names"=>[], "sort_by"=>"sort_date", "parsed_query"=>"batc*", "sort_mode"=>"descending", "filters"=>{}, "weights"=>{}, "per_page"=>20, "query"=>"batc*", "page"=>1, "facets"=>[]}
** ultrasphinx: search returned 144/144 in 0.0 seconds.
Job Columns (0.003786) SHOW FIELDS FROM `jobs`
Job Load (0.002685) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 16686)
Job Load (0.002705) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 16345)
Job Load (0.002685) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 16126)
Job Load (0.002787) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 16057)
Job Load (0.002614) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 15989)
Job Load (0.002569) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 15912)
Job Load (0.002583) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 15913)
Job Load (0.003253) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 15570)
Job Load (0.003113) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 15445)
Job Load (0.003097) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 15251)
Job Load (0.003092) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 14758)
Job Load (0.003164) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 14725)
Job Load (0.003161) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 14727)
Job Load (0.003274) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 14734)
Job Load (0.003106) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 14482)
Job Load (0.003334) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 13903)
Job Load (0.003261) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 13914)
Job Load (0.003236) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 13805)
Job Load (0.003024) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 13804)
Job Load (0.003349) SELECT * FROM `jobs` WHERE (`jobs`.`id` = 13390)
Completed in 0.21281 (4 reqs/sec) | Rendering: 0.00009 (0%) | DB: 0.06969 (32%) | 200 OK [http://debian-new/jobs/search]

Rendering was set to :nothing => true, to no skew the results with rendering timing.

Turns out that Ultrasphinx has an option (reify = false) to the search.run method, that makes the method only return the ID's of the found jobs. With that option turned on, the development.log becomes:

Search Query: batc*
** ultrasphinx: searching for {"class_names"=>[], "sort_by"=>"sort_date", "parsed_query"=>"batc*", "sort_mode"=>"descending", "filters"=>{}, "weights"=>{}, "per_page"=>20, "query"=>"batc*", "page"=>1, "facets"=>[]}
** ultrasphinx: search returned 144/144 in 0.0 seconds.
Completed in 0.03075 (32 reqs/sec) | Rendering: 0.00008 (0%) | DB: 0.00593 (19%) | 200 OK [http://debian-new/jobs/search]

A 10 fold reduction of database access times from 69 ms to 6 ms :-)

Of course, we still need some actual data from the 20 job objects (from the records in the database), but that are only a few short columns (no "text" fields, only "strings"). And now I refer to the interesting paper on better use of mysql from Rails. So, I implemented the advice to always use a ":select => [ fieldlist ]".

First off, I start with grouping the find for all records in a find call to the database:

@search.run(false)
@job_ids=@search.results
@job_ids_tmp = []
# job_ids seems to contain an array of ["Job", id] items
for job_id in @job_ids
@job_ids_tmp << job_id[1]
end
@job_ids = @job_ids_tmp
@jobs = Job.find(@job_ids)

yielding:

Search Query: batch*
** ultrasphinx: searching for {"class_names"=>[], "sort_by"=>"sort_date", "parsed_query"=>"batch*", "sort_mode"=>"descending", "filters"=>{}, "weights"=>{}, "per_page"=>20, "query"=>"batch*", "page"=>1, "facets"=>[]}
** ultrasphinx: search returned 144/144 in 0.0 seconds.
Job Columns (0.003798) SHOW FIELDS FROM `jobs`
Job Load (0.007347) SELECT * FROM `jobs` WHERE (`jobs`.`id` IN (16686,16345,16126,16057,15989,15912,15913,15570,15445,15251,14758,14725,14727,14734,14482,13903,13914,13805,13804,13390))
Completed in 0.05351 (18 reqs/sec) | Rendering: 0.00009 (0%) | DB: 0.01704 (31%) | 200 OK [http://debian-new/jobs/search]

Reducing the search to 2 searches with a total of approx. 17 ms.

Next step is to reduce the select * to only the fields we really require for the list view in the search page:

@jobs = Job.find(@job_ids, :select => "id, active, ...")

Leading to a further reduction from 17 to 12 ms for this search (actually, not as significant as I had expected ...).

** ultrasphinx: searching for {"class_names"=>[], "sort_by"=>"sort_date", "parsed_query"=>"batch*", "sort_mode"=>"descending", "filters"=>{}, "weights"=>{}, "per_page"=>20, "query"=>"batch*", "page"=>1, "facets"=>[]}
** ultrasphinx: search returned 144/144 in 0.0 seconds.
Job Columns (0.003944) SHOW FIELDS FROM `jobs`
Job Load (0.002425) SELECT id, active, ... FROM `jobs` WHERE (`jobs`.`id` IN (16686,16345,16126,16057,15989,15912,15913,15570,15445,15251,14758,14725,14727,14734,14482,13903,13914,13805,13804,13390))
Completed in 0.04617 (21 reqs/sec) | Rendering: 0.00008 (0%) | DB: 0.01221 (26%) | 200 OK [http://debian-new/jobs/search]

When I include rendering, the global timing now becomes:

Search Query: batc*
** ultrasphinx: searching for {"class_names"=>[], "sort_by"=>"sort_date", "parsed_query"=>"batc*", "sort_mode"=>"descending", "filters"=>{}, "weights"=>{}, "per_page"=>20, "query"=>"batc*", "page"=>1, "facets"=>[]}
** ultrasphinx: search returned 144/144 in 0.0 seconds.
Job Columns (0.003757) SHOW FIELDS FROM `jobs`
Job Load (0.002497) SELECT id, active, ...
FROM `jobs` WHERE (`jobs`.`id` IN (16686,16345,16126,16057,15989,15912,15913,15570,15445,15251,14758,14725,14727,14734,14482,13903,13914,13805,13804,13390))
Rendered jobs/_search (0.01136)
Completed in 0.05846 (17 reqs/sec) | Rendering: 0.01195 (20%) | DB: 0.01253 (21%) | 200 OK [http://debian-new/jobs/search]

Caveat: tests are run in "development" mode (not production) and with the Rails included mysql adapter. So, we tried the production mysql (C++) adapter:

root@debian-new:~# gem install -r mysql
Need to update 4 gems from http://gems.rubyforge.org
....
complete
Select which gem to install for your platform (i486-linux)
1. mysql 2.7.3 (mswin32)
2. mysql 2.7.1 (mswin32)
3. mysql 2.7 (ruby)
4. mysql 2.6 (ruby)
5. mysql 2.5.1 (ruby)
6. Cancel installation
> 3
...
Building native extensions. This could take a while...
[...]
make clean
Successfully installed mysql-2.7
root@debian-new:~#

Now the warning:

WARNING: You're using the Ruby-based MySQL library that ships with Rails. This library is not suited for production. Please install the C-based MySQL library instead (gem install mysql).

is not longer shown.

Speed of the mysql queries improved a further factor of 3 :-)

Processing JobsController#search (for 192.168.0.67 at 2008-02-07 16:40:21) [POST]
Session ID: BAh7CDoOcmV0dXJuX3RvMCIKZmxhc2hJQzonQWN0aW9uQ29udHJvbGxlcjo6%0ARmxhc2g6OkZsYXNoSGFzaHsABjoKQHVzZWR7ADoPYWNjb3VudF9pZGkG--c88f17a31067c4086501a65f545ebf47474b6ca7
Parameters: {"search_address"=>"", "search_region"=>"", "action"=>"search", "controller"=>"jobs", "search_job"=>"batc"}
Account Columns (0.001801) SHOW FIELDS FROM `accounts`
Account Load (0.000164) SELECT * FROM `accounts` WHERE (`accounts`.`id` = 1)
Search Query: batc*
** ultrasphinx: searching for {"class_names"=>[], "sort_by"=>"sort_date", "parsed_query"=>"batc*", "sort_mode"=>"descending", "filters"=>{}, "weights"=>{}, "per_page"=>20, "query"=>"batc*", "page"=>1, "facets"=>[]}
** ultrasphinx: search returned 144/144 in 0.0 seconds.
Job Columns (0.001926) SHOW FIELDS FROM `jobs`
Job Load (0.000145) SELECT id, active, ... FROM `jobs` WHERE (`jobs`.`id` IN (16686,16345,16126,16057,15989,15912,15913,15570,15445,15251,14758,14725,14727,14734,14482,13903,13914,13805,13804,13390))
Rendered jobs/_search (0.01112)
Completed in 0.04905 (20 reqs/sec) | Rendering: 0.01170 (23%) | DB: 0.00404 (8%) | 200 OK [http://debian-new/jobs/search]

Now the rendering is clearly the dominant time consumer, and mysql access and sphinx search time being minimized. Overall, the time spent in mysql interactions was reduced from 69 ms to 4 ms :-)