This project is no longer maintained and has been archived. |
Searching
Introduction
Searching is a huge topic, hence an entire chapter has been devoted to a
behavior called Searchable
. It is a fulltext indexing and searching
tool. It can be used for indexing and searching both database and files.
Consider we have a class called NewsItem
with the following
definition:
// models/NewsItem.php
class NewsItem extends Doctrine_Record
{
public function setTableDefinition()
{
$this->hasColumn('title', 'string', 255);
$this->hasColumn('body', 'clob');
}
}
Here is the same example in YAML format. You can read more about YAML in the YAML Schema Files chapter:
Now lets say we have an application where users are allowed to search for different news items, an obvious way to implement this would be building a form and based on that forms submitted values build DQL queries such as:
// test.php
// ...
$ q = Doctrine_Query::create()
->from('NewsItem i')
->where('n.title LIKE ? OR n.content LIKE ?');
As the application grows these kind of queries become very slow. For
example when using the previous query with parameters %framework%
and %framework%
(this would be equivalent of 'find all news items
whose title or content contains word 'framework') the database would
have to traverse through each row in the table, which would naturally be
very very slow.
Doctrine solves this with its search component and inverse indexes. First lets alter our definition a bit:
// models/NewsItem.php
class NewsItem extends Doctrine_Record
{
// ...
public function setUp()
{
$this->actAs('Searchable', array(
'fields' => array('title', 'content')
)
);
}
}
Here is the same example in YAML format. You can read more about YAML in the YAML Schema Files chapter:
Lets check the SQL that is generated by the above models:
// test.php
// ...
$ sql = Doctrine_Core::generateSqlFromArray(array('NewsItem'));
echo $sql[0] . "";
echo $sql[1] . "";
echo $sql[2];
The above code would output the following SQL query:
CREATE TABLE news_item_index (id BIGINT,
keyword VARCHAR(200),
field VARCHAR(50),
position BIGINT,
PRIMARY KEY(id,
keyword,
field,
position)) ENGINE = INNODB
CREATE TABLE news_item (id BIGINT AUTO_INCREMENT,
title VARCHAR(255),
body LONGTEXT,
PRIMARY KEY(id)) ENGINE = INNODB
ALTER TABLE news_item_index ADD FOREIGN KEY (id) REFERENCES news_item(id) ON UPDATE CASCADE ON DELETE CASCADE
Here we tell Doctrine that NewsItem
class acts as searchable
(internally Doctrine loads Doctrine_Template_Searchable) and
fields title
and content
are marked as fulltext indexed fields.
This means that every time a NewsItem
is added or updated Doctrine
will:
- Update the inverse search index or
- Add new pending entry to the inverse search index (sometimes it can be efficient to update the inverse search index in batches)
Index structure
The structure of the inverse index Doctrine uses is the following:
[ (string) keyword] [ (string) field ] [ (integer) position ] [ (mixed) [foreign_keys] ]
Column | Description |
---|---|
keyword |
The keyword in the text that can be searched for. |
field |
The field where the keyword was found. |
position |
The position where the keyword was found. |
[foreign_keys] |
The foreign keys of the record being indexed. |
In the NewsItem
example the [foreign_keys]
would simply contain
one field named id
with foreign key references to NewsItem(id)
and with onDelete => CASCADE
constraint.
An example row in this table might look something like:
keyword | field | position | id |
---|---|---|---|
database |
title |
3 |
1 |
In this example the word database
is the third word of the title
field of NewsItem
with id of 1
.
Index Building
Whenever a searchable record is being inserted into database Doctrine executes the index building procedure. This happens in the background as the procedure is being invoked by the search listener. The phases of this procedure are:
- Analyze the text using a Doctrine_Search_Analyzer based class
- Insert new rows into index table for all analyzed keywords
Sometimes you may not want to update the index table directly when new searchable entries are added. Rather you may want to batch update the index table in certain intervals. For disabling the direct update functionality you'll need to set the batchUpdates option to true when you attach the behavior:
// models/NewsItem.php
class NewsItem extends Doctrine_Record
{
// ...
public function setUp()
{
$this->actAs('Searchable', array(
'fields' => array('title', 'content')
'batchUpdates' => true
)
);
}
}
Here is the same example in YAML format. You can read more about YAML in the YAML Schema Files chapter:
The actual batch updating procedure can be invoked with the
batchUpdateIndex()
method. It takes two optional arguments:
limit
and offset
. Limit can be used for limiting the number of
batch indexed entries while the offset can be used for setting the first
entry to start the indexing from.
First lets insert a new NewsItem
records:
// test.php
// ...
$ newsItem = new NewsItem();
$ newsItem->title = 'Test';
$ newsItem->body = 'test';
$ newsItem->save();
If you don't have batch updates enabled then the index will
be automatically updated for you when you insert or update
|
// test.php
// ...
$ newsItemTable = Doctrine_Core::getTable('NewsItem');
$ newsItemTable->batchUpdateIndex();
Text Analyzers
By default Doctrine uses Doctrine_Search_Analyzer_Standard for analyzing the text. This class performs the following things:
- Strips out stop-keywords (such as 'and', 'if' etc.) As many commonly used words such as 'and', 'if' etc. have no relevance for the search, they are being stripped out in order to keep the index size reasonable.
- Makes all keywords lowercased. When searching words 'database' and 'DataBase' are considered equal by the standard analyzer, hence the standard analyzer lowercases all keywords.
- Replaces all non alpha-numeric marks with whitespace. In normal text many keywords might contain non alpha-numeric chars after them, for example 'database.'. The standard analyzer strips these out so that 'database' matches 'database.'.
- Replaces all quotation marks with empty strings so that "O'Connor" matches "oconnor"
You can write your own analyzer class by making a class that implements
Doctrine_Search_Analyzer_Interface. Here is an example where we
create an analyzer named MyAnalyzer
:
// models/MyAnalyzer.php
class MyAnalyzer implements Doctrine_Search_Analyzer_Interface
{
public function analyze($text)
{
$text = trim($text);
return $text;
}
}
The search analyzers must only contain one method named
|
This analyzer can then be applied to the search object as follows:
// test.php
// ...
$ newsItemTable = Doctrine_Core::getTable('NewsItem');
$ search = $newsItemTable
->getTemplate('Doctrine_Template_Searchable')
->getPlugin();
$ search->setOption('analyzer', new MyAnalyzer());
Query language
Doctrine_Search provides a query language similar to Apache Lucene. The Doctrine_Search_Query converts human readable, easy-to-construct search queries to their complex DQL equivalents which are then converted to SQL like normal.
Performing Searches
Here is a simple example to retrieve the record ids and relevance data.
// test.php
// ...
$ newsItemTable = Doctrine_Core::getTable('NewsItem');
$ results = $newsItemTable->search('test');
print_r($results);
The above code executes the following query:
SELECT
COUNT(keyword) AS relevance,
id
FROM article_index
WHERE id IN (SELECT
id
FROM article_index WHERE keyword = ?)
AND id IN (SELECT
id
FROM article_index
WHERE keyword = ?)
GROUP BY id
ORDER BY relevance DESC
The output of the code above would be the following:
$ php test.php
Array
(
[0] => Array
(
[relevance] => 1
[id] => 1
)
)
Now you can use those results in another query to retrieve the actual
NewsItem
objects:
// test.php
// ...
$ ids = array();
foreach ($results as $result) {
$ids[] = $result['id'];
}
$ q = Doctrine_Query::create()
->from('NewsItem i')
->whereIn('i.id', $ids);
$ newsItems = $q->execute();
print_r($newsItems->toArray());
The above example would produce the following output:
$ php test.php
Array
(
[0] => Array
(
[id] => 1
[title] => Test
[body] => test
)
)
You can optionally pass the search()
function a query object to
modify with a where condition subquery to limit the results using the
search index.
// test.php
// ...
$ q = Doctrine_Query::create()
->from('NewsItem i');
$ q = Doctrine_Core::getTable('Article')
->search('test', $q);
echo $q->getSqlQuery();
The above call to getSql()
would output the following SQL query:
SELECT
n.id AS n**id,
n.title AS n**title,
n.body AS n__body
FROM news_item n
WHERE n.id IN (SELECT
id
FROM news_item_index
WHERE keyword = ?
GROUP BY id)
Now we can execute the query and get the NewsItem
objects:
// test.php
// ...
$ newsItems = $q->execute();
print_r($newsItems->toArray());
The above example would produce the following output:
$ php test.php
Array
(
[0] => Array
(
[id] => 1
[title] => Test
[body] => test
)
)
File searches
As stated before Doctrine_Search can also be used for searching files. Lets say we have a directory which we want to be searchable. First we need to create an instance of Doctrine_Search_File which is a child of Doctrine_Search providing some extra functionality needed for the file searches.
// test.php
// ...
$ search = new Doctrine_Search_File();
Second thing to do is to generate the index table. By default Doctrine
names the database index class as FileIndex
.
Lets check the SQL that is generated by the above models created:
// test.php
// ...
$ sql = Doctrine_Core::generateSqlFromArray(array('FileIndex'));
The above code would output the following SQL query:
CREATE TABLE file_index (url VARCHAR(255),
keyword VARCHAR(200),
field VARCHAR(50),
position BIGINT,
PRIMARY KEY(url,
keyword,
field,
position)) ENGINE = INNODB
You can create the actual table in the database by using the
Doctrine_Core::createTablesFromArray()
method:
// test.php
// ...
Doctrine_Core::createTablesFromArray(array('FileIndex'));
Now we can start using the file searcher. In this example lets just
index the models
directory:
// test.php
// ...
$ search->indexDirectory('models');
The indexDirectory()
iterates recursively through given directory
and analyzes all files within it updating the index table as necessary.
Finally we can start searching for pieces of text within the indexed files:
// test.php
// ...
$ results = $search->search('hasColumn');
print_r($results);
The above example would produce the following output:
$ php test.php
Array
(
[0] => Array
(
[relevance] => 2
[url] => models/generated/BaseNewsItem.php
)
)
Conclusion
Now that we have learned all about the Searchable
behavior we are
ready to learn in more detail about the NestedSet
behavior in the
Hierarchical Data chapter. The NestedSet
is a large
topic like the Searchable
behavior so it got its own dedicated
chapter as well.