Simple Parser Example
Extend the Doctrine\Common\Lexer\AbstractLexer
class and implement
the getCatchablePatterns
, getNonCatchablePatterns
, and getType
methods. Here is a very simple example lexer implementation named CharacterTypeLexer
.
It tokenizes a string to T_UPPER
, T_LOWER
andT_NUMBER
tokens:
1 <?php
use Doctrine\Common\Lexer\AbstractLexer;
/**
* @extends AbstractLexer<CharacterTypeLexer::T_*, string>
*/
class CharacterTypeLexer extends AbstractLexer
{
const T_UPPER = 1;
const T_LOWER = 2;
const T_NUMBER = 3;
protected function getCatchablePatterns(): array
{
return [
'[a-bA-Z0-9]',
];
}
protected function getNonCatchablePatterns(): array
{
return [];
}
protected function getType(&$value): int
{
if (is_numeric($value)) {
return self::T_NUMBER;
}
if (strtoupper($value) === $value) {
return self::T_UPPER;
}
if (strtolower($value) === $value) {
return self::T_LOWER;
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Use CharacterTypeLexer
to extract an array of upper case characters:
1 <?php
class UpperCaseCharacterExtracter
{
public function __construct(private CharacterTypeLexer $lexer)
{
}
/** @return list<string> */
public function getUpperCaseCharacters(string $string): array
{
$this->lexer->setInput($string);
$this->lexer->moveNext();
$upperCaseChars = [];
while (true) {
if (!$this->lexer->lookahead) {
break;
}
$this->lexer->moveNext();
if ($this->lexer->token->isA(CharacterTypeLexer::T_UPPER)) {
$upperCaseChars[] = $this->lexer->token->value;
}
}
return $upperCaseChars;
}
}
$upperCaseCharacterExtractor = new UpperCaseCharacterExtracter(new CharacterTypeLexer());
$upperCaseCharacters = $upperCaseCharacterExtractor->getUpperCaseCharacters('1aBcdEfgHiJ12');
print_r($upperCaseCharacters);
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
The variable $upperCaseCharacters
contains all of the upper case
characters:
This is a simple example but it should demonstrate the low level API that can be used to build more complex parsers.