Parakeet
Overview
Parakeet is a .NET parser combinator library focused on Parsing Expression Grammars (PEG). It provides a declarative approach to defining grammars using C# classes and rules, where each grammar rule is a property that combines primitive parsers with combinators like sequence, choice, repetition, and lookahead. Parakeet generates parse trees from input text, which can then be traversed for interpretation or transformation. The library emphasizes simplicity and readability over raw performance.
NuGet Package
Parakeet-- core parser combinator library
Grammar Definition
using Parakeet;
// Define a grammar by inheriting from Grammar
public class ArithmeticGrammar : Grammar
{
// Primitive rules
public Rule Digit => MatchChar(char.IsDigit);
public Rule Letter => MatchChar(char.IsLetter);
public Rule WS => MatchChar(char.IsWhiteSpace).ZeroOrMore();
// Number: one or more digits, optionally with decimal point
public Rule Integer => Digit.OneOrMore();
public Rule Decimal => Integer + MatchChar('.') + Integer;
public Rule Number => (Decimal | Integer) + WS;
// Identifier
public Rule Identifier => (Letter + (Letter | Digit | MatchChar('_')).ZeroOrMore()) + WS;
// Operators
public Rule AddOp => (MatchChar('+') | MatchChar('-')) + WS;
public Rule MulOp => (MatchChar('*') | MatchChar('/')) + WS;
// Expression grammar (recursive)
public Rule Factor => Number | (MatchChar('(') + WS + Expr + MatchChar(')') + WS);
public Rule Term => Factor + (MulOp + Factor).ZeroOrMore();
public Rule Expr => Term + (AddOp + Term).ZeroOrMore();
// Entry point
public override Rule Start => WS + Expr;
}
Parsing Input
using Parakeet;
var grammar = new ArithmeticGrammar();
var input = "3 + 4 * (2 - 1)";
var parseResult = grammar.Parse(input);
if (parseResult.Success)
{
Console.WriteLine("Parse succeeded!");
Console.WriteLine(parseResult.Node.ToXml());
}
else
{
Console.WriteLine($"Parse failed at position {parseResult.Position}");
Console.WriteLine($"Expected: {parseResult.Expected}");
}
Common Combinators
using Parakeet;
public class CommonPatterns : Grammar
{
// Sequence: A then B then C
public Rule Sequence => RuleA + RuleB + RuleC;
// Choice: A or B or C (ordered, PEG semantics)
public Rule Choice => RuleA | RuleB | RuleC;
// Repetition
public Rule ZeroOrMoreDigits => Digit.ZeroOrMore();
public Rule OneOrMoreDigits => Digit.OneOrMore();
public Rule OptionalSign => (MatchChar('+') | MatchChar('-')).Optional();
// Lookahead (does not consume input)
public Rule FollowedByDigit => RuleA + Digit.Lookahead();
public Rule NotFollowedByDigit => RuleA + Digit.NotAt();
// String matching
public Rule Keyword => MatchString("function") + WS;
public Rule Arrow => MatchString("=>") + WS;
// Character classes
public Rule HexDigit => MatchChar(c =>
char.IsDigit(c) || (c >= 'a' && c <= 'f') || (c >= 'A' && c <= 'F'));
// Named rules for better parse tree nodes
public Rule NamedNumber => Named(Number, "Number");
}
CSV Parser Example
using Parakeet;
public class CsvGrammar : Grammar
{
// Basic elements
public Rule Newline => MatchChar('\n') | MatchString("\r\n");
public Rule Comma => MatchChar(',');
public Rule Quote => MatchChar('"');
// Quoted field: handles escaped quotes (doubled)
public Rule EscapedQuote => MatchString("\"\"");
public Rule QuotedContent => (EscapedQuote | MatchChar(c => c != '"')).ZeroOrMore();
public Rule QuotedField => Quote + QuotedContent + Quote;
// Unquoted field: any chars except comma, quote, and newline
public Rule UnquotedField => MatchChar(c => c != ',' && c != '"' && c != '\n' && c != '\r').ZeroOrMore();
// Field: either quoted or unquoted
public Rule Field => QuotedField | UnquotedField;
// Row: fields separated by commas
public Rule Row => Field + (Comma + Field).ZeroOrMore();
// CSV file: rows separated by newlines
public Rule File => Row + (Newline + Row).ZeroOrMore() + Newline.Optional();
public override Rule Start => File;
}
// Usage
var csv = new CsvGrammar();
var result = csv.Parse("name,age,city\nAlice,30,\"New York\"\nBob,25,London");
if (result.Success)
{
// Traverse the parse tree to extract data
foreach (var row in result.Node.Children)
{
var fields = row.Children
.Where(n => n.RuleName == "Field")
.Select(n => n.Text)
.ToList();
Console.WriteLine(string.Join(" | ", fields));
}
}
Simple Programming Language Parser
using Parakeet;
public class MiniLangGrammar : Grammar
{
// Whitespace and basics
public Rule WS => MatchChar(c => c == ' ' || c == '\t').ZeroOrMore();
public Rule NL => (MatchString("\r\n") | MatchChar('\n')) + WS;
public Rule Digit => MatchChar(char.IsDigit);
public Rule Letter => MatchChar(char.IsLetter);
// Literals
public Rule Integer => Digit.OneOrMore() + WS;
public Rule StringLit =>
MatchChar('"') + MatchChar(c => c != '"').ZeroOrMore() + MatchChar('"') + WS;
public Rule BoolLit => (MatchString("true") | MatchString("false")) + WS;
public Rule Literal => Integer | StringLit | BoolLit;
// Identifiers
public Rule Ident => Letter + (Letter | Digit | MatchChar('_')).ZeroOrMore() + WS;
// Expressions
public Rule Atom => Literal | Ident | (MatchChar('(') + WS + Expr + MatchChar(')') + WS);
public Rule CompOp => (MatchString("==") | MatchString("!=") |
MatchString("<=") | MatchString(">=") |
MatchChar('<') | MatchChar('>')) + WS;
public Rule Expr => Atom + (CompOp + Atom).Optional();
// Statements
public Rule LetStmt => MatchString("let") + WS + Ident +
MatchChar('=') + WS + Expr + MatchChar(';') + WS;
public Rule PrintStmt => MatchString("print") + WS + Expr + MatchChar(';') + WS;
public Rule IfStmt => MatchString("if") + WS + Expr +
MatchChar('{') + WS + NL.ZeroOrMore() +
Statements +
MatchChar('}') + WS;
public Rule Statement => LetStmt | PrintStmt | IfStmt;
public Rule Statements => (Statement + NL.ZeroOrMore()).ZeroOrMore();
public override Rule Start => WS + NL.ZeroOrMore() + Statements;
}
Parse Tree Traversal
using Parakeet;
public static class ParseTreeInterpreter
{
public static object Evaluate(ParseNode node, Dictionary<string, object> env)
{
return node.RuleName switch
{
"Integer" => int.Parse(node.Text.Trim()),
"StringLit" => node.Text.Trim('"', ' '),
"BoolLit" => bool.Parse(node.Text.Trim()),
"Ident" => env[node.Text.Trim()],
"LetStmt" => EvaluateLet(node, env),
"PrintStmt" => EvaluatePrint(node, env),
_ => EvaluateChildren(node, env)
};
}
private static object EvaluateLet(ParseNode node, Dictionary<string, object> env)
{
var children = node.Children.ToList();
var name = children[0].Text.Trim();
var value = Evaluate(children[1], env);
env[name] = value;
return value;
}
private static object EvaluatePrint(ParseNode node, Dictionary<string, object> env)
{
var value = Evaluate(node.Children.First(), env);
Console.WriteLine(value);
return value;
}
private static object EvaluateChildren(ParseNode node, Dictionary<string, object> env)
{
object result = null!;
foreach (var child in node.Children)
result = Evaluate(child, env);
return result;
}
}
Parakeet vs Other Parsers
| Feature | Parakeet | Pidgin | FParsec | Regex |
|---|---|---|---|---|
| Language | C# | C# | F# | Any |
| Grammar style | PEG (class-based) | Combinator functions | Combinator functions | Pattern strings |
| Parse tree | Automatic | Manual construction | Manual construction | Capture groups |
| Recursion | Direct property refs | Forward references | Forward references | Not supported |
| Error messages | Position-based | Good | Excellent | Poor |
| Best for | Grammar-oriented DSLs | High-performance parsing | F# projects | Simple patterns |
Best Practices
- Define grammars as classes inheriting from
Grammarwith each rule as a property, using PEG operators (+for sequence,|for ordered choice) for readable grammar definitions. - Use
Named()to label important rules in the parse tree so traversal code can identify semantic nodes by name rather than position. - Handle whitespace explicitly by adding
+ WSafter token rules; PEG grammars do not skip whitespace automatically. - Use
.Lookahead()and.NotAt()for zero-width assertions to disambiguate grammar rules without consuming input. - Use
.Optional()for optional elements rather than choice with empty; it produces cleaner parse trees. - Define the grammar entry point via
override Rule Startso the parser knows which rule to begin parsing from. - Test grammars incrementally: verify each rule parses correctly in isolation before combining into complex grammars.
- Use ordered choice (
|) carefully in PEG grammars; alternatives are tried left-to-right and the first match wins, which can prevent later alternatives from being reached. - Traverse parse trees with pattern matching on
RuleNameto interpret or transform parsed results into domain objects. - For performance-critical parsing of large inputs, consider Pidgin or FParsec instead; Parakeet prioritizes grammar readability over raw throughput.