FParsec-Pipes


FParsec-Pipes: making FParsec parsers even more concise and consistent.

FParsec is an F# library for writing parsers. This library extends FParsec with a new set of combinators, which I believe make it even easier to translate from a formal grammar like EBNF to F# parsing code and end up with a fast, highly readable parser.

Show me an example

Here's a parser for ISO-8601-formatted datetimes.

// Parses `count` digits and returns the result as an integer.
let digits (count : int) =
    %% +.(qty.[count] * digit)
    -|> (String >> Int32.Parse)

// Parses a yyyy-mm-dd date and returns the result as a tuple of 3 integers.
let date =
    %% +.digits 4 -- '-' -- +.digits 2 -- '-' -- +.digits 2
    -%> auto

// Parses a hh:mm:ss time and returns the result as a tuple of 3 integers.
let time =
    %% +.digits 2 -- ':' -- +.digits 2 -- ':' -- +.digits 2
    -%> auto

// Parses a DateTime in the format yyyy-mm-ddThh:mm:ss
let datetime : Parser<_, unit> =
    %% +.date -- 'T' -- +.time
    -|> fun (yyyy, mm, dd) (hh, mi, ss) ->
        new DateTime(yyyy, mm, dd, hh, mi, ss)

What are the features?

Default parsers for values

The unary operator % can be used to convert various values to the "obvious" corresponding parser. This cuts down on clutter from explicitly using pstring, pchar, and others. See the examples below:

// Expression:              // Equivalent to:
%"bob"                      // pstring "bob"
%ci "jim"                   // pstringCI "jim" // case insensitive
%'b'                        // pchar 'b'
%ci 'b'                     // pcharCI 'b' // case insensitive
%['a'; 'b']                 // choice [pchar 'a'; pchar 'b']
%["alice"; "bob"]           // choice [pstring "alice"; pstring "bob"]
%[pint32; preturn (-1)]     // choice [pint32; preturn (-1)]

The value you pass to % can also be of type DefaultParserOf<'a>, which you can obtain by using p<'a>. If 'a is a supported type, the resulting parser will be one to parse any 'a.

This works for several primitive types, including all integer types.

// Expression:  // Equivalent to:
%p<int>         // pint32
%p<uint16>      // puint16
%p<float>       // pfloat
%p<Position>    // getPosition

You can make p work for your own types, too. Simply define:

static member DefaultParser : Parser<'a, 'u>

type Vector3D =
    {
        X : float
        Y : float
        Z : float
    }
    // Parses a vector in the format (x, y, z)
    static member DefaultParser =
        let comma = spaces >>. %',' >>. spaces
        pipe3
            (%'(' >>. spaces >>. %p<float>)
            (comma >>. %p<float>)
            (comma >>. %p<float> .>> spaces .>> %')')
            (fun x y z -> { X = x; Y = y; Z = z })

%p<Vector3D> // equivalent to Vector3D.DefaultParser

Pipes

Notice that the above DefaultParser example used the function pipe3, and the combinators >>. and .>>. Using vanilla FParsec, when you want to make a parser like "parserA followed by parserB followed by parserC", you must choose between >>., .>>, .>>., pipe2 through pipe5, or tuple2 through tuple5, depending on which of the chained parsers you want to consume the output of.

This means that two parsers written to consume the same grammar may look quite different, if their authors had different requirements about how much information to capture in their output AST. A "recognizer" that simply validates the source passed to it will have a very different structure from a parser that produces an AST.

FParsec-Pipes replaces all such sequencing combinators with "pipes", so named because they fill the purpose of the pipe2 family of operators. The rules for combining parsers with these operators are simple:

  • Start the pipe with %% followed by the first parser.
  • Add additional parsers to the pipe with the -- operator.
  • "Capture" parser outputs in the pipe with the +. prefix operator.
  • Complete the pipe with -|> by supplying a function to combine all the "captured" outputs.

All the operators use % to convert their operands to parsers, so you can use raw characters and strings in the pipeline. Here's the rewritten version of the Vector3D parser.

let vector3pipe : Parser<_, unit> =
    %% '(' -- spaces
    -- +.p<float> -- spaces -- ',' -- spaces
    -- +.p<float> -- spaces -- ',' -- spaces
    -- +.p<float> -- spaces -- ')'
    -|> fun x y z -> { X = x; Y = y; Z = z }

Sometimes it's overkill to write a function to combine the captured outputs of a pipe. Maybe the pipe has no outputs, or only one, or you just need a tuple.

For pipes with no outputs, you can give any value to -|> and that value will be returned from the parser on success. This is like the >>% operator in FParsec.

let unitexample : Parser<unit, unit> =
    %% "this pipe" -- "has no" -- "captures" -|> ()

// Always produces the value 1337 on success
let constexample : Parser<int, unit> =
    %% "some text" -|> 1337

For pipes with 1 to 5 outputs, you can automatically combine them into a tuple by using -%> auto as the terminator.

// Parses an int
let auto1example : Parser<_, unit> =
    %% "this pipe" -- "has one" -- +.p<int> -- "capture"
    -%> auto

// Parse a tuple (float, int)
let auto2example : Parser<_, unit> =
    %% "this pipe" -- +.p<float> -- "has two" -- +.p<int> -- "captures"
    -%> auto

Backtracking with pipes

Sometimes it is necessary to backtrack in a parser. For example, suppose that at a given place in a file format, it is valid to find either a date/time in ISO 8601 format, or a Unix timestamp as a 64-bit integer.

Here is one attempt at this:

type Timestamp =
    | UnixTimestamp of int64
    | CalendarTimestamp of DateTime

let unixTimestamp = %% +.p<int64> -|> UnixTimestamp
let calendarTimestamp =
    let digits (count : int) = %% +.(qty.[count] * digit) -|> (String >> Int32.Parse)
    %% +.digits 4 -- '-' -- +.digits 2 -- '-' -- +.digits 2 -- 'T'
    -- +.digits 2 -- ':' -- +.digits 2 -- ':' -- +.digits 2
    -|> fun yyyy mm dd h m s ->
        new DateTime(yyyy, mm, dd, h, m, s) |> CalendarTimestamp

let timestamp : Parser<_, unit> =
    %[calendarTimestamp; unixTimestamp]

The problem with this approach is that, given the Unix timestamp input 1458730894, the first part of calendarTimestamp will succeed since it successfully parses 4 digits. It will fail when it encounters 7 as the next character instead of the expected -, but since it's already consumed input and advanced the parser state, the alternative unixTimestamp parser will not be attempted.

The solution is to add backtracking to calendarTimestamp. What we want is to ensure that, until we've seen the first -, the whole pipe can be backtracked to the beginning.

Vanilla FParsec provides the attempt, >>?, .>>?, and .>>.? combinators for this type of situation. However, there's no need to abandon the pipe syntax -- it supports backtracking too, with the ?- and -? operators.

The ?- operator wraps everything to its left in attempt.

let calendarTimestampBacktrack1 : Parser<_, unit> =
    let digits (count : int) = %% +.(qty.[count] * digit) -|> (String >> Int32.Parse)
    %% +.digits 4 -- '-' ?- +.digits 2 -- '-' -- +.digits 2 -- 'T'
                      // ^^ Backtrack the whole pipe if anything to the left of this fails.
    -- +.digits 2 -- ':' -- +.digits 2 -- ':' -- +.digits 2
    -|> fun yyyy mm dd h m s ->
        new DateTime(yyyy, mm, dd, h, m, s) |> CalendarTimestamp

The -? also wraps everything to its left in attempt, but also joins its left and right together using .>>.?, so that if the right side fails without changing the parser state, the pipe will backtrack to the beginning.

This is a bit harder to understand, but can sometimes produce better error messages.

let calendarTimestampBacktrack2 : Parser<_, unit> =
    let digits (count : int) = %% +.(qty.[count] * digit) -|> (String >> Int32.Parse)
    %% +.digits 4 -? '-' -- +.digits 2 -- '-' -- +.digits 2 -- 'T'
                  // ^^ Backtrack the whole pipe if anything to the left of this fails,
                  // or if the parser to the right fails without changing the parser state.
    -- +.digits 2 -- ':' -- +.digits 2 -- ':' -- +.digits 2
    -|> fun yyyy mm dd h m s ->
        new DateTime(yyyy, mm, dd, h, m, s) |> CalendarTimestamp

You can think of ?- and -? as marking the end of ambiguity.

?- says "if we've gotten this far, we're definitely looking at something that is supposed to match this parser, and if it doesn't, that's a syntax error."

-? is similar, but changes the condition to "if we've gotten this far, and the next little bit doesn't immediately look wrong".

Repetition and separation

FParsec includes many combinators for defining parsers like "zero or more of parserA" or "one or more of parserA separated by parserB". These include many, many1, sepBy, and sepEndBy.

FParsec-Pipes uses F#'s slice notation to handle the concept of ranges. You can slice into qty to get a Range object, and multiply a parser (or a value that can be converted to a parser with %) by that range.

The resulting parser parses a System.Collections.Generic.List (not an F# list) of values.

'a' * qty.[3]    // parse exactly 3 'a's   -- similar to `parray 3 %'a'`
'a' * qty.[1..]  // parse one or more 'a's -- similar to `many1 %'a'`
'a' * qty.[0..]  // parse 0 or more 'a's   -- similar to `many %'a'`
'a' * qty.[..5]  // parse up to 5 'a's
'a' * qty.[3..6] // parse at least 3 and at most 6 'a's
qty.[3] * 'a'    // the range can be on either side of the *

For separation, / is also overloaded on ranges. Another variant, /., allows (but does not require) a separator to appear after the last element.

Mathematically this is of course nonsense, but / was chosen because you can say that the elements are "divided" by their separators, in the same way that a fence divides pieces of property.

'a' * (qty.[0..] / ',') // parse 0 or more 'a's separated by ','s
                        // similar to sepBy %'a' %','

qty.[0..] / ',' * 'a'   // same as above, just written in a different order
                        // this may be preferable as it doesn't require parentheses

qty.[1..] /. ',' * 'a'  // parse one or more 'a's separated by ','s, with an optional trailing ','
                        // similar to sepEndBy1 %'a' %','

let comma() = %% "," -- spaces -|> ()
let element() = %% 'a' -- spaces -|> ()
in qty.[2..10] / comma() * element() // parse 2 to 10 'a's separated by commas, with whitespace allowed
namespace System
namespace FParsec
namespace FParsec.Pipes
val digits : count:int -> Parser<int,'a>

Full name: Intro.digits
val count : int
Multiple items
val int : value:'T -> int (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.int

--------------------
type int = int32

Full name: Microsoft.FSharp.Core.int

--------------------
type int<'Measure> = int

Full name: Microsoft.FSharp.Core.int<_>
val qty : RangeDefiner

Full name: FParsec.Pipes.Many.qty
val digit : Parser<char,'u>

Full name: FParsec.CharParsers.digit
Multiple items
type String =
  new : value:char -> string + 7 overloads
  member Chars : int -> char
  member Clone : unit -> obj
  member CompareTo : value:obj -> int + 1 overload
  member Contains : value:string -> bool
  member CopyTo : sourceIndex:int * destination:char[] * destinationIndex:int * count:int -> unit
  member EndsWith : value:string -> bool + 2 overloads
  member Equals : obj:obj -> bool + 2 overloads
  member GetEnumerator : unit -> CharEnumerator
  member GetHashCode : unit -> int
  ...

Full name: System.String

--------------------
String(value: nativeptr<char>) : unit
String(value: nativeptr<sbyte>) : unit
String(value: char []) : unit
String(c: char, count: int) : unit
String(value: nativeptr<char>, startIndex: int, length: int) : unit
String(value: nativeptr<sbyte>, startIndex: int, length: int) : unit
String(value: char [], startIndex: int, length: int) : unit
String(value: nativeptr<sbyte>, startIndex: int, length: int, enc: Text.Encoding) : unit
type Int32 =
  struct
    member CompareTo : value:obj -> int + 1 overload
    member Equals : obj:obj -> bool + 1 overload
    member GetHashCode : unit -> int
    member GetTypeCode : unit -> TypeCode
    member ToString : unit -> string + 3 overloads
    static val MaxValue : int
    static val MinValue : int
    static member Parse : s:string -> int + 3 overloads
    static member TryParse : s:string * result:int -> bool + 1 overload
  end

Full name: System.Int32
Int32.Parse(s: string) : int
Int32.Parse(s: string, provider: IFormatProvider) : int
Int32.Parse(s: string, style: Globalization.NumberStyles) : int
Int32.Parse(s: string, style: Globalization.NumberStyles, provider: IFormatProvider) : int
val date : Parser<(int * int * int),unit>

Full name: Intro.date
val auto : DefaultEnding

Full name: FParsec.Pipes.Pipes.auto
val time : Parser<(int * int * int),unit>

Full name: Intro.time
val datetime : Parser<DateTime,unit>

Full name: Intro.datetime
type Parser<'Result,'UserState> = CharStream<'UserState> -> Reply<'Result>

Full name: FParsec.Primitives.Parser<_,_>
type unit = Unit

Full name: Microsoft.FSharp.Core.unit
val yyyy : int
val mm : int
val dd : int
val hh : int
val mi : int
val ss : int
Multiple items
type DateTime =
  struct
    new : ticks:int64 -> DateTime + 10 overloads
    member Add : value:TimeSpan -> DateTime
    member AddDays : value:float -> DateTime
    member AddHours : value:float -> DateTime
    member AddMilliseconds : value:float -> DateTime
    member AddMinutes : value:float -> DateTime
    member AddMonths : months:int -> DateTime
    member AddSeconds : value:float -> DateTime
    member AddTicks : value:int64 -> DateTime
    member AddYears : value:int -> DateTime
    ...
  end

Full name: System.DateTime

--------------------
DateTime()
   (+0 other overloads)
DateTime(ticks: int64) : unit
   (+0 other overloads)
DateTime(ticks: int64, kind: DateTimeKind) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, calendar: Globalization.Calendar) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, kind: DateTimeKind) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, calendar: Globalization.Calendar) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int) : unit
   (+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int, kind: DateTimeKind) : unit
   (+0 other overloads)
val ci : x:'a -> CaseInsensitive<'a>

Full name: FParsec.Pipes.DefaultParsers.ci
val pint32 : Parser<int32,'u>

Full name: FParsec.CharParsers.pint32
val preturn : 'a -> Parser<'a,'u>

Full name: FParsec.Primitives.preturn
val p<'a> : DefaultParserOf<'a>

Full name: FParsec.Pipes.DefaultParsers.p
Multiple items
val uint16 : value:'T -> uint16 (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.uint16

--------------------
type uint16 = UInt16

Full name: Microsoft.FSharp.Core.uint16
Multiple items
val float : value:'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

--------------------
type float = Double

Full name: Microsoft.FSharp.Core.float

--------------------
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>
Multiple items
type Position =
  new : streamName:string * index:int64 * line:int64 * column:int64 -> Position
  member Column : int64 with get, set
  member CompareTo : other:Position -> int
  member Equals : obj:obj -> bool + 1 overload
  member GetHashCode : unit -> int
  member Index : int64 with get, set
  member Line : int64 with get, set
  member StreamName : string with get, set
  member ToString : unit -> string
  static member Compare : left:Position * right:Position -> int

Full name: FParsec.Position

--------------------
Position(streamName: string, index: int64, line: int64, column: int64) : unit
type Vector3D =
  {X: float;
   Y: float;
   Z: float;}
  static member DefaultParser : Parser<Vector3D,unit>

Full name: Intro.Vector3D
Vector3D.X: float
Vector3D.Y: float
Vector3D.Z: float
Multiple items
static member Vector3D.DefaultParser : Parser<Vector3D,unit>

Full name: Intro.Vector3D.DefaultParser

--------------------
type DefaultParser =
  | DefaultParser
  static member ( %!!~~% ) : DefaultParser:DefaultParser * list:'a0 list -> (CharStream<'a2> -> Reply<'a1>) (requires member ( %!!~~% ))
  static member ( %!!~~% ) : DefaultParser:DefaultParser * literal:string -> Parser<string,'a0>
  static member ( %!!~~% ) : DefaultParser:DefaultParser * literal:char -> Parser<char,'a0>
  static member ( %!!~~% ) : DefaultParser:DefaultParser * existing:Parser<'a,'u> -> Parser<'a,'u>
  static member ( %!!~~% ) : DefaultParser:DefaultParser * cap:Fitting<'a0,'a1,'a2> -> Fitting<'a0,'a1,'a2> (requires 'a2 :> Appendable)

Full name: FParsec.Pipes.DefaultParsers.DefaultParser
val comma : Parser<unit,unit>
val spaces : Parser<unit,'u>

Full name: FParsec.CharParsers.spaces
val pipe3 : Parser<'a,'u> -> Parser<'b,'u> -> Parser<'c,'u> -> ('a -> 'b -> 'c -> 'd) -> Parser<'d,'u>

Full name: FParsec.Primitives.pipe3
val x : float
val y : float
val z : float
val vector3pipe : Parser<Vector3D,unit>

Full name: Intro.vector3pipe
val unitexample : Parser<unit,unit>

Full name: Intro.unitexample
val constexample : Parser<int,unit>

Full name: Intro.constexample
val auto1example : Parser<int32,unit>

Full name: Intro.auto1example
val auto2example : Parser<(float * int32),unit>

Full name: Intro.auto2example
type Timestamp =
  | UnixTimestamp of int64
  | CalendarTimestamp of DateTime

Full name: Intro.Timestamp
union case Timestamp.UnixTimestamp: int64 -> Timestamp
Multiple items
val int64 : value:'T -> int64 (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.int64

--------------------
type int64 = Int64

Full name: Microsoft.FSharp.Core.int64

--------------------
type int64<'Measure> = int64

Full name: Microsoft.FSharp.Core.int64<_>
union case Timestamp.CalendarTimestamp: DateTime -> Timestamp
val unixTimestamp : Parser<Timestamp,unit>

Full name: Intro.unixTimestamp
val calendarTimestamp : Parser<Timestamp,unit>

Full name: Intro.calendarTimestamp
val digits : (int -> Parser<int,'a>)
val h : int
val m : int
val s : int
val timestamp : Parser<Timestamp,unit>

Full name: Intro.timestamp
val calendarTimestampBacktrack1 : Parser<Timestamp,unit>

Full name: Intro.calendarTimestampBacktrack1
val calendarTimestampBacktrack2 : Parser<Timestamp,unit>

Full name: Intro.calendarTimestampBacktrack2
val comma : unit -> Parser<unit,'a>

Full name: Intro.comma
val element : unit -> Parser<unit,'a>

Full name: Intro.element
Fork me on GitHub