module Biocaml_fasta:FASTA files.sig
..end
FASTA files are in the format:
# comment # comment ... >header content >header content ...
where the content may span multiple lines.
Header lines begin with the '>' character. It is often considered that all characters until the first whitespace define the name of the content, and any characters beyond that define additional information in a format specific to the file provider.
The content section is most often a sequence of characters
denoting nucleotides, but also somtimes ASCII encoded quality
scores, e.g. as supported by the PhredScore
module. Sometimes, the quality scores are provided as space
separated integers.
Thus, the FASTA format is really a family of formats with a fairly loose specification of the header and content formats. The only consistently followed meaning of the format is:
exception Error of string
typerecord =
string * string
typerecordi =
string * int list
val enum_input : Batteries.IO.input ->
Biocaml_comments.t * record Batteries.Enum.t
val enum_of_file : string -> Biocaml_comments.t * record Batteries.Enum.t
val enum_inputi : Batteries.IO.input ->
Biocaml_comments.t * recordi Batteries.Enum.t