module Biocaml_fasta:sig
..end
# comment # comment ... >header sequence >header sequence ...
where the sequence may span multiple lines, and a ';' may be used instead of '#' to start comments.
Header lines begin with the '>' character. It is often considered that all characters until the first whitespace define the name of the content, and any characters beyond that define additional information in a format specific to the file provider.
Sequence are most often a sequence of characters denoting
nucleotides or amino acids. However, sometimes FASTA files provide
quality scores, either as ASCII encoded, e.g. as supported by
modules Biocaml_phred_score
and Biocaml_solexa_score
, or as space-separated integers.
Thus, the FASTA format is really a family of formats with a fairly loose specification of the header and content formats. The only consistently followed meaning of the format is:
sequence
to generically
mean either kind of data found in the sequence lines, char_seq
to mean specifically a sequence of characters, and int_seq
to
mean specifically a sequence of integers.
Parsing functions throughout this module take the following optional arguments:
filename
- used only for error messages when the data source
is not the file.pedantic
- if true, which is the default, report more
errors: Biocaml_transform.no_error lines, non standard
characters.sharp_comments
and semicolon_comments
- if true, allow
comments beginning with a '#' or ';' character,
respectively. Setting both to true is okay, although it is not
recommended to have such files. Setting both to false implies that
comments are disallowed.typechar_seq =
string
typeint_seq =
int list
type 'a
item = {
|
header : |
|
sequence : |
module Error:sig
..end
exception Error of Error.t
val in_channel_to_char_seq_item_stream : ?buffer_size:int ->
?filename:string ->
?pedantic:bool ->
?sharp_comments:bool ->
?semicolon_comments:bool ->
Pervasives.in_channel -> char_seq item Stream.t
char_seq item
s. Initial comments are
discarded.Error
in case of any errors.val in_channel_to_int_seq_item_stream : ?buffer_size:int ->
?filename:string ->
?pedantic:bool ->
?sharp_comments:bool ->
?semicolon_comments:bool ->
Pervasives.in_channel -> int_seq item Stream.t
int_seq item
s. Initial comments are
discarded.Error
in case of any errors.module Result:sig
..end
module Transform:sig
..end
val sexp_of_char_seq : char_seq -> Sexplib.Sexp.t
val char_seq_of_sexp : Sexplib.Sexp.t -> char_seq
val sexp_of_int_seq : int_seq -> Sexplib.Sexp.t
val int_seq_of_sexp : Sexplib.Sexp.t -> int_seq
val sexp_of_item : ('a -> Sexplib.Sexp.t) -> 'a item -> Sexplib.Sexp.t
val item_of_sexp : (Sexplib.Sexp.t -> 'a) -> Sexplib.Sexp.t -> 'a item