Module Biocaml_fasta (.ml)

module Biocaml_fasta: sig .. end
FASTA files. The FASTA family of file formats has different incompatible descriptions (1, 2, 3, etc.). Roughly FASTA files are in the format:

    # comment
    # comment
    ...
    >header
    sequence
    >header
    sequence
    ...
   

where the sequence may span multiple lines, and a ';' may be used instead of '#' to start comments.

Header lines begin with the '>' character. It is often considered that all characters until the first whitespace define the name of the content, and any characters beyond that define additional information in a format specific to the file provider.

Sequence are most often a sequence of characters denoting nucleotides or amino acids. However, sometimes FASTA files provide quality scores, either as ASCII encoded, e.g. as supported by modules Biocaml_phred_score and Biocaml_solexa_score, or as space-separated integers.

Thus, the FASTA format is really a family of formats with a fairly loose specification of the header and content formats. The only consistently followed meaning of the format is:

Names used throughout this module use sequence to generically mean either kind of data found in the sequence lines, char_seq to mean specifically a sequence of characters, and int_seq to mean specifically a sequence of integers.

Parsing functions throughout this module take the following optional arguments:



type char_seq = string 
type int_seq = int list 
type 'a item = {
   header :string;
   sequence :'a;
}
module Error: sig .. end
Errors.
exception Error of Error.t
val in_channel_to_char_seq_item_stream : ?buffer_size:int ->
?filename:string ->
?pedantic:bool ->
?sharp_comments:bool ->
?semicolon_comments:bool ->
Pervasives.in_channel -> char_seq item Stream.t
Returns a stream of char_seq items. Initial comments are discarded.
Raises Error in case of any errors.
val in_channel_to_int_seq_item_stream : ?buffer_size:int ->
?filename:string ->
?pedantic:bool ->
?sharp_comments:bool ->
?semicolon_comments:bool ->
Pervasives.in_channel -> int_seq item Stream.t
Returns a stream of int_seq items. Initial comments are discarded.
Raises Error in case of any errors.
module Result: sig .. end
module Transform: sig .. end
Low-level transforms.

S-expressions

val sexp_of_char_seq : char_seq -> Sexplib.Sexp.t
val char_seq_of_sexp : Sexplib.Sexp.t -> char_seq
val sexp_of_int_seq : int_seq -> Sexplib.Sexp.t
val int_seq_of_sexp : Sexplib.Sexp.t -> int_seq
val sexp_of_item : ('a -> Sexplib.Sexp.t) -> 'a item -> Sexplib.Sexp.t
val item_of_sexp : (Sexplib.Sexp.t -> 'a) -> Sexplib.Sexp.t -> 'a item