Module Biocaml_sam (.ml)

SAM files and SAM-alignements high-level representation.

module Biocaml_sam: 

Basic Types

Low-Level Items

type raw_alignment = {
   qname : string;
   flag : int;
   rname : string;
   pos : int;
   mapq : int;
   cigar : string;
   rnext : string;
   pnext : int;
   tlen : int;
   seq : string;
   qual : string;
   optional : (string * char * string) list;
The contents of an alignment line.
type raw_item = [ `alignment of raw_alignment
| `comment of string
| `header of string * (string * string) list ]
The "items" of a parsed SAM file stream.

High-Level Items

type reference_sequence = {
   ref_name : string;
   ref_length : int;
   ref_assembly_identifier : string option;
   ref_checksum : string option;
   ref_species : string option;
   ref_uri : string option;
   ref_unknown : (string * string) list;
Definition of a reference sequence.
val reference_sequence : ?assembly_identifier:string ->
?checksum:string ->
?species:string ->
?uri:string ->
?unknown_data:(string * string) list ->
string -> int -> reference_sequence
Create a reference sequence.

module Flags: 

Manipulate the alignment flags.
type t = private int 
Flags are represented as “bit map”.
val of_int : int -> t
val has_multiple_segments : t -> bool
val segment_unmapped : t -> bool
val next_segment_unmapped : t -> bool
val first_segment : t -> bool
val last_segment : t -> bool
val secondary_alignment : t -> bool
include Core.Sexpable.S
type cigar_op = [ `D of int
| `Eq of int
| `H of int
| `I of int
| `M of int
| `N of int
| `P of int
| `S of int
| `X of int ]
CIGAR operations.
type optional_content_value = [ `array of char * optional_content_value array
| `char of char
| `float of float
| `int of int
| `string of string ]
Meta-value used to store “optional content”.
type optional_content = (string * char * optional_content_value) list 
Alignment optional content.
type alignment = {
   query_template_name : string;
   flags : Flags.t;
   reference_sequence : [ `name of string
| `none
| `reference_sequence of reference_sequence ]
   position : int option;
   mapping_quality : int option;
   cigar_operations : cigar_op array;
   next_reference_sequence : [ `name of string
| `none
| `qname
| `reference_sequence of reference_sequence ]
   next_position : int option;
   template_length : int option;
   sequence : [ `none | `reference | `string of string ];
   quality : Biocaml_phred_score.t array;
   optional_content : optional_content;
High-level representation of a parsed alignment.
type item = [ `alignment of alignment
| `comment of string
| `header of string * (string * string) list
| `header_line of
string * [ `coordinate | `queryname | `unknown | `unsorted ] *
(string * string) list
| `reference_sequence_dictionary of reference_sequence array ]
High-level representation of a parsed entity.

Error Types

module Error: 

The possible errors.
type optional_content_parsing = [ `wrong_optional of
(string * char * string) list *
[ `not_a_char of string
| `not_a_float of string
| `not_an_int of string
| `unknown_type of char
| `wrong_array of
[ `not_a_char of string
| `not_a_float of string
| `not_an_int of string
| `unknown_type of char
| `wrong_type of string ]
| `wrong_type of string ] ]
Errors which can happen while parsing optional content.
type string_to_raw = [ `incomplete_input of Biocaml_pos.t * string list * string option
| `invalid_header_tag of Biocaml_pos.t * string
| `invalid_tag_value_list of Biocaml_pos.t * string list
| `not_an_int of Biocaml_pos.t * string * string
| `wrong_alignment of Biocaml_pos.t * string
| `wrong_optional_field of Biocaml_pos.t * string ]
The possible errors one can get while parsing SAM files.
type raw_to_item = [ `comment_after_end_of_header of int * string
| `duplicate_in_reference_sequence_dictionary of
Biocaml_sam.reference_sequence array
| `header_after_end_of_header of int * (string * (string * string) list)
| `header_line_not_first of int
| `header_line_without_version of (string * string) list
| `header_line_wrong_sorting of string
| `missing_ref_sequence_length of (string * string) list
| `missing_ref_sequence_name of (string * string) list
| `wrong_cigar_text of string
| `wrong_flag of Biocaml_sam.raw_alignment
| `wrong_mapq of Biocaml_sam.raw_alignment
| `wrong_optional of
(string * char * string) list *
[ `not_a_char of string
| `not_a_float of string
| `not_an_int of string
| `unknown_type of char
| `wrong_array of
[ `not_a_char of string
| `not_a_float of string
| `not_an_int of string
| `unknown_type of char
| `wrong_type of string ]
| `wrong_type of string ]
| `wrong_phred_scores of Biocaml_sam.raw_alignment
| `wrong_pnext of Biocaml_sam.raw_alignment
| `wrong_pos of Biocaml_sam.raw_alignment
| `wrong_qname of Biocaml_sam.raw_alignment
| `wrong_ref_sequence_length of (string * string) list
| `wrong_tlen of Biocaml_sam.raw_alignment ]
The possible errors one can get while lifting SAM raw_items to higher-level representations. (Note: raw_to_item explicitly contains optional_content_parsing but OCamldoc pastes it inline)
type item_to_raw = [ `wrong_phred_scores of Biocaml_sam.alignment ] 
The error that may happen while downgrading the higher-level represtation of an alignment.
type parse = [ `comment_after_end_of_header of int * string
| `duplicate_in_reference_sequence_dictionary of
Biocaml_sam.reference_sequence array
| `header_after_end_of_header of int * (string * (string * string) list)
| `header_line_not_first of int
| `header_line_without_version of (string * string) list
| `header_line_wrong_sorting of string
| `incomplete_input of Biocaml_pos.t * string list * string option
| `invalid_header_tag of Biocaml_pos.t * string
| `invalid_tag_value_list of Biocaml_pos.t * string list
| `missing_ref_sequence_length of (string * string) list
| `missing_ref_sequence_name of (string * string) list
| `not_an_int of Biocaml_pos.t * string * string
| `wrong_alignment of Biocaml_pos.t * string
| `wrong_cigar_text of string
| `wrong_flag of Biocaml_sam.raw_alignment
| `wrong_mapq of Biocaml_sam.raw_alignment
| `wrong_optional of
(string * char * string) list *
[ `not_a_char of string
| `not_a_float of string
| `not_an_int of string
| `unknown_type of char
| `wrong_array of
[ `not_a_char of string
| `not_a_float of string
| `not_an_int of string
| `unknown_type of char
| `wrong_type of string ]
| `wrong_type of string ]
| `wrong_optional_field of Biocaml_pos.t * string
| `wrong_phred_scores of Biocaml_sam.raw_alignment
| `wrong_pnext of Biocaml_sam.raw_alignment
| `wrong_pos of Biocaml_sam.raw_alignment
| `wrong_qname of Biocaml_sam.raw_alignment
| `wrong_ref_sequence_length of (string * string) list
| `wrong_tlen of Biocaml_sam.raw_alignment ]
All possible parsing errors. It is defined as:
      type parse = [
      | string_to_raw
      | raw_to_item

type t = parse 
The union of all possible errors.

S-Expressions conversions for Errors

val string_to_raw_of_sexp : Sexplib.Sexp.t -> string_to_raw
val sexp_of_string_to_raw : string_to_raw -> Sexplib.Sexp.t
val raw_to_item_of_sexp : Sexplib.Sexp.t -> raw_to_item
val sexp_of_raw_to_item : raw_to_item -> Sexplib.Sexp.t
val item_to_raw_of_sexp : Sexplib.Sexp.t -> item_to_raw
val sexp_of_item_to_raw : item_to_raw -> Sexplib.Sexp.t
val parse_of_sexp : Sexplib.Sexp.t -> parse
val sexp_of_parse : parse -> Sexplib.Sexp.t
val t_of_sexp : Sexplib.Sexp.t -> parse
val sexp_of_t : parse -> Sexplib.Sexp.t
exception Error of Error.t
The only exception raised by *_exn functions in this module.

Stream functions

val in_channel_to_item_stream : ?buffer_size:int ->
?filename:string ->
Pervasives.in_channel ->
(item, [> Error.parse ]) Core.Result.t Stream.t
Parse an input-channel into a stream of high-level items.
val in_channel_to_raw_item_stream : ?buffer_size:int ->
?filename:string ->
Pervasives.in_channel ->
(raw_item, [> Error.parse ]) Core.Result.t Stream.t
Parse an input-channel into a stream of low-level (“raw”) items.
val in_channel_to_item_stream_exn : ?buffer_size:int ->
?filename:string -> Pervasives.in_channel -> item Stream.t
Like in_channel_to_item_stream but each call to may raise Error _
val in_channel_to_raw_item_stream_exn : ?buffer_size:int ->
?filename:string -> Pervasives.in_channel -> raw_item Stream.t
Like in_channel_to_raw_item_stream but each call to may raise Error _

Low-level partial parsing

Here we expose functions used both in Biocaml_sam.Transform and Biocaml_bam.Transform for parsing. It can be ignored by most users but can be useful.
val parse_cigar_text : string ->
(cigar_op array, [> `wrong_cigar_text of string ]) Core.Result.t
Parse CIGAR operations from a string.
val parse_optional_content : (string * char * string) list ->
[> Error.optional_content_parsing ])
Parse optional content from a “tokenized” string.
val parse_header_line : 'a ->
string ->
([> `comment of string | `header of string * (string * string) list ],
[> `invalid_header_tag of 'a * string
| `invalid_tag_value_list of 'a * string list ])
Parse a header line form a string. The first argument is used to pass the location to the error values (c.f. Biocaml_sam.Error.string_to_raw).
val expand_header_line : (string * string) list ->
([> `header_line of
string * [ `coordinate | `queryname | `unknown | `unsorted ] *
(string * string) list ],
[> `header_line_without_version of (string * string) list
| `header_line_wrong_sorting of string ])
Parse a header line into a more detailed type.

Low-level Transforms

module Transform: 

Low-level, threading-model agnostic transforms (c.f. Biocaml_transform).
val string_to_raw : ?filename:string ->
unit ->
(Biocaml_sam.raw_item, [> Biocaml_sam.Error.string_to_raw ]) Core.Result.t)
Create a parsing "stoppable" transform.
val raw_to_string : unit -> (Biocaml_sam.raw_item, string) Biocaml_transform.t
Create a printing "stoppable" transform.
val raw_to_item : unit ->
(Biocaml_sam.item, [> Biocaml_sam.Error.raw_to_item ]) Core.Result.t)
Create a transform that lifts raw_items to items
val item_to_raw : unit ->
(Biocaml_sam.raw_item, [> Biocaml_sam.Error.item_to_raw ]) Core.Result.t)
Create a transform that downgrades items to raw_items


val cigar_op_of_sexp : Sexplib.Sexp.t -> cigar_op
val sexp_of_cigar_op : cigar_op -> Sexplib.Sexp.t
val optional_content_value_of_sexp : Sexplib.Sexp.t -> optional_content_value
val sexp_of_optional_content_value : optional_content_value -> Sexplib.Sexp.t
val optional_content_of_sexp : Sexplib.Sexp.t -> optional_content
val sexp_of_optional_content : optional_content -> Sexplib.Sexp.t
val alignment_of_sexp : Sexplib.Sexp.t -> alignment
val sexp_of_alignment : alignment -> Sexplib.Sexp.t
val item_of_sexp : Sexplib.Sexp.t -> item
val sexp_of_item : item -> Sexplib.Sexp.t