Data structures to represent sets of (possibly annotated) genomic regions

This module is useful to deal with sets of genomic regions. It provides set operations like union, intersection, difference or membership tests. Specific data types are also provided when the regions are annotated with some value.

Genomic regions are represented as a pair formed by a range and an
abstract representation of a sequence/chromosome identifier. The
data structures implemented here are parameterized over this
abstract type. To obtain an implementation for the most common case
where chromosomes are identified with a string, simply apply the
functor `Make`

on the `String`

module.

The functor `Make`

provides four datatypes, which corresponds to
variants where:

- the regions in the set can overlap or not

- the regions are annotated with some values

`module Biocaml_genomeMap: `

`sig`

module Make:

`sig`

type`range =`

`Biocaml_range.t`

type`location =`

`Biocaml_genomeMap.Chromosome.t * range`

A collection of non-overlapping regions (e.g. a set of CpG islands)

`module Selection: `

`sig`

`type `

t

`val intersects : ``t -> Biocaml_genomeMap.Make.location -> bool`

`intersects loc sel`

returns `true`

if `loc`

has a non-empty
intersection with `sel`

, and `false`

otherwise.`val overlap : ``t -> Biocaml_genomeMap.Make.location -> int`

`val to_stream : ``t ->`

Biocaml_genomeMap.Make.location Stream.t

`val of_stream : ``Biocaml_genomeMap.Make.location Stream.t ->`

t

`of_stream e`

computes a selection (i.e. a set of non
overlapping locations) as the union of the locations contained
in `e`

`end`

module type Signal =

`sig`

`type ``'a`

t

`val eval : ``'a t ->`

default:'a -> Biocaml_genomeMap.Chromosome.t -> int -> 'a

function evaluation at some point in the genome

`val fold : ``'a t ->`

init:'c -> f:('c -> Biocaml_genomeMap.Make.location -> 'b -> 'c) -> 'c

folds on constant intervals of the function, in increasing order

`val to_stream : ``'a t ->`

(Biocaml_genomeMap.Make.location * 'a) Stream.t

stream over all constant intervals of the function, in
increasing order

`val of_stream : ``('a -> 'a -> 'a) ->`

(Biocaml_genomeMap.Make.location * 'a) Stream.t ->

'a t

`of_stream f ls`

builds a signal from a collection of
annotated locations. `f`

is used when two locations intersect, to
compute the annotation on their intersection. *Beware*, `f`

should be associative and commutative since when many locations
in `ls`

intersect, there is no guarantee on the order followed to
aggregate them and their annotation.`end`

Partial function over the genome (e.g.

A set of locations (e.g. a set of gene loci)

`module LSet: `

`sig`

`type `

t

`val to_stream : ``t -> Biocaml_genomeMap.Make.location Stream.t`

`val of_stream : ``Biocaml_genomeMap.Make.location Stream.t -> t`

`val intersects : ``t -> Biocaml_genomeMap.Make.location -> bool`

`intersects lset loc`

returns `true`

if `loc`

has a non-empty
intersection with one of the locations in `lset`

, and returns
`false`

otherwise`val closest : ``t ->`

Biocaml_genomeMap.Make.location ->

(Biocaml_genomeMap.Make.location * int) option

`closest lset loc`

returns the location in `lset`

that is the
closest to `loc`

, along with the actual (minimal)
distance. Returns `None`

if there is no location in `lset`

that comes from the same chromosome than `loc`

.`val intersecting_elems : ``t ->`

Biocaml_genomeMap.Make.location -> Biocaml_genomeMap.Make.location Stream.t

`intersecting_elems lset loc`

returns a stream of all
locations in `lset`

that intersect `loc`

.`end`

A set of locations with an attached value on each of them

`module LMap: `

`sig`

`type ``'a`

t

`val to_stream : ``'a t ->`

(Biocaml_genomeMap.Make.location * 'a) Stream.t

`val of_stream : ``(Biocaml_genomeMap.Make.location * 'a) Stream.t ->`

'a t

`val intersects : ``'a t -> Biocaml_genomeMap.Make.location -> bool`

`intersects lmap loc`

returns `true`

if `loc`

has a non-empty
intersection with one of the locations in `lmap`

, and returns
`false`

otherwise`val closest : ``'a t ->`

Biocaml_genomeMap.Make.location ->

(Biocaml_genomeMap.Make.location * 'a * int) option

`closest lmap loc`

returns the location in `lmap`

that is the
closest to `loc`

, along with its annotation and the actual (minimal)
distance. Returns `None`

if there is no location in `lmap`

that comes from the same chromosome than `loc`

.`val intersecting_elems : ``'a t ->`

Biocaml_genomeMap.Make.location ->

(Biocaml_genomeMap.Make.location * 'a) Stream.t

`intersecting_elems lmap loc`

returns a stream of elements
in `lmap`

whose location intersects with `loc`

.`end`

`end`

`end`