The following libraries are “kind of” new. Mostly, they have been part of some package but are sufficiently different that they can stand apart from bioinformatics in general.
An optimization scheme that has been used successfully for NLP, RNA secondary structure and other tasks (I guess ;-).
Depending on time constraints, a variant of the Haskell GLPK library, but for convex optimizers.
biocore contains a set of definitions useful for large set of bioinformatics solutions, and using it will aid consistency and compatibility. Thus, most other libraries should depend on this for their basic definitions, and biocore should be amended with further definitions that are general and unambigous and have limited external dependencies.
The old bio library contains a variety of functionality. The plan is to split this up into smaller and more focused libraries with a minimum of dependencies. For backwards compatibility, the next version (i.e. 0.6) of the bio library will then re-export these.
The following are currently done, and availabe via (darcs get) http://malde.org/~ketil/biohaskell/$LIBRARY_NAME. Eventually, they will be made available on Hackage as well.
biosff contains functionality for SFF files (Roche 454 and Ion Torrent sequences), and bundles the flower program.
biopsl contains the (very simple) functionality for parsing and unparsing PSL files (e.g. from BLAT alignments).
Three other libraries have been split off and made available via (darcs get ) http://patch-tag.com:/r/dfornika/$LIBRARY_NAME and also via hackage.
biophd Library for reading phd sequence files.
bioace Library for reading ace assembly files.
bioalign Data structures and helper functions for calculating alignments.
biofasta Library for reading fasta sequence files.
Further developments will involve factoring out more chunks of the library in a similar way. Then, all existing applications will be tested to work with version 0.6, and then new releases will be developed depending directly on the smaller libraries.
- import/export secondary structures based on some form of the Vienna dot-bracket notation (((…(((…)))..)))
- import/export extended secondary structures as used by RNAwolf
- FR3D contains already parsed PDB RNA structures
- this library extracts basepairs and sequence from FR3D data
- including complete directories full of entries
- verbose hits
- tabulated hits
- stockholm files
- covariance models
- currently being converted to iteratee
- reading of MAF files
- based on iteratee
- TrainingData to be used for training RNAwolf
- imports from FR3D and DotP
- exports trainingdata elements
- imports trainingdata
- import Turner 2004 energy parameter files
- rna primary and secondary structure
- tree-based representations
- some datasources reading and writing dot-bracket and similar notations (e.g. rnastrand data)
- named -xna instead to support both -dna and -rna. The internals are a bit rough, but since this is targeting high-performance stuff, it is ok
- vienna rnafold v2.0
- im- and exporting of turner and vienna tables
- asymptotically fast reimplementation of mc-fold (parisien, major, 2008)
- importing of mcfold-db
- extended rna secondary structure folding
- version 0.3 includes full stacking
- folding is reasonably fast due to the use of additional arrays (expect to fold 300–500 nt in seconds)
- 2-diagrams will be back soon, if no bugs show up
- complete 2-diagrams for multibranched loops will follow later due to the large constant overhead
- enumerator (moving out)
- iteratee (moving in)
- deepseq (for NFData instances)
- ghc >= 7 (later maybe even 7.2 as I’d like to use generics more)