High throughput data analysis for Genomics

Our team focuses in genomic data analysis from high throughput experiments in the functional, evolutionary and environmental fields.

We are currently working on transcriptome assembly using RNA-Seq data in order to create reliable genomic datasets for non-model eukaryotic organisms. Our goals are to develop methodologies and bioinformatics tools to help biologist to perform data analyses even when reference genomes they are working one are not available.

More...

Our team “High throughput data analysis for genomic” has been created in September 2011. The team was first member of the Department of Platforms and Technology Development of the Institute of Biology Paris Seine (IBPS) as an associated research group; and then moved to the Evolution unit in August 2015.

New generations of high throughput sequencing machines are spreading over the scientific community an exponentially increasing amount of data. In this context, the current major bottleneck is coming from data analysis. The bioinformatic efforts to fulfil this expectation are strong and visible through the large number of software available. But data analysis requires large computer infrastructures associated with expensive human and maintenance costs.

Highlights

We developed several tools in collaboration with the IBENS genomic platform, in order to automate primary data treatment and to go with the user as far as possible in more elaborate data analysis. For example, the data distributed analysis we made on cloud computing infrastructures (Eoulsan - Jourdren et al. 2012) is one of the first in France for genomic applications. We thus take advantage of this close collaboration to build the new functionalities we need into the Eoulsan workflow.

Future directions

Currently, we focus our research on RNA-Seq de novo assembly to build functional transcriptomes in various experimental conditions. We notably collaborate with the UPMC Roscoff marine station to study transcriptomes of non-model dinoflagellate lineages. We compare transcripts from organisms in free-living stage to transcripts from of organisms symbiosis with other non-model rhizarian lineages. The combination of RNA-Seq de novo assembly methods, with transcript annotation techniques and graph theory approaches help us better understand relations between host and symbionts. We implemented our tools in a python pipeline, which has been used to study transcriptomes from other non-model eukaryotic organisms (i.e. Ostreococcus lineages) in collaboration with a research team from the Jacques Monod Institute.

Our tools and methodologies are especially developed to help the study of non-model eukaryotes, which is currently limited, despite the potentially considerable evolutionary and ecological interest of these organisms.

Collaborations

  • Genomic platform, École normale supérieure Institute of Biology (IBENS), Paris, France
  • Computational Biology Unit, IBPS, UPMC, Paris, France
  • ABIMS platform, UPMC marine station, Roscoff, France
  • Plankton Group UMR 7144 - CNRS et UPMC Station Biologique de Roscoff - SBR, France
  • Mitochondria, Metals and Oxidative Stress, Institut Jacques Monod (IJM), CNRS UMR 7592, Université Paris Diderot/CNRS, Paris, France
  • Laboratoire d'Océanographie de Villefranche-sur-Mer (LOV), Villefranche, UPMC marine station, Villefranche, France
  • Ecology and Evolutionary Biology Section, Institut de Biologie de l'École normale supérieure (IBENS) IBENS, Paris, France