Analytical genomics

We work on problems connected with the functioning and evolution of biological systems. We use mathematical tools –coming from statistics and combinatorics, algorithmic tools and molecular physics tools to study basic principles of cellular functioning starting from genomic data. Our projects aim at understanding the basic principles of evolution and co-evolution of molecular structures in the cell. 

  • Domain annotation and metagenomics - We develop a new approach to domain annotation and remote homology detection. We are extending it to metagenomic annotation.
  • Transcriptomics and sequence analysis - We combine statistical modeling with combinatorial optimization to address each analysis step of a transcriptome sequencing experiment.
  • Protein evolution and interactions – We are creating a large scale mapping of protein-protein interactions with information at the molecular level. We use sequence- and structure-based bioinformatics methods to predict conformation of interacting proteins, interaction sites, protein partners.
  • Protein conformational dynamics - We study protein conformational dynamics to decipher molecular mechanisms underlying protein functions and interactions, and to characterize alternative functional conformations to be targeted by drugs.

More...

Our projects concern the development of bioinformatics and molecular modeling tools and their application to answer biological questions:

  • Domain annotation. We use machine learning and combinatorial optimization to detect protein remote homology, and are adapting the methodology for metagenome annotation.
  • Periodicity in genomes and chromosomes 3D structure in unicellular organisms. We apply spectral analysis to detect periodicities in genomes from high throughput data, and their connections with the three-dimensional structure of chromosomes.
  • High throughput sequencing methods. We are involved in two collaborative methodological efforts; correction of sequencing errors and gapped alignment of sequences.
  • microRNAs detection and functional analysis. We developed algorithms to predict new miRNAs and have analyzed their structural clusters along chromosomes.
  • Transcriptome analysis. We develop statistical methods for the analysis of transcriptome sequencing data : annotation of alternative splicing events and reconstruction of the transcriptional landscape.
  • Protein fold space reshaping by alternative splicing events. We seek to infer quantitative models to describe the impact of ASEs on protein conformational stability.
  • Protein-protein interactions prediction. We are creating a large scale mapping of PPIs, aiming at identifying partners in the cell and predicting their conformations. We use tools we have developed for binding site prediction from evolutionary and structural information, and for co-evolution signals detection.
  • Networks of functional residues. We combine co-evolution signals and allosteric communication detection to reconstruct a large-scale map of disease-associated mutation sites in proteins.

Highlights

Our most striking results in the past 5 years are

  • We demonstrated that genes with a highly biased codon composition in E.coli K12 are periodically distributed along the chromosomal arcs, suggesting an encoded 3D organization helping functional activities.
  • We developed highly specific predictive tool MIReNA for miRNA prediction from sequences or deep sequencing data. It has been independently evaluated as one of the best choice for miRNA search in human, mammals and plants.
  • We proposed a novel strategy for genome annotation that enabled construction of highly probable domain architectures and re-annotation of P. falciparum genome, known to be hard to annotate.
  • We developed methods to reconstruct networks of co-evolved residues for very conserved protein families and for families with few sequences. We also developed a highly specific method for the prediction of protein interaction sites.
  • We developed a method for the unsupervised clustering of high-dimensional biological datasets. The method is designed to highly perform on datasets made of a small number of datapoints defined on high dimensional spaces.
  • We developed general methods for reconstructing ancestral genomes and the history of the rearrangements. We applied it to yeast and vertebrates.

Future directions

  • Large-scale mapping of PPIs with information at the molecular level. This work requires, on the one hand, the understanding of the principles of residues co-evolution in proteins and, on the other hand, the unraveling of the relations between sequence evolution and structural dynamics.
  • Large-scale map of mutational sites and the identification of combination of hotspots, which are specific to different pathologies in human proteins. Recent work from our team has revealed that mutational hot-spots residues in p53, a protein known to play a crucial role in cancer development, display co-evolution patterns and that these patterns can be predicted by co-evolution analysis. The identification of recombination of hotspots in different pathologies would be of major importance in genetics.
  • Discovery of regulatory non-coding RNAs (ncRNAs) in diatoms. We wish to identify diatom small ncRNAs and their targets, and characterize the diatoms RNAi silencing machinery.
  • Development of novel methods for the domain annotation of metagenomic sequences. These approaches are likely to open new perspectives for the identification of genes and their products in genomes that have been separated by very large evolutionary distances.

Collaborations

  • University of Milano Bicocca (Italy), University of Udine (Italy), Frei Universität Berlin (Germany), Max-Planck Institut for Molecular Genetics (Berlin-Germany), Ecole Normale Supérieure de Cachan, Institut Pasteur, INSA Lyon, Max-Planck Institut for Informatics (Saarbrücken, Germany), Computational Biology Research Center (Tokyo, Japan).
  • Involvement of the team in large scale structures:

- Labex CALSIMLAB: Labex for Scientific Modeling and Simulation in Research, 2012-2020, Coordinator: Pascal Frey, Institut du Calcul et de la Simulation, UPMC.

- Réseaux de Coordination Scientifique internationale “Physics of living systems” - (GDRi PoLS), 2012-2016. Coordinator: Catherine Royer.

- FANTOM5 (Functional Annotation of the Mammalian Genome) as collaborator with RIKEN, Japan.

 - Groupement De Recherche (GDR) Bioinformatique Moléculaire.