Evo is a biological foundation model capable of long-context modeling and design. Evo uses the StripedHyena architecture to enable modeling of sequences at a single-nucleotide, byte-level resolution with near-linear scaling of compute and memory relative to context length. Evo has 7 billion parameters and is trained on a prokaryotic whole-genome dataset containing ~300 billion nucleotides.
Preprint, January 2024
Click here to access Evo on Github.
Click here to access the Evo integration with HuggingFace.
Click here to read our blog post to learn more about Evo.
Evo is able to learn across DNA, RNA and proteins, reaching competitive zero-shot performance on function prediction in prokaryotes with state-of-the-art protein language models without explicitly being shown protein coding regions.
Evo understands that small mutations to genes can have large effects on whole-organism function, which we use to perform zero-shot gene essentiality prediction.
Evo can generate sequences that include molecular complexes (Cas proteins bound to noncoding RNA), systems (mobile genetic elements), and coding-rich genome-length sequences.
Evo can design model long sequences without losing single-nucleotide resolution, enabled by fundamental changes to the machine learning model architecture based on the latest advances in deep signal processing.