Computational biology · Computational genomics

Nina Deng

Computational Genomics & Biological AI

I work on transposable element annotation and ML-ready genomics datasets, and I am developing a multimodal TE annotation model concept. I am expanding toward physics-informed biological foundation models and virtual cell systems.

Early-career researcher with a background in math-biology, computational genomics, data modeling, and visual communication.

01 · Research focus

A concrete foundation,
with a clear direction.

My current work is grounded in computational genomics and TE annotation. From this base, I am exploring how biological foundation models can become more mechanistically grounded, physically constrained, and useful for reasoning across biological scales.

Current focus

Computational genomics and TE data

  • Transposable element annotation
  • TE taxonomy construction
  • ML-ready biological dataset curation
  • Computational genomics
  • Biological sequence modeling
  • Multimodal TE annotation model concept

Expanding toward

Mechanistically grounded biological AI

  • Genomics foundation models
  • Biological foundation models
  • Physics-informed biological AI
  • Virtual cell systems
  • Dynamical-systems thinking
  • Mechanistically grounded representation learning

02 · Questions I’m exploring

Questions before claims.

A few questions currently shaping my research direction:

  1. 01

    How can genomic sequence modeling connect to cell-level biological representation learning?

  2. 02

    How can biological foundation models become more mechanistically grounded rather than purely predictive?

  3. 03

    What dynamical, physical, or energy-based constraints are meaningful for biological AI?

  4. 04

    How can messy biological annotation systems become model-ready, interpretable structures?

  5. 05

    How can sequence, domain, structure, and taxonomy evidence be integrated into annotation models?

03 · Current research projects

Research assets
and active directions.

Three connected areas: building sound data foundations, developing an interpretable annotation framework, and defining the next research bridge.

Placeholder diagram showing TE labels converging into a harmonized taxonomy and dataset

Harmonizing TE labels and metadata into a structured dataset for downstream modeling.

Project 01Active foundation work

TE Dataset Curation & Taxonomy Harmonization

Curating and organizing transposable element consensus-sequence data from RepBase and Dfam-derived sources into a cleaner, ML-ready format. The work includes sequence deduplication, label-conflict review, ontology-aware taxonomy mapping, source metadata organization, and hierarchical label harmonization.

Why it matters: This work turns heterogeneous biological database records into a more structured, traceable resource for downstream modeling.

Computational genomicsTE taxonomyData curationML-ready datasets
Placeholder conceptual architecture combining sequence, protein-domain, structural, and taxonomy evidence

Conceptual architecture for multimodal TE annotation combining sequence, domain, structural, and taxonomy evidence.

Project 02Model concept

Multimodal TE Annotation Model Concept

Developing a research framework for library-independent transposable element annotation that would integrate genomic sequence representations, protein-domain evidence, structural signals, and hierarchical TE taxonomy. The aim is to explore a path beyond purely library-based annotation toward interpretable, multimodal biological sequence modeling.

Current scope: This is a conceptual research framework and developing set of modeling ideas—not a completed production system.

Genomics AISequence modelingMultimodal modelingTE annotation
Placeholder research roadmap connecting computational genomics with dynamics, biological foundation models, and virtual cell systems

A research direction connecting computational genomics to physics-informed biological foundation models.

Project 03Research direction

Toward Mechanistically Grounded Biological AI

Exploring how biological foundation models might become more mechanistically grounded through dynamical-systems thinking, energy-dissipation principles, and cross-scale biological representation learning. This direction connects computational genomics with future-facing questions in physics-informed biological AI and virtual cell systems.

Current scope: A future-facing research direction being developed from the concrete base of TE data and genomic sequence modeling.

Foundation modelsPhysics-informed AIVirtual cellDynamical systems

04 · Cross-domain evidence

One underlying skill:
structuring complexity.

Earlier work that shaped how I model data, communicate complex ideas, and organize projects across research, design, and public-facing contexts.

Data Modeling &
Analytical Thinking

Turning messy data and abstract problems into structured analytical workflows.

  • Analyzed rare-earth-element data from Appalachian Mountain samples using Python during undergraduate research at Miami University.
  • Presented geochemical and mineralogical analysis at a student research symposium.
  • Led algorithm development for shredded-paper reconstruction using textual features, hierarchical clustering, and a Traveling Salesman Problem formulation.
  • Worked with BI and reporting workflows using Python, Excel, SQL, and PowerPoint.

Research relevance Biological datasets are similarly heterogeneous and sensitive to modeling assumptions.

Visual Communication &
Data Storytelling

Making complex and public-facing information visible, readable, and persuasive.

  • Created data visualizations and presentation materials using MATLAB, Python, and PowerPoint.
  • Designed public-facing articles and posters for large-scale cycling events.
  • Used Photoshop, Procreate, and visual design workflows to communicate beyond plain text.
  • Translated data and concepts into presentation-ready visual formats.

Research relevance Supports research presentations, proposal writing, visual explanation, and outreach.

Leadership &
Outreach

Coordinating small analytical projects and contributing to public-interest communication.

  • Served as team leader for a business-intelligence and win-rate analysis project using Python and PowerPoint.
  • Led algorithm development within a mathematical modeling project.
  • Contributed visual design and volunteer support for animal-protection and adoption events.
  • Built experience coordinating tasks, presenting results, and supporting public-facing work.

Research relevance Supports collaborative research, lab communication, outreach, and project execution.

06 · Education

The Ohio State University

B.S. Mathematics, Biology Track

Research and coursework in computational biology, molecular genetics, statistics, mathematics, and genomics.

Earlier training included mathematics, modeling, and data analysis.

07 · Contact

Let’s talk about biological data, models, and the questions between them.

Open to research conversations, research-assistant opportunities, and collaborations in computational genomics, AI for biology, biological foundation models, and physics-informed biological modeling.

A PDF version of the research memo will be added before public launch.