Skip to content
View mgarsamo's full-sized avatar

Block or report mgarsamo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
mgarsamo/README.md

Hi there 👋, I am Melaku!

🔬 Melaku Garsamo

Protein Engineer • Machine Learning Scientist • Computational Biologist
Creator of EmbedDiff-ESM 🧬 and EmbedDiff-Dayhoff 🔄


👋 About Me

🔬 I'm a hybrid protein engineer and ML scientist with deep experience in both wet-lab experimentation and machine learning for protein design.
I bridge experimental biochemistry with generative AI, building next-gen tools to accelerate biologics discovery.

  • 🧠 Currently developing: EmbedDiff-ESM (ESM-2 backbone) and EmbedDiff-Dayhoff (Dayhoff ablation) — exploring how protein LMs affect generative design
  • 🧬 Passionate about generative AI in biotech & synthetic biology
  • 🧪 Experienced in sequence modeling, folding, and structure–function pipelines

🔧 Skills & Tools

  • Languages: Python, PyTorch, R, SQL, Bash
  • Tools: Git, Docker, VS Code, Conda, Jupyter, SnapGene, PyMOL, Prism, ELN, Tableau
  • ML/BioAI: ESM-2, Dayhoff Atlas, AlphaFold, Transformers, Diffusion Models, t-SNE, BLAST

🧪 Wet Lab Expertise

  • Enzyme characterization: Km, Vmax, kcat
  • Thermal stability: Prometheus Panta, residual activity assays
  • Protein visualization: SDS-PAGE, Western blot
  • Molecular biology: PCR, qPCR, SDM, Golden Gate, high-throughput cloning
  • PPIs: FRET assays
  • Automation: Tecan, Echo, LabChip, ZAG
  • Purification: SEC-MALS, IEX, affinity (FPLC/ÄKTA)
  • Biophysics: DLS, BLI (Octet® RH96), FT-IR, TGA
  • Quantification: MS, analytical SEC (HPLC)
  • Microscopy: confocal, SEM, EDS
  • Crystallization & genotype screening, Agrobacterium methods

🚀 Featured Projects

Complementary pipelines for de novo protein design with diffusion models, probing how ESM-2 vs Microsoft Dayhoff-3B shape generative outcomes.

👉 EmbedDiff-ESM report
👉 EmbedDiff-Dayhoff report


📑 Comparative Benchmark: EmbedDiff-ESM2 vs EmbedDiff-Dayhoff

I developed and compared two parallel latent diffusion pipelines for de novo protein design, each conditioned on a different pretrained embedding backbone: EmbedDiff-ESM2, which leverages Meta's ESM-2 protein language model trained at evolutionary scale, and EmbedDiff-Dayhoff, which uses Microsoft's Dayhoff-3B model trained on clustered UniRef with substitution-aware geometry. Both pipelines share the same workflow—embedding natural protein sequences into latent space, training a denoising diffusion model to learn biologically meaningful manifolds, and decoding embeddings into amino acid sequences with a Transformer-based decoder—followed by rigorous multi-metric evaluation. Unlike traditional structure-based or template-driven design approaches, EmbedDiff explores protein sequence space without structural supervision, enabling us to test how different embedding backbones influence novelty, plausibility, and functional diversity. To benchmark generated sequences, I combined perplexity scoring with ProtT5, t-SNE domain clustering, logistic regression probes, entropy vs identity trade-offs, cosine similarity distributions, domain overlays, and AlphaFold2 structural validation, providing a holistic view of backbone performance. Our results show that both models produce very high-perplexity sequences, confirming that diffusion pushes into novel sequence space beyond the immediate training manifold, while global plausibility remains comparable between ESM-2 and Dayhoff. At the local level, however, differences emerge: ESM-2 tends to generate more conservative, higher-identity outputs that preserve natural priors, whereas Dayhoff explores higher-entropy, more divergent solutions. Together, these findings demonstrate that embedding choice directly steers generative exploration of protein space, with ESM-2 offering stability and conservation, and Dayhoff driving evolutionary exploration—two complementary strategies for advancing generative protein engineering.


🧭 Domain-Colored t-SNE (Overview)

Takeaway: ESM-2 embeddings produce slightly tighter domain separation, while Dayhoff preserves broader evolutionary diversity in latent space.

ESM-2 Dayhoff


✅ Logistic Regression Backbone Check (Classification Sanity)

Takeaway: Both backbones retain strong class separability, validating that embeddings encode sufficient biological signal for downstream classifiers.

ESM-2 Dayhoff



🔬 Latent Diffusion Training — Cross-Entropy Loss

Takeaway: Training dynamics are comparable across backbones, with both models converging steadily under diffusion noise scheduling.

ESM-2 Dayhoff

🧮 Entropy vs Sequence Identity

Takeaway: Both backbones show a comparable global entropy–identity distribution.

ESM-2 Dayhoff

📊 Identity & Similarity Distributions

Takeaway: Identity and cosine similarity histograms reveal overlapping regimes for both ESM-2 and Dayhoff.

ESM-2 — Identity Dayhoff — Identity
ESM-2 — All cosine histograms Dayhoff — All cosine histograms

🧩 Domain Overlay — Real vs Generated (t-SNE)

Takeaway: Generated sequences cluster near real domains but backbone choice shifts how tightly generated points adhere to natural evolutionary space.

ESM-2 Dayhoff

🧮 Perplexity (ESM-2 vs Dayhoff Results)

Takeaway: Despite absolute perplexity being high for both, distributions overlap strongly—suggesting backbone choice does not dramatically alter global plausibility.

ESM-2 vs Dayhoff
Identity Comparison Entropy Comparison Perplexity Comparison

🧬 AlphaFold2 Structural Validation (ESM-2 vs Dayhoff)

Takeaway: Dayhoff shows slight structural advantages with higher mean pLDDT (62.41 vs 60.50) and pTM scores (0.596 vs 0.559), though differences are not statistically significant—validating both models produce comparable structural quality.

Structural Quality Comparison
AlphaFold2 Structural Validation

Key Structural Metrics:

  • High-confidence sequences (pLDDT > 70): Dayhoff 25% vs ESM-2 12.5%
  • Good structural quality (pTM > 0.5): Both models ~75-88%
  • Statistical significance: Not significant (p > 0.05), confirming comparable quality
  • Winner: Dayhoff by small margins, but both models validate as viable alternatives

These side-by-side comparisons reveal how the embedding backbone steers generative design — domain separation, entropy/identity trade-offs, similarity structure, and structural quality all shift with the latent geometry learned by ESM-2 vs Dayhoff.


🌍 Let's Connect


Popular repositories Loading

  1. domain-boundary-parser-- domain-boundary-parser-- Public

    Detects and visualizes confident structural domains from AlphaFold2 models using pLDDT scores.

    Python 2

  2. Unique-DNA-Barcodes-Generator Unique-DNA-Barcodes-Generator Public

    Generate diverse and unique DNA barcodes for sample identification and genetic tracking with this open-source Python tool.

    Jupyter Notebook 1

  3. geez-biotech geez-biotech Public

    HTML 1

  4. EmbedDiff EmbedDiff Public

    🧬 EmbedDiff: A modular machine learning pipeline combining ESM2 embeddings, latent diffusion, and transformer-based decoding for de novo protein design

    Jupyter Notebook 1

  5. mgarsamo mgarsamo Public

    Hybrid protein engineer & ML scientist building generative AI tools for protein design.

    HTML 1

  6. ereft ereft Public

    Python 1