A learning project focused on protein modeling and BioAI, building towards expertise in computational biology and AI-driven protein analysis.
This repository contains my journey learning protein modeling and BioAI, with the goal of becoming a strong candidate for companies like EvolutionaryScale. The project focuses on:
- FASTA sequence analysis - Reading and processing protein sequences
- Protein mutations - Generating and analyzing point mutations
- ESM embeddings - Using Meta's Evolutionary Scale Modeling for protein representations
- RESTful API - Building a FastAPI service for protein analysis
- Python 3.12+
- Biopython - Biological sequence manipulation
- Jupyter Notebooks - Interactive experimentation
- FastAPI - Modern web framework for APIs
- ESM Models - Pre-trained protein language models (Meta AI)
bioai-protein-modeling/
├── src/
│ ├── data/ # FASTA file utilities
│ ├── mutations/ # Protein mutation tools
│ ├── embeddings/ # ESM embedding generation
│ └── api/ # FastAPI REST service
├── notebooks/
│ ├── week1/ # Week 1 experiments
│ └── week2/ # Week 2 experiments
└── reports/ # Weekly progress reports
- Initial project structure
- Basic FASTA sequence analysis
- Mutation system implementation
- ESM model embeddings integration
- FastAPI service development
- Unit tests and documentation
- Deployment and CI/CD
- Master protein sequence analysis and manipulation
- Understand transformer-based protein language models
- Build production-ready BioAI applications
- Contribute to open-source computational biology tools
This is a personal learning project, but suggestions and feedback are welcome!
This project is licensed under the MIT License - see the LICENSE file for details.
- GitHub Repository
- EvolutionaryScale - Target company inspiration
Built with ❤️ for BioAI