A comprehensive Python toolkit for genomic data analysis and bioinformatics, created as part of the Coursera "Python for Genomic Data Science" course.
This repository provides everything you need to learn Python for genomics:
- Tutorials - Learn Python fundamentals with genomics examples
- Modules - Reusable tools for DNA/RNA analysis
- Examples - Complete genomics workflows
- Practice - Exercises to test your skills
- Quick Reference - Fast lookup cheat sheet
python-for-genomics/
βββ tutorials/ # π Python fundamentals (START HERE!)
β βββ 01_strings_and_dna.py
β βββ 02_lists_and_sequences.py
β βββ 03_dictionaries_and_codons.py
β βββ 04_conditionals_in_genomics.py
β βββ 05_loops_and_iteration.py
β βββ 06_file_io_genomics.py
βββ modules/ # Core reusable modules
β βββ dna_tools.py
β βββ sequence_analysis.py
β βββ file_parsers.py
βββ examples/ # Complete genomics workflows
β βββ 00_basic_operations.py
β βββ 01_my_first_analysis.py
β βββ 02_interactive_analyzer.py
β βββ 03_compare_sequences.py
β βββ 04_gc_content_analysis.py
β βββ 05_orf_finder.py
β βββ 06_fasta_file_operations.py
βββ practice/ # πͺ Test your skills
β βββ exercises.py
β βββ solutions.py
βββ data/ # Sample data files
βββ QUICK_REFERENCE.md # π Cheat sheet
βββ README.md
Start with the tutorials! They teach Python fundamentals using genomics examples.
cd tutorials
python3 01_strings_and_dna.pyWork through tutorials 01-06 in order. Each tutorial is a complete, runnable script with explanations.
Jump straight to the examples to see complete genomics workflows:
cd examples
python3 00_basic_operations.pyLearn Python basics with genomics context:
- Strings and DNA - String operations for sequences
- Lists - Working with multiple sequences
- Dictionaries - Genetic code and mappings
- Conditionals - Sequence validation
- Loops - Iterating through data
- File I/O - Reading and writing FASTA files
Each tutorial includes:
- Clear explanations
- Code examples
- Genomics applications
- Try-it-yourself exercises
See complete workflows in action:
π Beginner
- 00: Basic DNA operations
- 01: Your first analysis
- 02: Interactive analyzer
- 03: Compare sequences
π Intermediate
- 04: GC content analysis
- 05: ORF finding
π Advanced
- 06: Complete FASTA operations
Test your skills with exercises:
- 12 exercises covering all concepts
- Solutions provided
- Real genomics problems
Fast lookup for Python syntax and common patterns.
git clone https://github.com/amritasule/python-for-genomics.git
cd python-for-genomicsNo external dependencies! Uses only Python standard library.
| Function | Description |
|---|---|
validate_dna(seq) |
Check if valid DNA |
gc_content(seq) |
Calculate GC percentage |
complement(seq) |
Get DNA complement |
reverse_complement(seq) |
Get reverse complement |
transcribe(dna) |
Convert DNA to RNA |
translate(dna) |
Translate to protein |
count_nucleotides(seq) |
Count each base |
has_start_codon(seq) |
Check for ATG |
has_stop_codon(seq) |
Check for stop codons |
| Function | Description |
|---|---|
find_motif(seq, motif) |
Find pattern occurrences |
find_orfs(seq) |
Find open reading frames |
calculate_melting_temp(seq) |
Calculate Tm |
hamming_distance(seq1, seq2) |
Calculate differences |
gc_content_window(seq, size) |
Sliding window GC |
find_repeats(seq, min_len) |
Find repeated sequences |
| Function | Description |
|---|---|
read_fasta(filename) |
Read FASTA file |
write_fasta(seqs, filename) |
Write FASTA file |
read_fastq(filename) |
Read FASTQ file |
write_fastq(records, filename) |
Write FASTQ file |
import sys
sys.path.append('modules')
import dna_tools
sequence = "ATGCGCTAGGGTAA"
print(f"GC Content: {dna_tools.gc_content(sequence):.2f}%")
print(f"Protein: {dna_tools.translate(sequence)}")import file_parsers
sequences = file_parsers.read_fasta('data/sample_sequences.fasta')
for header, seq in sequences.items():
print(f"{header}: {len(seq)} bp")import sequence_analysis
orfs = sequence_analysis.find_orfs("ATGCGCGCGTAGGGTAA")
for start, end, orf in orfs:
print(f"ORF: {orf}")- β No external dependencies (pure Python)
- β Complete tutorials with genomics examples
- β Runnable code examples
- β Practice exercises with solutions
- β FASTA/FASTQ file support
- β Complete genetic code table
- β ORF finding in all reading frames
- β Quick reference guide
- β Portfolio-ready quality
- Variables and data types
- Strings, lists, dictionaries
- If/elif/else statements
- For and while loops
- Functions and modules
- File I/O operations
- DNA sequence manipulation
- GC content calculation
- Sequence complement
- Transcription and translation
- ORF finding
- FASTA file parsing
- Sequence comparison
- Pattern matching
- Education - Learn Python and bioinformatics
- Research - Quick sequence analysis
- Pipelines - Building blocks for workflows
- Prototyping - Test ideas before scaling
After completing the tutorials, you'll be able to:
β Analyze DNA sequences (GC content, composition, etc.)
β Read and write FASTA files
β Find open reading frames (ORFs)
β Translate DNA to protein
β Compare sequences
β Filter sequences by criteria
β Search for patterns and motifs
β Build analysis pipelines
Contributions welcome! Feel free to:
- Add new features
- Improve documentation
- Report bugs
- Suggest enhancements
- tutorials/README.md - Detailed tutorial guide
- practice/README.md - Exercise instructions
- QUICK_REFERENCE.md - Python syntax cheat sheet
- examples/ - Working code examples
Created as part of the Coursera course "Python for Genomic Data Science" offered by Johns Hopkins University.
MIT License - feel free to use this code for learning and research.
Ready to start? Head to tutorials/ and begin with 01_strings_and_dna.py!
Happy coding! π§¬