This is the code repository for the CELLM paper, CELLM is an AI-powered tool that bridges synthetic biology and natural language processing. It enables the design and analysis of genetic circuits through simple text descriptions by integrating large language models (LLMs), the CELLO synthetic biology framework, and a modular backend.
Synthetic biology is an interdisciplinary field combining biology, engineering, and computation to create or modify biological systems for specific purposes. This project aims to automate the creation of genetic circuits by leveraging Verilog code generation, LLMs, and retrieval-augmented UCF selection.
In this first example, a 3-input AND gate was described to \emph{CELLM}. The specific description prompt is: "Hi, I would like you to design a genetic circuit with three inputs: LacI, AraC, and TetR. The output should be YFP, which is activated only when all three inputs (LacI, AraC, and TetR) are present (i.e., ‘on’).".
- The full reasoning process: reasoning process
- The final response of the model: model response
- Generated Verilog:
module top(
input wire AraC, // Alphabetical ordering
input wire LacI,
input wire TetR,
output wire YFP // Single output
);
assign YFP = AraC & LacI & TetR; // AND operation
endmoduleIn this second example, the Repressilator logic was described to CELLM. The prompt given to the system was: "Hi CELLM. Please produce a single circuit in which a LacI protein (regulated by an Ara promoter) represses a Lac promoter. This Lac promoter regulates a TetR protein that in turn represses a Tet promoter. Finally, this Tet promoter regulates the AraC protein that represses the first Ara promoter.".
- The full reasoning process: reasoning process
- The final response of the model: model response
- Generated Verilog:
module top(
output wire LacI, // LacI protein level
output wire TetR, // TetR protein level
output wire AraC // AraC protein level
);
// Internal promoter activity wires
wire P_ara_active, P_lac_active, P_tet_active;
// Core regulatory logic
assign P_ara_active = ~AraC; // Ara promoter repressed by AraC
assign LacI = P_ara_active; // LacI produced by Ara promoter
assign P_lac_active = ~LacI; // Lac promoter repressed by LacI
assign TetR = P_lac_active; // TetR produced by Lac promoter
assign P_tet_active = ~TetR; // Tet promoter repressed by TetR
assign AraC = P_tet_active; // AraC produced by Tet promoter
endmoduleFor this example the logic table was provided to CELLM with the following prompt: "I want to create a genetic circuit based on this truth table: \n Inputs | Outputs | 0 0 0
- The full reasoning process: reasoning process
- The final response of the model: model response
- Generated Verilog:
module top(
input wire A, // Input 1
input wire B, // Input 2
input wire C, // Input 3
output wire Y // Output
);
// Truth table implementation using full case statement
always @(*) begin
case({A, B, C})
3'b000: Y = 1'b0;
3'b001: Y = 1'b0;
3'b010: Y = 1'b0;
3'b011: Y = 1'b1;
3'b100: Y = 1'b1;
3'b101: Y = 1'b1;
3'b110: Y = 1'b0;
3'b111: Y = 1'b1;
default: Y = 1'b0; // Full coverage guarantee
endcase
end
endmoduleFor this example, we utilized the multiplexer Verilog module provided by Cello in their v2.0 paper. The system was prompted with the following request: "Help me create the circuit for this Verilog module:"
module multiplexer_alt(output out, input A, B, C);
always @(C, B, A)
begin
case({C, B, A})
3'b000: {out} = 1'b0;
3'b001: {out} = 1'b0;
3'b010: {out} = 1'b0;
3'b011: {out} = 1'b1;
3'b100: {out} = 1'b1;
3'b101: {out} = 1'b1;
3'b110: {out} = 1'b0;
3'b111: {out} = 1'b1;
endcase
end
endmodule- The full reasoning process: reasoning process
- The final response of the model: model response
- Generated Verilog:
module top(
input wire A, // Input A
input wire B, // Input B
input wire C, // Input C (selector)
output wire out // Output
);
// Simplified logic implementation
assign out = (C & ~B) | (B & A);
endmoduleWe have different UCF uploaded to Supabase in this repository. these ucf were not uploaded in their entirety as they are very extensive, instead, they were summarized so that the llm can identify important characteristics when selecting a ucf.
Below you'll find direct links to the TXT files to view their content:
Files are stored in Supabase so, is not public the access to the database
- If you want to use exactly the same model i used locally:
- Minimum: 64Gb RAM, M1 ultra / Intel Core i9-12900K, 100Gb disk
- Recommended: 128GB ram, M1 ultra / Intel Core i9-12900K, 100Gb disk
- To install Ollama, go to the Ollama web page and follow the installation instructions in the documentation. (NOTE: if you have a NVIDIA gpu you have to configure the NVIDIA cuda drivers).
- To execute Ollama run:
ollama serve
- Create the custom verilog creator model based on the "custom-llama" file with the following commands:
cd llm/ollama/
ollama pull deepseek-r1:32b
ollama create verilog-r1-32b -f custom-llama
ollama pull phi4
ollama pull mxbai-embed-large:latest
NOTE: IT USES THE DEEPSEEK-R1:32B MODEL SO CHECK THE SYSTEM REQUIREMENT POINT.
- You need Docker installed: Just run the command:
docker run -p 8000:8000 -d lexodat2111/cello-apiThis will run the cello image on the port 8000 in the background. If you want to see the logs of the container (in case that something failed) just run:
docker logs < container id >NOTE: this will execute the cello api in 0.0.0.0:8000, to see more further information about the endpoints go to: 0.0.0.0:8000/docs.
- nodejs and pnpm installed
- To install all the dependencies and run the front run the following commands:
cd frontend
npm install
npm run devNOTE: The frontend is the most updated way to use the system
NOTE: you have to create a supabase account and locate the SUPABASE_URL and SUPABASE_KEY to create the vectorial database and also have to create a .env file:
SUPABASE_URL="SUPABASE URL"
SUPABASE_KEY="SUPABASE KEY"
-
First, you must obtain all the supabase keys.
-
Then, you have to move your txt file into the root folder.
-
you have to have running the ollama image.
-
Finally, run the next command:
python upload.py
- The ucf txt file have to be like this:
# Bth1C1G1T1 inputs
Input Sensors:
- Bth1C1G1T1 input: BA_sensor
- Bth1C1G1T1 input: IPTG_sensor
- Bth1C1G1T1 input: aTc_sensor
# Bth1C1G1T1 outputs
Output Sensors:
- Bth1C1G1T1 output: nanoluc_reporter
- Bth1C1G1T1 output: nanoluc_reporter_2
# Bth1C1G1T1 organism
Organism:
- Bth1C1G1T1 organism: Bacteroides thetaiotaomicron VPI-5482
# Bth1C1G1T1 genome
Genome:
- Bth1C1G1T1 genome: wildtype with dCas9 integrated
# Bth1C1G1T1 media
Media:
- Bth1C1G1T1 media: TYG (10 g/L Tryptone Peptone, 5 g/L Yeast Extract, 11 mM Glucose, 100 mM KPO4 (pH7.2), 72\u00b5M CaCl2, 0.4 \u00b5g/ml FeSO4 and 1\u00b5g/mL Resazurin, 1.2 \u00b5g/ml hematin, 0.5g/mL of L-cysteine, and 1 \u00b5g/ml of Vitamin K (menadione)
# Bth1C1G1T1 temperature
Temperature:
- Bth1C1G1T1 temperature: 37 degrees Celsius
# Bth1C1G1T1 growth
Growth:
- Bth1C1G1T1 growth: Inoculation: Inoculate individual colonies into TYG media without antibiotics and grow 18 hours overnight in the anaerobic chamber. Dilution and Induction: Next day, dilute 100-fold into pre-reduced TYG with inducers (no antibiotics), grow for 6 hours in the anaerobic chamber. Measurement: Plate Reader, data processing for RPUL normalization
# Bth1C1G1T1 posible use
Posible Use:
- Bth1C1G1T1 posible use: It can be used in genetic circuits as a logical switch where dCas9 blocks a promoter until it receives a signal (e.g., chemical induction), enabling combinational control in biological systems.NOTE: if it's your first time running the app upload all the .txt file inside the App/cello/library/constraints folder, this txt files are all the ucf information for the ucf recognition system, if you want to upload a custom ucf follow the structure of the txt file showed above.
NOTE: If your machine has graphical interface this is optional, all the files will be saved inside the Downloads folder. To use this package, you'll need to obtain an application-specific password for the Gmail account you want to send emails from. This password is required for authentication when sending emails.
To obtain an application-specific password for Gmail:
- Go to your sender gmail Account settings: https://myaccount.google.com/
- Click on "Security" in the left sidebar.
- You have to activate the two way factor
- Go to the two way factor menu and search the application passwords
- Then you have to create an application and copy the code, that is your password
There are 2 options:
- web chat
- cli app
for the frontend you have to run:
- the cello image running
- the ollama gateway
- the frontend app
for the cli:
- the cello image running
- the ollama gateway image running
- the cli script
Pre-requisites:
- golang installed on your machine To run the cli you have to run:
go mod init
go mod tidy
go run cmd/main.goAll the information about how to run this modules is on the prerequisites section.
If you have questions, suggestions, or just want to connect: