AIVIVE: AI-aided In Vitro-In Vivo Extrapolation

AIVIVE is a novel Generative Adversarial Network (GAN) framework that combines a GAN-based translator with local optimizers. It uses biologically meaningful gene clusters (modules) to facilitate the demonstration of in vitro-in vivo extrapolation (IVIVE).

**The paper is published in Toxicological Sciences and can be accessed via https://doi.org/10.1093/toxsci/kfaf100

Introduction

This repository contains code files and models for training the AIVIVE framework and generating predictions for in vitro-in vivo extrapolation. Additionally, it provides scripts for applying the model to biologically relevant tasks, such as analyzing differentially expressed genes (DEGs), predicting necrosis, and other related applications.

Code Scripts

Preprocessing

This folder contains the code and scripts for data preprocessing, specifically focusing on RMA (Robust Multi-array Average) normalization and annotation of gene expression data. These preprocessing steps are critical for preparing data before training the AIVIVE model.

Files:

rma_vivo_single.R - R script that performs RMA normalization on the raw gene expression data which is widely used for processing microarray data
rat_annotation.R - R script processes rat gene expression data by extracting probe IDs and merging them with gene annotations (relevant gene identifiers) to generate a final dataset of unique rat genes used for AIVIVE model development.

These preprocessing steps ensure that the gene expression data is properly prepared for the downstream tasks of training the AIVIVE model.

AIVIVE Model Development, Training & Predictions

This folder contains the core code for developing and training the AIVIVE framework. It includes the implementation of a GAN-based translator model and local optimizers networks.

GAN-Based Translator framework for translating in-vitro transcriptomic profiles to in-vivo transcriptomic profiles.
Local optimizers for enhancing the model's performance.
Predictions for generating the test set predictions.

Files:

vitro_vivo_GAN.py - GAN-based translator framework script to train the AIVIVE model on the IVIVE dataset.
train_test_samples.py - Generating test set predictions using the optimal generator from the GAN-based translator
optim_neural_net_#.py - Local optimizer neural network frameworks for specific modules, where # refers to the module number (e.g., optim_neural_net_18.py, optim_neural_net_20.py, etc.). These scripts contain implementations for training different modules.
module_test_evals.py - Generating test set predicitons for specific modules using the optimal local optimizers.

Model Evaluation

This contains the scripts for performing AIVIVE model evaluation on the test set using three metrices: cosine similarity, root mean squared error (RMSE), and mean absolute percentage error (MAPE).

Negative Control - Measurements were calculated between any two real in vivo profiles (excluding biological replicates), serving as a baseline for comparison.
AIVIVE - Measurements were calculated between the optimized synthetic in vivo profiles generated by the model and its corresponding real profiles.
Positive Control - Measurements were calculated between biological replicates within a treatment, providing a benchmark for real biological performance.
Files:
- negative_control.py - Python script for calculating the metrices for the negative control group as defined above.
- cosine.py - For calculating the cosine similarity for AIVIVE group (generated profiles).
- rmse.py - For calculating the rmse for AIVIVE group (generated profiles).
- mape.py - For calculating the mape for AIVIVE group (generated profiles).
- positive_control.py - For calculating the metrics for positive control group.

DEG Analysis

This folder contains the code and scripts related to Differentially Expressed Genes (DEG) analysis for both real and AIVIVE-generated synthetic profiles. The code is the same for both rat S1500+ and modules gene sets with different input files depending on the task.

Foldchange Calculation: The Foldchange Calculation is the difference of gene expression between treatment and control condition. It is calulcated as: Fold Change = Treatment Expression - Control Expression
DEG Identification: DEGs were defined as genes with an absolute fold change > 1 in both real and synthetic profiles. It is given by: DEG = |FoldChange| > 1.
DEG Overlap Calculation: Measures the overlap between real and generated DEGs by comparing the intersection of both sets and normalizing by the total number of real DEGs. It is calculated as: Overlap = (DEGs_real ∩ DEGs_generated) / DEGs_real

Files:

real_fold.py - Foldchange calculation script for real profiles.
gen_fold.py - Foldchange calculation script for generated profiles (to be used for both rat S1500+ and module genes with modifications in input).
real_deg.py - DEG identification script for real profiles.
gen_union.py - DEG identification script for generated profiles. For synthetic profiles, an additional step was taken to calculate the union of DEGs across the subgroups of a specific treatment group.
overlap_deg.py - DEGs overlap calculation script.

KEGG Pathway Analysis

This contains code for performing KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis, which is used to identify biological pathways enriched in differentially expressed genes (DEGs) for understanding underlying biological processes.

KEGG Pathway Enrichment Analysis: This step would map genes to known pathways using the KEGG database.
Pathway Overlap Calculation: Calculates the overlap of KEGG pathways enriched in real and generated profiles DEGs.
Files:
- kegg_pathway.R - KEGG pathway enrichment analysis.
- overlap_pathways.py - Python script for calculating the pathway overlap between real and synthetic profiles.

AOP-Gene Expression Analysis

This contains code for performing AOP (Adverse Outcome Pathway) analysis combined with Gene Expression Analysis. The goal is to investigate the percentage error in gene expression values between real and AIVIVE-generated synthetic test profiles for AOP-related genes.

Percentage Error in Gene Expression: The AOP genes were identified from the rat S1500+ gene set and percentage error was calulcated for the gene expression values between the real and AIVIVE-generated synthetic profiles. It was calculated as: % Error (Gene Expression) = [(Real - Synthetic)/Real] * 100
Files:
- AOP.py - Python script for calculating the percentage error in gene expression values between real and AIVIVE-generated profiles, followed by visualization using a heatmap to identify patterns in gene expression between the two profiles.

Necrosis Prediction Model

This contains code for modeling necrosis gene expression predictions.

Necrosis Predictive Model Development - Model trained on real in vivo train profiles
Necrosis Model Evaluation - The model was evaluated on two distinct test sets: one consisting of real in vivo profiles and another consisting of optimized synthetic in vivo profiles generated by AIVIVE.
Files:
- necrosis_prediction.py - Python script for developing and evaluating the necrosis predictive model.

Data Files

This folder consists of the data files that were used in model development and training.

final_rat_genes.csv - Rat S1500+ gene set used for AIVIVE model development and training.
open_tggates_cel_file_attribute.csv - Open TG-GATEs metadata downloaded from [https://dbarchive.biosciencedbc.jp/en/open-tggates/download.html]
rat_vitro_vivo_train_test - Rat liver in vitro and in vivo(single dose) transcriptomic profiles that are RMA normalized. The data is split into train (80%) and test (20%) set based on unique compounds.
modules_genes - Module gene sets used in local optimization
aop_overlap_genes.csv - AOP genes overlapped with rat S1500+ genes (used for AOP-Gene Expression Analysis)
necrosis_df.csv - Pathological findings (from Open TG-GATEs)

Installation

Prerequisites

Before using this repository, ensure you have the following installed:

Python (version 3.11.7)
Tensorflow-GPU (version 2.4.1)
R (version 4.4.1)
Bioconductor (version 3.19)
rat2302.db (version 3.13.0 from Bioconductor version 3.19 release)
Other packages specified in the code scripts

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
applications		applications
data		data
degs		degs
docs		docs
preprocessing		preprocessing
training		training
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIVIVE: AI-aided In Vitro-In Vivo Extrapolation

Table of Contents

Introduction

Code Scripts

Preprocessing

AIVIVE Model Development, Training & Predictions

Model Evaluation

DEG Analysis

KEGG Pathway Analysis

AOP-Gene Expression Analysis

Necrosis Prediction Model

Data Files

Installation

Prerequisites

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

CHANDMX20/AIVIVE

Folders and files

Latest commit

History

Repository files navigation

AIVIVE: AI-aided In Vitro-In Vivo Extrapolation

Table of Contents

Introduction

Code Scripts

Installation

Prerequisites

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages