API and Architecture Refactoring
[1.2.0] - 2025-11-23
API and Architecture Refactoring (copied from CHANGELOG.md
Namespace Hierarchy Overhaul: Transitioned from a flat namespace to a hybrid hierarchical structure to enhance modularity and prevent namespace pollution.
Core Classes: DPMON, GNNEmbedding, SubjectRepresentation, SmCCNet, and DatasetLoader remain accessible at the top level (e.g., bnn.DPMON).
Utilities and Metrics: Functional tools are now scoped to their respective submodules (e.g., bnn.metrics.plot_network, bnn.utils.preprocess_clinical).
Utils Module Restructuring: Decomposed the monolithic utils module into specialized submodules for improved maintainability:
utils.data: Contains summary statistics functions (e.g., variance_summary).
utils.preprocess: Contains data transformation functions (e.g., impute_omics, normalize_omics).
utils.reproducibility: Dedicated module for seeding functions (set_seed).
New Features
Graph Engineering Module (graph_tools): Introduced a new module for the diagnosis and repair of network topology issues.
repair_graph_connectivity: Implemented an algorithm to reconnect fragmented network components (islands) to the global network using eigenvector centrality hubs or omics-driven correlation.
find_optimal_graph: Added an AutoML-style search function that benchmarks various graph construction strategies (Gaussian, Correlation, Threshold) using a structural proxy task to optimize downstream stability.
graph_analysis: Added diagnostic utilities to log topological metrics (clustering coefficient, average degree) and identify isolated subgraphs broken down by omics modality.
DPMON Enhancements: Expanded the NeuralNetwork backbone to support multiple dimensionality reduction strategies beyond the standard AutoEncoder.
Linear Projection: Added ScalarProjection, utilizing a linear layer to map embeddings to feature weights.
MLP Projection: Added MLPProjection, utilizing a non-linear Multilayer Perceptron for complex feature weighting.
Dataset Loaders:
Implemented functional loaders (load_brca, load_kipan, load_lgg, load_paad, load_monet, load_example) to provide immediate access to data dictionaries, aligning with scikit-learn conventions.
Added __getitem__ support to the DatasetLoader class for direct key access (e.g., loader['rna']).
Data Standardization
BRCA Clinical Update: Removed 15 duplicated columns from the BRCA clinical dataset, reducing the feature dimensionality from 118 to 103 to ensure data uniqueness.
Dataset Renaming:
Renamed the synthetic dataset example1 to example.
Renamed gbmlgg to lgg (Brain Lower Grade Glioma).
Target Variable Update: Updated the target variable for the lgg dataset from 'histological type' to 'vital_status' to better align with prognostic prediction tasks.
Key Standardization: Removed redundant _data suffixes from dataset dictionary keys (e.g., monet['mirna_data'] is now monet['mirna']).
Dataset Specifications: Updated documentation to explicitly define the dimensions (samples × features) for all included datasets:
BRCA: miRNA (769, 503), Target (769, 1), Clinical (769, 103), RNA (769, 2500), Meth (769, 2203).
LGG: miRNA (511, 548), Target (511, 1), Clinical (511, 13), RNA (511, 2127), Meth (511, 1823).
PAAD: CNV (177, 1035), Target (177, 1), Clinical (177, 19), RNA (177, 1910), Meth (177, 1152).
KIPAN: miRNA (658, 472), Target (658, 1), Clinical (658, 19), RNA (658, 2284), Meth (658, 2102).
Monet: Gene (107, 5039), miRNA (107, 789), Phenotype (106, 1), RPPA (107, 175), Clinical (107, 5).
Example: X1 (358, 500), X2 (358, 100), Y (358, 1), Clinical (358, 6).
Improvements and Fixes
Documentation: Refactored all docstrings across the library to adhere to strict Google Style formatting (Args/Returns) to ensure consistent API documentation generation.
Clustering:
Hybrid Louvain: Corrected the parameter tuning logic for k3 and k4 weights and refined the iterative refinement loop for identifying phenotype-associated subgraphs.
Correlated PageRank: Enhanced input validation to ensure proper alignment between graph nodes and omics features.
Removed
Metrics Evaluation: Removed the metrics.evaluation module. Its functionality has been consolidated into the metrics module or deprecated in favor of external validation workflows.
Left to Do
Errors: Online docs will be updated. on v1.2.1, package tests should pass once datasets get posted to pypi