Merge branch 'main' into random_spike_selection_new_methods

tayheau · web-flow · commit cc440d0d8731 · 2025-12-28T10:16:02.000+01:00
diff --git a/.github/scripts/determine_testing_environment.py b/.github/scripts/determine_testing_environment.py
@@ -77,6 +77,7 @@
             sorters_external_changed = True
         elif "internal" in changed_file.parts:
             sorters_internal_changed = True
+            sorters_changed = True
         else:
             sorters_changed = True
     elif ".github" in changed_file.parts:
diff --git a/README.md b/README.md
@@ -43,16 +43,8 @@
 </tr>
 </table>
 
-[![Twitter](https://img.shields.io/badge/@spikeinterface-%231DA1F2.svg?style=for-the-badge&logo=Twitter&logoColor=white)](https://twitter.com/spikeinterface) [![Mastodon](https://img.shields.io/badge/-@spikeinterface-%232B90D9?style=for-the-badge&logo=mastodon&logoColor=white)](https://fosstodon.org/@spikeinterface)
-
 
-> :rocket::rocket::rocket:
-> **New features!**: after months of development and testing, we are happy to announce that
-> the latest release (0.101.0) includes a major API improvement: the `SortingAnalyzer`!
-> To read more about why we did this, checkout the
-> [SpikeInterface Enhancement Proposal](https://github.com/SpikeInterface/spikeinterface/issues/2282).
-> Please follow this guide to transition from the old API to the new one:
-> [Updating from legacy](https://spikeinterface.readthedocs.io/en/0.101.0/tutorials/waveform_extractor_to_sorting_analyzer.html).
+[![Twitter](https://img.shields.io/badge/@spikeinterface-%231DA1F2.svg?style=for-the-badge&logo=Twitter&logoColor=white)](https://twitter.com/spikeinterface) [![Mastodon](https://img.shields.io/badge/-@spikeinterface-%232B90D9?style=for-the-badge&logo=mastodon&logoColor=white)](https://fosstodon.org/@spikeinterface)
 
 
 SpikeInterface is a Python framework designed to unify preexisting spike sorting technologies into a single code base.
@@ -64,14 +56,19 @@ With SpikeInterface, users can:
 
 - read/write many extracellular file formats.
 - pre-process extracellular recordings.
-- run many popular, semi-automatic spike sorters (also in Docker/Singularity containers).
-- post-process sorted datasets.
+- run many popular, semi-automatic spike sorters (kilosort1-4, mountainsort4-5, spykingcircus,
+  tridesclous, ironclust, herdingspikes, yass, waveclus)
+- run sorters developed in house (lupin, spkykingcicus2, tridesclous2, simple) that compete with kilosort4
+- run theses polar sorters without installation using containers (Docker/Singularity).
+- post-process sorted datasets using th SortingAnalyzer
 - compare and benchmark spike sorting outputs.
 - compute quality metrics to validate and curate spike sorting outputs.
 - visualize recordings and spike sorting outputs in several ways (matplotlib, sortingview, jupyter, ephyviewer)
 - export a report and/or export to phy
-- offer a powerful Qt-based viewer in a separate package [spikeinterface-gui](https://github.com/SpikeInterface/spikeinterface-gui)
+- curate your sorting with several strategies (ml-based, metrics based, manual, ...)
+- offer a powerful Qt-based or we-based viewer in a separate package [spikeinterface-gui](https://github.com/SpikeInterface/spikeinterface-gui) for manual curation that replace phy.
 - have powerful sorting components to build your own sorter.
+- have a full motion/drift correction framework
 
 
 ## Documentation
diff --git a/doc/get_started/install_sorters.rst b/doc/get_started/install_sorters.rst
@@ -284,7 +284,7 @@ working not only at peak times but at all times, recovering more spikes close to
 
         pip install hdbscan
         pip install spikeinterface
-        pip install numba  (or conda install numba as recommended by conda authors)
+        pip install numba (or conda install numba as recommended by conda authors)
 
 
 Tridesclous2
@@ -294,11 +294,25 @@ This is an upgraded version of Tridesclous, natively written in SpikeInterface.
 
 
 * Python
-* Requires: HDBSCAN and Numba
+* Requires: Numba
 * Authors: Samuel Garcia
 * Installation::
 
-      pip install hdbscan
+      pip install spikeinterface
+      pip install numba
+
+
+Lupin
+^^^^^
+
+This is a representative spike sorting pipeline, natively written in SpikeInterface.
+
+
+* Python
+* Requires: Numba
+* Authors: Samuel Garcia & Pierre Yger
+* Installation::
+
       pip install spikeinterface
       pip install numba
 
diff --git a/doc/how_to/analyze_neuropixels.rst b/doc/how_to/analyze_neuropixels.rst
@@ -516,7 +516,7 @@ pipeline, in SpikeInterface this is dead-simple: one function.
 -  most of sorters are wrapped from external tools (kilosort,
    kisolort2.5, spykingcircus, montainsort4 …) that often also need
    other requirements (e.g., MATLAB, CUDA)
--  some sorters are internally developed (spyekingcircus2)
+-  some sorters are internally developed (spykingcircus2, tridesclous2, lupin)
 -  external sorter can be run inside a container (docker, singularity)
    WITHOUT pre-installation
 
diff --git a/doc/index.rst b/doc/index.rst
@@ -7,17 +7,6 @@ SpikeInterface is a Python module to analyze extracellular electrophysiology dat
 With a few lines of code, SpikeInterface enables you to load and pre-process the recording, run several
 state-of-the-art spike sorters, post-process and curate the output, compute quality metrics, and visualize the results.
 
-.. warning::
-
-    Version 0.101.0 introduces a major API improvement: the :code:`SortingAnalyzer`.
-    To read more about the motivations, checkout the
-    `enhancement proposal <https://github.com/SpikeInterface/spikeinterface/issues/2282>`_.
-    Learn how to :ref:`update your code here <tutorials/waveform_extractor_to_sorting_analyzer:From WaveformExtractor to SortingAnalyzer>`
-    and read more about the :code:`SortingAnalyzer`, please refer to the
-    :ref:`core <modules/core:SortingAnalyzer>` and :ref:`postprocessing <modules/postprocessing:Postprocessing module>` module
-    documentation.
-
-
 
 Overview of SpikeInterface modules
 ----------------------------------
@@ -30,16 +19,23 @@ SpikeInterface is made of several modules to deal with different aspects of the
 
 - read/write many extracellular file formats.
 - pre-process extracellular recordings.
-- run many popular, semi-automatic spike sorters (also in Docker/Singularity containers).
-- post-process spike sorted data.
+- run many popular, semi-automatic spike sorters (kilosort1-4, mountainsort4-5, spykingcircus,
+  tridesclous, ironclust, herdingspikes, yass, waveclus)
+- run sorters developed in house (lupin, spkykingcicus2, tridesclous2, simple) that compete with
+  kilosort4
+- run theses polar sorters without installation using containers (Docker/Singularity).
+- post-process sorted datasets using th SortingAnalyzer
 - compare and benchmark spike sorting outputs.
 - compute quality metrics to validate and curate spike sorting outputs.
-- visualize recordings and spike sorting outputs.
-- export a report and/or export to Phy.
-- offer a powerful Qt-based viewer in a separate package `spikeinterface-gui <https://github.com/SpikeInterface/spikeinterface-gui>`_
-- have some powerful sorting components to build your own sorter.
+- visualize recordings and spike sorting outputs in several ways (matplotlib, sortingview, jupyter, ephyviewer)
+- export a report and/or export to phy
+- curate your sorting with several strategies (ml-based, metrics based, manual, ...)
+- offer a powerful Qt-based or we-based viewer in a separate package `spikeinterface-gui <https://github.com/SpikeInterface/spikeinterface-gui>`_ for manual curation that replace phy.
+- have powerful sorting components to build your own sorter.
 - have a full motion/drift correction framework (See :ref:`motion_correction`)
 
+
+
 .. toctree::
     :maxdepth: 1
     :caption: Contents:
diff --git a/doc/modules/index.rst b/doc/modules/index.rst
@@ -8,6 +8,7 @@ Modules documentation
     extractors
     preprocessing
     sorters
+    sorters_internal
     postprocessing
     metrics
     comparison
diff --git a/doc/modules/sorters.rst b/doc/modules/sorters.rst
@@ -8,9 +8,10 @@ Kilosort, Mountainsort, etc. (see :ref:`compatible-sorters`). All these sorter c
 from the :py:class:`~spikeinterface.sorters.BaseSorter` class, which provides the common tools to
 run spike sorters.
 
-On the other hand SpikeInterface directly implements some internal sorters (**spykingcircus2**)
+On the other hand SpikeInterface directly implements some internal sorters
 that do not depend on external tools, but depend on the :py:mod:`spikeinterface.sortingcomponents`
-module. **Note that internal sorters are currently experimental and under development**.
+module. Check the :ref:`internal_sorters` page for more details on internal sorters and their
+strategies.
 
 A drawback of using external sorters is the separate installation of these tools. Sometimes they need MATLAB,
 specific versions of CUDA, specific gcc versions or outdated versions of
@@ -322,15 +323,12 @@ an :code:`engine` that supports parallel processing (such as :code:`joblib` or :
 :py:func:`~spikeinterface.sorters.run_sorters` has several "engines" available to launch the computation:
 
 * "loop": sequential
-* "joblib": in parallel
 * "slurm": in parallel, using the SLURM job manager
 
 .. code-block:: python
 
   run_sorter_jobs(job_list=job_list, engine='loop')
 
-  run_sorter_jobs(job_list=job_list, engine='joblib', engine_kwargs={'n_jobs': 2})
-
   run_sorter_jobs(job_list=job_list, engine='slurm', engine_kwargs={'cpus_per_task': 10, 'mem': '5G'})
 
 
@@ -343,6 +341,7 @@ Alternatively, for long silicon probes, such as Neuropixels, one could think of
 separately, for example using a different sorter for the hippocampus, the thalamus, or the cerebellum.
 Running spike sorting by group is indeed a very common need.
 
+
 A :py:class:`~spikeinterface.core.BaseRecording` object has the ability to split itself into a dictionary of
 sub-recordings given a certain property (see :py:meth:`~spikeinterface.core.BaseRecording.split_by`).
 The :py:func:`~spikeinterface.sorters.run_sorter` method can accept the dictionary which is returned
@@ -404,10 +403,10 @@ In this example, we create a 16-channel recording with 4 tetrodes:
         sorting = run_sorter(sorter_name='kilosort2', recording=recording, folder=f"folder_KS2_group{group}")
         sortings[group] = sorting
 
-Read more about preprocessing and sorting by group in our How To, :ref:`recording-by-channel-group`.
 
-Note: you can feed the dict of sortings and dict of recordings directly to :code:`create_sorting_analyzer` to make
-a SortingAnalyzer from the split data: :ref:`read more <process_by_group>`.
+.. note::
+
+    Read more about preprocessing and sorting by group in our How To, :ref:`process_by_group`.
 
 
 Handling multi-segment recordings
@@ -447,11 +446,14 @@ do not handle multi-segment, and in that case we will use the
 
     # Case 2: the sorter DOES NOT handle multi-segment objects
     # The `concatenate_recordings()` mimics a mono-segment object that concatenates all segments
-    multirecording = si.concatenate_recordings(recordings_list)
-    # multirecording has 1 segment of 40s each
+    recording_concat = si.concatenate_recordings(recordings_list)
+    # recording_concat has 1 segment of 40s each
 
     # run mountainsort4 in mono-segment mode
-    multisorting = si.run_sorter(sorter_name='mountainsort4', recording=multirecording)
+    sorting_concat = si.run_sorter(sorter_name='mountainsort4', recording=recording_concat)
+
+    # split sorting back to multi-segment using concatenation info
+    multisorting = si.split_sorting(sorting_concat, recording_concat)
 
 See also the :ref:`multi_seg` section.
 
@@ -485,11 +487,12 @@ Here is the list of external sorters accessible using the run_sorter wrapper:
 * **Combinato** :code:`run_sorter(sorter_name='combinato')`
 * **HDSort** :code:`run_sorter(sorter_name='hdsort')`
 
-Here is a list of internal sorter based on `spikeinterface.sortingcomponents`; they are totally
-experimental for now:
+Here is a list of internal sorter based on :py:mod:`spikeinterface.sortingcomponents`:
 
+* **Lupin** :code:`run_sorter(sorter_name='lupin')`
 * **Spyking Circus2** :code:`run_sorter(sorter_name='spykingcircus2')`
 * **Tridesclous2** :code:`run_sorter(sorter_name='tridesclous2')`
+* **Simple** :code:`run_sorter(sorter_name='simple')`
 
 Here is the list of legacy sorters that are no longer supported, but can still be run
 with an older version of SpikeInterface:
@@ -546,8 +549,10 @@ Internal sorters
 In 2022, we started the :py:mod:`spikeinterface.sortingcomponents` module to break into components a sorting pipeline.
 These components can be gathered to create a new sorter. We already have 2 sorters to showcase this new module:
 
-* :code:`spykingcircus2` (experimental, but ready to be tested)
-* :code:`tridesclous2` (experimental, not ready to be used)
+* :code:`spykingcircus2`
+* :code:`tridesclous2`
+* :code:`lupin`
+* :code:`simple`
 
 There are some benefits of using these sorters:
   * they directly handle SpikeInterface objects, so they do not need any data copy.
@@ -560,7 +565,13 @@ From the user's perspective, they behave exactly like the external sorters:
 
     sorting = run_sorter(sorter_name="spykingcircus2", recording=recording, folder="/tmp/folder")
 
-Read more in the :ref:`sorting-components-module` docs.
+These sorters are based on the :py:mod:`spikeinterface.sortingcomponents`, allowing fast and modular implementations
+of various algorithms often encountered in spike-sorting.
+
+Please go to :ref:`internal_sorters` for more details on how they work.
+
+Read more in the :ref:`sorting-components-module` docs for more low level details on components.
+
 
 Contributing
 ------------
diff --git a/doc/modules/sorters_internal.rst b/doc/modules/sorters_internal.rst
@@ -0,0 +1,84 @@
+.. _internal_sorters:
+
+Internal sorters
+================
+
+:py:mod:`spikeinterface.sortingcomponents` implement algorithms to break a sorting pipeline
+into individual components. With this components it is easy to develop a new sorter.
+
+These components and sorters havs been benchmarked [here](https://github.com/samuelgarcia/sorting_components_benchmark_paper).
+
+
+At the moment, there are 4 internal sorters implemented in ``spikeinterface``:
+
+* :code:`lupin`
+* :code:`spykingcircus2`
+* :code:`tridesclous2`
+* :code:`simple`
+
+
+Lupin
+-----
+
+Lupin is components-based sorters, it combine components that give the best reults on benchmarks
+for each steps. It is theorically the "best" sorter that ``spikeinterface`` can offer internally.
+
+Lupin components are:
+  * preprocessing (filtering, CMR, whitening)
+  * the *DREDGE* motion correction algorithm (optional)
+  * peak detection with *matched filtering*
+  * iterative splits for clustering *Iter-ISOPLIT*
+  * augmented matching pursuit for the spike deconvolution with *Wobble*
+
+
+Some notes on this algorithm and related parameters:
+  * waveforms size is different for clustering and template matching:
+    ``clustering_ms_before``, ``clustering_ms_after``, ``ms_before``, ``ms_after``
+  * the filtering is quite smooth by default to filter out high-frequency noise: ``freq_max=7000``
+  * ``n_pca_features`` can impact the clustering step
+  * there is a cleaning step before the template matching using ``template_sparsify_threshold``,
+    ``template_min_snr_ptp``, ``template_max_jitter_ms``, and ``min_firing_rate``. This step can have a substantial impact on the result.
+  * Lupin is a bit slower than ``tridesclous2`` and ``spkykingcircus2``, but more accurate!
+
+SpyKING-CIRCUS 2
+----------------
+
+This is an updated version of SpyKING-CIRCUS [Yger2018]_ based on the modular
+components. In summary, this spike sorting pipeline uses optionaly the DREDGE motion
+correction algorithm before filtering and whitening the data. On these whitened data, the chains of components
+that are used are: matched filtering for peak detection, iterative splits for clustering (Iter-HDBSCAN),
+and orthogonal matching pursuit for template reconstruction (Circus-OMP).
+
+SpyKING-CIRCUS 2 components are:
+  * preprocessing (filtering, CMR, whitening)
+  * the *DREDGE* motion correction algorithm (optional)
+  * peak detection with *matched filtering*
+  * iterative splits for clustering *Iter-HDBSCAN*
+  * orthogonal matching pursuit for the spike deconvolution with *Circus-OMP*
+
+TriDesClous 2
+-------------
+
+This is an updated version of TriDesClous based on the modular components.
+It is not as good as ``Lupin`` in terms of performance, but it's way faster.
+This is sorter is a good choice for a very fast exploration of a dataset.
+
+TriDesClous 2 components are:
+  * preprocessing (filtering, CMR) but no whitening
+  * the *DREDGE* motion correction algorithm (optional)
+  * peak detection with *locally_exlusive*
+  * iterative splits for clustering *Iter-ISOPLIT*
+  * fast template matching using the *TDC-peeler*
+
+
+Simple
+------
+
+This is a simple sorter that **does not use the template matching**.
+It can be seen as an "old school" sorter with only peak detection, feature reduction (svd) and
+clustering.
+Using this sorter can be very useful on mono channel and tetrode datasets.
+Very often on 1-4 channel dataset when the SNR is too small then template matching is an overkill
+feature that gives worse results.
+
+The clustering step is quite flexible and several algorithms can be tested (k-means, isosplit, hdbscan, ...)
diff --git a/doc/references.rst b/doc/references.rst
@@ -44,7 +44,7 @@ please include the appropriate citation for the :code:`sorter_name` parameter yo
 - :code:`kilosort`  [Pachitariu]_
 - :code:`mountainsort` [Chung]_
 - :code:`rtsort` [van_der_Molen]_
-- :code:`spykingcircus` [Yger]_
+- :code:`spykingcircus` [Yger2018]_
 - :code:`wavclus` [Chaure]_
 - :code:`yass` [Lee]_
 
@@ -178,6 +178,6 @@ References
 
 .. [Windolf_b] `DREDge: robust motion correction for high-density extracellular recordings across species. 2023 <https://www.biorxiv.org/content/10.1101/2023.10.24.563768v1>`_
 
-.. [Yger] `A spike sorting toolbox for up to thousands of electrodes validated with ground truth recordings in vitro and in vivo. 2018. <https://pubmed.ncbi.nlm.nih.gov/29557782/>`_
+.. [Yger2018] `A spike sorting toolbox for up to thousands of electrodes validated with ground truth recordings in vitro and in vivo. 2018. <https://pubmed.ncbi.nlm.nih.gov/29557782/>`_
 
 .. [Scopin2024] `Localization of neurons from extracellular footprints <https://doi.org/10.1016/j.jneumeth.2024.110297>`_
diff --git a/src/spikeinterface/sorters/internal/lupin.py b/src/spikeinterface/sorters/internal/lupin.py
diff --git a/src/spikeinterface/sorters/internal/spyking_circus2.py b/src/spikeinterface/sorters/internal/spyking_circus2.py
diff --git a/src/spikeinterface/sorters/internal/tridesclous2.py b/src/spikeinterface/sorters/internal/tridesclous2.py
diff --git a/src/spikeinterface/sortingcomponents/tools.py b/src/spikeinterface/sortingcomponents/tools.py