Skip to content

Commit 9cb5224

Browse files
committed
Clarify non-span-normalised pca
Fixes #3358
1 parent 03ad862 commit 9cb5224

File tree

1 file changed

+19
-14
lines changed

1 file changed

+19
-14
lines changed

python/tskit/trees.py

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -9299,23 +9299,28 @@ def pca(
92999299
eigenvectors of the genetic relatedness matrix, which are obtained by a
93009300
randomized singular value decomposition (rSVD) algorithm.
93019301
9302-
Concretely, if :math:`M` is the matrix of genetic relatedness values, with
9303-
:math:`M_{ij}` the output of
9304-
:meth:`genetic_relatedness <.TreeSequence.genetic_relatedness>`
9305-
between sample :math:`i` and sample :math:`j`, then by default this returns
9306-
the top ``num_components`` eigenvectors of :math:`M`, so that
9302+
Concretely, take :math:`M` as the matrix of non-span-normalised
9303+
branch-based genetic relatedness values, for instance obtained by
9304+
setting :math:`M_{ij}` to be the :meth:`~.TreeSequence.genetic_relatedness`
9305+
between sample :math:`i` and sample :math:`j` with ``mode="branch"``,
9306+
``proportion=False`` and ``span_normalise=False``. Then by default this
9307+
returns the top ``num_components`` eigenvectors of :math:`M`, so that
93079308
``output.factors[i,k]`` is the position of sample `i` on the `k` th PC.
9308-
If ``samples`` or ``individuals`` are provided, then this does the same thing,
9309-
except with :math:`M_{ij}` either the relatedness between ``samples[i]``
9310-
and ``samples[j]`` or the nodes of ``individuals[i]`` and ``individuals[j]``,
9311-
respectively.
9309+
If ``samples`` or ``individuals`` are provided, then this does the same
9310+
thing, except with :math:`M_{ij}` either the relatedness between
9311+
``samples[i]`` and ``samples[j]`` or the average relatedness between the
9312+
nodes of ``individuals[i]`` and ``individuals[j]``, respectively.
9313+
Factors are normalized to have L2 norm 1, i.e.,
9314+
``output.factors[:,k] ** 2).sum() == 1)`` for any ``k``.
93129315
93139316
The parameters ``centre`` and ``mode`` are passed to
9314-
:meth:`genetic_relatedness <.TreeSequence.genetic_relatedness>`;
9315-
if ``windows`` are provided then PCA is carried out separately in each window.
9316-
If ``time_windows`` is provided, then genetic relatedness is measured using only
9317-
ancestral material within the given time window (see
9318-
:meth:`decapitate <.TreeSequence.decapitate>` for how this is defined).
9317+
:meth:`~.TreeSequence.genetic_relatedness`: the default ``centre=True`` results
9318+
in factors whose elements sum to zero; ``mode`` currently only supports the
9319+
``"branch"`` setting. If ``windows`` are provided then PCA is carried out
9320+
separately in each genomic window. If ``time_windows`` is provided, then genetic
9321+
relatedness is measured using only ancestral material within the given time
9322+
window (see :meth:`decapitate <.TreeSequence.decapitate>` for how this is
9323+
defined).
93199324
93209325
So that the method scales to large tree sequences, the underlying method
93219326
relies on a randomized SVD algorithm, using

0 commit comments

Comments
 (0)