Skip to content

Harvest items duplicate key issue #788

@maudetes

Description

@maudetes

We have an issue at the moment on the harvest item v-for loop due to duplicate keys.

          <tr
            v-for="item in paginatedItems"
            :key="item.remote_id"
          >

Indeed, item.remote_id is not unique due to "skipped" items that have remote_id at null and duplicates remote id.

I checked for unique property but we don't have an id and it seems that started, created and ended are not 100% unique either.

Pipeline to ckech for uniqueness
from udata.harvest.models import HarvestJob, HarvestSource
pipeline = [
    # Étape 1 : On décompose chaque HarvestJob en un document par HarvestItem
    {
        "$unwind": "$items"
    },
    # Étape 2 : On groupe par HarvestJob et par date created des items
    {
        "$group": {
            "_id": {
                "job_id": "$_id",
                "created_date": "$items.started"
            },
            "count": {"$sum": 1},
            "job": {"$first": "$$ROOT"}
        }
    },
    # Étape 3 : On filtre pour ne garder que les groupes où count > 1 (plusieurs items avec la même date)
    {
        "$match": {
            "count": {"$gt": 1}
        }
    },
    # Étape 4 : On regroupe par job_id pour éviter les doublons
    {
        "$group": {
            "_id": "$_id.job_id",
            "job": {"$first": "$job"},
            "duplicate_dates": {"$push": "$_id.created_date"}
        }
    },
    # Étape 5 : On reforme le document pour un affichage clair
    {
        "$project": {
            "job": 1,
            "duplicate_dates": 1
        }
    }
]
results = list(HarvestJob.objects(created__gte="2025-11-25").aggregate(*pipeline))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions