WC2-875: Add command to clean-up instance duplicates#2747
WC2-875: Add command to clean-up instance duplicates#2747
Conversation
| self.stdout.write(self.style.ERROR(" - file_name is None, that's odd!")) | ||
| continue | ||
| # We should use the content of the latest file received. | ||
| file_name = Instance.objects.filter(uuid__isnull=True, file_name=duplicate["file_name"]).order_by( |
There was a problem hiding this comment.
The file_name vairable could be renamed file_name_qs to express that it's a queryset
tdethier
left a comment
There was a problem hiding this comment.
I didn't test that because I don't have a WFP setup, but it looks good. I think there's just an order_by() missing
| has_content = Instance.objects.filter( | ||
| uuid=duplicate["uuid"], file_name=duplicate["file_name"], json__isnull=False | ||
| ) | ||
| if has_content.count() == 1: | ||
| self.stdout.write( | ||
| f"{index}. Duplicate with uuid '{duplicate['uuid']}' has only one instance with content." | ||
| ) | ||
| self._delete_instances( | ||
| uuid=duplicate["uuid"], file_name=duplicate["file_name"], first_id=has_content.first().id | ||
| ) | ||
| single_content += 1 | ||
| elif has_content.count() > 1: | ||
| self.stdout.write( | ||
| f"{index}. Duplicate with uuid '{duplicate['uuid']}' has {has_content.count()} instances with content." | ||
| ) | ||
| # Based on the code of `Instance.import_data`, the first one is always the one that has the data. | ||
| self._delete_instances( | ||
| uuid=duplicate["uuid"], file_name=duplicate["file_name"], first_id=has_content.first().id |
There was a problem hiding this comment.
You do has_content.first() but your has_content queryset doesn't have an .order_by(). I'm afraid you can't know for sure which one you're going to get here
| def _delete_instances(self, uuid: str, file_name: Union[str, None], first_id: str): | ||
| to_delete = Instance.objects.filter(uuid=uuid).exclude(id=first_id) | ||
| if file_name: | ||
| to_delete = to_delete.filter(file_name=file_name) | ||
| self.stdout.write(self.style.WARNING(f" - Deleting {to_delete.count()} instances and linked entities")) | ||
| Entity.objects.filter(attributes__in=to_delete).delete() | ||
| to_delete.delete() |
There was a problem hiding this comment.
Could these instances also have InstanceFile objects that need to be deleted? I'm not sure if that scenario can happen with the issue we've had
There was a problem hiding this comment.
I don't think we'll have InstanceFile as there are no attachment in WFP
| multiple_content += 1 | ||
| else: | ||
| self.stdout.write(f"{index}. Duplicate with uuid '{duplicate['uuid']}' has no instances with content.") | ||
| first = Instance.objects.filter(uuid=duplicate["uuid"], file_name=duplicate["file_name"]).first() |
There was a problem hiding this comment.
Same comment about .first() and .order_by()
| list_filter = ("birth_date", "gender", "account", "guidelines", ProgrammeType) | ||
| list_display = ("id", "birth_date", "gender", "account", "guidelines") | ||
| actions = [create_uuid_index_action] | ||
| actions = [create_uuid_index_action, clean_up_duplicates_action, clean_up_duplicates_action_dry_run] |
What problem is this PR solving?
Some instances have duplicates instances (Instance with the same UUID). This adds a command and a celery task to fix the issue.
Related JIRA tickets
WC2-875
Changes
The command looks for duplicate. Then, for each one, it tries to merge them the best it can.
How to test
With a dump of a database with duplicates. Either execute the command or launch the celery task.