Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
74f7633
Turn FooterBanner into InfoBanner, add to all views
naglepuff Apr 17, 2025
aa7934e
Fix oauth2 setting
mvandenburgh Apr 18, 2025
7426499
Merge pull request #2335 from dandi/fix-setting-name
mvandenburgh Apr 18, 2025
ca4b478
Don't override oauth2_provider settings dict
mvandenburgh Apr 18, 2025
0817a5d
Merge pull request #2337 from dandi/fix-overwritten-settings
mvandenburgh Apr 18, 2025
b703541
Revert "Don't override oauth2_provider settings dict"
mvandenburgh Apr 18, 2025
964712d
Revert "Fix oauth2 setting"
mvandenburgh Apr 18, 2025
0259d05
Revert "Switch staging back to builtin oauth `Application`"
mvandenburgh Apr 18, 2025
57eb078
Merge pull request #2338 from dandi/revert-oauth-model
mvandenburgh Apr 18, 2025
799ec3a
Merge pull request #2329 from dandi/2302-move-banner
naglepuff Apr 18, 2025
af5d20f
auto shipit - CHANGELOG.md etc
dandibot Apr 18, 2025
9df9b6c
Auto-allow people with `@nih.gov` and `@janelia.hhmi.org` email addre…
kabilar Apr 20, 2025
a76574d
Fix format
kabilar Apr 20, 2025
c1b7f40
Add some documentation for playwright test data
mvandenburgh Apr 21, 2025
65014af
Regenerate playwright test fixture w/ excludes
mvandenburgh Apr 21, 2025
6048fe2
Merge pull request #2341 from dandi/fix-test-data
mvandenburgh Apr 21, 2025
d38100a
Convert StagingApplication to a proxy model
jjnesbitt Apr 18, 2025
875f776
Split up GC service into multiple modules
mvandenburgh Apr 21, 2025
d663ad4
Refactor GC module to allow for dry-run
mvandenburgh Apr 21, 2025
b1bf772
Hook up GC service layer to `collect-garbage` script
mvandenburgh Apr 21, 2025
07e392c
Add additional confirmation to `collect_garbage.py`
mvandenburgh Apr 21, 2025
f6327dc
Merge pull request #2343 from dandi/add-gc-management-command
mvandenburgh Apr 21, 2025
600aaea
auto shipit - CHANGELOG.md etc
dandibot Apr 21, 2025
538e330
Refactor email patterns into `any()` call
waxlamp Apr 22, 2025
390d1bd
Merge pull request #2340 from kabilar/auto-email
kabilar Apr 23, 2025
12227a2
Remove import_dandisets command from docs
asmacdo Apr 23, 2025
24f6813
Merge pull request #2351 from asmacdo/dev-doc-cleanup-import-dandisets
waxlamp Apr 24, 2025
f8bbd24
Merge pull request #2339 from dandi/proxy-staging-application
jjnesbitt Apr 24, 2025
85ff137
Revert "Convert StagingApplication to a proxy model"
jjnesbitt Apr 24, 2025
426d879
Merge pull request #2357 from dandi/revert-2339-proxy-staging-applica…
jjnesbitt Apr 24, 2025
75271ea
Check to see if cookies are enabled for message
naglepuff Apr 25, 2025
02c594f
Make eslint fail on warning
mvandenburgh Apr 28, 2025
37c7075
Merge pull request #2359 from dandi/2271-cookies-disabled-banner
naglepuff Apr 28, 2025
9dbe20b
Merge pull request #2360 from dandi/web-fail-lint-on-warning
mvandenburgh Apr 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,58 @@
# v0.8.1 (Mon Apr 21 2025)

#### 🏠 Internal

- Integrate `garbage_collection` service into `collect_garbage.py` [#2343](https://github.com/dandi/dandi-archive/pull/2343) ([@mvandenburgh](https://github.com/mvandenburgh))

#### 🧪 Tests

- Remove unneeded data from playwright test fixture [#2341](https://github.com/dandi/dandi-archive/pull/2341) ([@mvandenburgh](https://github.com/mvandenburgh))

#### Authors: 1

- Mike VanDenburgh ([@mvandenburgh](https://github.com/mvandenburgh))

---

# v0.8.0 (Fri Apr 18 2025)

#### 🚀 Enhancement

- Move banner with info blurb to top of all pages [#2329](https://github.com/dandi/dandi-archive/pull/2329) ([@naglepuff](https://github.com/naglepuff))

#### 🐛 Bug Fix

- Don't override oauth2_provider settings dict [#2337](https://github.com/dandi/dandi-archive/pull/2337) ([@mvandenburgh](https://github.com/mvandenburgh))
- Fix oauth2 setting [#2335](https://github.com/dandi/dandi-archive/pull/2335) ([@mvandenburgh](https://github.com/mvandenburgh))
- Require minimum version of 2.0 for django-oauth-toolkit [#2326](https://github.com/dandi/dandi-archive/pull/2326) ([@jjnesbitt](https://github.com/jjnesbitt))

#### 🏠 Internal

- Revert OAuth model change [#2338](https://github.com/dandi/dandi-archive/pull/2338) ([@mvandenburgh](https://github.com/mvandenburgh))
- Switch from `runtime.txt` to `.python-version` [#2332](https://github.com/dandi/dandi-archive/pull/2332) ([@mvandenburgh](https://github.com/mvandenburgh))
- Switch staging back to builtin oauth `Application` [#2331](https://github.com/dandi/dandi-archive/pull/2331) ([@mvandenburgh](https://github.com/mvandenburgh))
- Update swagger/redocs urls to align with Resonant [#2327](https://github.com/dandi/dandi-archive/pull/2327) ([@mvandenburgh](https://github.com/mvandenburgh))

#### 📝 Documentation

- DOC: fixup description of the interaction with auto for releases based on labels [#2285](https://github.com/dandi/dandi-archive/pull/2285) ([@yarikoptic](https://github.com/yarikoptic) [@waxlamp](https://github.com/waxlamp))

#### 🔩 Dependency Updates

- Clean up `setup.py` [#2324](https://github.com/dandi/dandi-archive/pull/2324) ([@mvandenburgh](https://github.com/mvandenburgh))
- Update Heroku Python runtime [#2323](https://github.com/dandi/dandi-archive/pull/2323) ([@mvandenburgh](https://github.com/mvandenburgh))
- Unpin `django-oauth-toolkit`, generate migrations for downstream `StagingApplication` [#2320](https://github.com/dandi/dandi-archive/pull/2320) ([@mvandenburgh](https://github.com/mvandenburgh))

#### Authors: 5

- Jacob Nesbitt ([@jjnesbitt](https://github.com/jjnesbitt))
- Michael Nagler ([@naglepuff](https://github.com/naglepuff))
- Mike VanDenburgh ([@mvandenburgh](https://github.com/mvandenburgh))
- Roni Choudhury ([@waxlamp](https://github.com/waxlamp))
- Yaroslav Halchenko ([@yarikoptic](https://github.com/yarikoptic))

---

# v0.7.0 (Wed Apr 16 2025)

#### 🚀 Enhancement
Expand Down
34 changes: 0 additions & 34 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,40 +146,6 @@ This creates a dummy dandiset with valid metadata and a single dummy asset.
The dandiset should be valid and publishable out of the box.
This script is a simple way to get test data into your DB without having to use dandi-cli.

### import_dandisets
```
python manage.py import_dandisets [API_URL] --all
```

This imports all dandisets (versions + metadata only, no assets) from the dandi-api deployment
living at `API_URL`. For example, to import all dandisets from the production server into your
local dev environment, run `python manage.py import_dandisets https://api.dandiarchive.org` from
your local terminal. Note that if a dandiset with the same identifier as the one being imported
already exists, that dandiset will not be imported.

```
python manage.py import_dandisets [API_URL] --all --replace
```

Same as the previous example, except if a dandiset with the same identifier as the one being imported
already exists, the existing dandiset will be replaced with the one being imported.

```
python manage.py import_dandisets [API_URL] --all --offset 100000
```

This imports all dandisets (versions + metadata only, no assets) from the dandi-api deployment
living at `API_URL` and offsets their identifiers by 100000. This is helpful if you want to import
a dandiset that has the same identifier as one already in your database.

```
python manage.py import_dandisets [API_URL] --identifier 000005
```

This imports dandiset 000005 from `API_URL` into your local dev environment. Note that if there is already
a dandiset with an identifier of 000005, nothing will happen. Use the --replace flag to have the script
overwrite it instead if desired.

## Abbreviations

- DLP: Dataset Landing Page (e.g. https://dandiarchive.org/dandiset/000027)
12 changes: 12 additions & 0 deletions dandiapi/api/fixtures/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Playwright Test Data Fixture

This directory contains a [Django fixture](https://docs.djangoproject.com/en/5.2/topics/db/fixtures/) that contains test data
for the Playwright-based e2e tests.

## How was this data generated?

To generate this data, a local DB was populated with test data and then dumped to a Django fixture using `manage.py dumpdata`. Note, the `--exclude` flags are important here because they prevent unneeded and/or deployment specific DB tables from being included in the dump.

```bash
./manage.py dumpdata --output dandiapi/api/fixtures/playwright.json.xz --exclude auth.permission --exclude authtoken --exclude contenttypes --exclude oauth2_provider --exclude sites
```
Binary file modified dandiapi/api/fixtures/playwright.json.xz
Binary file not shown.
33 changes: 24 additions & 9 deletions dandiapi/api/management/commands/collect_garbage.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,29 @@
from __future__ import annotations

from django.db.models import Sum
import djclick as click

from dandiapi.api.garbage import stale_assets
from dandiapi.api.services import garbage_collection


def echo_report():
click.echo(f'Assets: {stale_assets().count()}')
click.echo('AssetBlobs: Coming soon')
click.echo('Uploads: Coming soon')
garbage_collectable_assets = stale_assets()
assets_count = garbage_collectable_assets.count()

garbage_collectable_asset_blobs = garbage_collection.asset_blob.get_queryset()
asset_blobs_count = garbage_collectable_asset_blobs.count()
asset_blobs_size_in_bytes = garbage_collectable_asset_blobs.aggregate(Sum('size'))['size__sum']

garbage_collectable_uploads = garbage_collection.upload.get_queryset()
uploads_count = garbage_collectable_uploads.count()

click.echo(f'Assets: {assets_count}')
click.echo(
f'AssetBlobs: {asset_blobs_count} ({asset_blobs_size_in_bytes} bytes / '
f'{asset_blobs_size_in_bytes / (1024 ** 3):.2f} GB)'
)
click.echo(f'Uploads: {uploads_count}')
click.echo('S3 Blobs: Coming soon')


Expand All @@ -24,13 +39,13 @@ def collect_garbage(*, assets: bool, assetblobs: bool, uploads: bool, s3blobs: b
if doing_deletes:
echo_report()

if assetblobs:
raise click.NoSuchOption('Deleting AssetBlobs is not yet implemented')
if uploads:
raise click.NoSuchOption('Deleting Uploads is not yet implemented')
if s3blobs:
if assetblobs and click.confirm('This will delete all AssetBlobs. Are you sure?'):
garbage_collection.asset_blob.garbage_collect()
if uploads and click.confirm('This will delete all Uploads. Are you sure?'):
garbage_collection.upload.garbage_collect()
if s3blobs and click.confirm('This will delete all S3 Blobs. Are you sure?'):
raise click.NoSuchOption('Deleting S3 Blobs is not yet implemented')
if assets:
if assets and click.confirm('This will delete all Assets. Are you sure?'):
assets_to_delete = stale_assets()
if click.confirm(f'This will delete {assets_to_delete.count()} assets. Are you sure?'):
assets_to_delete.delete()
Expand Down
91 changes: 4 additions & 87 deletions dandiapi/api/services/garbage_collection/__init__.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,13 @@
from __future__ import annotations

from concurrent.futures import Future, ThreadPoolExecutor, wait
from datetime import timedelta
import json

from celery.utils.log import get_task_logger
from django.core import serializers
from django.db import transaction
from django.utils import timezone
from more_itertools import chunked

from dandiapi.api.models import (
AssetBlob,
GarbageCollectionEvent,
GarbageCollectionEventRecord,
Upload,
)
from dandiapi.api.models import GarbageCollectionEvent
from dandiapi.api.services.garbage_collection import asset_blob, upload
from dandiapi.api.storage import DandiMultipartMixin

logger = get_task_logger(__name__)
Expand All @@ -33,85 +25,10 @@
) # TODO: pick this up from env var set by Terraform to ensure consistency?


def _garbage_collect_uploads() -> int:
qs = Upload.objects.filter(
created__lt=timezone.now() - UPLOAD_EXPIRATION_TIME,
)
if not qs.exists():
return 0

deleted_records = 0
futures: list[Future] = []

with transaction.atomic(), ThreadPoolExecutor() as executor:
event = GarbageCollectionEvent.objects.create(type=Upload.__name__)
for uploads_chunk in chunked(qs.iterator(), GARBAGE_COLLECTION_EVENT_CHUNK_SIZE):
GarbageCollectionEventRecord.objects.bulk_create(
GarbageCollectionEventRecord(
event=event, record=json.loads(serializers.serialize('json', [u]))[0]
)
for u in uploads_chunk
)

# Delete the blobs from S3
futures.append(
executor.submit(
lambda chunk: [u.blob.delete(save=False) for u in chunk],
uploads_chunk,
)
)

deleted_records += Upload.objects.filter(
pk__in=[u.pk for u in uploads_chunk],
).delete()[0]

wait(futures)

return deleted_records


def _garbage_collect_asset_blobs() -> int:
qs = AssetBlob.objects.filter(
assets__isnull=True,
created__lt=timezone.now() - ASSET_BLOB_EXPIRATION_TIME,
)
if not qs.exists():
return 0

deleted_records = 0
futures: list[Future] = []

with transaction.atomic(), ThreadPoolExecutor() as executor:
event = GarbageCollectionEvent.objects.create(type=AssetBlob.__name__)
for asset_blobs_chunk in chunked(qs.iterator(), GARBAGE_COLLECTION_EVENT_CHUNK_SIZE):
GarbageCollectionEventRecord.objects.bulk_create(
GarbageCollectionEventRecord(
event=event, record=json.loads(serializers.serialize('json', [a]))[0]
)
for a in asset_blobs_chunk
)

# Delete the blobs from S3
futures.append(
executor.submit(
lambda chunk: [a.blob.delete(save=False) for a in chunk],
asset_blobs_chunk,
)
)

deleted_records += AssetBlob.objects.filter(
pk__in=[a.pk for a in asset_blobs_chunk],
).delete()[0]

wait(futures)

return deleted_records


def garbage_collect():
with transaction.atomic():
garbage_collected_uploads = _garbage_collect_uploads()
garbage_collected_asset_blobs = _garbage_collect_asset_blobs()
garbage_collected_uploads = upload.garbage_collect()
garbage_collected_asset_blobs = asset_blob.garbage_collect()

GarbageCollectionEvent.objects.filter(
timestamp__lt=timezone.now() - RESTORATION_WINDOW
Expand Down
71 changes: 71 additions & 0 deletions dandiapi/api/services/garbage_collection/asset_blob.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
from __future__ import annotations

from concurrent.futures import Future, ThreadPoolExecutor, wait
from datetime import timedelta
import json
from typing import TYPE_CHECKING

from celery.utils.log import get_task_logger
from django.core import serializers
from django.db import transaction
from django.utils import timezone
from more_itertools import chunked

from dandiapi.api.models import (
AssetBlob,
GarbageCollectionEvent,
GarbageCollectionEventRecord,
)

if TYPE_CHECKING:
from django.db.models import QuerySet

logger = get_task_logger(__name__)

ASSET_BLOB_EXPIRATION_TIME = timedelta(days=7)


def get_queryset() -> QuerySet[AssetBlob]:
"""Get the queryset of AssetBlobs that are eligible for garbage collection."""
return AssetBlob.objects.filter(
assets__isnull=True,
created__lt=timezone.now() - ASSET_BLOB_EXPIRATION_TIME,
)


def garbage_collect() -> int:
from . import GARBAGE_COLLECTION_EVENT_CHUNK_SIZE

qs = get_queryset()

if not qs.exists():
return 0

deleted_records = 0
futures: list[Future] = []

with transaction.atomic(), ThreadPoolExecutor() as executor:
event = GarbageCollectionEvent.objects.create(type=AssetBlob.__name__)
for asset_blobs_chunk in chunked(qs.iterator(), GARBAGE_COLLECTION_EVENT_CHUNK_SIZE):
GarbageCollectionEventRecord.objects.bulk_create(
GarbageCollectionEventRecord(
event=event, record=json.loads(serializers.serialize('json', [a]))[0]
)
for a in asset_blobs_chunk
)

# Delete the blobs from S3
futures.append(
executor.submit(
lambda chunk: [a.blob.delete(save=False) for a in chunk],
asset_blobs_chunk,
)
)

deleted_records += AssetBlob.objects.filter(
pk__in=[a.pk for a in asset_blobs_chunk],
).delete()[0]

wait(futures)

return deleted_records
Loading
Loading