Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
ace5ee0
Changes for testing warnings with js
BharatVe Mar 24, 2025
e987387
Update APi to API&Data
BharatVe Mar 26, 2025
7c145c0
Merge branch 'main' into enhancement/Download_all_geometries_and_meta…
nuest Mar 31, 2025
ba487d4
Addition of GeoPackage + Dynamic Size Calculation ( GeoPackage needs …
Apr 2, 2025
af50835
Upadted implemntation for Geopackage download
Apr 7, 2025
83cc473
Updated test file
Apr 9, 2025
699e78b
Merge remote-tracking branch 'origin/main' into enhancement/Download_…
Apr 9, 2025
1cc2193
Merge remote-tracking branch 'origin/main' into enhancement/Download_…
Apr 9, 2025
b21233d
update test( with pygdal)
Apr 9, 2025
823ad18
Update test_geo_data.py
BharatVe Apr 9, 2025
bcec9a1
Update requirements.txt
BharatVe Apr 9, 2025
aca2962
updated views.py, requirements.txt using fiona and shapely (vs osgeo)
Apr 10, 2025
7869a82
Changes for updated pull request. (Work in progress)
Apr 20, 2025
17965b0
Update tasks.py, minor updates
BharatVe Apr 20, 2025
988b5e1
Completed implemeentation with recommeded changes(final check needed)
Apr 22, 2025
c4cc194
Minor corrections tasks.py
BharatVe Apr 23, 2025
f135ef3
updated test
Apr 23, 2025
052c42f
Update data.html
BharatVe Apr 23, 2025
acfe536
Updated data message
BharatVe Apr 23, 2025
30cb5f1
now to timezone (Fix unittest issue)
Apr 23, 2025
9d24d63
add logos and colours to README, closes #33
nuest Apr 9, 2025
fc02e9b
Updated scripts- changed time fomats, modified test added humanize time
Apr 28, 2025
daed800
Merge branch 'main' into enhancement/Download_all_geometries_and_meta…
BharatVe Apr 28, 2025
a9f7a8d
fixed tests, removed fiona and updated requirements.txt
Apr 28, 2025
939e0b8
install GDAL package form PyPI
nuest Apr 29, 2025
e7d9701
fix test
nuest Apr 29, 2025
a2f829f
Updated links, changed URLs, corrected footer, added automated cache …
May 5, 2025
479e10e
Updated apps and tests
May 6, 2025
0e4b16b
Update apps.py
BharatVe May 6, 2025
6c072ed
Use Humanize, added checks for link validity.
May 12, 2025
100dc2a
added humanize
May 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 9 additions & 8 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,14 +41,14 @@ jobs:
run: |
sudo apt-get update
sudo add-apt-repository ppa:ubuntugis/ppa
sudo apt-get install -y -qq gdal-bin libgdal-dev python3-gdal

sudo apt-get install -y -qq gdal-bin libgdal-dev
- name: Install Python Dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -r requirements-dev.txt

python -m pip install gdal=="$(gdal-config --version).*"
python -m pip install -r requirements.txt
python -m pip install -r requirements-dev.txt

- name: Run Django migrations
run: |
python manage.py migrate
Expand Down Expand Up @@ -110,13 +110,14 @@ jobs:
run: |
sudo apt-get update
sudo add-apt-repository ppa:ubuntugis/ppa
sudo apt-get install -y -qq gdal-bin libgdal-dev python3-gdal
sudo apt-get install -y -qq gdal-bin libgdal-dev

- name: Install Python Dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -r requirements-dev.txt
python -m pip install gdal=="$(gdal-config --version).*"
python -m pip install -r requirements.txt
python -m pip install -r requirements-dev.txt

- name: Run Django migrations
run: |
Expand Down
3 changes: 2 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@ RUN apt-get update && \
RUN apt-get update && \
apt-get install -y -qq software-properties-common && \
add-apt-repository ppa:ubuntugis/ppa && \
apt-get install -y -qq gdal-bin libgdal-dev python3-gdal

RUN pip install gdal=="$(gdal-config --version).*"

RUN mkdir -p /code

Expand Down
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,13 @@ source .venv/bin/activate
# Confirm Python path
which python

# Instal GDAL and the Python GDAL bindings, see Dockerfile for example on Ubuntu
# Install GDAL
gdalinfo --version

# Install non-GDAL Python dependencies
# Install gdal Pyhton library matching your GDAL version
pip install gdal=="$(gdal-config --version).*"

# Install Python dependencies
pip install -r requirements.txt

# create local DB container (once)
Expand Down
4 changes: 4 additions & 0 deletions optimap/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@
'django.contrib.staticfiles',
'django.contrib.gis',
'django.contrib.sitemaps',
'django.contrib.humanize',
'publications',
'rest_framework',
'rest_framework_gis',
Expand Down Expand Up @@ -191,6 +192,7 @@
OAI_USERNAME = env("OPTIMAP_OAI_USERNAME", default="")
OAI_PASSWORD = env("OPTIMAP_OAI_PASSWORD", default="")
EMAIL_SEND_DELAY = 2
DATA_DUMP_INTERVAL_HOURS = 6

MIDDLEWARE = [
'django.middleware.cache.UpdateCacheMiddleware',
Expand All @@ -210,6 +212,8 @@
"django.contrib.sites.middleware.CurrentSiteMiddleware",
"sesame.middleware.AuthenticationMiddleware",
"django_currentuser.middleware.ThreadLocalUserMiddleware",
"django.middleware.gzip.GZipMiddleware",

]

ROOT_URLCONF = 'optimap.urls'
Expand Down
13 changes: 5 additions & 8 deletions optimap/urls.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@
}

urlpatterns = [
path('admin/', admin.site.urls),
path('', include('publications.urls')),
path('admin/', admin.site.urls),
path('', include(('publications.urls', 'optimap'), namespace='optimap')),
path(
"sitemap.xml",
sitemaps_views.index,
Expand All @@ -40,16 +40,13 @@
name="django.contrib.sitemaps.views.sitemap",
),
re_path(r'^robots.txt', RobotsView.as_view(), name="robots_file"),
]
]

# https://stackoverflow.com/a/18272203/261210
# Context processor for the site
from django.contrib.sites.shortcuts import get_current_site
from django.utils.functional import SimpleLazyObject

def site(request):
protocol = 'https' if request.is_secure() else 'http'
site = SimpleLazyObject(lambda: "{0}://{1}".format(protocol, get_current_site(request)))

return {
'site': site,
}
return {'site': site}
38 changes: 34 additions & 4 deletions publications/apps.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,39 @@
import logging
from django.apps import AppConfig
from django.db.models.signals import post_migrate
from django.conf import settings
from django.utils import timezone

logger = logging.getLogger(__name__)

def schedule_data_dump(sender, **kwargs):
from django_q.models import Schedule
from django_q.tasks import schedule

func_name = "publications.tasks.regenerate_geopackage_cache"
if not Schedule.objects.filter(func=func_name).exists():
schedule(
func_name,
schedule_type="I",
minutes=settings.DATA_DUMP_INTERVAL_HOURS * 60,
next_run=timezone.now(),
repeats=-1,
)
logger.info(
"Scheduled data‐dump task '%s' every %d hours",
func_name,
settings.DATA_DUMP_INTERVAL_HOURS,
)

class PublicationsConfig(AppConfig):
default_auto_field = 'django.db.models.BigAutoField'
name = 'publications'
name = "publications"
default_auto_field = "django.db.models.BigAutoField"

def ready(self):
# Implicitly connect signal handlers decorated with @receiver.
from . import signals
import publications.signals
post_migrate.connect(
schedule_data_dump,
sender=self,
weak=False,
dispatch_uid="publications.schedule_data_dump",
)
19 changes: 19 additions & 0 deletions publications/management/commands/schedule_geojson.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from django.core.management.base import BaseCommand
from django_q.tasks import schedule
from django_q.models import Schedule

class Command(BaseCommand):
help = "Schedule the GeoJSON regeneration task every 6 hours."

def handle(self, *args, **options):
func_name = 'publications.tasks.regenerate_geojson_cache'
if not Schedule.objects.filter(func=func_name).exists():
schedule(
func_name,
schedule_type='I', # interval
minutes=360, # every 6 hours
repeats=-1
)
self.stdout.write(self.style.SUCCESS("Scheduled GeoJSON regeneration every 6h."))
else:
self.stdout.write("GeoJSON regeneration already scheduled.")
78 changes: 60 additions & 18 deletions publications/tasks.py
Original file line number Diff line number Diff line change
@@ -1,36 +1,36 @@
import logging
logger = logging.getLogger(__name__)

from django_q.models import Schedule
from publications.models import Publication, HarvestingEvent, Source
from bs4 import BeautifulSoup
import os
import json
import subprocess
import gzip
import re
import tempfile
import time
import calendar
from datetime import datetime, timedelta
import xml.dom.minidom
from django.contrib.gis.geos import GEOSGeometry
import requests
from django.core.mail import send_mail, EmailMessage
from django.utils import timezone
from bs4 import BeautifulSoup
from requests.auth import HTTPBasicAuth
import os
from urllib.parse import quote
from django.conf import settings
from django.utils.timezone import now
from django.core.mail import send_mail, EmailMessage
from django.core.serializers import serialize
from django.contrib.gis.geos import GEOSGeometry, GeometryCollection
from django.utils import timezone
from publications.models import Publication, HarvestingEvent, Source
from .models import EmailLog, Subscription
from django.contrib.auth import get_user_model
User = get_user_model()
from .models import EmailLog, Subscription
from datetime import datetime, timedelta
from django.urls import reverse
from urllib.parse import quote
from datetime import datetime
from django_q.tasks import schedule
from django.utils import timezone
from django_q.tasks import schedule
from django_q.models import Schedule
import time
import calendar
import re
from django.contrib.gis.geos import GeometryCollection

BASE_URL = settings.BASE_URL
DOI_REGEX = re.compile(r'10\.\d{4,9}/[-._;()/:A-Z0-9]+', re.IGNORECASE)

def extract_geometry_from_html(content):
for tag in content.find_all("meta"):
Expand Down Expand Up @@ -211,7 +211,7 @@ def harvest_oai_endpoint(source_id, user=None):

def send_monthly_email(trigger_source='manual', sent_by=None):
recipients = User.objects.filter(userprofile__notify_new_manuscripts=True).values_list('email', flat=True)
last_month = now().replace(day=1) - timedelta(days=1)
last_month = timezone.now().replace(day=1) - timedelta(days=1)
new_manuscripts = Publication.objects.filter(creationDate__month=last_month.month)

if not recipients.exists() or not new_manuscripts.exists():
Expand Down Expand Up @@ -319,4 +319,46 @@ def schedule_subscription_email_task(sent_by=None):
kwargs={'trigger_source': 'scheduled', 'sent_by': sent_by.id if sent_by else None}
)
logger.info(f"Scheduled 'send_subscription_based_email' for {next_run_date}")

def regenerate_geojson_cache():
cache_dir = os.path.join(tempfile.gettempdir(), "optimap_cache")
os.makedirs(cache_dir, exist_ok=True)

json_path = os.path.join(cache_dir, 'geojson_cache.json')
with open(json_path, 'w') as f:
serialize(
'geojson',
Publication.objects.filter(status="p"),
geometry_field='geometry',
srid=4326,
stream=f
)

gzip_path = json_path + '.gz'
with open(json_path, 'rb') as fin, gzip.open(gzip_path, 'wb') as fout:
fout.writelines(fin)

size = os.path.getsize(json_path)
logger.info("Cached GeoJSON at %s (%d bytes), gzipped at %s", json_path, size, gzip_path)
return json_path

def convert_geojson_to_geopackage(geojson_path):
cache_dir = os.path.dirname(geojson_path)
gpkg_path = os.path.join(cache_dir, 'publications.gpkg')
try:
output = subprocess.check_output(
["ogr2ogr", "-f", "GPKG", gpkg_path, geojson_path],
stderr=subprocess.STDOUT,
text=True,
)
logger.info("ogr2ogr output:\n%s", output)
return gpkg_path
except subprocess.CalledProcessError as e:
return None
# on success, return the filename so callers can stream it or inspect it
return gpkg_path


def regenerate_geopackage_cache():
geojson_path = regenerate_geojson_cache()
return convert_geojson_to_geopackage(geojson_path)
83 changes: 64 additions & 19 deletions publications/templates/data.html
Original file line number Diff line number Diff line change
@@ -1,32 +1,77 @@
{% extends "main.html" %}
{% load optimap_extras humanize %}
{% block title %}Data & API | {% endblock %}

{% load optimap_extras %}
{% block content %}
<div class="row justify-content-center">
<div class="col-md-6 py-5">

{% block title %}API | {% endblock %}
<h1 class="mb-4">OPTIMAP Data &amp; API Access</h1>
<p class="lead">
All publication metadata published in OPTIMAP is licensed under a Creative Commons Zero
(<a href="https://creativecommons.org/publicdomain/zero/1.0/" target="_blank">CC-0</a>) license.
</p>

{% block content %}
<h2 class="py-2">API Endpoint</h2>
<p>
The API endpoint is <b>{{ site|addstr:"/api"|urlize }}</b>. Visit in your browser for
an interactive interface.
</p>

<div class="row justify-content-center">
<div class="col-4 py-5 text-wrap">
<h1 class="py-2">OPTIMAP data access</h1>
<p>
Query all publications via:
<pre class="bg-light p-2">
curl -X GET {{ site|addstr:"/api" }}/api/optimap/ | jq
</pre>
</p>

<p class="lead">All publication metadata published in OPTIMAP is licensed under a Create Commons Zero (<a href='https://creativecommons.org/publicdomain/zero/1.0/'>CC-0</a>) license.</p>
<h2 class="py-2">OpenAPI Schema</h2>
<p>
Download the OpenAPI spec at <b>{{ site|addstr:"/api/schema"|urlize }}</b>.
</p>

<h2 class="py-2">API endpoint</h2>
<p>The API endpoint is <b>{{ site|addstr:"/api"|urlize }}</b>. Visit the URL in your browser to get an interactive interface for exploring the API.</p>
<h2 class="py-2">OpenAPI UI</h2>
<p>
Explore interactively at <b>{{ site|addstr:"/api/schema/ui"|urlize }}</b>.
</p>

<p>You can query all publications with the following request (using <a href="https://stedolan.github.io/jq/" title="Link to jq project website"><code>jq</code></a> for formatting of the response):</p>
<pre>
curl -X GET {{ site|addstr:"/api" }}/api/publications/ | jq
</pre>
<hr>

<h2 class="py-2">OpenAPI schema</h2>
<p>You can download an OpenAPI specification of the api at <b>{{ site|addstr:"/api/schema"|urlize }}</b>.</p>
<h2 class="py-2">Download Publication Data</h2>
<ul class="list-unstyled mb-4">
{% if geojson_size %}
<li class="mb-3">
<div class="d-flex align-items-center">
<a class="btn btn-primary btn-sm" href="{% url 'optimap:download_geojson' %}">
Download GeoJSON
</a>
(<a href="https://geojson.org/" target="_blank" class="ms-2 small">GeoJSON spec</a>)
</div>
<div class="small text-muted mt-1">
File size: {{ geojson_size }}
</div>
</li>
{% endif %}

<h2 class="py-2">OpenAPI user interface</h2>
<p>You can explore the API with an interactive user intreface built based on the OpenAPI schema at <b>{{ site|addstr:"/api/schema/ui"|urlize }}</b>.</p>
{% if geopackage_size %}
<li>
<div class="d-flex align-items-center">
<a class="btn btn-primary btn-sm" href="{% url 'optimap:download_geopackage' %}">
Download GeoPackage
</a>
(<a href="https://www.geopackage.org/" target="_blank" class="ms-2 small">GeoPackage spec</a>)
</div>
<div class="small text-muted mt-1">
File size: {{ geopackage_size }}
</div>
</li>
{% endif %}
</ul>
<p class="small text-muted text-center mb-0">
Data dumps run every {{ interval }} hour{{ interval|pluralize }}.<br>
Last updated: {{ last_updated|naturaltime }}
</p>

</div>
</div>

{% endblock %}
{% endblock %}
Loading