Upload users table midnight daily to superset #214

sravfeyn · 2025-12-25T12:46:29Z

Technical Summary

creates a csv of ConnectUser, uploads it to superset via its REST API
uses instance itself to store and delete the csv file. So, added a 100k user limit (current suer count is 6k). Will add a separate task for a proper file backend, either mounted on host or S3 backend like connect. Though probably not necessary in short term.

I have created a role restricted user on superset for this purpose. I will update the env once this gets deployed.

Logging and monitoring

Added logs, and sends exception to sentry in case.

Safety Assurance

Safety story

Tested the full flow using a test csv and test table.

I am confident that this change will not break current and/or previous versions of CommCare apps

Automated test coverage

QA Plan

NA

Labels & Review

The set of people pinged as reviewers is appropriate for the level of risk of the change

coderabbitai · 2025-12-25T12:54:48Z

Walkthrough

Adds configuration and a scheduled job to export ConnectUser records to Superset. New environment variables and a SUPERSET_UPLOAD_CONFIG setting were introduced. A Celery beat schedule entry runs a new task users.tasks.upload_connect_users_to_superset daily at midnight. The new users/tasks.py implements ConnectUserSupersetExporter which generates a CSV of user records, authenticates with Superset (login and CSRF handling), uploads the CSV to a Superset database endpoint, and handles errors and cleanup. Unit tests for CSV generation, upload request composition, and authentication header handling were added.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: a daily midnight task to upload a ConnectUser table to Superset, which is the central objective of all added code.
Description check	✅ Passed	The description is directly related to the changeset, providing technical context about CSV generation, Superset upload, file handling, logging, and safety testing.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch sr/userupload

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

users/tests/test_tasks.py (1)

84-97: Consider adding tests for error scenarios.

The authentication test is well-structured. For more comprehensive coverage, consider adding tests for:

SuperusetUserUploadException when login fails (missing access_token)

SuperusetUserUploadException when CSRF token is missing

The MAX_USER_UPLOAD_LIMIT check in generate_csv()

Would you like me to generate additional test cases for these error scenarios?

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ace11e6 and ddc77fd.

📒 Files selected for processing (5)

.env_template
connectid/celery_app.py
connectid/settings.py
users/tasks.py
users/tests/test_tasks.py

🧰 Additional context used

🧬 Code graph analysis (2)

users/tasks.py (1)

users/models.py (1)

ConnectUser (30-118)

users/tests/test_tasks.py (1)

users/tasks.py (5)

ConnectUserSupersetExporter (38-135)

generate_csv (67-77)

_serialize_value (128-135)

upload (106-119)

authenticate (79-83)

🪛 GitHub Actions: CI

connectid/celery_app.py

[error] 31-31: TypeError: 'hour' is an invalid keyword argument for timedelta. In connectid/celery_app.py line 31, replace timedelta(hour=0, minute=0) with timedelta(hours=0, minutes=0) or use the appropriate keyword. This caused the pytest/Django setup to fail during import.

🪛 Ruff (0.14.10)

users/tasks.py

64-64: Do not catch blind exception: Exception

(BLE001)

73-73: Avoid specifying long messages outside the exception class

(TRY003)

93-93: Avoid specifying long messages outside the exception class

(TRY003)

103-103: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (5)

.env_template (1)

40-46: LGTM!

The new environment variable placeholders for Superset configuration follow the existing patterns in the template and are appropriately left empty.

connectid/settings.py (1)

350-359: LGTM!

The SUPERSET_UPLOAD_CONFIG dictionary is well-structured. The empty defaults for username and password ensure the upload won't accidentally run without proper configuration, and the task implementation guards against this.

users/tests/test_tasks.py (2)

24-60: LGTM!

Good test coverage for CSV generation. The test validates field serialization, row content, and column ordering correctly.

63-81: LGTM!

The upload test effectively validates the HTTP request structure, payload, and file handling by mocking _request.

users/tasks.py (1)

138-146: LGTM!

The Celery task properly captures exceptions to Sentry, logs them, and re-raises to ensure task failure is recorded. This is good practice for observability and debugging.

coderabbitai · 2025-12-25T12:54:51Z

connectid/celery_app.py

+    "upload_connect_users_to_superset": {
+        "task": "users.tasks.upload_connect_users_to_superset",
+        "schedule": timedelta(hour=0, minute=0),
+    },


⚠️ Potential issue | 🔴 Critical

Fix invalid timedelta arguments and incorrect schedule interval.

Two issues here:

Syntax error (causes pipeline failure): hour and minute are invalid keyword arguments for timedelta. The correct keywords are hours and minutes (plural).

Logic error: Even with corrected keywords, timedelta(hours=0, minutes=0) equals zero duration, which would cause the task to run continuously. Based on the PR title "midnight daily", this should run once every 24 hours.

🔎 Proposed fix

For a daily interval:

"upload_connect_users_to_superset": { "task": "users.tasks.upload_connect_users_to_superset", - "schedule": timedelta(hour=0, minute=0), + "schedule": timedelta(hours=24), },

Alternatively, for running at exactly midnight UTC, use crontab:

from celery.schedules import crontab # In beat_schedule: "upload_connect_users_to_superset": { "task": "users.tasks.upload_connect_users_to_superset", "schedule": crontab(hour=0, minute=0), # Runs at midnight UTC },

🧰 Tools

🪛 GitHub Actions: CI

[error] 31-31: TypeError: 'hour' is an invalid keyword argument for timedelta. In connectid/celery_app.py line 31, replace timedelta(hour=0, minute=0) with timedelta(hours=0, minutes=0) or use the appropriate keyword. This caused the pytest/Django setup to fail during import.

🤖 Prompt for AI Agents

In connectid/celery_app.py around lines 29 to 32, the schedule uses invalid timedelta keywords (hour, minute) and even if corrected would be a zero-duration causing continuous execution; change the schedule to a 24-hour interval (e.g., timedelta(days=1)) or, if you need the task to run exactly at midnight UTC, replace the schedule with celery.schedules.crontab(hour=0, minute=0) and add the necessary import for crontab at the top of the file.

coderabbitai · 2025-12-25T12:54:51Z

users/tasks.py

+class SuperusetUserUploadException(Exception):
+    pass


⚠️ Potential issue | 🟡 Minor

Typo in exception class name.

SuperusetUserUploadException should be SupersetUserUploadException (missing 's' in "Superset").

🔎 Proposed fix

-class SuperusetUserUploadException(Exception): +class SupersetUserUploadException(Exception): pass

Also update the references on lines 73, 93, and 103.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class SuperusetUserUploadException(Exception):

pass

class SupersetUserUploadException(Exception):

pass

🤖 Prompt for AI Agents

In users/tasks.py around lines 34–35, the exception class name has a typo: rename SuperusetUserUploadException to SupersetUserUploadException (add the missing "s"), and update all references to that class on lines 73, 93, and 103 to use the new correct name; ensure any imports or string mentions are updated consistently and run tests/lint to confirm no remaining references.

coderabbitai · 2025-12-25T12:54:52Z

users/tasks.py

+    def generate_csv(self) -> Path:
+        tmp = tempfile.NamedTemporaryFile(mode="w", newline="", suffix=".csv", delete=False)
+        writer = csv.DictWriter(tmp, fieldnames=CONNECT_USER_DUMP_FIELDS)
+        writer.writeheader()
+        MAX_USER_UPLOAD_LIMIT = 100000
+        if ConnectUser.objects.count() > MAX_USER_UPLOAD_LIMIT:
+            raise SuperusetUserUploadException(f"ConnectUser count is greater than {MAX_USER_UPLOAD_LIMIT}")
+        for user in ConnectUser.objects.all().values(*CONNECT_USER_DUMP_FIELDS).iterator(chunk_size=500):
+            writer.writerow({k: self._serialize_value(user[k]) for k in CONNECT_USER_DUMP_FIELDS})
+        tmp.close()
+        return Path(tmp.name)


⚠️ Potential issue | 🟠 Major

Temp file not cleaned up if an exception occurs during CSV generation.

If generate_csv() raises an exception (e.g., when MAX_USER_UPLOAD_LIMIT is exceeded at line 73, or during iteration), the temporary file won't be cleaned up because run() only cleans up after generate_csv() returns successfully.

🔎 Proposed fix using context manager pattern

def generate_csv(self) -> Path: - tmp = tempfile.NamedTemporaryFile(mode="w", newline="", suffix=".csv", delete=False) - writer = csv.DictWriter(tmp, fieldnames=CONNECT_USER_DUMP_FIELDS) - writer.writeheader() MAX_USER_UPLOAD_LIMIT = 100000 if ConnectUser.objects.count() > MAX_USER_UPLOAD_LIMIT: - raise SuperusetUserUploadException(f"ConnectUser count is greater than {MAX_USER_UPLOAD_LIMIT}") - for user in ConnectUser.objects.all().values(*CONNECT_USER_DUMP_FIELDS).iterator(chunk_size=500): - writer.writerow({k: self._serialize_value(user[k]) for k in CONNECT_USER_DUMP_FIELDS}) - tmp.close() - return Path(tmp.name) + raise SupersetUserUploadException(f"ConnectUser count is greater than {MAX_USER_UPLOAD_LIMIT}") + + tmp = tempfile.NamedTemporaryFile(mode="w", newline="", suffix=".csv", delete=False) + try: + writer = csv.DictWriter(tmp, fieldnames=CONNECT_USER_DUMP_FIELDS) + writer.writeheader() + for user in ConnectUser.objects.all().values(*CONNECT_USER_DUMP_FIELDS).iterator(chunk_size=500): + writer.writerow({k: self._serialize_value(user[k]) for k in CONNECT_USER_DUMP_FIELDS}) + except Exception: + tmp.close() + Path(tmp.name).unlink(missing_ok=True) + raise + tmp.close() + return Path(tmp.name)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def generate_csv(self) -> Path:

tmp = tempfile.NamedTemporaryFile(mode="w", newline="", suffix=".csv", delete=False)

writer = csv.DictWriter(tmp, fieldnames=CONNECT_USER_DUMP_FIELDS)

writer.writeheader()

MAX_USER_UPLOAD_LIMIT = 100000

if ConnectUser.objects.count() > MAX_USER_UPLOAD_LIMIT:

raise SuperusetUserUploadException(f"ConnectUser count is greater than {MAX_USER_UPLOAD_LIMIT}")

for user in ConnectUser.objects.all().values(*CONNECT_USER_DUMP_FIELDS).iterator(chunk_size=500):

writer.writerow({k: self._serialize_value(user[k]) for k in CONNECT_USER_DUMP_FIELDS})

tmp.close()

return Path(tmp.name)

def generate_csv(self) -> Path:

MAX_USER_UPLOAD_LIMIT = 100000

if ConnectUser.objects.count() > MAX_USER_UPLOAD_LIMIT:

raise SupersetUserUploadException(f"ConnectUser count is greater than {MAX_USER_UPLOAD_LIMIT}")

tmp = tempfile.NamedTemporaryFile(mode="w", newline="", suffix=".csv", delete=False)

try:

writer = csv.DictWriter(tmp, fieldnames=CONNECT_USER_DUMP_FIELDS)

writer.writeheader()

for user in ConnectUser.objects.all().values(*CONNECT_USER_DUMP_FIELDS).iterator(chunk_size=500):

writer.writerow({k: self._serialize_value(user[k]) for k in CONNECT_USER_DUMP_FIELDS})

except Exception:

tmp.close()

Path(tmp.name).unlink(missing_ok=True)

raise

tmp.close()

return Path(tmp.name)

🧰 Tools

🪛 Ruff (0.14.10)

73-73: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In users/tasks.py around lines 67 to 77, the temporary file created by NamedTemporaryFile is not removed if generate_csv() raises an exception (e.g., limit check or during iteration); change the implementation to ensure the temp file is always cleaned up on error by either creating the temp filepath and using a try/finally to remove the file on exception or by using a context manager that guarantees deletion (or TemporaryDirectory) while writing; specifically, open/write the CSV inside a try block and in the finally block close and delete the tempfile if an exception occurred, leaving successful return behavior unchanged.

coderabbitai · 2025-12-25T12:54:52Z

users/tasks.py

+    def _request(self, method: str, path: str, **kwargs):
+        url = f"{self.base_url}{path}"
+        response = self.session.request(method, url, timeout=60, **kwargs)
+        response.raise_for_status()
+        return response.json()


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "tasks.py" -path "*/users/*" | head -20

Repository: dimagi/connect-id

Length of output: 77

🏁 Script executed:

cat -n ./users/tasks.py | sed -n '115,135p'

Repository: dimagi/connect-id

Length of output: 936

🏁 Script executed:

cat -n ./users/tasks.py | sed -n '100,125p'

Repository: dimagi/connect-id

Length of output: 1146

🌐 Web query:

Superset API csv_upload endpoint response format success

💡 Result:

Short answer: use Superset's OpenAPI (Swagger) — the CSV upload endpoints follow the standard REST patterns and return JSON on success (HTTP 201/200) with a success message plus the created resource info (ID/metadata). See Superset API docs and the shipped openapi.json for exact response schema for your Superset version (inspect /swagger/v1 on your instance or the repo openapi.json). [1][2]

Sources

Superset API docs (use /swagger/v1 on your Superset instance). [1]

Superset openapi.json (contains the upload/file-related schemas and response definitions). [2]

References
[1] https://superset.apache.org/docs/6.0.0/api/
[2] Superset openapi.json (docs/static/resources/openapi.json in the Superset repo).

🌐 Web query:

Superset /api/v1/database/csv_upload endpoint response body empty

💡 Result:

Short troubleshooting checklist — common causes and fixes when /api/v1/database/csv_upload (or related CSV-upload endpoints) return an empty body:

Wrong endpoint or method — recent Superset versions expose CSV upload via the Database REST API (check /swagger/v1 on your instance or docs). Confirm you’re calling the correct POST URL and path for your Superset version. [1]

Permissions / DB not enabled for uploads — the target database must have “Allow file uploads to database” enabled and schemas_allowed_for_csv_upload configured correctly; if not, the UI/backend can fail silently. Check the database’s extra JSON (schemas_allowed_for_csv_upload) and toggle file uploads in the DB settings. [2][3]

Request format / headers — upload must be sent as multipart/form-data with the file field and other form fields expected by the API (Authorization header, database id, schema, table name). Incorrect Content-Type or missing fields can produce empty/no-content responses. [1][4]

Proxy / size limits — Nginx/Proxy or Flask limits can drop the body: check nginx client_max_body_size, any reverse-proxy timeouts, and Flask MAX_CONTENT_LENGTH. If the file is large the server/proxy may return an empty response. [5]

Temporary file handling / server filesystem — Superset writes uploaded files to a tmp/uploads path; FileNotFoundError or permission issues on that path will break upload. Inspect Superset logs for FileNotFoundError or permission errors. [6]

Inspect Superset logs and request/response — enable debug logging and check superset-worker and webserver logs for stack traces. Capture the full HTTP request/response (curl -v or tcpdump) to see status code and headers; empty body often accompanies 4xx/5xx responses with details in logs. [6][4]

API changes or client bug — if you’re scripting against /api/v1/dataset/ vs database upload endpoints, try the official upload flow (UI) and replicate the exact request with curl to find differences. Search recent GitHub issues for version-specific bugs. [2][7]

If you want, tell me:

Superset version,

Exact curl/postman request you’re sending (headers + body),

Status code and any webserver logs — and I’ll give a targeted fix.

Sources: Superset API/docs and upload docs, GitHub issues and community posts, and StackOverflow. [1][2][3][4][6][7]

References
[1] Superset API docs and swagger (API versions vary).
[2] GitHub issue and discussion about enabling CSV upload / schemas_allowed_for_csv_upload.
[3] Superset docs: enabling file uploads in Database Connections.
[4] Postman/StackOverflow examples of API CSV upload problems.
[5] Common nginx/Flask upload limits (proxy config).
[6] StackOverflow issue: FileNotFoundError during CSV upload.
[7] Recent GitHub discussions about CSV upload behavior.

🌐 Web query:

Superset csv_upload 201 success response schema JSON

💡 Result:

Do you want the OpenAPI/Swagger response schema (exact JSON) for the 201 success from your Superset instance's csv-upload endpoint, or an example 201 response body? If the former I can fetch the OpenAPI spec from your Superset server (e.g. GET /swagger/v1) — please provide the Superset base URL and version (or allow me to try localhost). If you want an example response, tell me which Superset version.

🌐 Web query:

site:github.com/apache/superset csv_upload endpoint response

💡 Result:

I found GitHub issues, discussions and docs showing common responses/errors and behavior for Superset's CSV upload endpoint (Upload a CSV):

"Failed to fetch schemas allowed for csv upload in this database! Please contact your Superset Admin!" — reported in issue #15708. [1]

"No database is allowed for your csv upload" / guidance on enabling CSV upload via the Database > Advanced > Security > Allow file uploads to database setting (docs/discussion). [2][6]

FileNotFoundError when the uploads directory is missing in container (traceback shown in StackOverflow). [3]

Several threads about programmatic CSV import and API limitations / migration permissions referencing related endpoints and permissions. [7][5]

Key docs showing how CSV upload is enabled and used in the UI: Superset docs "Exploring Data" (Upload CSV instructions). [6]

Sources:
[1] github.com/apache/superset/issues/15708
[2] github.com/apache/superset/discussions/14143
[3] stackoverflow.com/questions/63244063
[5] github.com/apache/superset/discussions/23399
[6] superset.apache.org/docs/6.0.0/using-superset/exploring-data
[7] github.com/apache/superset/issues/8796

Would you like me to fetch the exact server-side handler/source file for the csv upload endpoint in the Apache Superset repo?

🏁 Script executed:

rg -i "csv_upload|upload.*test" --type py -A 5 -B 5

Repository: dimagi/connect-id

Length of output: 2234

🏁 Script executed:

cat -n ./users/tests/test_tasks.py

Repository: dimagi/connect-id

Length of output: 4208

🏁 Script executed:

rg "_request\(" --type py -B 2 -A 2

Repository: dimagi/connect-id

Length of output: 5623

🏁 Script executed:

cat -n ./users/tasks.py | sed -n '80,110p'

Repository: dimagi/connect-id

Length of output: 1384

🌐 Web query:

site:github.com/apache/superset "csv_upload" POST response json

💡 Result:

I couldn't find a single repository file named "csv_upload" that shows a direct POST response JSON. The CSV upload-related API/permission is described in Superset's OpenAPI spec (schemas_allowed_for_file_upload / csv upload settings) and in PR/discussion notes about upload endpoints. See the OpenAPI spec and the PR discussion. (fossies.org)

Citations:

1: https://fossies.org/linux/apache-superset/docs/static/resources/openapi.json?utm_source=openai

🌐 Web query:

site:github.com/apache/superset database_upload endpoint handler response

💡 Result:

The database file-upload endpoints the frontend calls are:

POST /api/v1/database/upload_metadata/ — returns parsed file metadata (columns, sheets, etc.). [1]

POST /api/v1/database/upload/ — uploads the file to be ingested. [1]

On the backend these are implemented/handled in the databases API code (see superset/databases/api.py) — that file contains the schema-access / upload-related handlers used by the UI. There are also recent changes noted in the 5.0 release notes that affect legacy per-file endpoints (see UPDATING). [2][3]

Sources:

Frontend calls to the endpoints (UploadDataModel component). [Superset frontend — UploadDataModel]. [4]

Code-review / API discussion referencing superset/databases/api.py upload/schema-access logic. [superset/databases/api.py review]. [3]

UPDATING.md (notes about removal/changes to data-upload endpoints in 5.0). [2]

🌐 Web query:

Superset "csv_upload" endpoint HTTP 200 empty response OR "204" OR "202"

💡 Result:

Short answer

An empty HTTP 200/204/202 from Superset's csv upload usually means the request reached Superset but no synchronous JSON payload was returned (either the server accepted the upload and responded with no body, or the request failed earlier — permission/config or a CORS/preflight problem). See Superset issues and StackOverflow examples. [1][2][3]

What to check (quick checklist)

Confirm CSV upload is enabled on the target database: "Allow CSV Upload" and proper "Extra" settings. If not allowed Superset will not process uploads. [3][4]

Inspect response headers (Content-Length, Content-Type, Location) and server logs — a 202/204 often indicates asynchronous acceptance (no body) while errors appear in the server log. Check Superset logs for traceback. [1][5]

Permissions / filesystem: permission denied errors when Superset writes temp files cause silent failures; check process user and upload directory permissions. [2]

CORS / browser preflight: browser clients can show “empty” responses when preflight or CORS blocks the real response — verify network tab (request vs preflight) and server CORS settings. [1]

Use the API endpoints documented for CSV/dataset import (api/v1/dataset or dataset/import) with correct multipart/form-data or JSON format; examples and recent issues show people hitting empty responses when using the wrong endpoint or parameters. [5][6]

If using overwrite/import flows, note some API calls return 202/204 on accepted async jobs — check task status endpoints or the database table to confirm. See related GitHub issues. [6]

Helpful links

GitHub issues and discussions about csv upload behavior and errors. [1][3][6]

StackOverflow thread about permission-related CSV upload failures. [2]

Superset handbook / docs on uploading CSVs. [4]

If you want, paste the exact curl/Postman request (headers + response headers + status) and a short excerpt of the Superset log and I’ll identify the likely cause and exact fix.

References
[1] apache/superset issue and discussion search results (CSV upload).
[2] "Apache Superset permissions issue upload csv" — StackOverflow.
[3] apache/superset GitHub issues about CSV upload failures.
[4] "Upload a CSV — Superset Handbook" documentation.
[5] StackOverflow: posting to /api/v1/dataset for CSV upload (example and pitfalls).
[6] GitHub issues re: dataset import async behavior.

Handle empty or non-JSON responses from Superset's csv_upload endpoint

Superset's csv_upload endpoint can return empty responses (HTTP 204) or responses without JSON body in certain scenarios (e.g., asynchronous job acceptance). Calling response.json() unconditionally will raise a JSONDecodeError in these cases, causing the upload to fail.

Since upload() discards the return value from _request(), apply this fix:

def _request(self, method: str, path: str, **kwargs): url = f"{self.base_url}{path}" response = self.session.request(method, url, timeout=60, **kwargs) response.raise_for_status() + if not response.content: + return {} return response.json()

🤖 Prompt for AI Agents

In users/tasks.py around lines 121 to 125, the helper always calls response.json() which raises on empty (204) or non-JSON responses from Superset's csv_upload endpoint; update the method to handle these cases by: after response.raise_for_status(), check for response.status_code == 204 or response.content is empty and return None, or inspect Content-Type (not containing "application/json") and return None; otherwise call response.json() inside a try/except catching json.JSONDecodeError and return None on decode failure so upload() can proceed without expecting a JSON body.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (3)

users/tasks.py (3)

34-35: Typo in exception class name.

SuperusetUserUploadException should be SupersetUserUploadException (missing 's' in "Superset").

🔎 Proposed fix

-class SuperusetUserUploadException(Exception):
+class SupersetUserUploadException(Exception):
     pass

Also update the references on lines 73, 93, and 103.

67-77: Temp file not cleaned up if an exception occurs during CSV generation.

If generate_csv() raises an exception (e.g., when MAX_USER_UPLOAD_LIMIT is exceeded at line 73, or during iteration), the temporary file won't be cleaned up because run() only cleans up after generate_csv() returns successfully.

🔎 Proposed fix using try/finally pattern

     def generate_csv(self) -> Path:
-        tmp = tempfile.NamedTemporaryFile(mode="w", newline="", suffix=".csv", delete=False)
-        writer = csv.DictWriter(tmp, fieldnames=CONNECT_USER_DUMP_FIELDS)
-        writer.writeheader()
         MAX_USER_UPLOAD_LIMIT = 100000
         if ConnectUser.objects.count() > MAX_USER_UPLOAD_LIMIT:
-            raise SuperusetUserUploadException(f"ConnectUser count is greater than {MAX_USER_UPLOAD_LIMIT}")
-        for user in ConnectUser.objects.all().values(*CONNECT_USER_DUMP_FIELDS).iterator(chunk_size=500):
-            writer.writerow({k: self._serialize_value(user[k]) for k in CONNECT_USER_DUMP_FIELDS})
-        tmp.close()
-        return Path(tmp.name)
+            raise SupersetUserUploadException(f"ConnectUser count is greater than {MAX_USER_UPLOAD_LIMIT}")
+        
+        tmp = tempfile.NamedTemporaryFile(mode="w", newline="", suffix=".csv", delete=False)
+        try:
+            writer = csv.DictWriter(tmp, fieldnames=CONNECT_USER_DUMP_FIELDS)
+            writer.writeheader()
+            for user in ConnectUser.objects.all().values(*CONNECT_USER_DUMP_FIELDS).iterator(chunk_size=500):
+                writer.writerow({k: self._serialize_value(user[k]) for k in CONNECT_USER_DUMP_FIELDS})
+        except Exception:
+            tmp.close()
+            Path(tmp.name).unlink(missing_ok=True)
+            raise
+        tmp.close()
+        return Path(tmp.name)

121-125: Handle empty or non-JSON responses from Superset's csv_upload endpoint.

Superset's csv_upload endpoint can return empty responses (HTTP 204) or responses without a JSON body in certain scenarios (e.g., asynchronous job acceptance). Calling response.json() unconditionally will raise a JSONDecodeError in these cases, causing the upload to fail.

🔎 Proposed fix

    def _request(self, method: str, path: str, **kwargs):
        url = f"{self.base_url}{path}"
        response = self.session.request(method, url, timeout=60, **kwargs)
        response.raise_for_status()
+       if not response.content:
+           return {}
        return response.json()

🧹 Nitpick comments (1)

users/tasks.py (1)
71-71: Consider extracting magic number to a class or module constant.

The MAX_USER_UPLOAD_LIMIT is defined inline within the method. For better maintainability and reusability, consider defining it as a class or module-level constant.
🔎 Proposed refactor

At the module level (after imports):
MAX_USER_UPLOAD_LIMIT = 100000
Then in the method:
     def generate_csv(self) -> Path:
         tmp = tempfile.NamedTemporaryFile(mode="w", newline="", suffix=".csv", delete=False)
         writer = csv.DictWriter(tmp, fieldnames=CONNECT_USER_DUMP_FIELDS)
         writer.writeheader()
-        MAX_USER_UPLOAD_LIMIT = 100000
         if ConnectUser.objects.count() > MAX_USER_UPLOAD_LIMIT:
             raise SuperusetUserUploadException(f"ConnectUser count is greater than {MAX_USER_UPLOAD_LIMIT}")

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ddc77fd and 4504547.

📒 Files selected for processing (5)

.env_template
connectid/celery_app.py
connectid/settings.py
users/tasks.py
users/tests/test_tasks.py

🧰 Additional context used

🧬 Code graph analysis (2)

users/tasks.py (1)

users/models.py (1)

ConnectUser (30-118)

users/tests/test_tasks.py (1)

users/tasks.py (5)

ConnectUserSupersetExporter (38-135)

generate_csv (67-77)

_serialize_value (128-135)

upload (106-119)

authenticate (79-83)

🪛 Ruff (0.14.10)

users/tasks.py

64-64: Do not catch blind exception: Exception

(BLE001)

73-73: Avoid specifying long messages outside the exception class

(TRY003)

93-93: Avoid specifying long messages outside the exception class

(TRY003)

103-103: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (6)

.env_template (1)

41-46: LGTM!

The new environment variable placeholders are well-named and align with the Superset upload configuration defined in connectid/settings.py.

connectid/celery_app.py (2)

5-5: LGTM!

The crontab import is correctly added to support the midnight schedule.

30-33: Previous issue resolved.

The schedule now correctly uses crontab(hour=0, minute=0) to run at midnight UTC daily. This fixes the previous syntax error with invalid timedelta keyword arguments.

connectid/settings.py (1)

351-359: LGTM!

The Superset upload configuration is well-structured with appropriate defaults. The use of env.int() for database_id ensures proper type conversion.

users/tests/test_tasks.py (1)

1-97: Excellent test coverage!

The tests comprehensively validate:

CSV generation with correct field serialization

Upload request payload construction

Authentication header setup

The use of mocking to isolate concerns and fixtures for configuration is appropriate.

users/tasks.py (1)

138-146: LGTM!

The Celery task properly wraps the exporter, captures exceptions to Sentry, and logs errors appropriately.

coderabbitai bot reviewed Dec 25, 2025

View reviewed changes

Upload users table midnight daily to superset

4504547

sravfeyn force-pushed the sr/userupload branch from ddc77fd to 4504547 Compare December 25, 2025 12:56

coderabbitai bot reviewed Dec 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Upload users table midnight daily to superset #214

Upload users table midnight daily to superset #214

Uh oh!

sravfeyn commented Dec 25, 2025

Uh oh!

coderabbitai bot commented Dec 25, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 25, 2025

Uh oh!

coderabbitai bot Dec 25, 2025

Uh oh!

coderabbitai bot Dec 25, 2025

Uh oh!

coderabbitai bot Dec 25, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Upload users table midnight daily to superset #214

Are you sure you want to change the base?

Upload users table midnight daily to superset #214

Uh oh!

Conversation

sravfeyn commented Dec 25, 2025

Technical Summary

Logging and monitoring

Safety Assurance

Safety story

Automated test coverage

QA Plan

Labels & Review

Uh oh!

coderabbitai bot commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Dec 25, 2025 •

edited

Loading