Add Support for DATE_TIMEs in STATA #325

FlipperPA · 2025-03-17T14:00:10Z

When converting a CSV to STATA, I received the following error: src/bin/read_csv/mod_dta.c:404 unsupported variable type 3

This PR adds support for DATE_TIMEs to the STATA writer. Here's the test CSV that was used:

orgpermid,valuecalcdt,justdate
4295533401,2025-03-01 00:00:00,2024-03-01
4295533401,2023-08-26 00:00:00,2024-08-26
4295533401,1974-02-08 00:00:00,2024-02-08
4295533401,1972-06-29 23:59:59.123456+05,2024-02-21
4295533401,1854-03-01 00:00:00,2024-01-01
4295533401,1974-02-08 11:38:23.543212-02,2024-02-08
4295533401,,2024-02-21
4295533401,1972-06-29 23:59:59,2024-02-21
4295533401,1854-03-01 10:42:42,2024-01-01

Here's the JSON mapping file:

{
    "type": "STATA",
    "variables": [
        {
            "type": "NUMERIC",
            "name": "orgpermid",
            "label": "OrgPermID (orgpermid)",
            "format": "UNSPECIFIED"
        },
        {
            "type": "NUMERIC",
            "name": "valuecalcdt",
            "label": "ValueCalcDt (valuecalcdt)",
            "format": "DATE_TIME"
        },
        {
            "type": "NUMERIC",
            "name": "justdate",
            "label": "JustDate (justdate)",
            "format": "DATE"
        }
    ]
}

...and the output in STATA from the written file:

. list

     +-------------------------------------------+
     | orgper~d          valuecalcdt    justdate |
     |-------------------------------------------|
  1. | 4.30e+09   01mar2025 00:00:00   01mar2024 |
  2. | 4.30e+09   26aug2023 00:00:00   26aug2024 |
  3. | 4.30e+09   08feb1974 00:00:00   08feb2024 |
  4. | 4.30e+09   29jun1972 23:59:59   21feb2024 |
  5. | 4.30e+09   01mar1854 00:00:00   01jan2024 |
     |-------------------------------------------|
  6. | 4.30e+09   08feb1974 11:38:23   08feb2024 |
  7. | 4.30e+09                    .   21feb2024 |
  8. | 4.30e+09   29jun1972 23:59:59   21feb2024 |
  9. | 4.30e+09   01mar1854 10:42:42   01jan2024 |
     +-------------------------------------------

...and to validate millisecond display:

. format valuecalcdt %tCHH:MM:SS.sss

. list

     +-------------------------------------+
     | orgper~d    valuecalcdt    justdate |
     |-------------------------------------|
  1. | 4.30e+09   00:00:00.000   01mar2024 |
  2. | 4.30e+09   00:00:00.000   26aug2024 |
  3. | 4.30e+09   00:00:00.000   08feb2024 |
  4. | 4.30e+09   23:59:59.123   21feb2024 |
  5. | 4.30e+09   00:00:00.000   01jan2024 |
     |-------------------------------------|
  6. | 4.30e+09   11:38:23.543   08feb2024 |
  7. | 4.30e+09              .   21feb2024 |
  8. | 4.30e+09   23:59:59.000   21feb2024 |
  9. | 4.30e+09   10:42:42.000   01jan2024 |
     +-------------------------------------+

In reviewing the code, I also noticed the patterns for the three DATE_TIME patterns did not match the Date-Time Unit from the source spec here: https://libguides.library.kent.edu/SPSS/DatesTime

I've also changed those patterns to match the spec.

evanmiller

Thanks for the contribution. I left some suggestions.

src/bin/read_csv/mod_dta.c

evanmiller · 2025-03-23T13:16:15Z

src/bin/read_csv/mod_dta.c

+        fprintf(stderr, "%s:%d not a valid date-time: %s (expected format: yyyy-mm-dd hh:MM:SS with optional milliseconds. Datetime string is truncated at 23 characters to ignore microseconds and timezone information.)\n", __FILE__, __LINE__, date_time);
        exit(EXIT_FAILURE);
    }
-    int missing_ranges_count = readstat_variable_get_missing_ranges_count(var);


Why was this logic removed?

I had initially copied the value_double_dta to value_double_date_time_dta to ensure code was being reached, before implementing value_double_date_time_dta as its own function. I wasn't sure how this code would handle blank date fields in the CSV.

OK, I'm guessing missing ranges are seldom used with dates, so let's leave it out

src/bin/read_csv/mod_dta.c

FlipperPA · 2025-03-24T14:24:11Z

@evanmiller Thanks for the feedback and suggestions, I'll be the first to admit my C skills are very, very rusty. If there's value in re-inserting the missing_ranges_count, I'm happy to add it back.

evanmiller · 2025-03-24T15:32:49Z

The macOS build is unhappy for some reason; not sure if related to your change or not

FlipperPA · 2025-03-24T17:30:39Z

The macOS build is unhappy for some reason; not sure if related to your change or not

@evanmiller Any tips on how I can best figure these out? I've been compiling on Rocky 9 but don't have an easy way to test on MacOS. Thanks again for your guidance and expertise.

evanmiller · 2025-03-24T18:47:24Z

Looks like a pre-existing error, so I'm not going to sweat it. Thanks for the contribution.

FlipperPA added 3 commits March 17, 2025 09:55

Fix patterns for DATE_TIMEs.

646d4de

Create is_date_time.

8edb171

Complete timestamp support for STATA with leap seconds accounted for.

2376f47

FlipperPA changed the title ~~WIP: Fix patterns for DATE_TIMEs.~~ WIP: Support DATE_TIMEs in STATA Mar 18, 2025

FlipperPA changed the title ~~WIP: Support DATE_TIMEs in STATA~~ Add Support for DATE_TIMEs in STATA Mar 18, 2025

evanmiller reviewed Mar 23, 2025

View reviewed changes

Switch to snprintf for safer operations.

e5451bb

evanmiller merged commit 14f3937 into WizardMac:dev Mar 24, 2025
9 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Support for DATE_TIMEs in STATA #325

Add Support for DATE_TIMEs in STATA #325

FlipperPA commented Mar 17, 2025 •

edited

Loading

Uh oh!

evanmiller left a comment

Uh oh!

Uh oh!

evanmiller Mar 23, 2025

Uh oh!

FlipperPA Mar 24, 2025

Uh oh!

evanmiller Mar 24, 2025

Uh oh!

Uh oh!

FlipperPA commented Mar 24, 2025

Uh oh!

evanmiller commented Mar 24, 2025

Uh oh!

FlipperPA commented Mar 24, 2025

Uh oh!

evanmiller commented Mar 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Support for DATE_TIMEs in STATA #325

Add Support for DATE_TIMEs in STATA #325

Conversation

FlipperPA commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evanmiller left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

evanmiller Mar 23, 2025

Choose a reason for hiding this comment

Uh oh!

FlipperPA Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

evanmiller Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

FlipperPA commented Mar 24, 2025

Uh oh!

evanmiller commented Mar 24, 2025

Uh oh!

FlipperPA commented Mar 24, 2025

Uh oh!

evanmiller commented Mar 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FlipperPA commented Mar 17, 2025 •

edited

Loading