Skip to content

Commit 5df4cf2

Browse files
committed
add README
1 parent 3aed375 commit 5df4cf2

File tree

2 files changed

+58
-0
lines changed

2 files changed

+58
-0
lines changed

scripts/README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Helper Scripts
2+
3+
## Dumping Database Fixtures
4+
5+
In order to dump JSON copies of database fixtures, you should first have loaded a copy of a production database backup into your Beagle PostgreSQL database instance.
6+
7+
The included `dump_db_fixtures.py` script can be used to some types of fixtures. See `dump_db_fixtures.py -h` for the most up to date details.
8+
9+
### Dump Requests
10+
11+
A request can be dumped with `dump_db_fixtures.py request <request_id>`. This will output a JSON for the File entries associated with the request, and a JSON for the FileMetadata entries. Use the `--santize` flag to replace personal information in the fixtures (e.g. people's names and email addresses) with fake ones. Currently, `--sanitize`
12+
13+
Example:
14+
15+
```
16+
$ ./dump_db_fixtures.py request 09603_I --sanitize
17+
18+
$ ls
19+
09603_I.file.json
20+
09603_I.filemetadata.json
21+
```
22+
23+
- NOTE: `dump_db_fixtures.py` should be run from your Beagle dev environment since it requires Python 3 + Django. If you installed with the `Makefile` in the `beagle` root directory, from there you can just use `make bash` to load your dev instance environment to run the script with.
24+
25+
## Sanitize Fixtures
26+
27+
Fixtures for use in the repo should be further sanitized to remove sample ID's and replace CMO ID's.
28+
29+
Some easy helper scripts have been included for this purpose.
30+
31+
Use the script `get_fields_to_sanitize.sh` to print out a list of all recognized CMO ID's and sample ID's in the JSON you generated with `dump_db_fixtures.py`
32+
33+
```
34+
$ ./get_fields_to_sanitize.sh 09603_I.filemetadata.json
35+
# a list of id's
36+
```
37+
38+
Use the list of ID's that are printed out to create a `patterns.tsv` file in this same directory. This will be used for the `sanitize.sh` script later. The `patterns.tsv` file should be formatted like this:
39+
40+
```
41+
old_pattern1 new_pattern1
42+
old_pattern2 new_pattern2
43+
```
44+
45+
A quick & easy way to do this is to utilize both Excel and a raw text editor (`nano`, Atom, Sublime, etc.). You can copy the terminal output from `get_fields_to_sanitize.sh` into Excel, then in the adjacent column fill in dummy identifiers such as `Sample1`, `Sample2`, etc.. For CMO ID's, you can generate fake ones with the included `generate_cmo_id.py` script. Once you have two columns, simple highlight them in Excel (just the two columns, not the entire sheet) and paste them into `nano`/Atom/Sublime/etc. and they should be entered as tab delimited text which you can save with the filename of `patterns.tsv`
46+
47+
- TODO: write a script to do this instead
48+
49+
Once `patterns.tsv` is present, you can run the `sanitize.sh` script on your JSON to replace all the old patterns with new ones.
50+
51+
52+
```
53+
./sanitize.sh 09603_I.filemetadata.json
54+
```
55+
56+
For good measure, double check the contents of the JSON to verify that it looks correct before commiting it. The JSON should be moved to the appropriate `fixtures` directory.

scripts/example.patterns.tsv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
C-ABCDEF-P001-d Sample1
2+
C-FB3D87 C-47D8BA

0 commit comments

Comments
 (0)