Skip to content

Commit 5470e76

Browse files
committed
PC-1435 Update docs for harvesting from folio
Merge branch 'docs' into 'main' See merge request msu-libraries/catalog/catalog!977
2 parents 2325728 + ea8592f commit 5470e76

File tree

1 file changed

+83
-4
lines changed

1 file changed

+83
-4
lines changed

docs/harvesting-and-importing.md

Lines changed: 83 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -49,13 +49,18 @@ each source.
4949
### FOLIO
5050

5151
<!-- markdownlint-disable MD031 -->
52-
1. Ensure that your OAI settings on the FOLIO tenant are what you want
52+
1. Notify your hosting provider prior to your harvest attempt (EBSCO
53+
in our case) so that they can prepare the environment to allocate
54+
appropriate additional resources and advise you on the window in
55+
which you should start the harvest.
56+
57+
2. Ensure that your OAI settings on the FOLIO tenant are what you want
5358
them to be for this particular harvest. For example, if you wish to
5459
include storage and inventory records (i.e. the records without a
5560
MARC source) then you will need to modify the "Record Source" field
5661
in the OAI Settings in FOLIO.
5762

58-
2. Next you will need to clear out the contents of the `harvest_folio`
63+
3. Next you will need to clear out the contents of the `harvest_folio`
5964
directory before the next cron job will run. Assuming you want to
6065
preserve the last harvest for the time being, you can simply move
6166
those directories somewhere else and rename them. Below is an example,
@@ -65,17 +70,91 @@ each source.
6570
(they technically can have files in them, you just will not want
6671
them to have files since they will get mixed in with your new harvest).
6772
```bash
68-
cd /mnt/shared/oai/[STACK_NAME]/harvest_folio/
73+
STACK_NAME=catalog-preview
74+
cd /mnt/shared/oai/${STACK_NAME}/harvest_folio/
75+
76+
# Option 1: Preserving the last harvest set
6977
mv processed processed_old
7078
mv log log_old
7179
mv last_state.txt last_state.txt.old
7280
mv harvest.log harvest.log.old
81+
82+
# Option 2: Not preserving the last harvest set (if you have
83+
# other copies already such as in -beta or -prod and you are
84+
# doing the harvest in -preview)
85+
sudo find . -type f -delete
7386
```
7487

75-
3. Monitor progress after it starts via the cron job in the monitoring app
88+
4. Monitor progress after it starts via the cron job in the monitoring app
7689
or in the log file on the container or volume (`/mnt/logs/harvests/`).
7790
<!-- markdownlint-enable MD031 -->
7891

92+
#### Syncing the harvest files to other environments
93+
94+
If you have done the harvest for one environment and you want to sync the
95+
newly harvested set over to the other environments you can follow these steps:
96+
97+
<!-- markdownlint-disable MD013 MD031 -->
98+
1. Disable the FOLIO cron for the source and target environment
99+
```bash
100+
# Define the source and target
101+
SOURCE_STACK=catalog-preview
102+
TARGET_STACK=catalog-beta
103+
104+
sudo mv /mnt/shared/oai/${SOURCE_STACK}/enabled /mnt/shared/oai/${SOURCE_STACK}/disabled
105+
sudo mv /mnt/shared/oai/${TARGET_STACK}/enabled /mnt/shared/oai/${TARGET_STACK}/disabled
106+
```
107+
108+
2. Create a faster compression script (if not done already)
109+
```bash
110+
#!/bin/bash
111+
# Contents of /usr/local/bin/fastpigz
112+
nice -n 19 /usr/bin/pigz -p 4 --fast "$@"
113+
```
114+
115+
3. Give the script execute permissions (if not done already)
116+
```bash
117+
chmod +x /usr/local/bin/fastpigz
118+
```
119+
120+
4. Archive the target environment's `harvest_folio` directory
121+
```bash
122+
sudo screen
123+
cd /mnt/shared/oai/${TARGET_STACK}/harvest_folio
124+
tar -cv -I /usr/local/bin/fastpigz -f ../archives/${TARGET_STACK}-$(date -I).tar.gz .
125+
```
126+
127+
5. Clear out the `harvest_folio` directory and sync over the source
128+
environment files
129+
```bash
130+
cd /mnt/shared/oai/${TARGET_STACK}/harvest_folio
131+
sudo find . -type f -delete
132+
# Run this once with -n for dry-run, then again without the -n to do it for real
133+
sudo rsync -aivn /mnt/shared/oai/${SOURCE_STACK}/harvest_folio/ /mnt/shared/oai/${TARGET_STACK}/harvest_folio/
134+
```
135+
136+
6. Run the `pc-full-import` script on the target environment
137+
```bash
138+
sudo screen
139+
pc-full-import ${TARGET_STACK} --email [email protected] --yes --debug 2>&1 | tee /mnt/shared/logs/${TARGET_STACK}-import_$(date -I).log
140+
```
141+
142+
7. Verify that you are happy with the counts in the new environment
143+
144+
8. Repeat steps 1 through 7 with any other environment you want to sync to
145+
146+
9. Once you are done syncing to all environments, you can re-enable the
147+
source environment's cron. There is no need to do this for the target
148+
environment(s) since the full import script will have done that.
149+
```bash
150+
sudo mv /mnt/shared/oai/${SOURCE_STACK}/disabled /mnt/shared/oai/${SOURCE_STACK}/enabled
151+
```
152+
153+
10. You will eventually want to clean up the archives to save disk
154+
space, but keep them around for a while until you are confident
155+
in the data in the new set
156+
<!-- markdownlint-enable MD013 MD031 -->
157+
79158
### HLM
80159

81160
This can just be done with the script's `--full` `--harvest` flags in a

0 commit comments

Comments
 (0)