Skip to content

Probabilistic Data Synthesis

Kai edited this page Mar 10, 2023 · 9 revisions

PROBABILISTIC DATA SYNTHESIS

Synthesize the Word List and Clause List together and format the data for Probabilistic Data Analysis (prob_analysis.py). The output, prob_analysis_raw.pkl, is formatted as follows:

[
    line_number,
    {
         verse_1: v1_occurences,
         verse_2: v2_occurences,
         verse_3: v3_occurences,
         ...
    }
], ...

The line number correlates to the line number of the given clause. The structure following the line number is a dictionary containing every unique verse that any word in the clause was found in. The data synthesis itself involves iterating through every word in a given clause, accessing the respective word in the Word List if it exists, and combining all of the verse lists from the word objects. This process is demonstrated by the flowchart below:

Data Synthesis Flowchart

Clone this wiki locally