-
Notifications
You must be signed in to change notification settings - Fork 0
Data Visualization
The provided functions in data_vis.py are used to graph data returned from the probability synthesis in a file called "prob_analysis_raw.pkl" that contains a list of gresult objects.
These functions rely on the numpy and matPlotLib libraries to manipulate and graph data.
Furthermore, the data visualization relies entirely on new testament quotations.
Support to recognize quotes of the Septuagint is not currently integrated
This function is used to determine which verses are most similar to the given clause. This is done by examining the verse data (which includes all matched verses, referenced as strings, and the number of words which the clause matches). There are two extra parameters to manage the output:
- num_bars is used to determine how many bars the user wants in the graph (e.g. a value of 5 would show the top 5 most similar verses to the clause).
- match_threshold is used to determine the minimum number of word matches between a verse and the clause for the verse to be included in the graph.
This function calls gen_graph for every clause in the output data from "prob_analysis_raw.pkl" The following parameters manage output:
- max_num is the number of plots to generate before quitting (e.g. 100 will generate a bar graph for the first 100 clauses).
- num_bars is the num_bars parameter for each generated graph.
- min_thresh is the match_threshold value for each generated graph.
This function is used to determine which verses seem to be more commonly referenced within the text using the data from "prob_analysis_raw.pkl". It will output a stacked bar graph where the x-axis displays the most potentially referenced verses and the y-axis contains the number of clauses that potentially reference the verse. The bars will be constructed of different colored segments. The legend displays how many matches existed between the clause and the verse in those colored sections. The following parameters manage output:
- num_bars is the number of bars to be displayed in the graph. These will always be the verses with the most potential references (e.g. 20 would display information for the 20 most potentially referenced verses)
- match_threshold is the number of word matches required for a verse and clause to affect the graph.
- bible_ordered is a boolean determining if the graph's bars will be ordered by the number of potential matches or if they will be ordered by their location within the Bible (e.g. Matthew 28:20 would appear before Mark 1:1 if both were one of the num_bars most likely referenced verses and Matthew 28:20 had 20 potential references while Mark 1:1 had 100 potential references).