You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Calling `get_member_bioguide()` returns a `dict` containing the GovInfo data corresponding to the given Bioguide ID.
263
+
Calling `get_member_govinfo()` returns a `dict` containing the GovInfo data corresponding to the given Bioguide ID.
264
264
265
265
#### `.number` <aname="congress_number"></a>
266
266
@@ -289,7 +289,7 @@ print(c.bioguide)
289
289
290
290
#### `.govinfo` <aname="congress_govinfo"></a>
291
291
292
-
The `govinfo` property returns GovInfo data as `GovInfoCongressRecord` .
292
+
The `govinfo` property returns GovInfo data as a `GovInfoCongressRecord` .
293
293
294
294
#### `.members` <aname="congress_members"></a>
295
295
@@ -521,21 +521,4 @@ If you'd like to contribute to the project, or know of a useful data source, fee
521
521
1. GovInfo data only goes as far back as the 105<sup>th</sup> Congress
522
522
523
523
The GovInfo API makes congress persons' data available via "Congressional Directories", which are only provided starting with the 105<sup>th</sup> Congress. If data for an earlier congress is needed, use Bioguide data instead.
524
-
525
-
1. Downloading bills is very slow
526
-
527
-
The GovInfo API is geared towards bulk data and does not function efficiently for low-granularity queries. To download the bills for a single congress, V may have to send requests to as many as twenty-thousand different endpoints, taking as long as an hour to download the full dataset. To understand why this is, a deeper explanation of the GovInfo API is needed.
528
-
529
-
Firstly, GovInfo datasets are organized by collections, which contain packages. A package is a snapshotted version of a given dataset. For example, in the Congressional Directory collection (denoted as CDIR), each package represents a unique version of a directory. Each time a new directory is created or an existing one is updated, it is made available in the CDIR collection under a new modified date. To get the most recent Congressional Directory for a given congress, you would need to look for the package with the most recent modified date.
530
-
531
-
Bills are a unique collection, which are queryable by four parameters: a start date, an end date, the congress number, and the class of the documents you're looking for (in the case of bills, this could be Senate Bills, House Joint Resolutions, Senate Concurrent Resolutions, etc.) Filtering down by any combination of the latter two (congress number and document class) can still result in thousands of records. For example, the 115<sup>th</sup> Congress had 10,740 House bills.
532
-
533
-
The maximum dataset size that can be downloaded from a single endpoint is 10,000 records, so in order to download all of the House bills for the 115<sup>th</sup> congress, the start date and end date parameters would have to be used to limit the size of the dataset. However, these date parameters do not use the date that the bills were issued, as one might expect. Instead, they use the last modified date of the packages. This is made even more difficult by the fact that a bill package can be modified outside of the term that it was issued during, so incrementally searching the dates between the beginning and end of the congress you're querying does not work.
534
-
535
-
If that didn't make matters difficult enough, a large amount of records have modified dates occurring on the same day, meaning that once you've found the right date to query, you'd have to segment your time window even further to accomodate the 10,000-record-limit
536
-
537
-
To work around these limitations, V begins searching for bills by doing an "open query" for one record and checking the header information for the total amount of expected records. Using that total amount, V then begins to work its way backwards over each year, until finding records. If the amount of records enocountered is larger than the record limit, V begins searching the months of the year to segment the data futher. If the dataset for a month is larger than the record limit, V searches the days. It repeats this pattern until it finds a unit of time small enough to segment the data to a size below the record limit, all the way down to seconds. V continues this recursize search until it downloads the total expected records. V might end up sending hundreds of requests before even being able to download data, and if 10,000 records were ever modfied in a single second, V would break, as seconds are the maximum depth by which V searches. Obviously, this is not the ideal approach, but it's an approach that works (mostly.)
538
-
539
-
It may seem abhorrent - in this era of "big data" and numerous tools capable of acting on hundreds of millions of records in a few seconds - that a dataset in the tens of thousands could take so long to download. However, this approach is a necessary evil until the design of the GovInfo API is improved.
0 commit comments