Replies: 1 comment
-
|
Thanks for the question! You're right that the Current Behaviorfrom edgar import get_filings
filings = get_filings(2024, 1)
# Include only these accession numbers
filtered = filings.filter(accession_number=["0001193125-24-001234", "0001193125-24-005678"])This keeps only the specified accession numbers. Workaround: Manual ExclusionSince import pyarrow.compute as pc
from edgar import get_filings, Filings
# Get your filings
filings = get_filings(2024, 1)
# Your exclude list from metadata file
exclude_accessions = ["0001193125-24-001234", "0001193125-24-005678", ...]
# Create inverse filter (exclude these accession numbers)
filtered_table = filings.data.filter(
~pc.is_in(filings.data['accession_number'], exclude_accessions)
)
# Wrap back into Filings object
filtered_filings = Filings(filtered_table)The Alternative: Load From Metadata CacheIf you're maintaining a metadata file of downloaded filings, you could also: import pandas as pd
import pyarrow as pa
from edgar import Filings
# Load your cached metadata
cached_metadata = pd.read_parquet("my_cached_filings.parquet")
# Get new filings
new_filings = get_filings(2024, 1)
# Convert to pandas and exclude cached accession numbers
new_df = new_filings.to_pandas()
cached_accessions = set(cached_metadata['accession_number'])
filtered_df = new_df[~new_df['accession_number'].isin(cached_accessions)]
# Convert back to Filings
filtered_table = pa.Table.from_pandas(filtered_df)
filtered_filings = Filings(filtered_table)Feature Request?This could be a useful enhancement! We could add an # Hypothetical API
filings.filter(accession_number=exclude_list, exclude=True)
# or
filings.filter(exclude_accession_numbers=exclude_list)Let me know if the workaround helps or if you'd like to open a feature request for native exclusion support! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I’m working on a pipeline which reads a previously downloaded metadata file and decides which filings to get, to avoid downloading the metadata or the actual filing again, I notice the get_filings method has a way to filter for a set of accession numbers but not any way to take in an “exclude set” of accession numbers?
Beta Was this translation helpful? Give feedback.
All reactions