-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Introduction
In an effort to make companies, with the same CIK, report back correctly a format change was made in the name to put all of the names as uppercase without any punctuation. This is a weak implementation as companies could change their names slightly and cause a need to reformat the code again. A better approach is needed.
Proposed approach
For EDGAR the durable identifier is the Central Index Key (CIK). This identifier should be used instead of the name as the name can change even for public companies. The present code for temporarily tracking a company is:
# If we've seen this company before then add the form, otherwise include both firmographics and the initial form definition
if tmp_companies.get(company_name) == None:
tmp_companies[company_name] = company_info
tmp_companies[company_name]['forms'] = {accession_key: form}
else:
tmp_companies[company_name]['forms'][accession_key] = form
The proposed change could look something like this:
# If we've seen this company before then add the form, otherwise include both firmographics and the initial form definition
if tmp_companies.get(cik_no) == None:
tmp_companies[cik_no] = company_info
tmp_companies[cik_no]['forms'] = {accession_key: form}
else:
tmp_companies[cik_no]['forms'][accession_key] = form
Since company_info is a dict() that also keeps the companyName attribute the bookkeeping of the name is ok there. Because modules that make use of these data require a dict() keyed on companyName a function to rekey based upon companyName is needed. This function would loop over all cik_no keys, replace them with companyName and return a new dict(). The exact details of this change are left to the time of implementation.