Closed
Description
Running against https://github.com/simonw/fara-history
(git-history) git-history % git-history file fara.db ../fara-history/FARA_All_Registrants.csv --repo ../fara-history --id Registration_Number --changed --branch master --csv
[------------------------------------] 1/376 0%Traceback (most recent call last):
File "/Users/simon/.local/share/virtualenvs/git-history-nXMauUZE/bin/git-history", line 33, in <module>
sys.exit(load_entry_point('git-history', 'console_scripts', 'git-history')())
File "/Users/simon/.local/share/virtualenvs/git-history-nXMauUZE/lib/python3.10/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/Users/simon/.local/share/virtualenvs/git-history-nXMauUZE/lib/python3.10/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/Users/simon/.local/share/virtualenvs/git-history-nXMauUZE/lib/python3.10/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/simon/.local/share/virtualenvs/git-history-nXMauUZE/lib/python3.10/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/simon/.local/share/virtualenvs/git-history-nXMauUZE/lib/python3.10/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/Users/simon/Dropbox/Development/git-history/git_history/cli.py", line 246, in file
item = fix_reserved_columns(item)
File "/Users/simon/Dropbox/Development/git-history/git_history/utils.py", line 8, in fix_reserved_columns
if not any(reserved_with_suffix_re.match(key) for key in item):
File "/Users/simon/Dropbox/Development/git-history/git_history/utils.py", line 8, in <genexpr>
if not any(reserved_with_suffix_re.match(key) for key in item):
TypeError: expected string or bytes-like object
After much debugging, it turns out the problem is running the CSV parser against this specific revision of the file: https://github.com/simonw/fara-history/blob/ab27087f642680697db6c914d094bf3d06b363f3/FARA_All_Registrants.csv
Here's what's happening:
>>> import csv, httpx, io
>>> content = httpx.get("https://raw.githubusercontent.com/simonw/fara-history/ab27087f642680697db6c914d094bf3d06b363f3/FARA_All_Registrants.csv").content
>>> decoded = content.decode("utf-8")
>>> dialect = csv.Sniffer().sniff(decoded[:512])
>>> (dialect.delimiter, dialect.doublequote, dialect.escapechar, dialect.lineterminator, dialect.quotechar, dialect.quoting, dialect.skipinitialspace)
(',', False, None, '\r\n', '"', 0, False)
>>> reader = csv.DictReader(io.StringIO(decoded), dialect=dialect)
>>> items = list(reader)
>>> [it for it in items if it["Registration_Number"] == '4797']
[{'Registration_Number': '4797',
'Registration_Date': '04/20/1993',
'Termination_Date': '05/06/1993',
'Name': 'National Petroleum Company, "Sudan""',
'Business_Name': ' Ltd."',
'Address_1': '',
'Address_2': '525 South Lancaster Street',
'City': '',
'State': 'Arlington',
'Zip': 'VA',
None: ['22204']}]
What is going on with that last item of None: ['22204']
?