8000 Add a `--dialect` option for forcing a CSV dialect · Issue #27 · simonw/git-history · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Add a --dialect option for forcing a CSV dialect #27
Closed
@simonw

Description

@simonw

Running against https://github.com/simonw/fara-history

(git-history) git-history % git-history file fara.db ../fara-history/FARA_All_Registrants.csv --repo ../fara-history --id Registration_Number --changed --branch master --csv
  [------------------------------------]  1/376    0%Traceback (most recent call last):
  File "/Users/simon/.local/share/virtualenvs/git-history-nXMauUZE/bin/git-history", line 33, in <module>
    sys.exit(load_entry_point('git-history', 'console_scripts', 'git-history')())
  File "/Users/simon/.local/share/virtualenvs/git-history-nXMauUZE/lib/python3.10/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/Users/simon/.local/share/virtualenvs/git-history-nXMauUZE/lib/python3.10/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/Users/simon/.local/share/virtualenvs/git-history-nXMauUZE/lib/python3.10/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/simon/.local/share/virtualenvs/git-history-nXMauUZE/lib/python3.10/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/simon/.local/share/virtualenvs/git-history-nXMauUZE/lib/python3.10/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/Users/simon/Dropbox/Development/git-history/git_history/cli.py", line 246, in file
    item = fix_reserved_columns(item)
  File "/Users/simon/Dropbox/Development/git-history/git_history/utils.py", line 8, in fix_reserved_columns
    if not any(reserved_with_suffix_re.match(key) for key in item):
  File "/Users/simon/Dropbox/Development/git-history/git_history/utils.py", line 8, in <genexpr>
    if not any(reserved_with_suffix_re.match(key) for key in item):
TypeError: expected string or bytes-like object

After much debugging, it turns out the problem is running the CSV parser against this specific revision of the file: https://github.com/simonw/fara-history/blob/ab27087f642680697db6c914d094bf3d06b363f3/FARA_All_Registrants.csv

Here's what's happening:

>>> import csv, httpx, io
>>> content = httpx.get("https://raw.githubusercontent.com/simonw/fara-history/ab27087f642680697db6c914d094bf3d06b363f3/FARA_All_Registrants.csv").content
>>> decoded = content.decode("utf-8")
>>> dialect = csv.Sniffer().sniff(decoded[:512])
>>> (dialect.delimiter, dialect.doublequote, dialect.escapechar, dialect.lineterminator, dialect.quotechar, dialect.quoting, dialect.skipinitialspace)
(',', False, None, '\r\n', '"', 0, False)
>>> reader = csv.DictReader(io.StringIO(decoded), dialect=dialect)
>>> items = list(reader)
>>> [it for it in items if it["Registration_Number"] == '4797']
[{'Registration_Number': '4797',
  'Registration_Date': '04/20/1993',
  'Termination_Date': '05/06/1993',
  'Name': 'National Petroleum Company, "Sudan""',
  'Business_Name': ' Ltd."',
  'Address_1': '',
  'Address_2': '525 South Lancaster Street',
  'City': '',
  'State': 'Arlington',
  'Zip': 'VA',
  None: ['22204']}]

What is going on with that last item of None: ['22204']?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0