8000 Text file encoding problem on Windows · Issue #34 · Skyscanner/whispers · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Oct 25, 2023. It is now read-only.
This repository was archived by the owner on Oct 25, 2023. It is now read-only.
Text file encoding problem on Windows #34
Closed
@rtt-ncc

Description

@rtt-ncc

This involves a similar message to #33, but isn't the same.

Running whispers v1.3.7 on Windows, default config.

Encountering errors due to non-ASCII characters in the codebase, such as:

Traceback (most recent call last):
  File "C:\Python36_64\Lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Python36_64\Lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Envs\whispers\Scripts\whispers.exe\__main__.py", line 7, in <module>
  File "c:\envs\whispers\lib\site-packages\whispers\cli.py", line 50, in cli
    for secret in run(args):
  File "c:\envs\whispers\lib\site-packages\whispers\core.py", line 90, in run
    for secret in whispers.scan(filename):
  File "c:\envs\whispers\lib\site-packages\whispers\secrets.py", line 92, in scan
    for ret in plugin.pairs():
  File "c:\envs\whispers\lib\site-packages\whispers\plugins\__init__.py", line 83, in pairs
    yield from self.plugin.pairs(self.filepath)
  File "c:\envs\whispers\lib\site-packages\whispers\plugins\javascript.py", line 8, in pairs
    for line in filepath.open("r").readlines():
  File "c:\envs\whispers\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 87: character maps to <undefined>

This is due to the default encoding on Windows when reading from text files using file.open being cp1252, which has a few gaps (e.g. 0x8d).

I was able to fix this by changing line 8 of javascript.py to:

for line in filepath.open("r", encoding="utf8").readlines():

Also made similar changes to similar calls in json.py and plaintext.py.

Probably makes sense to make sure that whispers is using UTF-8 throughout on Windows.

Of course I don't know that this won't mess something up elsewhere, but it certainly allowed my run to complete (and found some useful things, thanks!)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0