8000 Better MBOX parser · Issue #19 · terhechte/postsack · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Better MBOX parser #19
Open
Open
@terhechte

Description

@terhechte

Postsack currently uses mbox-reader for MBOX parsing, but it doesn't properly implement the standard. It only checks for the FROM string at the beginning of a line which means any email containing a newline with a FROM somewhere in the body is regarded as two different emails. The correct way to detect a new email in MBOX according to the RFC 4155 is:

Each message in the mbox database MUST be immediately preceded
by a single separator line, which MUST conform to the following
syntax:

  • The exact character sequence of "From";

  • a single Space character (0x20);

  • the email address of the message sender (as obtained from the message envelope or other authoritative source), conformant with the "addr-spec" syntax from RFC 2822;

  • a single Space character;

  • a timestamp indicating the UTC date and time when the message was originally received, conformant with the syntax of the traditional UNIX 'ctime' output sans timezone (note that the use of UTC precludes the need for a timezone indicator);

  • an end-of-line marker.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0