8000 Only \r\n, \n\r, \r, \n and \f will be treat as line break characters -- No longer use preg_replace · Pull Request #80 · cebe/markdown · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Only \r\n, \n\r, \r, \n and \f will be treat as line break characters -- No longer use preg_replace #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ghost
Copy link
@ghost ghost commented Oct 24, 2014

As the preg_replace('\R') will actually breaks some UTF-8 character some how.

And the '\Ru' will break the test case: BaseMarkdownTest::testInvalidUtf8 as it replace too many i guess.

I also add the test data that actually saved in UTF-8 into test case. See unicode.md and unicode.html for detail.

…. Which reduced the problem caused by miss replace of a legal UTF-8 character when replacing with preg_replace('~\R~').
@samdark
Copy link
Contributor
samdark commented Oct 24, 2014

Couldn't it be solved by using \R?

@cebe
Copy link
Owner
cebe commented Oct 24, 2014

Thanks for the test case, will try to look into this issue today.

@cebe cebe added this to the 1.0.1 milestone Oct 24, 2014
@ghost
Copy link
Author
ghost commented Oct 24, 2014

@samdark Hi Sam and Cebe.

Sorry for the delay. It's not the only thing i have to do today, so...

The story is all about \x85 and may be \x0b. In the case of \x85, this one is included in some legal characters that saved in UTF-8.

So now guess what happened? Those characters has been replaced to \n.

Now take look the test case unicode.md (may be UTF-8.md is a better name). The word 電腦 give me E9 9B BB E8 85 A6. Then, you know, this is what i got:

See this screenshot

Now how about the test case will still pass even no code is changed? Well, that's because the imported html will be convert by another preg_replace('\R') in testParse method.

So, that's what i found. May be this pull not a good idea, all i want to show is There is a problem. :)

Please investigate. Thank you!

@cebe
Copy link
Owner
cebe commented Oct 24, 2014

Thanks for the detailed information, will look into this issue.

@chigix
Copy link
8000
chigix commented Oct 24, 2014

In fact, this implementation is really used at first. You could search the git log to view the changes from str_replace solution to the preg_replace case now. In addition, the change log and change reason is also written in the old comments.

@ghost
Copy link
Author
ghost commented Oct 25, 2014

Hi @chigix, I can't found your commit in the master repo, but #78 as pull request (The acutal change made by @cebe in commit #67). Problem is, It's not truly passed as you forgot to change the method parseParagraph at same time with parse. Which causes ... nothing ... Because the method testInvalidUtf8 in test case BaseMarkdownTest uses parseParagraph to convert. You will see the test error when you made that change.

BUT, i'm not saying ~\R~u is not working. In fact, i also tried this way to fix my case (seems we both inspired by the link in comment 😄), and it works perfectly (In my case). -- It only breaks the test that try to compare <code>�</code> with \x80. Another reason I don't support your commit is because this very reason: I don't know why there is this comparison, AND i don't like breaked test case.

Hope it fixed any way.

@chigix
Copy link
chigix commented Oct 25, 2014

Here is the first change from str_replace to preg_replace for new line format convert: f7f1582

That commit is from original author, not by me.

@ghost
Copy link
Author
ghost commented Oct 25, 2014

LOL i like to post comment here as i don't need to pay any copper coin.

@chigix, Then, what's the point to see the original commit of How that changes during time? As it eventally changed to \R and caused problem? I don't understand, can you clear it for me please? (Or may be the speed one?)

And speak of \r\n? one, it completely ignored LF+CR that used -- acordding to Wikipedia -- by Acorn BBC and RISC OS spooled text output, and convert it to two \ns. (But who use that now days LOL, so i actually like this one).

@cebe cebe closed this in 4752766 Oct 25, 2014
@ghost
Copy link
Author
ghost commented Oct 25, 2014

@cebe Works bright! Thank you!

@ghost
Copy link
Author
ghost commented Oct 25, 2014

@cebe

Buuuuut, you got a failed test.... which is the one i curious about.

I will use this one anyway until next update. Thank you again.

@cebe
Copy link
Owner
cebe commented Oct 25, 2014

I have already fixed that.

Just released a new version: https://github.com/cebe/markdown/releases/tag/1.0.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0