Only \r\n, \n\r, \r, \n and \f will be treat as line break characters -- No longer use preg_replace #80

ghost · 2014-10-24T07:44:07Z

As the preg_replace('\R') will actually breaks some UTF-8 character some how.

And the '\Ru' will break the test case: BaseMarkdownTest::testInvalidUtf8 as it replace too many i guess.

I also add the test data that actually saved in UTF-8 into test case. See unicode.md and unicode.html for detail.

…. Which reduced the problem caused by miss replace of a legal UTF-8 character when replacing with preg_replace('~\R~').

samdark · 2014-10-24T07:53:49Z

Couldn't it be solved by using \R?

cebe · 2014-10-24T09:23:48Z

Thanks for the test case, will try to look into this issue today.

ghost · 2014-10-24T09:33:01Z

@samdark Hi Sam and Cebe.

Sorry for the delay. It's not the only thing i have to do today, so...

The story is all about \x85 and may be \x0b. In the case of \x85, this one is included in some legal characters that saved in UTF-8.

So now guess what happened? Those characters has been replaced to \n.

Now take look the test case unicode.md (may be UTF-8.md is a better name). The word 電腦 give me E9 9B BB E8 85 A6. Then, you know, this is what i got:

Now how about the test case will still pass even no code is changed? Well, that's because the imported html will be convert by another preg_replace('\R') in testParse method.

So, that's what i found. May be this pull not a good idea, all i want to show is There is a problem. :)

Please investigate. Thank you!

cebe · 2014-10-24T09:38:53Z

Thanks for the detailed information, will look into this issue.

chigix · 2014-10-24T16:58:33Z

In fact, this implementation is really used at first. You could search the git log to view the changes from str_replace solution to the preg_replace case now. In addition, the change log and change reason is also written in the old comments.

ghost · 2014-10-25T05:39:23Z

Hi @chigix, I can't found your commit in the master repo, but #78 as pull request (The acutal change made by @cebe in commit #67). Problem is, It's not truly passed as you forgot to change the method parseParagraph at same time with parse. Which causes ... nothing ... Because the method testInvalidUtf8 in test case BaseMarkdownTest uses parseParagraph to convert. You will see the test error when you made that change.

BUT, i'm not saying ~\R~u is not working. In fact, i also tried this way to fix my case (seems we both inspired by the link in comment 😄), and it works perfectly (In my case). -- It only breaks the test that try to compare <code>�</code> with \x80. Another reason I don't support your commit is because this very reason: I don't know why there is this comparison, AND i don't like breaked test case.

Hope it fixed any way.

chigix · 2014-10-25T05:59:38Z

Here is the first change from str_replace to preg_replace for new line format convert: f7f1582

That commit is from original author, not by me.

ghost · 2014-10-25T06:45:00Z

LOL i like to post comment here as i don't need to pay any copper coin.

@chigix, Then, what's the point to see the original commit of How that changes during time? As it eventally changed to \R and caused problem? I don't understand, can you clear it for me please? (Or may be the speed one?)

And speak of ~~\r\n?~~ one, it completely ignored LF+CR that used -- acordding to Wikipedia -- by Acorn BBC and RISC OS spooled text output, and convert it to two \ns. (But who use that now days LOL, so i actually like this one).

ghost · 2014-10-25T15:26:00Z

@cebe Works bright! Thank you!

ghost · 2014-10-25T15:30:09Z

@cebe

Buuuuut, you got a failed test.... which is the one i curious about.

I will use this one anyway until next update. Thank you again.

cebe · 2014-10-25T16:27:09Z

I have already fixed that.

Just released a new version: https://github.com/cebe/markdown/releases/tag/1.0.1

Only \r\n, \n\r, \r, \n and \f will be treat as line break characters…

35dd4be

…. Which reduced the problem caused by miss replace of a legal UTF-8 character when replacing with preg_replace('~\R~').

cebe added this to the 1.0.1 milestone Oct 24, 2014

cebe closed this in 4752766 Oct 25, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Only \r\n, \n\r, \r, \n and \f will be treat as line break characters -- No longer use preg_replace #80

Only \r\n, \n\r, \r, \n and \f will be treat as line break characters -- No longer use preg_replace #80

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Only \r\n, \n\r, \r, \n and \f will be treat as line break characters -- No longer use preg_replace #80

Only \r\n, \n\r, \r, \n and \f will be treat as line break characters -- No longer use preg_replace #80

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!