-
Notifications
You must be signed in to change notification settings - Fork 140
Only \r\n, \n\r, \r, \n and \f will be treat as line break characters -- No longer use preg_replace #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…. Which reduced the problem caused by miss replace of a legal UTF-8 character when replacing with preg_replace('~\R~').
Couldn't it be solved by using |
Thanks for the test case, will try to look into this issue today. |
@samdark Hi Sam and Cebe. Sorry for the delay. It's not the only thing i have to do today, so... The story is all about \x85 and may be \x0b. In the case of \x85, this one is included in some legal characters that saved in UTF-8. So now guess what happened? Those characters has been replaced to \n. Now take look the test case unicode.md (may be UTF-8.md is a better name). The word 電腦 give me E9 9B BB E8 85 A6. Then, you know, this is what i got: Now how about the test case will still pass even no code is changed? Well, that's because the imported html will be convert by another preg_replace(' So, that's what i found. May be this pull not a good idea, all i want to show is There is a problem. :) Please investigate. Thank you! |
Thanks for the detailed information, will look into this issue. |
In fact, this implementation is really used at first. You could search the git log to view the changes from str_replace solution to the preg_replace case now. In addition, the change log and change reason is also written in the old comments. |
Hi @chigix, I can't found your commit in the master repo, but #78 as pull request (The acutal change made by @cebe in commit #67). Problem is, It's not truly passed as you forgot to change the method BUT, i'm not saying Hope it fixed any way. |
Here is the first change from str_replace to preg_replace for new line format convert: f7f1582 That commit is from original author, not by me. |
LOL i like to post comment here as i don't need to pay any copper coin. @chigix, Then, what's the point to see the original commit of How that changes during time? As it eventally changed to And speak of |
@cebe Works bright! Thank you! |
Buuuuut, you got a failed test.... which is the one i curious about. I will use this one anyway until next update. Thank you again. |
I have already fixed that. Just released a new version: https://github.com/cebe/markdown/releases/tag/1.0.1 |
As the preg_replace('
\R') will actually breaks some UTF-8 character some how.And the '
\Ru' will break the test case: BaseMarkdownTest::testInvalidUtf8 as it replace too many i guess.I also add the test data that actually saved in UTF-8 into test case. See unicode.md and unicode.html for detail.