-
Notifications
You must be signed in to change notification settings - Fork 83
Encoding problem on Windows with non-US English locale #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think it's the See this comment that describes the issue, but with base functions. |
Could you provide a simple reproducible example for me? (i.e. an xml file that fails to parse uploaded to dropbox or similar) |
Sure. This small scrip should do it. I've set the default text encoding to UTF-8 in RStudio::tools::gobal options::default text encoding.
Results in:
|
The char to raw stuff is completely expected, so I deleted that. I'm pretty sure the problem is with |
I didn't actually test it, but I'm pretty sure this should fix the problem. |
I just updated the package to the master version on Github, ran the script again and the issue is still there:
|
@jeroenooms that looks good to me |
@jeroenooms @hadley @jennybc It works like a charm. Thanks! This also fixed the issue in googlesheets.
|
The problem seems to happen again :'(
gives:
R version 3.1.3 (2015-03-09) |
Just for info. I ran @katossky example with the CRAN and github versions of xml2 on Windows and it's not an issue here. |
@katossky You need to provide a bit more evidence that the problem is with xml2, and not with |
The problem first occurred with a external file I did not |
I have various reports of problems with listing Google Sheets or specifying worksheet names or (possibly) reading Sheet data from [1] Windows and [2] non-US English locales. Examples: Spanish_Spain.1252, Danish_Denmark.1252 and ... someone in Colombia who didn't provide session info.
I think all the problematic text has been successfully processed with
httr::content(as = "text", encoding = "UTF-8")
but then a problem is introduced inxml2::read_xml()
.The most relevant issue is jennybc/googlesheets#151. I apologize for some noise there -- a recent example Sheet posted seems to have a different problem. If you look there, focus on the comments from @krose. He has done a lot of digging and posed a related question on stack overflow. From his work it seems the problem might come from
read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html)
The text was updated successfully, but these errors were encountered: