8000 Fix breakage of non-Windows binary emulation on Windows host by nmantani · Pull Request #1143 · qilingframework/qiling · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix breakage of non-Windows binary emulation on Windows host #1143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 30, 2022

Conversation

nmantani
Copy link
Contributor
@nmantani nmantani commented Apr 29, 2022

Checklist

Which kind of PR do you create?

  • This PR only contains minor fixes.
  • This PR contains major feature update.
  • This PR introduces a new function/api for Qiling Framework.

Coding convention?

  • The new code conforms to Qiling Framework naming convention.
  • The imports are arranged properly.
  • Essential comments are added.
  • The reference of the new code is pointed out.

Extra tests?

  • No extra tests are needed for this PR.
  • I have added enough tests for this PR.
  • Tests will be added after some discussion and review.

Changelog?

  • This PR doesn't need to update Changelog.
  • Changelog will be updated after some proper review.
  • Changelog has been updated in my PR.

Target branch?

  • The target branch is dev branch.

One last thing


Linux binary emulation for dynamically linked executable files on Windows host fails with the error unicorn.unicorn.UcError: Invalid memory read (UC_ERR_READ_UNMAPPED) as I mentioned in my pull request #1064 and another person reported in #974.

I have investigated this issue for recent months and I finally found that truncation in the read() system call is the cause of the issue. In the following example, the beginning part of libc.so.6 is truncated into 0x304 bytes out of 0x340 bytes in the read() system call. \x1a is located at offset 0x304. \x1a is the ASCII Substitute character (SUB) and it is used as EOF (Ctrl-Z) on Windows. This truncation is caused by opening files with text mode. It may cause failure of binary emulation for other operating systems. Opening files with binary mode by setting the O_BINARY flag can prevent it.

PS > python .\qltool run -f .\examples\rootfs\x8664_linux\bin\x8664_hello --rootfs .\examples\rootfs\x8664_linux -v debug
[+]     Profile: default
[+]     Mapping GDT at 0x30000 with limit 0x1000
[+]     Mapped 0x555555554000-0x555555555000
[+]     Mapped 0x555555754000-0x555555756000
[+]     mem_start : 0x555555554000
[+]     mem_end   : 0x555555756000
[+]     Interpreter path: examples\rootfs\x8664_linux\lib64\ld-linux-x86-64.so.2
[+]     Interpreter addr: 0x7ffff7dd5000
[+]     Mapped 0x7ffff7dd5000-0x7ffff7dfc000
[+]     Mapped 0x7ffff7ffc000-0x7ffff7fff000
[+]     mmap_address is : 0x7fffb7dd6000
[+]     rel name b'_ITM_deregisterTMCloneTable'
[+]     rel name b'__libc_start_main'
[+]     rel name b'__gmon_start__'
[+]     rel name b'_ITM_registerTMCloneTable'
[+]     rel name b'__cxa_finalize'
[+]     rel name b'puts'

[snip]

[+]     read() CONTENT: b'\x7fELF\x02\x01\x01\x03\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00>\x00\x01\x00\x00\x00\xb0\x1c\x02\x00\x00\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x00\x90\xe9\x1e\x00\x00\x00\x00\x00\x00\x00\x00\x00@\x008\x00\n\x00@\x00I\x00H\x00\x06\x00\x00\x00\x04\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x000\x02\x00\x00\x00\x00\x00\x000\x02\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00P\xdd\x1b\x00\x00\x00\x00\x00P\xdd\x1b\x00\x00\x00\x00\x00P\xdd\x1b\x00\x00\x00\x00\x00\x1c\x00\x00\x00\x00\x00\x00\x00\x1c\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xa0j\x1e\x00\x00\x00\x00\x00\xa0j\x1e\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x01\x00\x00\x00\x06\x00\x00\x00 v\x1e\x00\x00\x00\x00\x00 v>\x00\x00\x00\x00\x00 v>\x00\x00\x00\x00\x00@R\x00\x00\x00\x00\x00\x00\xc0\x94\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x02\x00\x00\x00\x06\x00\x00\x00\x80\xab\x1e\x00\x00\x00\x00\x00\x80\xab>\x00\x00\x00\x00\x00\x80\xab>\x00\x00\x00\x00\x00\xe0\x01\x00\x00\x00\x00\x00\x00\xe0\x01\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x04\x00\x00\x00p\x02\x00\x00\x00\x00\x00\x00p\x02\x00\x00\x00\x00\x00\x00p\x02\x00\x00\x00\x00\x00\x00D\x00\x00\x00\x00\x00\x00\x00D\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x07\x00\x00\x00\x04\x00\x00\x00 v\x1e\x00\x00\x00\x00\x00 v>\x00\x00\x00\x00\x00 v>\x00\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x90\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00P\xe5td\x04\x00\x00\x00l\xdd\x1b\x00\x00\x00\x00\x00l\xdd\x1b\x00\x00\x00\x00\x00l\xdd\x1b\x00\x00\x00\x00\x00\xdcY\x00\x00\x00\x00\x00\x00\xdcY\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00Q\xe5td\x06\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00R\xe5td\x04\x00\x00\x00 v\x1e\x00\x00\x00\x00\x00 v>\x00\x00\x00\x00\x00 v>\x00\x00\x00\x00\x00\xe09\x00\x00\x00\x00\x00\x00\xe09\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x14\x00\x00\x00\x03\x00\x00\x00GNU\x00\xb4\x17\xc0\xba|\xc5\xcf\x06\xd1\xd1\xbe\xd6e,\xed\xb9%<`\xd0\x04\x00\x00\x00\x10\x00\x00\x00\x01\x00\x00\x00GNU\x00\x00\x00\x00\x00\x03\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf3\x03\x00\x00\n\x00\x00\x00\x00\x01\x00\x00\x0e\x00\x00\x00\x000\x10D\xa0 \x02\x01\x88\x03\xe6\x90\xc5E\x8c\x00\xc4\x00X\x00\x07\x84\x00p\xc2\x80\x00\r\x8a\x0cA\x04\x10\x00\x88@2\x08*@\x88T<- \x0e2H&\x84\xc0\x8c\x04\x08\x00\x02\x02\x0e\xa1\xac'
[+]     0x00007ffff7df1da2: read(fd = 0x3, buf = 0x80000000d538, length = 0x340) = 0x304

[snip]

Traceback (most recent call last):
  File "C:\Users\user\Desktop\qiling\qltool", line 253, in <module>
    ql.run(timeout=options.timeout)
  File "C:\Users\user\Desktop\qiling\qiling\core.py", line 572, in run
    self.os.run()
  File "C:\Users\user\Desktop\qiling\qiling\os\linux\linux.py", line 157, in run
    self.ql.emu_start(self.ql.loader.entry_point, entry_address, self.ql.timeout)
  File "C:\Users\user\Desktop\qiling\qiling\core.py", line 705, in emu_start
    self.uc.emu_start(begin, end, timeout, count)
  File "C:\Users\user\Desktop\qiling-test\lib\site-packages\unicorn\unicorn.py", line 525, in emu_start
    raise UcError(status)
unicorn.unicorn.UcError: Invalid memory read (UC_ERR_READ_UNMAPPED)
PS >

@xwings
Copy link
Member
xwings commented Apr 30, 2022

Hi,

Two comments:

  1. i am not too sure hardcoding O_BINARY is a good idea or not. I need to look at the new code

  2. again, Linux binary got way more issue. Eg, some linux system just can't run properly. So, technically we do have lots of linux binary should not work under Windows.

@nmantani
Copy link
Contributor Author

@xwings Thank you for your comment. The POSIX standard (IEEE Std 1003.1-2017) does not define text mode (the O_TEXT flag) nor binary mode (the O_BINARY flag). These modes are only provided by Windows as described in the document of File Translation Constants:

These constants specify the mode of translation ("b" or "t"). The mode is included in the string specifying the type of access ("r", "w", "a", "r+", "w+", "a+").

The translation modes are as follows:

  • t
    Opens in text (translated) mode. In this mode, carriage return-line feed (CR-LF) combinations are translated into single line feeds (LF) on input, and LF characters are translated into CR-LF combinations on output. Also, CTRL+Z is interpreted as an end-of-file character on input. In files opened for reading or reading and writing, fopen checks for CTRL+Z at the end of the file and removes it, if possible. This is done because using the fseek and ftell functions to move within a file ending with CTRL+Z may cause fseek to behave improperly near the end of the file.

Note

The t option is not part of the ANSI standard for fopen and freopen. It is a Microsoft extension and should not be used where ANSI portability is desired.

  • b
    Opens in binary (untranslated) mode. The above translations are suppressed.

Troublingly, the default setting on Windows is text mode according to the document of the _fmode variable. It causes unintended data truncation or conversion (CR-LF <-> LF) that never happen on POSIX systems (such as Linux and FreeBSD). So the O_BINARY flag has to be set to maintain file I/O compatibility for emulation of POSIX systems.

I carefully added the code to not break compatibility of Qiling Framework:

  • Emulation of binaries on non-Windows hosts: flags are not modified
  • Emulation of non-Windows binaries on Windows hosts: the O_BINARY flag is set
  • Emulation of Windows binaries on Windows hosts: flags are not modified

Though I understand the current situation that a lot of Linux binaries should not work under Windows, this issue is critical because truncation or conversion (CR-LF <-> LF) of libc.so.6 breaks every emulation of dynamically liked Linux binary on Windows host. I use Qiling Framework for my own tool FileInsight-plugins that works on Windows, but I have to stick with the version 1.2.3 of Qiling Framework due to this issue.

@xwings
Copy link
Member
xwings commented Apr 30, 2022

Ok, that make sense!

@xwings xwings merged commit d92e50c into qilingframework:dev Apr 30, 2022
@nmantani nmantani deleted the fix-ql_open_flag_mapping branch April 30, 2022 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0