python3 -m pip install docx-parser
paragraph
: text paragraph, with style_idmultipart
: paragraph with image or hyperlinktable
: table data with merged_cells
- CMD
docx_parser --help
# parse image as file
docx_parser tests/demo.docx -D tests/media -o tests/out.file.jl
# parse image as base64 string
docx_parser tests/demo.docx -A base64 -o tests/out.base64.jl
- Python
from docx_parser import DocumentParser
infile = 'tests/demo.docx'
doc = DocumentParser(infile)
for _type, item in doc.parse():
print(_type, item)
- parse text style: color, bgcolor, font, bold, italic ...< 4C24 /li>
- parse paragraph format