Tags: skyeyester/pypdf
Tags
REL: 5.1.0 ## What's new ### New Features (ENH) - Add `layout_mode_font_height_weight` argument to `PageObject.extract_text()` (py-pdf#2920) by @hpierre001 ### Bug Fixes (BUG) - Fix font specificier for FreeText annotation (py-pdf#2893) by @ssjkamei - Line breaks are not generated due to incorrect calculation of text leading (py-pdf#2890) by @ssjkamei - Improve handling of spaces in text extraction (py-pdf#2882) by @ssjkamei ### Robustness (ROB) - Soft failure for flate encode image mode 1 with wrong LUT size (py-pdf#2900) by @stefan6419846 ### Documentation (DOC) - Use latest package versions (py-pdf#2907) by @stefan6419846 - Correct example of reading FileAttachment annotation (py-pdf#2906) by @j-t-1 ### Developer Experience (DEV) - Update pinned requirements (py-pdf#2918) by @stefan6419846 - Make make_release.py compatible with Windows environment (py-pdf#2894) by @pubpub-zz ### Maintenance (MAINT) - Remove references to outdated Python versions (py-pdf#2919) by @stefan6419846 - Generalize the method of obtaining space_code (py-pdf#2891) by @ssjkamei - Unnecessary character mapping process (py-pdf#2888) by @ssjkamei - New LZW decoding implementation (py-pdf#2887) by @MartinThoma ### Testing (TST) - Add LzwCodec for encoding (py-pdf#2883) by @MartinThoma ### Code Style (STY) - Capitalize error messages (py-pdf#2903) by @j-t-1 - Modify error messages in PdfWriter (py-pdf#2902) by @j-t-1 [Full Changelog](py-pdf/pypdf@5.0.1...5.1.0)
REL: 5.0.1 (py-pdf#2884) ## Version 5.0.1, 2024-09-29 ### New Features (ENH) - Add `full` parameter to PdfWriter constructor (py-pdf#2865) ### Bug Fixes (BUG) - Update pyproject.toml with minimum Python version of 3.8 (py-pdf#2859) - Cope with unbalanced delimiters in dictionary object (py-pdf#2878) - Cope with encoding with too many differences (py-pdf#2873) - Missing spaces in extract_text() method (py-pdf#1328) (py-pdf#2868) - Tolerate truncated files and no warning when jumping startxref (py-pdf#2855) ### Robustness (ROB) - Repair PDF with invalid Root object (py-pdf#2880) - Continue parsing dictionary object when error is detected (py-pdf#2872) - Merge documents with invalid pages in named destinations (py-pdf#2857) - Tolerate comments in arrays (py-pdf#2856) ### Developer Experience (DEV) - Use latest Python version for benchmarking (py-pdf#2879) ### Maintenance (MAINT) - Add tests to source distributions (py-pdf#2874) - Refactor _update_field_annotation (py-pdf#2862) [Full Changelog](py-pdf/pypdf@5.0.0...5.0.1)
REL: 5.0.0 (py-pdf#2851) ## Version 5.0.0, 2024-09-15 This version drops support for Python 3.7 (not maintained since July 2023), PdfMerger (use PdfWriter instead) and AnnotationBuilder (use annotations instead). ### Deprecations (DEP) - Remove the deprecated PfdMerger and AnnotationBuilder classes and other deprecations cleanup (py-pdf#2813) - Drop Python 3.7 support (py-pdf#2793) ### New Features (ENH) - Add capability to remove /Info from PDF (py-pdf#2820) - Add incremental capability to PdfWriter (py-pdf#2811) - Add UniGB-UTF16 encodings (py-pdf#2819) - Accept utf strings for metadata (py-pdf#2802) - Report PdfReadError instead of RecursionError (py-pdf#2800) - Compress PDF files merging identical objects (py-pdf#2795) ### Bug Fixes (BUG) - Fix sheared image (py-pdf#2801) ### Robustness (ROB) - Robustify .set_data() (py-pdf#2821) - Raise PdfReadError when missing /Root in trailer (py-pdf#2808) - Fix extract_text() issues on damaged PDFs (py-pdf#2760) - Handle images with empty data when processing an image from bytes (py-pdf#2786) ### Developer Experience (DEV) - Fix coverage uploads (py-pdf#2832) - Test against Python 3.13 (py-pdf#2776) [Full Changelog](py-pdf/pypdf@4.3.1...5.0.0)
## Version 4.3.1, 2024-07-21 ### Bug Fixes (BUG) - Cope with Matrix entry in field annotations (py-pdf#2736) ### Robustness (ROB) - Cope with fields with upside down box/rectangle (py-pdf#2729) ### Maintenance (MAINT) - Add deprecate_with_replacement to StreamObject.initializeFromD… (py-pdf#2728) - Deal with cryptography>=43 moving ARC4 (py-pdf#2765) [Full Changelog](py-pdf/pypdf@4.3.0...4.3.1)
REL: 4.3.0 ## What's new ### New Features (ENH) - Accept ETen-B5 and UniCNS-UTF16 encodings (py-pdf#2721) by @pubpub-zz - Add decode_as_image() to ContentStreams (py-pdf#2615) by @pubpub-zz - context manager for PdfReader (py-pdf#2666) by @tibor-reiss - Add capability to set font and size in fields (py-pdf#2636) by @pubpub-zz - Allow to pass input file without named argument (py-pdf#2576) by @pubpub-zz ### Bug Fixes (BUG) - Fix deprecation for Ressources when using old constants (py-pdf#2705) by @stefan6419846 - Fix images issue 4 bits encoding and LUT starting with UTF16_BOM (py-pdf#2675) by @pubpub-zz - Reading large compressed images takes huge time to process (py-pdf#2644) by @snanda85 - Highlighted Text Cannot Be Printed (py-pdf#2604) by @Nifury - Fix UnboundLocalError on malformed pdf (py-pdf#2619) by @farjasju ### Documentation (DOC) - Various improvements on docstrings and examples by @j-t-1 ### Robustness (ROB) - Cope with missing Standard 14 fonts in fields (py-pdf#2677) by @pubpub-zz - Improve inline image extraction (py-pdf#2622) by @pubpub-zz - Cope with loops in Fields tree (py-pdf#2656) by @pubpub-zz - Discard /I in choice fields for compatibility with Acrobat (py-pdf#2614) by @pubpub-zz - Cope with some issues in pillow (py-pdf#2595) by @pubpub-zz - Cope with some image extraction issues (py-pdf#2591) by @pubpub-zz ### Maintenance (MAINT) - Deprecate interiour_color with replacement interior_color (py-pdf#2706) by @j-t-1 - Add deprecate_with_replacement to PdfWriter.find_bookmark (py-pdf#2674) by @j-t-1 ### Code Style (STY) - Change Link to be a non-markup annotation (py-pdf#2714) by @j-t-1 [Full Changelog](py-pdf/pypdf@4.2.0...4.3.0)
Version 4.2.0, 2024-04-07 ## What's new ### New Features (ENH) - Allow multiple charsets for NameObject.read_from_stream (py-pdf#2585) - Add support for /Kids in page labels (py-pdf#2562) - Allow to update fields on many pages (py-pdf#2571) - Tolerate PDF with invalid xref pointed objects (py-pdf#2335) - Add Enforce from PDF2.0 in viewer_preferences (py-pdf#2511) - Add += and -= operators to ArrayObject (py-pdf#2510) ### Bug Fixes (BUG) - Fix merge_page sometimes generating unknown operator 'QQ' (py-pdf#2588) - Fix fields update where annotations are kids of field (py-pdf#2570) - Process CMYK images without a filter correctly (py-pdf#2557) - Extract text in layout mode without finding resources (py-pdf#2555) - Prevent recursive loop in some PDF files (py-pdf#2505) ### Robustness (ROB) - Tolerate "truncated" xref (py-pdf#2580) - Replace error by warning for EOD in RunLengthDecode/ASCIIHexDecode (py-pdf#2334) - Rebuild xref table if one entry is invalid (py-pdf#2528) - Robustify stream extraction (py-pdf#2526) ### Documentation (DOC) - Update release process for latest changes (py-pdf#2564) - Encryption/decryption: Clone document instead of copying all pages (py-pdf#2546) - Minor improvements (py-pdf#2542) - Update annotation list (py-pdf#2534) - Update references and formatting (py-pdf#2529) - Correct threads reference, plus minor changes (py-pdf#2521) - Minor readability increases (py-pdf#2515) - Simplify PaperSize examples (py-pdf#2504) - Minor improvements (py-pdf#2501) ### Developer Experience (DEV) - Remove unused dependencies (py-pdf#2572) - Remove page labels PR link from message (py-pdf#2561) - Fix changelog generator regarding whitespace and handling of "Other" group (py-pdf#2492) - Add REL to known PR prefixes (py-pdf#2554) - Release using the REL commit instead of git tag (py-pdf#2500) - Unify code between PdfReader and PdfWriter (py-pdf#2497) - Bump softprops/action-gh-release from 1 to 2 (py-pdf#2514) ### Maintenance (MAINT) - Ressources → Resources (and internal name childs) (py-pdf#2550) - Fix typos found by codespell (py-pdf#2549) - Update Read the Docs configuration (py-pdf#2538) - Add root_object, _info and _ID to PdfReader (py-pdf#2495) ### Testing (TST) - Allow loading truncated images if required (py-pdf#2586) - Fix download issues from py-pdf#2562 (py-pdf#2578) - Improve test_get_contents_from_nullobject to show real use-case (py-pdf#2524) - Add missing test annotations (py-pdf#2507) [Full Changelog](py-pdf/pypdf@4.1.0...4.2.0)
Version 4.1.0, 2024-03-03 ## What's new ### New Features (ENH) - Add get_pages_from_field (py-pdf#2494) by @pubpub-zz - Add reattach_fields function (py-pdf#2480) by @pubpub-zz - Automatic access to pointed object for IndirectObject (py-pdf#2464) by @pubpub-zz ### Bug Fixes (BUG) - missing error on name without leading / (py-pdf#2387) by @Rak424 - encode_pdfdocencoding() always returns bytes (py-pdf#2440) by @sbourlon - BI in text content identified as image tag (py-pdf#2459) by @pubpub-zz ### Robustness (ROB) - Missing basefont entry in type 3 font (py-pdf#2469) by @pubpub-zz ### Documentation (DOC) - Amend robustness documentation (py-pdf#2479) by @j-t-1 ### Developer Experience (DEV) - Fix changelog for UTF-8 characters (py-pdf#2462) by @stefan6419846 ### Maintenance (MAINT) - Add _get_page_number_from_indirect in writer (py-pdf#2493) by @pubpub-zz - Remove user assignment for feature requests (py-pdf#2483) by @stefan6419846 - Remove reference to old 2.0.0 branch (py-pdf#2482) by @stefan6419846 ### Testing (TST) - Fix benchmark failures (py-pdf#2481) by @stefan6419846 - Resolve file naming conflict in test_iss1767 (py-pdf#2445) by @sbourlon [Full Changelog](py-pdf/pypdf@4.0.2...4.1.0)
Version 4.0.2, 2024-02-18 ## What's new ### Bug Fixes (BUG) - Use NumberObject for /Border elements of annotations (py-pdf#2451) by @rsinger417 ### Documentation (DOC) - Document easier way to update metadata (py-pdf#2454) by @stefan6419846 - Typo `Polyline` \xe2\x86\x92 `PolyLine` in adding-pdf-annotations.md (py-pdf#2426) by @CWKSC ### Developer Experience (DEV) - Bump codecov/codecov-action from 3 to 4 (py-pdf#2430) by @dependabot[bot] ### Testing (TST) - Avoid catching not emitted warnings (py-pdf#2429) by @stefan6419846 [Full Changelog](py-pdf/pypdf@4.0.1...4.0.2)
Version 4.0.1, 2024-01-28 ## What's new ### Bug Fixes (BUG) - layout mode text extraction ZeroDivisionError (py-pdf#2417) by @shartzog ### Testing (TST) - Skip tests using fpdf2 if it\'s not installed (py-pdf#2419) by @MartinThoma [Full Changelog](py-pdf/pypdf@4.0.0...4.0.1)
Version 4.0.0, 2024-01-19 ## What's new pypdf==4.0.0 is a big milestone forward: * We finally have a layout-mode text extraction. This enables users who want to detect / extract tables with heuristics to give it a try. * We deprecated a lot of the old PyPDF2 API that was either not following PEP8 naming styles or was not using a property. Users comming from PyPDF2 might want to switch first to pypdf<4.0.0 to get helpful error messages that show the new API in their speicific cases. A big 'Thank you!' the the whole pypdf community for your work. Thanks to you, pypdf is better than ever. Kudos to @shartzog who added the layout-mode with his first contribution! ### Deprecations (DEP) - Drop Python 3.6 support (py-pdf#2369) by @MartinThoma - Remove deprecated code (py-pdf#2367) by @MartinThoma - Remove deprecated XMP properties (py-pdf#2386) by @stefan6419846 ### New Features (ENH) - Add "layout" mode for text extraction (py-pdf#2388) by @shartzog - Add Jupyter Notebook integration for PdfReader (py-pdf#2375) by @MartinThoma - Improve/rewrite PDF permission retrieval (py-pdf#2400) by @stefan6419846 ### Bug Fixes (BUG) - PdfWriter.add_uri was setting the wrong type (py-pdf#2406) by @pmiller66 - Add support for GBK2K cmaps (py-pdf#2385) by @stefan6419846 ### Documentation (DOC) - Add pmiller66 for py-pdf#2406 as a contributor by @MartinThoma - Add missing expand parameter (py-pdf#2393) by @Atomnp - Resolve build warnings (py-pdf#2380) by @stefan6419846 - Fix testing prerequisites (py-pdf#2381) by @stefan6419846 - Improve formatting of contributors page (py-pdf#2383) by @stefan6419846 - Add Tobeabellwether as a contributor for py-pdf#2341 by @MartinThoma ### Developer Experience (DEV) - Make dependabot aware of our PR prefixes (py-pdf#2415) by @stefan6419846 - Fail on Sphinx issues (py-pdf#2405) by @stefan6419846 - Move title check to own workflow (py-pdf#2384) by @MasterOdin - Write to temporary files instead of the working directory (py-pdf#2379) by @stefan6419846 - Ensure that the PR titles have the correct format (py-pdf#2378) by @stefan6419846 ### Maintenance (MAINT) - Return None instead of -1 when page is not attached (py-pdf#2376) by @MartinThoma - Complete FileSpecificationDictionaryEntries constants (py-pdf#2416) by @MartinThoma - Replace warning with logging.error (py-pdf#2377) by @MartinThoma ### Testing (TST) - Add missing pytest.mark.samples annotations (py-pdf#2412) by @kitterma - Correctly close temporary files (py-pdf#2396) by @stefan6419846 - Fix side effect py-pdf#2379 (py-pdf#2395) by @pubpub-zz - Add test for layout extraction mode (py-pdf#2390) by @MartinThoma ### Code Style (STY) - Use the UserAccessPermissions enum (py-pdf#2398) by @MartinThoma - Run black (py-pdf#2370) by @MartinThoma [Full Changelog](py-pdf/pypdf@3.17.4...4.0.0)
PreviousNext