Tags: gentisaliu/PyPDF2
Tags
Version 2.0.0, 2022-06-01 The 2.0.0 release of PyPDF2 includes three core changes: 1. Dropping support for Python 3.5 and older. 2. Introducing type annotations. 3. Interface changes, mostly to have PEP8-compliant names We introduced a [deprecation process](py-pdf#930) that hopefully helps users to avoid unexpected breaking changes. Breaking Changes(DEP): - PyPDF2 2.0 requires Python 3.6+. Python 2.7 and 3.5 support were dropped. - PdfFileReader: The "warndest" parameter was removed - PdfFileReader and PdfFileMerger no longer have the `overwriteWarnings` parameter. The new behavior is `overwriteWarnings=False`. - merger: OutlinesObject was removed without replacement. - merger.py ➔ _merger.py: You must import PdfFileMerger from PyPDF2 directly. - utils: * `ConvertFunctionsToVirtualList` was removed * `formatWarning` was removed * `isInt(obj)`: Use `instance(obj, int)` instead * `u_(s)`: Use `s` directly * `chr_(c)`: Use `chr(c)` instead * `barray(b)`: Use `bytearray(b)` instead * `isBytes(b)`: Use `instance(b, type(bytes()))` instead * `xrange_fn`: Use `range` instead * `string_type`: Use `str` instead * `isString(s)`: Use `instance(s, str)` instead * `_basestring`: Use `str` instead * All Exceptions are now in `PyPDF2.errors`: - PageSizeNotDefinedError - PdfReadError - PdfReadWarning - PyPdfError - `PyPDF2.pdf` (the `pdf` module) no longer exists. The contents were moved with the library. You should most likely import directly from `PyPDF2` instead. The `RectangleObject` is in `PyPDF2.generic`. - The `Resources`, `Scripts`, and `Tests` will no longer be part of the distribution files on PyPI. This should have little to no impact on most people. The `Tests` are renamed to `tests`, the `Resources` are renamed to `resources`. Both are still in the git repository. The `Scripts` are now in https://github.com/py-pdf/cpdf. `Sample_Code` was moved to the `docs`. For a full list of deprecated functions, please see the changelog of version 1.28.0. New Features (ENH): - Improve space setting for text extraction (py-pdf#922) - Allow setting the decryption password in PdfReader.__init__ (py-pdf#920) - Add Page.add_transformation (py-pdf#883) Bug Fixes (BUG): - Fix error adding transformation to page without /Contents (py-pdf#908) Robustness (ROB): - Cope with invalid length in streams (py-pdf#861) Documentation (DOC): - Fix style of 1.25 and 1.27 patch notes (py-pdf#927) - Transformation (py-pdf#907) Developer Experience (DEV): - Create flake8 config file (py-pdf#916) - Use relative imports (py-pdf#875) Maintenance (MAINT): - Use Python 3.6 language features (py-pdf#849) - Add wrapper function for PendingDeprecationWarnings (py-pdf#928) - Use new PEP8 compliant names (py-pdf#884) - Explicitly represent transformation matrix (py-pdf#878) - Inline PAGE_RANGE_HELP string (py-pdf#874) - Remove unnecessary generics imports (py-pdf#873) - Remove star imports (py-pdf#865) - merger.py ➔ _merger.py (py-pdf#864) - Type annotations for all functions/methods (py-pdf#854) - Add initial type support with mypy (py-pdf#853) Testing (TST): - Regression test for xmp_metadata converter (py-pdf#923) - Checkout submodule sample-files for benchmark - Add text extracting performance benchmark - Use new PyPDF2 API in benchmark (py-pdf#902) - Make test suite fail for uncaught warnings (py-pdf#892) - Remove -OO testrun from CI (py-pdf#901) - Improve tests for convert_to_int (py-pdf#899) Full Changelog: py-pdf/pypdf@1.28.4...2.0.0
Version 1.28.4, 2022-05-29 Bug Fixes (BUG): - XmpInformation._converter_date was unusable (py-pdf#921) Full Changelog: py-pdf/pypdf@1.28.3...1.28.4
1.28.3 Deprecations (DEP): - PEP8 renaming (py-pdf#905) Bug Fixes (BUG): - XmpInformation missing method _getText (py-pdf#917) - Fix PendingDeprecationWarning on _merge_page (py-pdf#904) Full Changelog: py-pdf/pypdf@1.28.2...1.28.3
Version 1.28.2, 2022-05-23 Bug Fixes (BUG): - PendingDeprecationWarning for getContents (py-pdf#893) - PendingDeprecationWarning on using PdfMerger (py-pdf#891)
Version 1.28.1, 2022-05-22 Bug Fixes (BUG): - Incorrectly show deprecation warnings on internal usage (py-pdf#887) Maintenance (MAINT): - Add stacklevel=2 to deprecation warnings (py-pdf#889) - Remove duplicate warnings imports (py-pdf#888) Full Changelog: py-pdf/pypdf@1.28.0...1.28.1
Version 1.28.0, 2022-05-22 This release adds a lot of deprecation warnings in preparation of the PyPDF2 2.0.0 release. The changes are mostly using snake_case function-, method-, and variable-names as well as using properties instead of getter-methods. Maintenance (MAINT): - Remove IronPython Fallback for zlib (py-pdf#868) Full Changelog: py-pdf/pypdf@1.27.12...1.27.13 * Make the `PyPDF2.utils` module private * Rename of core classes: * PdfFileReader ➔ PdfReader * PdfFileWriter ➔ PdfWriter * PdfFileMerger ➔ PdfMerger * Use PEP8 conventions for function names and parameters * If a property and a getter-method are both present, use the property In many places: - getObject ➔ get_object - writeToStream ➔ write_to_stream - readFromStream ➔ read_from_stream PyPDF2.generic - readObject ➔ read_object - convertToInt ➔ convert_to_int - DocumentInformation.getText ➔ DocumentInformation._get_text : This method should typically not be used; please let me know if you need it. PdfReader class: - `reader.getPage(pageNumber)` ➔ `reader.pages[page_number]` - `reader.getNumPages()` / `reader.numPages` ➔ `len(reader.pages)` - getDocumentInfo ➔ metadata - flattenedPages attribute ➔ flattened_pages - resolvedObjects attribute ➔ resolved_objects - xrefIndex attribute ➔ xref_index - getNamedDestinations / namedDestinations attribute ➔ named_destinations - getPageLayout / pageLayout ➔ page_layout attribute - getPageMode / pageMode ➔ page_mode attribute - getIsEncrypted / isEncrypted ➔ is_encrypted attribute - getOutlines ➔ get_outlines - readObjectHeader ➔ read_object_header (TODO: read vs get?) - cacheGetIndirectObject ➔ cache_get_indirect_object (TODO: public vs private?) - cacheIndirectObject ➔ cache_indirect_object (TODO: public vs private?) - getDestinationPageNumber ➔ get_destination_page_number - readNextEndLine ➔ read_next_end_line - _zeroXref ➔ _zero_xref - _authenticateUserPassword ➔ _authenticate_user_password - _pageId2Num attribute ➔ _page_id2num - _buildDestination ➔ _build_destination - _buildOutline ➔ _build_outline - _getPageNumberByIndirect(indirectRef) ➔ _get_page_number_by_indirect(indirect_ref) - _getObjectFromStream ➔ _get_object_from_stream - _decryptObject ➔ _decrypt_object - _flatten(..., indirectRef) ➔ _flatten(..., indirect_ref) - _buildField ➔ _build_field - _checkKids ➔ _check_kids - _writeField ➔ _write_field - _write_field(..., fieldAttributes) ➔ _write_field(..., field_attributes) - _read_xref_subsections(..., getEntry, ...) ➔ _read_xref_subsections(..., get_entry, ...) PdfWriter class: - `writer.getPage(pageNumber)` ➔ `writer.pages[page_number]` - `writer.getNumPages()` ➔ `len(writer.pages)` - addMetadata ➔ add_metadata - addPage ➔ add_page - addBlankPage ➔ add_blank_page - addAttachment(fname, fdata) ➔ add_attachment(filename, data) - insertPage ➔ insert_page - insertBlankPage ➔ insert_blank_page - appendPagesFromReader ➔ append_pages_from_reader - updatePageFormFieldValues ➔ update_page_form_field_values - cloneReaderDocumentRoot ➔ clone_reader_document_root - cloneDocumentFromReader ➔ clone_document_from_reader - getReference ➔ get_reference - getOutlineRoot ➔ get_outline_root - getNamedDestRoot ➔ get_named_dest_root - addBookmarkDestination ➔ add_bookmark_destination - addBookmarkDict ➔ add_bookmark_dict - addBookmark ➔ add_bookmark - addNamedDestinationObject ➔ add_named_destination_object - addNamedDestination ➔ add_named_destination - removeLinks ➔ remove_links - removeImages(ignoreByteStringObject) ➔ remove_images(ignore_byte_string_object) - removeText(ignoreByteStringObject) ➔ remove_text(ignore_byte_string_object) - addURI ➔ add_uri - addLink ➔ add_link - getPage(pageNumber) ➔ get_page(page_number) - getPageLayout / setPageLayout / pageLayout ➔ page_layout attribute - getPageMode / setPageMode / pageMode ➔ page_mode attribute - _addObject ➔ _add_object - _addPage ➔ _add_page - _sweepIndirectReferences ➔ _sweep_indirect_references PdfMerger class - `__init__` parameter: strict=True ➔ strict=False (the PdfFileMerger still has the old default) - addMetadata ➔ add_metadata - addNamedDestination ➔ add_named_destination - setPageLayout ➔ set_page_layout - setPageMode ➔ set_page_mode Page class: - artBox / bleedBox/ cropBox/ mediaBox / trimBox ➔ artbox / bleedbox/ cropbox/ mediabox / trimbox - getWidth, getHeight ➔ width / height - getLowerLeft_x / getUpperLeft_x ➔ left - getUpperRight_x / getLowerRight_x ➔ right - getLowerLeft_y / getLowerRight_y ➔ bottom - getUpperRight_y / getUpperLeft_y ➔ top - getLowerLeft / setLowerLeft ➔ lower_left property - upperRight ➔ upper_right - mergePage ➔ merge_page - rotateClockwise / rotateCounterClockwise ➔ rotate_clockwise - _mergeResources ➔ _merge_resources - _contentStreamRename ➔ _content_stream_rename - _pushPopGS ➔ _push_pop_gs - _addTransformationMatrix ➔ _add_transformation_matrix - _mergePage ➔ _merge_page XmpInformation class: - getElement(..., aboutUri, ...) ➔ get_element(..., about_uri, ...) - getNodesInNamespace(..., aboutUri, ...) ➔ get_nodes_in_namespace(..., aboutUri, ...) - _getText ➔ _get_text utils.py: - matrixMultiply ➔ matrix_multiply - RC4_encrypt is moved to the security module
Version 1.27.12, 2022-05-02 Bug Fixes (BUG): - _rebuild_xref_table expects trailer to be a dict (py-pdf#857) Documentation (DOC): - Security Policy Full Changelog: py-pdf/pypdf@1.27.11...1.27.12
Version 1.27.11, 2022-05-02 Bug Fixes (BUG): - Incorrectly issued xref warning/exception (py-pdf#855) Full Changelog: py-pdf/pypdf@1.27.10...1.27.11
Version 1.27.10, 2022-05-01 Robustness (ROB): - Handle missing destinations in reader (py-pdf#840) - warn-only in readStringFromStream (py-pdf#837) - Fix corruption in startxref or xref table (py-pdf#788 and py-pdf#830) Documentation (DOC): - Project Governance (py-pdf#799) - History of PyPDF2 - PDF feature/version support (py-pdf#816) - More details on text parsing issues (py-pdf#815) Developer Experience (DEV): - Add benchmark command to Makefile - Ignore IronPython parts for code coverage (py-pdf#826) Maintenance (MAINT): - Split pdf module (py-pdf#836) - Separated CCITTFax param parsing/decoding (py-pdf#841) - Update requirements files Testing (TST): - Use external repository for larger/more PDFs for testing (py-pdf#820) - Swap incorrect test names (py-pdf#838) - Add test for PdfFileReader and page properties (py-pdf#835) - Add tests for PyPDF2.generic (py-pdf#831) - Add tests for utils, form fields, PageRange (py-pdf#827) - Add test for ASCII85Decode (py-pdf#825) - Add test for FlateDecode (py-pdf#823) - Add test for filters.ASCIIHexDecode (py-pdf#822) Code Style (STY): - Apply pre-commit (black, isort) + use snake_case variables (py-pdf#832) - Remove debug code (py-pdf#828) - Documentation, Variable names (py-pdf#839) Full Changelog: py-pdf/pypdf@1.27.9...1.27.10
Version 1.27.9, 2022-04-24 A change I would like to highlight is the performance improvement for large PDF files (py-pdf#808) 🎉 New Features (ENH): - Add papersizes (py-pdf#800) - Allow setting permission flags when encrypting (py-pdf#803) - Allow setting form field flags (py-pdf#802) Bug Fixes (BUG): - TypeError in xmp._converter_date (py-pdf#813) - Improve spacing for text extraction (py-pdf#806) - Fix PDFDocEncoding Character Set (py-pdf#809) Robustness (ROB): - Use null ID when encrypted but no ID given (py-pdf#812) - Handle recursion error (py-pdf#804) Documentation (DOC): - CMaps (py-pdf#811) - The PDF Format + commit prefixes (py-pdf#810) - Add compression example (py-pdf#792) Developer Experience (DEV): - Add Benchmark for Performance Testing (py-pdf#781) Maintenance (MAINT): - Validate PDF magic byte in strict mode (py-pdf#814) - Make PdfFileMerger.addBookmark() behave life PdfFileWriters\' (py-pdf#339) - Quadratic runtime while parsing reduced to linear (py-pdf#808) Testing (TST): - Newlines in text extraction (py-pdf#807) Full Changelog: py-pdf/pypdf@1.27.8...1.27.9
PreviousNext