10000 feature: unarchive files, add support for online files by lucasrodes · Pull Request #45 · owid/owid-datautils-py · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Nov 1, 2023. It is now read-only.

feature: unarchive files, add support for online files #45

Merged
merged 20 commits into from
Oct 17, 2022

Conversation

lucasrodes
Copy link
Member
@lucasrodes lucasrodes commented Oct 14, 2022

Overview

Additions:

  • New module owid.datautils.decorators.
  • Extends the capabilities of decompress_file so it can handle .tar.gz and .tar.bz2.

Improves changes from PR #42.

New module owid.datautils.decorators

This PR creates a new module, owid.datautils.decorators. In general, decorators can come handy to enhance the capabilities of functions. You can read more about them in this guide.

The first decorator is enable_file_download, which adds the functionality to read or process a file directly from a URL.

Suppose you have the following function, which reads a LOCAL file and processes it.

def process_file(path: str) -> None:
    """Read the local file in `path` and process it"""
    ...

Now, imagine that you want this function to be able to download a remote file (hosted in a certain URL) and then apply the same processing. You can do it now with the decorator by adding the following on top of the function declaration:

@enable_file_download("path")

In this case, the decorator argument is the argument's name in the function process_file that contains the input path to the file (in this case, a URL).

So, all together:

@enable_file_download("path")
def process_file(path: str) -> None:
    """Read the local file in `path` and process it"""
    ...

.tar.gz and .tar.bz2

Now tar files should also be supported. I have used the standard library package tarfile. Tests have been added, too.

@lucasrodes lucasrodes added the enhancement New feature or request label Oct 14, 2022
@lucasrodes lucasrodes changed the title enhance(online files) feature: unarchive files, add support for online files Oct 14, 2022
@lucasrodes lucasrodes changed the base branch from add-extract-zip to main October 15, 2022 19:11
@codecov
Copy link
codecov bot commented Oct 16, 2022

Codecov Report

Merging #45 (6c8b945) into main (fc013f7) will increase coverage by 1.30%.
The diff coverage is 98.00%.

@@            Coverage Diff             @@
##             main      #45      +/-   ##
==========================================
+ Coverage   86.10%   87.40%   +1.30%     
==========================================
  Files           8       10       +2     
  Lines         511      548      +37     
==========================================
+ Hits          440      479      +39     
+ Misses         71       69       -2     
Impacted Files Coverage Δ
owid/datautils/io/local.py 100.00% <ø> (+7.69%) ⬆️
owid/datautils/io/archive.py 95.83% <95.83%> (ø)
owid/datautils/decorators.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator
@pabloarosado pabloarosado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed PR! A few minor concerns:

  • Now that we have datautils.io.json and datautils.io.archive and others, we may end up now knowing where things actually are, when attempting to load a utils function (like load_json). So maybe we should import all utils functions in the datautils.io.__init__ (or even datautils.__init__, given that the risk of duplicate function names is low) so that they are always easily accessible?
  • I think your initial concern was to avoid having interdependencies among modules. But I don't see how this has changed: Now io.archive imports decorators, and decorators imports web. In any case, I don't think we should worry about this.

Feel free to merge!

@lucasrodes lucasrodes merged commit 631a3fb into main Oct 17, 2022
@lucasrodes lucasrodes deleted the enhance/io-local-online branch October 17, 2022 14:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0