8000 [PATCH] Setting hierarchy parameter in recursive function to None to a… by dherincx92 · Pull Request #19 · dherincx92/fpds · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[PATCH] Setting hierarchy parameter in recursive function to None to a… #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 14 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# fpds
A no-frills parser for the Federal Procurement Data System (FPDS)
at https://www.fpds.gov/fpdsng_cms/index.php/en/.
A no-frills parser for the Federal Procurement Data System (FPDS) found
[here](https://www.fpds.gov/fpdsng_cms/index.php/en/).


## Motivation
Expand All @@ -17,15 +17,14 @@ To install this package for development, create a virtual environment
and install dependencies.

```
$ python3 -m venv venv
$ python3.8 -m venv venv
$ source venv/bin/activate
$ pip install -e .
```


## Usage
For a list of valid search criteria parameters, consult FPDS documentation
found at: https://www.fpds.gov/wiki/index.php/Atom_Feed_Usage. Parameters
found [here](https://www.fpds.gov/wiki/index.php/Atom_Feed_Usage). Parameters
will follow the `URL String` format shown in the link above, with the
following exceptions:

Expand All @@ -36,13 +35,11 @@ entire criteria string in quotes.

For example, `AGENCY_CODE:”3600”` should be used as `"AGENCY_CODE=3600"`.


Via CLI:
```
$ fpds parse "LAST_MOD_DATE=[2022/01/01, 2022/05/01]" "AGENCY_CODE=7504"
```


By default, data will be dumped into an `.fpds` folder at the user's
`$HOME` directory. If you wish to override this behavior, provide the `-o`
option. The directory will be created if it doesn't exist:
Expand All @@ -56,19 +53,22 @@ Same request via python interpreter:
from fpds import fpdsRequest

request = fpdsRequest(
target_database_url_env_key="SOME_ENVIRONMENT_VAR",
LAST_MOD_DATE="[2022/01/01, 2022/05/01]",
AGENCY_CODE="7504"
)

# handles automatic conversion of XML --> JSON
data = request()

# or conversely, you can call the explicit `process_records` method
data = request.process_records()

# URL magic method for assitance / debugging
url = request.__url__()

# if you wish to bypass `multiprocessing`
request = fpdsRequest(
LAST_MOD_DATE="[2022/01/01, 2022/05/01]",
AGENCY_CODE="7504"
)
data = request.run_asyncio_loop()
records = [xml.jsonified_entries() for xml in data]
```

For linting and formatting, we use `flake8` and `black`.
Expand All @@ -90,6 +90,8 @@ $ make test
```

## What's New
As of 06/05/2024, `v1.3.2` patches a bug that was caching attributes due to a misuse of a mutable default argument.

`fpds` now supports asynchronous requests! As of `v1.3.0`, users can instantiate
the class as usual, but will now need to call the `process_records` method
to get records as JSON. Note: due to some recursive function calls in the XML
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ readme = "README.md"
license = {file = "LICENSE"}
requires-python = ">=3.8"
keywords = ["fpds", "python", "atom feed", "cli", "xml"]
version = "1.3.1"
version = "1.3.2"
classifiers = [
"Intended Audience :: Developers",
"Natural Language :: English",
Expand Down
3 changes: 2 additions & 1 deletion src/fpds/core/mixins.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@
fpds mixin classes

author: derek663@gmail.com
last_updated: 01/20/2024
last_updated: 06/05/2024
"""

from xml.etree.ElementTree import Element, ElementTree


Expand Down
26 changes: 11 additions & 15 deletions src/fpds/core/xml.py
C5D8
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
XML classes for parsing FPDS content.

author: derek663@gmail.com
last_updated: 01/20/2024
last_updated: 06/05/2024
"""

import re
Expand Down Expand Up @@ -299,7 +299,7 @@ def contract_type(self) -> str:

def get_entry_data(self) -> Dict[str, str]:
"""Extracts award data from an entry."""
entry_tags = dict() # type: Dict[str, str]
entry_tags = dict()
hierarchy = self.content_tag_hierarchy()

for prefix, tag in hierarchy.items():
Expand All @@ -314,7 +314,7 @@ def content_tag_hierarchy(
self,
element: Optional[Element] = None,
parent: Optional[str] = None,
hierarchy: Dict[str, str] = dict(),
hierarchy: Optional[Dict[str, str]] = None,
) -> Dict[str, str]:
"""Added on v1.2.0

Expand Down Expand Up @@ -358,20 +358,25 @@ def content_tag_hierarchy(
The hierarchy dictionary structure to be passed through each
recursive function call.
"""
if hierarchy is None:
hierarchy = {}

if element is None:
element = self.element # type: ignore

_parent = Parent(content=element)

# continue parsing XML hierarchy because children exist and we want
# to get every possible bit of data
if _parent.children():
for child in _parent.children():
_child = Parent(content=child, parent_name=parent)
parent_tag_name = _child.parent_child_hierarchy_name()
hierarchy[parent_tag_name] = child

self.content_tag_hierarchy(
element=child, parent=parent_tag_name, hierarchy=hierarchy
element=child,
parent=parent_tag_name,
hierarchy=hierarchy,
)
return hierarchy

Expand All @@ -385,22 +390,13 @@ def __init__(self, parent_name=None, *args, **kwargs):
super().__init__(*args, **kwargs)
self.parent_name = parent_name

@property
def tag_exclusions(self) -> List[str]:
"""Tag names that should be excluded from the hierarchy tree. Because
some of the XML hierarchy doesn't provide much value, we provide a
mechanism for `award_tag_hierarchy` to avoid using such tags in the
final string concatenation.
"""
return ["content", "IDV", "award"]

def children(self):
"""Returns children if they exist."""
if list(self.element):
return list(self.element)

def parent_child_hierarchy_name(self, delim="__"):
if self.parent_name and self.parent_name not in self.tag_exclusions:
if self.parent_name:
name = self.parent_name + delim + self.clean_tag
else:
name = self.clean_tag
Expand Down
Loading
0