8000 GitHub - hmakki72/pymarc_utilities: Pymarc Utilities is a set of functions aimed to help manuplating large size MARC files. Pymarc Utilities works with Pymarc library for working with bibliographic data encoded in MARC21.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Pymarc Utilities is a set of functions aimed to help manuplating large size MARC files. Pymarc Utilities works with Pymarc library for working with bibliographic data encoded in MARC21.

License

Notifications You must be signed in to change notification settings

hmakki72/pymarc_utilities

Repository files navigation

Pymarc Utilities

Pymarc Utilities offer a suite of functions designed to facilitate the handling and manipulation of MARC (Machine-Readable Cataloging) records, which are the international standard for bibliographic and related information. These utilities require Pymarc 5.1.2 , a Python library, as a prerequisite for their operation.

The utilities include features for finding and replacing data within MARC records, such as specific tags, indicators, or subfields, and can even utilize regular expressions for more complex search patterns. Additionally, they provide the ability to swap linked fields, particularly useful for managing vernacular and Romanized fields in bibliographic records. Another notable function is the uncombining of diacritics, which separates combined UTF-8 characters into their base characters and diacritic marks, aiding in the normalization of text. Lastly, the utilities can count the number of records in a raw MARC file, providing a quick overview of the dataset size. These tools are essential for librarians, archivists, and anyone working with large volumes of bibliographic data, streamlining the process of cataloging and data management.

1- PyMARC_Utilites:

Class FIND_AND_REPLACE

The FIND_AND_REPLACE class provides methods to locate specific data within records, such as tags, indicators, or subfields, and replace them as needed. This can be particularly useful for correcting or updating information across multiple entries. The find function, for example, can locate records with a specific tag, such as Tag 100, and can be refined further to search for records where the first indicator is 1 and the subfield $a contains a certain value, like 'James'. Regular expressions can also be employed for more complex queries, enhancing the precision of searches.

1.1) Find:

- Find functions allows you to search for target field, indicators, or subfields.

- Finds MARC records that have Tag 100

- Finds MARC records that have Tag 100 and first indicator equals 1

- Finds MARC records where Tag is 100, first indicator equals 1, and subfield $a equals James. You can use regex pattern to find subfield values, for example ^James finds subfield value starts with James.

- You can find data in a control field as well.

     find(record, sample_field)

See test_find_replace.py

1.2) Find and Replace:

Find and Replace functions allows you to search for target field, indicators, or subfields, and replace it with something else.

     find_and_replace(record, find_field, replace_with_field)

See test_find_replace.py

1.3) Swap linked fields:

The utilities facilitate the swapping of linked fields, which is invaluable for maintaining the integrity of vernacular and Romanized fields in bibliographic records.

The swap function makes the 880 fields the main fields and converts the main fields into linked fields 880. Use this function if you want to make the vernacular field the main fields, and make the Romanized fields the linked fields.

     swap_bib_linked_fields(record)

It converts this:

=100 1\$6880-01$aMuṣṭafá, ʻAbd al-ʻAzīz.
=880 1\$6100-01/(3/r‏$a‏مصطفى، عبد العزيز.
To:
=100 1\$6880-01$a‏مصطفى، عبد العزيز.
=880 1\$6100-01$aMuṣṭafá, ʻAbd al-ʻAzīz.

2- PyMARC Utilities:

Class ENCODING

The uncombining of diacritics feature is another significant function, as it helps normalize text by separating combined UTF-8 characters into their base characters and diacritic marks.

2.1) Uncombine diacritics:

Replace combined UTF-8 characters to uncombined characters (characters+diacritics)

For example: change combine a macron ā to two characters (a+macron) ā

     uncombine_diacritics(record, skip_subfield_code)

Use '?' in skip_subfield_code if you want to uncombine all subfields data

3- PyMARC Utilities:

Class File

The ability to count the number of records in a raw MARC file provides a quick assessment of the dataset's size, aiding in data management tasks.

3.1) Count number of records in a raw MARC file:

Retrieves the total number of MARC records in a file.

     get_records_count()

3.2) Get MARC records:

This Function retrieves a dataset of MARC records that starts from a record number in a file.

     get_records(5000,5009)

Returns a list of 10 records starting from record number 5000 in a file, and ends with records number 5009.

4- Export MARC fields to CSV file:

For instance, you might use Pymarc to extract all the titles, authors, and publication dates from a collection of MARC records and then save this information into a CSV file for further analysis or reporting.

The process involves writing a script in Python that utilizes the Pymarc library to read MARC records, select the desired fields, and then write those fields to a CSV file. There are resources and examples available online that can guide you through this process.

Class EXPORT_CSV

4.1) Export Normalized Fields:

This function removes all subfield codes and delimters from a field, and saves the field into one row of a csv file. The csv is column headers are the tags in tags_list

     normalized_fields_to_csv(records list of PyMARC records, list of tags)

#CVS class calling fieldsto_csv = export_csv.EXPORT_CSV('normarlized.csv') #normalized_fields_to_csv (records_list, tags_list) fieldsto_csv.normalized_fields_to_csv(records_list,['100','245','650'])

Creates a CSV file with the following headers
100,245,650

4.2) Export subfields:

It takes 2d list of tags and subfields like this [["245","a","b","z"], ["300","a"], ["264","a","c"]], and exports subfields values in one CSV row. This function retrieves only the first occurrence of the repeated fields and subfields.

     subfields_to_csv(records list of PyMARC records, 2D list if tags and subfields codes)

#Create 2d list thislist = [["245","a","b","h"], ["300","a"], ["264","a","c"]] #CVS class calling fieldsto_csv = export_csv.EXPORT_CSV('subfields.csv') #subfields_to_csv fieldsto_csv.subfields_to_csv(records_list,thislist)

Creates a CSV file with the following headers
245a,245b,245h,300a,264a,264c

4.3) DB Normalized Export:

It extracts records into a sinlge csv file. Each record is extracted in rows. All rows of a record can be link with a primary key and control number in 001. Also, this function retains the squence of tags and subfields. .

     db_normalized_to_csv(records list of PyMARC records)

#CVS class calling fieldsto_csv = export_csv.EXPORT_CSV('DB_Normalized.csv') #subfields_to_csv fieldsto_csv.db_normalized_to_csv(records_list)

Creates a CSV file with the following headers
PK,001,Tag_Sequence,Tag,Ind1,Ind2,Subfield_Squence,Subfield_code,Field_Value

Finnaly:

Overall, Pymarc Utilities serve as an essential resource for librarians, archivists, and anyone involved in the handling of extensive bibliographic data, ensuring efficient and accurate cataloging and data management operations.

About

Pymarc Utilities is a set of functions aimed to help manuplating large size MARC files. Pymarc Utilities works with Pymarc library for working with bibliographic data encoded in MARC21.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0