Pymarc Utilities offer a suite of functions designed to facilitate the handling and manipulation of MARC (Machine-Readable Cataloging) records, which are the international standard for bibliographic and related information. These utilities require Pymarc 5.1.2 , a Python library, as a prerequisite for their operation.
The utilities include features for finding and replacing data within MARC records, such as specific tags, indicators, or subfields, and can even utilize regular expressions for more complex search patterns. Additionally, they provide the ability to swap linked fields, particularly useful for managing vernacular and Romanized fields in bibliographic records. Another notable function is the uncombining of diacritics, which separates combined UTF-8 characters into their base characters and diacritic marks, aiding in the normalization of text. Lastly, the utilities can count the number of records in a raw MARC file, providing a quick overview of the dataset size. These tools are essential for librarians, archivists, and anyone working with large volumes of bibliographic data, streamlining the process of cataloging and data management.
The FIND_AND_REPLACE class provides methods to locate specific data within records, such as tags, indicators, or subfields, and replace them as needed. This can be particularly useful for correcting or updating information across multiple entries. The find function, for example, can locate records with a specific tag, such as Tag 100, and can be refined further to search for records where the first indicator is 1 and the subfield $a contains a certain value, like 'James'. Regular expressions can also be employed for more complex queries, enhancing the precision of searches.
- Find functions allows you to search for target field, indicators, or subfields.
- Finds MARC records that have Tag 100
- Finds MARC records that have Tag 100 and first indicator equals 1
- Finds MARC records where Tag is 100, first indicator equals 1, and subfield $a equals James. You can use regex pattern to find subfield values, for example ^James finds subfield value starts with James.
- You can find data in a control field as well.
See test_find_replace.pyFind and Replace functions allows you to search for target field, indicators, or subfields, and replace it with something else.
See test_find_replace.pyThe utilities facilitate the swapping of linked fields, which is invaluable for maintaining the integrity of vernacular and Romanized fields in bibliographic records.
The swap function makes the 880 fields the main fields and converts the main fields into linked fields 880. Use this function if you want to make the vernacular field the main fields, and make the Romanized fields the linked fields.
It converts this:
=100 1\$6880-01$aMuṣṭafá, ʻAbd al-ʻAzīz.
=880 1\$6100-01/(3/r$aمصطفى، عبد العزيز.
To:
=100 1\$6880-01$aمصطفى، عبد العزيز.
=880 1\$6100-01$aMuṣṭafá, ʻAbd al-ʻAzīz.
The uncombining of diacritics feature is another significant function, as it helps normalize text by separating combined UTF-8 characters into their base characters and diacritic marks.
Replace combined UTF-8 characters to uncombined characters (characters+diacritics)
For example: change combine a macron ā to two characters (a+macron) ā
Use '?' in skip_subfield_code if you want to uncombine all subfields data
The ability to count the number of records in a raw MARC file provides a quick assessment of the dataset's size, aiding in data management tasks.
Retrieves the total number of MARC records in a file. This Function retrieves a dataset of MARC records that starts from a record number in a file.Returns a list of 10 records starting from record number 5000 in a file, and ends with records number 5009.
For instance, you might use Pymarc to extract all the titles, authors, and publication dates from a collection of MARC records and then save this information into a CSV file for further analysis or reporting.
The process involves writing a script in Python that utilizes the Pymarc library to read MARC records, select the desired fields, and then write those fields to a CSV file. There are resources and examples available online that can guide you through this process.
This function removes all subfield codes and delimters from a field, and saves the field into one row of a csv file. The csv is column headers are the tags in tags_list #CVS class calling
fieldsto_csv = export_csv.EXPORT_CSV('normarlized.csv')
#normalized_fields_to_csv (records_list, tags_list)
fieldsto_csv.normalized_fields_to_csv(records_list,['100','245','650'])
Creates a CSV file with the following headers100,245,650
#Create 2d list
thislist = [["245","a","b","h"], ["300","a"], ["264","a","c"]]
#CVS class calling
fieldsto_csv = export_csv.EXPORT_CSV('subfields.csv')
#subfields_to_csv
fieldsto_csv.subfields_to_csv(records_list,thislist)
Creates a CSV file with the following headers245a,245b,245h,300a,264a,264c
#CVS class calling
fieldsto_csv = export_csv.EXPORT_CSV('DB_Normalized.csv')
#subfields_to_csv
fieldsto_csv.db_normalized_to_csv(records_list)
Creates a CSV file with the following headersPK,001,Tag_Sequence,Tag,Ind1,Ind2,Subfield_Squence,Subfield_code,Field_Value
Overall, Pymarc Utilities serve as an essential resource for librarians, archivists, and anyone involved in the handling of extensive bibliographic data, ensuring efficient and accurate cataloging and data management operations.