collectionsasdatajam2019

Experiments with World Digital Library in ACH Collections as Data Jam 2019 in Pittsburgh, PA (description of the event: http://ach2019.ach.org/full-day-collections-as-data-jam-30/)

At the ACH Conference in Pittsburgh, PA, July 23, 2019, at the Collections as Data Jam workshop. The team included: Kristen Mapes, Mickey Casad, David Newbury, Ellen Prokop, Anindita Basu Sempere, and Ece Turnator. We explored the World Digital Library (https://www.wdl.org/en/), and used its API (http://api.wdl.org/) to undertake a data exploration of a search for "dragon" in the library. The csv included here is the compiled data we created, which includes latitude and longitude, dates, subject headings, and image measurements (hue, saturation, brightness). The pdf is a slideshow with many screenshots of the work we did, including a data model of the process, and Palladio visualizations.

The tools we used were:

Palladio
Vistorian (well, we tried)
Carto
Image measurements - ImageJ (& Imageplot in some ways)
RAW Graphs
IIIF
node-microdata-scraper
Typhoeus

Running the code

There are two scripts that were written for this. First is the main data extraction library, which is a node.js script.

You can install the dependencies with npm install, and then run the script with node index.js. It will export a CSV file to standard output, which is not that useful. I ran it with node index.js > locations.csv and node index.js > item_data.csv which actually generate the output. To toggle between the two modes, you can comment out one or the other lines at the bottom of the index.js file:

      createLocationSpreadsheet(allURLs)
      // createObjectSpreadsheet(allURLs)

The other file, test.rb, is a small Ruby script to scrape URLs. You can install the dependencies with bundle install (Assuming you have both Ruby and Bundler installed on your system). It crawls a pre-determined list of URLs, munges an Image API url into a Manifest URL, and then downloads the thumbnail URLs parsed from that manifest. It's not a very elegant solution.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
images		images
.gitignore		.gitignore
DragonDataset_imagemeasurements.csv		DragonDataset_imagemeasurements.csv
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
Team Dragon Slides.pdf		Team Dragon Slides.pdf
WorldDigitalLibraryDragonDataset.csv		WorldDigitalLibraryDragonDataset.csv
index.js		index.js
item_data.csv		item_data.csv
locations.csv		locations.csv
output.json		output.json
package-lock.json		package-lock.json
package.json		package.json
test.rb		test.rb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

collectionsasdatajam2019

Running the code

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

kmapes/collectionsasdatajam2019

Folders and files

Latest commit

History

Repository files navigation

collectionsasdatajam2019

Running the code

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages