Open
Description
Hi! Trying to run SciDocs on my own model. I produce the three files of embeddings and then run the evaluation suite:
from scidocs import get_scidocs_metrics
from scidocs.paths import DataPaths
# point to the data, which should be in scidocs/data by default
data_paths = DataPaths()
# now run the evaluation
scidocs_metrics = get_scidocs_metrics(
data_paths,
str(classification_embeddings_path),
str(user_activity_and_citations_embeddings_path),
str(recomm_embeddings_path),
val_or_test='test', # set to 'val' if tuning hyperparams
n_jobs=12, # the classification tasks can be parallelized
cuda_device=0 # the recomm task can use a GPU if this is set to 0, 1, etc
)
print(scidocs_metrics)
The first few tasks seem to work okay:
Loading MAG/MeSH embeddings...
reading embeddings from file...: 48473it [00:16, 2908.12it/s]
Running the MAG task...
Fitting 3 folds for each of 7 candidates, totalling 21 fits
[Parallel(n_jobs=12)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=12)]: Done 21 out of 21 | elapsed: 5.9min finished
Running the MeSH task...
Fitting 3 folds for each of 7 candidates, totalling 21 fits
[Parallel(n_jobs=12)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=12)]: Done 21 out of 21 | elapsed: 4.6min finished
Loading co-view, co-read, cite, and co-cite embeddings...
reading embeddings from file...: 142009it [00:50, 2803.86it/s]
But when it hits the recomm
task it errors out:
Running the recomm task...
[/content/scidocs/scidocs/__init__.py](https://localhost:8080/#) in get_scidocs_metrics(data_paths, classification_embeddings_path, user_activity_and_citations_embeddings_path, recomm_embeddings_path, val_or_test, n_jobs, cuda_device)
39 scidocs_metrics.update(get_mag_mesh_metrics(data_paths, classification_embeddings_path, val_or_test=val_or_test, n_jobs=n_jobs))
40 scidocs_metrics.update(get_view_cite_read_metrics(data_paths, user_activity_and_citations_embeddings_path, val_or_test=val_or_test))
---> 41 scidocs_metrics.update(get_recomm_metrics(data_paths, recomm_embeddings_path, val_or_test=val_or_test, cuda_device=cuda_device))
42
43 return scidocs_metrics
[/content/scidocs/scidocs/recomm_click_eval.py](https://localhost:8080/#) in get_recomm_metrics(data_paths, embeddings_path, val_or_test, cuda_device)
166 subprocess.run(command)
167 metrics = evaluate_ranking_performance(simpapers_model_path, data_paths.recomm_test if val_or_test=='test'
--> 168 else data_paths.recomm_val, int(cuda_device))
169 return {'recomm': {
170 'adj-NDCG': np.round(100 * float(metrics['Adj-ndcg']), 2),
[/content/scidocs/scidocs/recomm_click_eval.py](https://localhost:8080/#) in evaluate_ranking_performance(archive_path, test_data_path, cuda_device)
22 def evaluate_ranking_performance(archive_path, test_data_path, cuda_device):
23
---> 24 archive = archival.load_archive(archive_path, cuda_device=cuda_device)
25 params = archive.config
26 sr = archive.model
[/usr/local/lib/python3.7/dist-packages/allennlp/models/archival.py](https://localhost:8080/#) in load_archive(archive_file, cuda_device, overrides, weights_file)
168 """
169 # redirect to the cache, if necessary
--> 170 resolved_archive_file = cached_path(archive_file)
171
172 if resolved_archive_file == archive_file:
[/usr/local/lib/python3.7/dist-packages/allennlp/common/file_utils.py](https://localhost:8080/#) in cached_path(url_or_filename, cache_dir)
104 elif parsed.scheme == '':
105 # File, but it doesn't exist.
--> 106 raise FileNotFoundError("file {} not found".format(url_or_filename))
107 else:
108 # Something unknown
FileNotFoundError: file /content/scidocs/data/recomm-tmp/model.tar.gz not found
Looks like it can't find /content/scidocs/data/recomm-tmp/model.tar.gz
. Should this have been downloaded by the call to aws s3 sync
?
Metadata
Metadata
Assignees
Labels
No labels