8000 IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed · Issue #868 · deepset-ai/FARM · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.
This repository was archived by the owner on Apr 8, 2025. It is now read-only.
IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed #868
Open
@ShuhaoZhangTony

Description

@ShuhaoZhangTony

Describe the bug
A clear and concise description of what the bug is.

I'm trying to use haystack's API to build a RAG pipeline. I'm using FAISSDocumentStore and EmbeddingRetriever.

Works like the following:

# Create the document store using the factory
document_store = create_document_store(store_type, **store_config)

documents = []
documents_dir = args.docs_path
for filename in os.listdir(documents_dir):
    file_path = os.path.join(documents_dir, filename)
    if os.path.isfile(file_path):
        with open(file_path, 'r', encoding='utf-8') as file:
            content = file.read()
            document = Document(content=content)
            documents.append(document)
document_store.write_documents(documents)

# Ensure the retriever is initialized before updating embeddings
retriever = RetrieverFactory.get_retriever(retriever_type=args.retriever_type,
                                           document_store=document_store,
                                           query_embedding_model=args.query_embedding_model,
                                           passage_embedding_model=args.passage_embedding_model
                                           )

# Update embeddings right after writing documents
if hasattr(document_store,
           'update_embeddings'):  # check ensures that this code block only executes if the document_store instance has the update_embeddings method.
    document_store.update_embeddings(retriever=retriever, batch_size=10)

Error message
Error that was thrown (if available)

haystack/modeling/model/language_model.py", line 222, in _pool_tokens
ignore_mask_3d[:, :, :] = ignore_mask_2d[:, :, np.newaxis]
~~~~~~~~~~~~~~^^^^^^^^^
IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here, like type of downstream task, part of etc..

To Reproduce
Steps to reproduce the behavior

System:

  • OS: Ubuntu 18. 4F4F 04
  • GPU/CPU:
  • FARM version:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0