8000 finetune · Issue #52 · songlab-cal/gpn · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
finetune #52
Open
Open
@CAU-LEI

Description

@CAU-LEI

Hi, thank you for the excellent work on GPN — it's a really well-structured and efficient framework for genomic sequence modeling. I used it to pretrain a model on my own dataset with the GPNForMaskedLM architecture.

After training, my config.json looks like this:

{
"architectures": ["GPNForMaskedLM"],
"model_type": "GPN",
"vocab_size": 7,
"embedding": "one_hot",
"embedding_size": 768,
"encoder": "convnet",
"num_hidden_layers": 25,
"hidden_size": 512,
"pad_token_id": 0,
"max_position_embeddings": 1536,
...
}
My questions are:
1.Should I manually add "num_labels": 2 to the config.json before fine-tuning?

2.Is it sufficient to change "architectures" to "GPNForSequenceClassification" in the config, or is that automatically inferred?

3.Could you kindly provide an example fine-tuning command for a binary classification task, including required arguments such as --problem_type, dataset formatting, etc.?

  1. If possible, could you also include a short explanation of how finetune.py loads the model/config/tokenizer and prepares the dataset?

Having an example would make it much easier to understand the full fine-tuning pipeline. Thank you again for your great work!

< 3C04 /div>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0