8000 SetReplication Error. · Issue #18721 · Alluxio/alluxio · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
SetReplication Error. #18721
Open
Open
@JiGuoDing

Description

@JiGuoDing

Alluxio Version:
v2.9.4

Describe the bug
First, i deployed the Alluxio with Helm in a K8S cluster which has 1 master node and 7 worker nodes.

Second, when i entered an Alluxio Worker Pod, i tried like this "alluxio fs setReplication --max 3 --min 3 /test_ufs.txt", and it worked pretty good for the first time. However, when i tried another time with "alluxio fs setReplication --max 4 --min 4 /test_ufs.txt", it didn't work, the replication num remained to be 3.

Third, I found some information in Alluxio-master logs:

2025-03-30 06:33:39,802 WARN Master Replication Check - Unexpected exception encountered when starting a REPLICATE job (uri=/test_ufs.txt, block ID=16777216, num replicas=5) : alluxio.exception.status.NotFoundException: There's SetReplica job running for path:/test_ufs.txt blockId:16777216, try later
2025-03-30 06:44:39,783 WARN Master Replication Check - Unexpected exception encountered when starting a REPLICATE job (uri=/test_ufs.txt, block ID=16777216, num replicas=5) : alluxio.exception.status.NotFoundException: There's SetReplica job running for path:/test_ufs.txt blockId:16777216, try later

and information in Alluxio-job-master logs:

2025-03-30 06:43:39,782 WARN grpc-default-executor-0 - Exit (Error): run: request=jobConfig: "\254\355\000\005sr\000+alluxio.job.plan.replicate.SetReplicaConfig\031\027\020|\037\027z\302\002\000\003J\000\bmBlockIdI\000\tmReplicasL\000\005mPatht\000\022Ljava/lang/String;xp\000\000\000\000\001\000\000\000\000\000\000\005t\000\r/test_ufs.txt"
, Error=alluxio.exception.JobDoesNotExistException: There's SetReplica job running for path:/test_ufs.txt blockId:16777216, try later
2025-03-30 06:44:39,782 WARN grpc-default-executor-3 - Exit (Error): run: request=jobConfig: "\254\355\000\005sr\000+alluxio.job.plan.replicate.SetReplicaConfig\031\027\020|\037\027z\302\002\000\003J\000\bmBlockIdI\000\tmReplicasL\000\005mPatht\000\022Ljava/lang/String;xp\000\000\000\000\001\000\000\000\000\000\000\005t\000\r/test_ufs.txt"
, Error=alluxio.exception.JobDoesNotExistException: There's SetReplica job running for path:/test_ufs.txt blockId:16777216, try later
2025-03-30 06:45:39,783 WARN grpc-default-executor-3 - Exit (Error): run: request=jobConfig: "\254\355\000\005sr\000+alluxio.job.plan.replicate.SetReplicaConfig\031\027\020|\037\027z\302\002\000\003J\000\bmBlockIdI\000\tmReplicasL\000\005mPatht\000\022Ljava/lang/String;xp\000\000\000\000\001\000\000\000\000\000\000\005t\000\r/test_ufs.txt"
, Error=alluxio.exception.JobDoesNotExistException: There's SetReplica job running for path:/test_ufs.txt blockId:16777216, try later
2025-03-30 06:46:39,783 WARN grpc-default-executor-3 - Exit (Error): run: request=jobConfig: "\254\355\000\005sr\000+alluxio.job.plan.replicate.SetReplicaConfig\031\027\020|\037\027z\302\002\000\003J\000\bmBlockIdI\000\tmReplicasL\000\005mPatht\000\022Ljava/lang/String;xp\000\000\000\000\001\000\000\000\000\000\000\005t\000\r/test_ufs.txt"
, Error=alluxio.exception.JobDoesNotExistException: There's SetReplica job running for path:/test_ufs.txt blockId:16777216, try later
2025-03-30 06:47:39,782 WARN grpc-default-executor-5 - Exit (Error): run: request=jobConfig: "\254\355\000\005sr\000+alluxio.job.plan.replicate.SetReplicaConfig\031\027\020|\037\027z\302\002\000\003J\000\bmBlockIdI\000\tmReplicasL\000\005mPatht\000\022Ljava/lang/String;xp\000\000\000\000\001\000\000\000\000\000\000\005t\000\r/test_ufs.txt"
, Error=alluxio.exception.JobDoesNotExistException: There's SetReplica job running for path:/test_ufs.txt blockId:16777216, try later

Forth, I entered an Alluxio-worker pod and checked the alluxio job list:

sh-4.2# alluxio job ls
1743316123474 Persist COMPLETED
1743316123475 Replicate COMPLETED

it indicated that all the jobs were completed.

My confusion is why the job list says all tasks are completed, but the logs still show that there are setReplication jobs running? This problem prevents me from repeatedly adjusting the number of replicas for a file in Alluxio.

To Reproduce
Steps to reproduce the behavior (as minimally and precisely as possible)

Expected behavior
A clear and concise description of what you expected to happen.

Urgency
Describe the impact and urgency of the bug.

Are you planning to fix it
Yes.

Additional context
properties in values.yaml

properties:
  alluxio.security.stale.channel.purge.interval: 365d
  alluxio.conf.dynamic.update.enabled: true
  alluxio.user.file.metadata.sync.interval: 0
  alluxio.master.mount.table.root.ufs: "hdfs://<haodop-ip>:9001/alluxio/ufs"
  alluxio.underfs.address: "hdfs://<hadoop-ip>:9001/alluxio/ufs"
  alluxio.underfs.hdfs.configuration: "/secrets/hdfsConfig/core-site.xml:/secrets/hdfsConfig/hdfs-site.xml"
  alluxio.master.journal.ufs.option.alluxio.underfs.hdfs.configuration: "/secrets/hdfsConfig/core-site.xml:/secrets/hdfsConfig/hdfs-site.xml" 
  alluxio.master.journal.ufs.folder: "hdfs://<hadoop-ip>:9001/alluxio/journal"
  alluxio.security.authentication.type: "NOSASL"
  alluxio.security.authorization.permission.enabled: false
  alluxio.debug: true
  alluxio.proxy.s3.v2.version.enabled: false
  alluxio.proxy.s3.v2.async.processing.enabled: false
  alluxio.underfs.hdfs.user: "root"
  alluxio.user.metadata.cache.enabled: true
  alluxio.security.login.username: "root"

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-bugThis issue is about a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0