8000 o.e.s.storage.types.StorageChannelTask : Error occurred in storage channel#0org.eclipse.store.storage.exceptions.StorageExceptionIoWriting: Physical length 3047424 of current head file 146 is not equal its expected length of 3047612 · Issue #95 · eclipse-store/store · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

o.e.s.storage.types.StorageChannelTask : Error occurred in storage channel#0org.eclipse.store.storage.exceptions.StorageExceptionIoWriting: Physical length 3047424 of current head file 146 is not equal its expected length of 3047612 #95

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghilainm opened this issue Dec 14, 2023 · 10 comments

Comments

@ghilainm
Copy link
ghilainm commented Dec 14, 2023
8000

Environment Details

  • EclipseStore Version: 1.0.0

  • JDK version: /eclipse_adoptium-21-amd64-linux/jdk-21.0.1+12

  • OS:
    os.arch = amd64
    os.name = Linux
    os.version = 6.1.61-85.141.amzn2023.x86_64

  • Used frameworks: None.

Describe the bug

Exception occur while storing data.

o.e.s.storage.types.StorageChannelTask : Error occurred in storage channel#0org.eclipse.store.storage.exceptions.StorageExceptionIoWriting: Physical length 3047424 of current head file 146 is not equal its expected length of 3047612

And another one with the same use case.

2023-12-14 15:16:54.791 ERROR   --- [torageChannel-0]  o.e.store.storage.types.StorageChannel   : StorageChannel#0 encountered disrupting exceptionorg.eclipse.store.storage.exceptions.StorageExceptionGarbageCollector: Exception in channel #0
        at org.eclipse.store.storage.types.StorageEntityCache$Default.incrementalGarbageCollection(StorageEntityCache.java:1451) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageChannel$Default.performGarbageCollection(StorageChannel.java:314) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageHousekeepingBroker$Default.performGarbageCollection(StorageHousekeepingBroker.java:118) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageChannel$Default.houseKeepingGarbageCollection(StorageChannel.java:376) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageChannel$Default.houseKeeping(StorageChannel.java:238) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageChannel$Default.work(StorageChannel.java:429) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageChannel$Default.run(StorageChannel.java:486) [storage-1.0.0.jar]
        at java.lang.Thread.run(Thread.java:1583)
Caused by: org.eclipse.store.storage.exceptions.StorageExceptionIoReading
        at org.eclipse.store.storage.types.StorageFile$Abstract.readBytes(StorageFile.java:163) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageFileManager$Default.loadData(StorageFileManager.java:643) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageLiveDataFile$Default.loadEntityData(StorageLiveDataFile.java:372) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageEntity$Default.internalLoadData(StorageEntity.java:397) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageEntity$Default.loadData(StorageEntity.java:392) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageEntity$Default.ensureCachedReferenceData(StorageEntity.java:364) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageEntity$Default.iterateReferenceIds(StorageEntity.java:528) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageEntityCache$Default.incrementalMark(StorageEntityCache.java:904) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageEntityCache$Default.internalIncrementalGarbageCollection(StorageEntityCache.java:1502) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageEntityCache$Default.incrementalGarbageCollection(StorageEntityCache.java:1447) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageChannel$Default.performGarbageCollection(StorageChannel.java:314) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageHousekeepingBroker$Default.performGarbageCollection(StorageHousekeepingBroker.java:118) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageChannel$Default.houseKeepingGarbageCollection(StorageChannel.java:376) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageChannel$Default.houseKeeping(StorageChannel.java:238) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageChannel$Default.work(StorageChannel.java:429) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageChannel$Default.run(StorageChannel.java:486) [storage-1.0.0.jar]
        at java.lang.Thread.run(Thread.java:1583)
Caused by: org.eclipse.serializer.exceptions.IndexBoundsException: Index out of bounds: 8063955 not in [0;8062879[ (StartIndex >= bound)
        at org.eclipse.serializer.io.XIO.internalRead(XIO.java:1230) [base-1.0.0.jar]
        at org.eclipse.serializer.io.XIO.read(XIO.java:1208) [base-1.0.0.jar]
        at org.eclipse.store.afs.nio.types.NioIoHandler$Default.specificReadBytes(NioIoHandler.java:431) [afs-nio-1.0.0.jar]
        at org.eclipse.store.afs.nio.types.NioIoHandler$Default.specificReadBytes(NioIoHandler.java:81) [afs-nio-1.0.0.jar]
        at org.eclipse.serializer.afs.types.AIoHandler$Abstract.readBytes(AIoHandler.java:777) [afs-1.0.0.jar]
        at org.eclipse.serializer.afs.types.AReadableFile.readBytes(AReadableFile.java:98) [afs-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageFile$Abstract.readBytes(StorageFile.java:159) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageFileManager$Default.loadData(StorageFileManager.java:643) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageLiveDataFile$Default.loadEntityData(StorageLiveDataFile.java:372) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageEntity$Default.internalLoadData(StorageEntity.java:397) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageEntity$Default.loadData(StorageEntity.java:392) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageEntity$Default.ensureCachedReferenceData(StorageEntity.java:364) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageEntity$Default.iterateReferenceIds(StorageEntity.java:528) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageEntityCache$Default.incrementalMark(StorageEntityCache.java:904) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageEntityCache$Default.internalIncrementalGarbageCollection(StorageEntityCache.java:1502) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageEntityCache$Default.incrementalGarbageCollection(StorageEntityCache.java:1447) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageChannel$Default.performGarbageCollection(StorageChannel.java:314) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageHousekeepingBroker$Default.performGarbageCollection(StorageHousekeepingBroker.java:118) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageChannel$Default.houseKeepingGarbageCollection(StorageChannel.java:376) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageChannel$Default.houseKeeping(StorageChannel.java:238) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageChannel$Default.work(StorageChannel.java:429) [storage-1.0.0.jar]
        at org.eclipse.store.storage.types.StorageChannel$Default.run(StorageChannel.java:486) [storage-1.0.0.jar]
        at java.lang.Thread.run(Thread.java:1583)

To Reproduce

Here are relevant snippets.

void doIt {
     try (EmbeddedStorageManager storageManager = EmbeddedStorage.start(Paths.get(dataDirectory))) {
            PerfData root = getRoot(storageManager);
            var userData = root.getUsers();
            Stopwatch stopwatch = Stopwatch.createStarted();
            for (int userIndex = 0; userIndex < USER_COUNT; userIndex++) {
                completionService.submit(this::createUser);
            }
            log.info(
                    "Submitted all {} tasks to the executor, waiting for generation of all users",
                    USER_COUNT);
            AtomicInteger atomicInteger = new AtomicInteger();
            for (int userIndex = 0; userIndex < USER_COUNT; userIndex++) {
                Future<UserDataCreated> future = completionService.take();
                try {
                    var userDataCreated = future.get();
                    userData.add(userDataCreated);
                    storageManager.store(userDataCreated);
                    atomicInteger.incrementAndGet();
                } catch (ExecutionException e) {
                    log.warn("Error while generating user", e);
                }
                // Display number of generated user and production rate per minute
                int successfullyGeneratedUsers = atomicInteger.get();
                if (successfullyGeneratedUsers % 100 == 0) {
                    log.info(
                            "Generated {} users in {} ms. Production rate: {} users per minute",
                            successfullyGeneratedUsers,
                            stopwatch.elapsed().toMillis(),
                            (successfullyGeneratedUsers * 60000) / stopwatch.elapsed().toMillis());
                    storageManager.store(userData);
                }
            }
            log.info("Persisting list of users");
            storageManager.store(userData);
        } finally {
            taskExecutor.shutdown();
        }
}

    private PerfData getRoot(EmbeddedStorageManager storageManager) {
        var stopwatch = Stopwatch.createStarted();
        var elapsed = stopwatch.elapsed();
        PerfData perfData;
        if (storageManager.root() == null) {
            perfData = new PerfData();
            perfData.setUsers(new LinkedList<>());
            storageManager.setRoot(perfData);
            storageManager.storeRoot();
            log.info("Generating new root for data.");
        } else {
            perfData = (PerfData) storageManager.root();
            log.info("Reusing data. Already {} users generated", perfData.getUsers().size());
        }
        log.info("Loading all data took {} ms", stopwatch.elapsed().minus(elapsed).toMillis());
        return perfData;
    }

Expected behavior

No Error.

@ghilainm
Copy link
Author
ghilainm commented Dec 14, 2023

I tried with channel size 1, I have the same issue.

@mstahv
Copy link
mstahv commented Jan 31, 2024

I got a similar error and once it is on the collected data is lost forever. Any idea what causes this? I took down a copy of the data from a live system using scp, could that have caused it (locked the file temporarily or something) 🤷‍♂️

@hg-ms
Copy link
Contributor
hg-ms commented Feb 1, 2024

@ghilainm:

Hello,
Unfortunately, we can’t reproduce this issue.
The StorageExceptionIoWriting exception indicates that the latests storage file is has not been written completely. The IndexBoundsException is similar, the storage tries to read data that is not present. Why the incomplete write happened is not visible in the provided trace.

@hg-ms
Copy link
Contributor
hg-ms commented Feb 1, 2024

@mstahv:
Copying an active storage is risky, as you must ensure that no write operations happen during the copy process. Even if no user code issues a store this can’t be ensured as the storage does cleanup operation if idle.
To copy a storage you need stop the storage or you can issue a full backup

@ghilainm
Copy link
Author
ghilainm commented Feb 1, 2024

@hg-ms This issue occurs to me all the time. I have stopped using eclipse store for this reason. The problem occurs on my laptop and on EC2 provisionned on AWS. Do you see any issue with the code I have provided? At least from a conceptual point of view?

Moreover, it gives the feeling that eclipse-store is either super fragile or that it can be easily used in a wrong way.

Maybe more documentation and snippets about how to use it properly would be useful? Reading the documentation I didn't find many example. Would be nice to have more use cases showcased.

@mstahv
Copy link
mstahv commented Feb 1, 2024

Yeah, I guess so, at least for getting functional backup. In my case it is the live server that got this issue.

Another idea that might have affected this: I'm storing the data on a directory in the server and then that is mounted to the Docker container running the actual app. There might be two apps/containers accesing the same files during the update as there is "zero downtime update" configure for the system. If it is this that cause my issue, I need a different strategy (this has worked fine with RDBMSs).

I wonder still if the default is right here to throw the exeption, instead of logging some error and overriding? At least to some backup file. My case (where I'm storing weather station data for my hobby/learning project) not a biggie, but in some production system I might have lost a full day of entries (althogh store in server memory just fine), if EclipseStore would have overridden I assume would have only lost just one entry 🤷‍♂️

@hg-ms
Copy link
Contributor
hg-ms commented Feb 2, 2024

@mstahv:
More than one EclipseStore instance accessing the persisted data is not supported. This could explain your corrupted storage files. This is a major difference to RDBMS.

@ghilainm:
Regarding your code, I need to assume that the method createUser doesn’t modify the object graph in any way or does not do any call to EclipseStore so that there can’t be any concurrency problem, same for the class PerfData.
Looking at the store calls in the code snippet there is one unnecessary: store.storageManager.store(userDataCreated) only persists the userDataCreated object without any relation to the userData as the change (add) to user dataData is not persisted jet. This is done later, every 100th future. Here you persist the userData. The call storageManager.store(userData) perists the userData change and would also perist the referenced new UserDataCreated if not done before.

@amagnolo
Copy link

It just happened to me on a live server that had been running without issues for a month.
Is there any way to fix the corrupt storage?

To fix the problem, I tried to restart the application but it refused to start. I had to completely wipe the storage to run it again.
I kept a copy of the corrupt storage to try to recover at least some data. There is only one application instance accessing the storage, channel size is 1, no docker involved.

@amagnolo
Copy link

As @ghilainm mentioned earlier, this kind of bugs hinders the whole purpose of using a persistent storage solution. One single write error means you have lost all your data forever.
As suggested by @mstahv, I would expect at least the resiliency of losing just one entry, not everything - whether this bug is caused by eclipsestore itself or a user mistake (I still wonder what happened, in my case).

I wasn't able to find any recovery/repair tool or instructions in the official documentation nor elsewhere: this makes me wonder if file storage corruption was never considered a possibility in the design of eclipsestore (hint: it does happen in real life).
Randomly losing everything since the last backup is not a solution for my use case, so I have to revert to the reliability of RDBMS. It was nice while it lasted. 😞

@hg-ms
Copy link
Contributor
hg-ms commented Sep 3, 2024

Fixed in v 1.4.0. Please reopen if the error occurs again.

@hg-ms hg-ms closed this as completed Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0