8000 Deadlock scenario · Issue #185 · sustrik/libdill · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Deadlock scenario #185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occas 8000 ionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mulle-nat opened this issue Feb 14, 2019 · 12 comments
Closed

Deadlock scenario #185

mulle-nat opened this issue Feb 14, 2019 · 12 comments

Comments

@mulle-nat
Copy link

I am making this an own issue from the other thread, because I think this is a bug. If this is intentional behaviour it'd be interesting to know. I reduced my test case even more, to make it more understandable and the problem more prominent.

#include <stdio.h>
#include <sys/time.h>

#include <libdill.h>


coroutine void   consumer( int ch)
{
   for(;;)
   {
      fprintf( stderr, "before chrecv: %lld\n", dill_now());
      if( chrecv( ch, NULL, 0, -1) == -1)
      {
         fprintf( stderr, "failed chrecv: %lld\n", dill_now());
         switch( errno)
         {
         case EPIPE     :
         case ECANCELED : return;
         default        : perror("chrecv"); abort();
         }
      }
      fprintf( stderr, "after chrecv: %lld\n", dill_now());

      printf( "Press RETURN to continue\n");
      fflush( stdout);
   }
}


int    main( int argc, char *argv[])
{
   int   handle;
   int   ch[ 2];

   chmake( ch);

   handle = go( consumer( ch[ 0]));

   fprintf( stderr, "before chsend: %lld\n", dill_now());
   chsend( ch[ 1], NULL, 0, -1);
   fprintf( stderr, "after chsend: %lld\n", dill_now());

   //
   // The chsend didn't wake up the receiver. 
   // Neither does a chdone wake up the receiver
   //
   chdone( ch[ 1]);

   //
   // The user will wait endlessly without the prompt from consumer
   //
   getchar();

   bundle_wait( handle, -1);
   return( 0);
}

This is the input/output. (The empty line is me pressing RETURN for getchar)

before chrecv: 2564798
before chsend: 2564798
after chsend: 2564798

after chrecv: 2566314
Press RETURN to continue
before chrecv: 2566314
failed chrecv: 2566314

It is clear that the chread is not triggered during chsend. This is curious because there is a dill_trigger in dill_chsend to wake up channel readers presumably, but it does not seem to do its job. The chdone doesn't trigger the chread either. But the bundle_wait eventually does.

@sustrik
Copy link
Owner
sustrik commented Feb 16, 2019

You are using getchar which is a blocking function and block all the coroutines. Try deleting that line.

@sustrik sustrik closed this as completed Feb 16, 2019
@mulle-nat
Copy link
Author

The getchar is just there to make the log somewhat easier to read for the timestamps and to illustrate the general problem. It has really nothing to do with the question.

@sustrik
Copy link
Owner
sustrik commented Feb 17, 2019

But it blocks the entire thread and the coroutines thus can't run. Try deleting it. Or relace it with msleep.

@mulle-nat
Copy link
Author
mulle-nat commented Feb 17, 2019

The outcome is really no different. Did you try it ? I don't know how I can explain this better as I already wrote:

is clear that the chread is not triggered during chsend. This is curious because there
is a dill_trigger in dill_chsend to wake up channel readers presumably, but it does not
seem to do its job. The chdone doesn't trigger the chread either. But the
bundle_wait eventually does.

Do I expect to much from dill_trigger ?

@sustrik
Copy link
Owner
sustrik commented Feb 17, 2019

Well, this is your output from above:

before chrecv: 2564798
before chsend: 2564798
after chsend: 2564798

after chrecv: 2566314 <---  chrecv got a message
Press RETURN to continue
before chrecv: 2566314
failed chrecv: 2566314

It only happens after you press a key because getchar blocks the entire thread.

8000

@mulle-nat
Copy link
Author

The question again is not about that but, why It doesn't print:

before chrecv: 2564798
before chsend: 2564798
after chrecv: 2564798  <<<<<<<<<<<<<< NOT PRINTED BUT WHY ??
after chsend: 2564798

The getchar is way after the point in question, it's just there for illustration purposes. It has NO pertinence on the question.

@sustrik
Copy link
Owner
sustrik commented Feb 17, 2019

Why it doesn't print what?

The line of code is:

fprintf( stderr, "after chrecv: %lld\n", dill_now());

The printed string is:

after chrecv: 2564798

That looks OK to me, no?

@mulle-nat
Copy link
Author

I'll try it one more time, putting the question

It is clear that the chread is not triggered during chsend.
This is curious because there is a dill_trigger in dill_chsend
to wake up channel readers presumably, but it does not
seem to do its job.
Do I expect to much from dill_trigger ?

in kind of a flow diagram. So if chsend wakes the receiver chrecv a sequence of call would be:

main coroutine
go  
  printf "before chrecv"
  chrecv #1
  trigger
printf "before chsend"
chsend #1
trigger
  chrecv #1 continued
  printf "after chrecv"
  chrecv #2
  trigger
chsend #1 < continued>
printf "after chsend"
...

This would print the sequence:

before chrecv
before chsend
after chrecv
after chsend

But the actual sequence printed is

before chrecv
before chsend
after chsend

This indicates to me either there is a bug in libdill or that
"I expect to much from dill_trigger", though I don't see how that
behaviour not to switch during a send would be desirable.

@sustrik
Copy link
Owner
sustrik commented Feb 17, 2019

You should make no assumptions about how scheduler schedules coroutines.

After message is passed between coroutines, both sending coroutine and receiving coroutine are free to continue. Scheduler will pick one of them. You can't know which one in advance.

@mulle-nat
Copy link
Author
mulle-nat commented Feb 17, 2019

But does a dill_yield after dill_chsend guarantee it though ?

I wrote my own dill_chbroadcast (See: https://github.com/mulle-nat/libdill/blob/master/chan.c#L224), which uses dill_yield after dill_trigger. It seems to work OK, but I am just assuming the reliable context switch there. Can dill_yield after dill_trigger really guarantee it ?

I wonder if doing this with libdill is a good idea and if I shouldn't roll my own stuff based on deboost.context or something to get the reliability (and speed) I need. After all I just don't need much...

@sustrik
Copy link
Owner
sustrik commented Feb 17, 2019

No, you can't rely on scheduler working in a deterministic manner.

Also, dill_trigger is an internal function and shouldn't be used from outside.

As for deboost, I have no experience with it, so I can't tell.

@mulle-nat
Copy link
Author

Ok thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0