8000 MonitorAid · devlaam/Leucine Wiki · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

MonitorAid

devlaam edited this page Jun 8, 2023 · 6 revisions

MonitorAid: Defining what is traced.

Text compliant with Leucine version 0.5.x

Introduction

If the application contains may actors things may behave in an unexpected manner. In such a situation some insights in the system are required. But also during normal operation it is nice to actually see everything runs as designed. In order to facilitate the user here, the MonitorAid is provided.

If you mixin MonitorAid there are two possibilities:

  • use a global actor monitor, derived from GlobalMonitor
  • use a local actor monitor, usually an instance of LocalMonitor

The choice is made by supplying MonitorAid the appropriate trait parameter. Here we will discuss the latter, which gives the possibility to investigate what is going on in this particular actor, without having to set up a system wide monitor. On the wiki page ActorMonitor it will be shown to do just that.

Mixin MonitorAid with a LocalMonitor

The library provides a predefined LocalMonitor class that can be instantiated directly at the definition of your Actor like this:

class Ticker extends AcceptActor(Ticker), MonitorAid(LocalMonitor(1.second))

but you are of course free to first derive a personal version with extra capabilities and make use of that. Due to the generic signature of MonitorAid:

trait MonitorAid[Monitor <: ActorMonitor](val monitor: Monitor)(using context: ActorContext) extends ActorInit, ActorDefs :

those methods will be available in the newly defined actor as well. But the former is also perfectly fine. What you should not do is use an instance of a LocalMonitor in multiple actors. Use a new one for each actor instance. The time parameter of LocalMonitor is obligatory and defines the probe frequency for the monitor actions.

Internals of MonitorAid

The way the monitor works is that it starts taking samples from the actor at the given interval. The bare actor and each mixin generate performance metrics that are collected during activity. At each sample, those metrics are copied and, depending on the type of metric, are reset again. Those metrics can be interrogated at three different levels, with increasing granularity. This will be discussed below.

Besides that, when MonitorAid is mixed in, the following two methods become available:

  /**
   * Calculate the relative time this actor spend performing processing letters. Returns a value between
   * zero and one. Zero means no time was spend at processing messages and one means 100% of the time was
   * spend therein. With alltime being true this value reflects the overall performance of the actor, and
   * with false, the value since the last call to processLoad(false). If this actor is part of a system
   * ActorMonitor, do not use the latter call, for it may interfere with similar calls from the ActorMonitor
   * which utilizes the same call to determine the actor process load. */
  protected def processLoad(alltime: Boolean): Double

  /**
   * Returns the total time this actor is active in nano seconds. After the actor has
   * stopped (i.e. does not accept any messages any more) this value remains constant. */
  protected def activeTime: Long

which provide direct access to the load and lifetime of the actor. If you are probing the actor you should only call processLoad(true) and not processLoad(false) which may interfere with the probed measurements. Otherwise you can implement your own probing system by repeating calling on processLoad(false).

The processLoad returns a number between 0 and 1 indicating which fraction of its existence is spend on processing messages. Typically this number is very small, in the order of 0.0001 to 0.001.

There are two important parameters that need to be set as well:

  /**
   * In order to reduce the monitor output, you can switch the probing on and off at will. Per
   * default probing is on, but you may switch it off directly in the constructor if needed. */
  protected def probing(enable: Boolean): Unit

  /**
   * This is the personal setting of tracing. There is a public setting as well.
   * If tracing is active for this actor depends on both settings, in a symmetric manner.
   * There are two levels of tracing. TraceFull and TraceCount. TraceFull traces every
   * message with timestamp and action. This may be memory intensive. TraceCount just
   * counts the messages from sender to receiver per letter. Usually this number is limited.
   * However, both forms should only be used at debugging, since it requires a lot of
   * synchronized operations.
   * If both are Enabled or one is Enabled and the other is Default, the TraceFull is
   * active. In all other cases it is not. TraceCount is active when both are at Default
   * or one of the settings is Enabled. This implies that you can enable/disable
   * the tracing TraceFull here as long as the global tracing is Default or Enabled.
   * Setting this to Disabled will always prohibit tracing TraceFull of this actor,
   * and setting it to Default/Enabled leaves the fate in the hands of the global setting.
   * NOTE: This value is needed at object construction, so DO NOT override with a val.
   * You may however change its value during execution to be able to focus on a specific
   * time window or issue and reduce information you don't need. */
  protected def tracing: Tracing = Tracing.Default

So you learn that probing is automatically started normally. Usually this is what you want. However, there can be circumstances that the focus of interest lies further away in time, and you do not want to see all the stuff in between. Then it can be handy to switch off the probing in the constructor by calling probing(false) only to be switched on when a certain message comes in for example.

The second parameter also deserves some attention, and deals with the level messages should be traced. As described there are two levels of tracing, and the tracing can of course be completely turned off tracing = Tracing.Disabled, in which case only actor metrics are collected. In short, with tracing fully enabled tracing = Tracing.Enabled, every message occurrence is captured and reported with a time stamp. This can be handy to exactly reconstruct what is going on to find a particular problem, but also can be quite overwhelming.

Examples of tracing

With only TraceCount active tracing = Tracing.Default, every message type is counted, so you get a quick overview of which messages arrive and in what quantity. The particular order is lost however. This could look something like this:

from=:x,          to=F0.F2.F2.F1, letter=Forward: 1
from=:x,          to=F0.F2.F2.F2, letter=Forward: 1
from=F0,          to=F0,          letter=Create(2,3): 1
from=F0,          to=F0,          letter=Forward: 1
from=F0.F1,       to=F0,          letter=Backward: 4
from=F0.F1.F1,    to=F0.F1,       letter=Backward: 2
from=F0.F1.F1.F1, to=F0.F1.F1,    letter=Backward: 1

This indicates per letter kind "Forward", "Backward" ,"Create(2,3)" how many occurrences there were in the interval period, and also from which actor to which actor the message was send. The actor :x is the Anonymous Actor. So here we see that in the 2 seconds of the interval 4 letters "Backward" were send from actor called "F0.F1" to actor called "F0". (And we learn the "F1" is a child actor from "F0").

A small part of the result of full tracing could be:

time=50217024, nr=86, action=Completed, from=:x,          to=F0.F1.F2.F1,  mail=Processed, letter=Forward
time=50220140, nr=87, action=Accepted,  from=F0.F1.F1.F2, to=F0.F1.F1,     mail=Received,  letter=Backward
time=50220659, nr=88, action=Completed, from=:x,          to=F0.F2.F2.F2,  mail=Processed, letter=Forward
time=50243378, nr=89, action=Initiated, from=F0.F2.F1.F2, to=F0.F2.F1,     mail=Received,  letter=Backward
time=50261115, nr=90, action=Completed, from=:x,          to=F0.F1.F1.F2,  mail=Processed, letter=Forward
time=50276470, nr=91, action=Initiated, from=F0.F1.F2.F2, to=F0.F1.F2,     mail=Received,  letter=Backward

Observe that we see each action of each letter with time stamp, sender, target etc. The time is measured in nano seconds from start of the actor.

Note, these are the default string representations. If you want, you can define other representations as well that suit your application better. The perfect place for that would be in your own extension of LocalMonitor if you want to reuse it over different actors, or directly in your the actor being under surveillance.

There are two fields left to explain. These are action and mail. Although these are mostly for internal purposes, we give their meaning here for completeness. The action is an enumerated from:

enum Action :
  /* The actor is created */
  case Created
  /* The actor is terminated */
  case Terminated
  /* The message is accepted for delivery */
  case Accepted
  /* The message is refused */
  case Refused
  /* Letter processing has begone */
  case Initiated
  /* Letter processing is done */
  case Completed

where as the mail stems from:

  /**
   * These are the stages a mail can be in. There can be a few reasons why mail is not handled by
   * the actor. Those will be collected in a separate list for manual inspection. The reason is
   * specified by one of these causes. Otherwise we have the regular stages: Empty, Received and
   * Processed. */
  enum Mail extends EnumOrder[Mail] :
    /** The mail is not yet defined. */
    case Empty
    /** The mail was not delivered to the actor because it has stopped. */
    case Undelivered
    /** The mail was not accepted by the actor because the mailbox is full. */
    case Unaccepted
    /** The mail was received by the actor for processing. */
    case Received
    /** The mail was not matched by the actor because the type is unknown. */
    case Unmatched
    /** The mail was not read completely by the actor because of an unhandled exception. */
    case Unreadable
    /** The mail was not processed by the actor because it stopped prematurely. */
    case Unprocessed
    /** The mail was processed by the actor, this completes the handling. */
    case Processed

Actor metrics

As discussed the MonitorActor collect some internal metrics of the complete actor with all its mixins. Lets see what is meant by that. Here you see the metrics that are defined in the form of case classes:

  /** Class to marshal all the KPI's of the bare actor. */
  case class Bare(phase: BareActor.Phase, stop: Actor.Stop, lettersSum: Int, lettersMax: Int, exceptionsSum: Int, failedSum: Int, needles: Int, processLoad: Double) extends Sample :
    def userPPM = (processLoad * 1000000).toInt
    def show = s"phase=$phase, stop=$stop, lettersSum=$lettersSum, lettersMax=$lettersMax, exceptionsSum=$exceptionsSum, failedSum=$failedSum, needles=$needles, processLoad=${userPPM}ppm"

  /** Class to marshal all the KPI's of the Stash mixin. */
  case class Stash(lettersSum: Int, lettersMax: Int) extends Sample :
    def show = s"stashSum=$lettersSum, stashMax=$lettersMax"

  /** Class to marshal all the KPI's of the Timing mixin. */
  case class Timing(lettersSum: Int, lettersMax: Int, anchorsSize: Int) extends Sample :
    def show = s"timersSum=$lettersSum, timersMax=$lettersMax, timersNow=$anchorsSize"

  /** Class to marshal all the KPI's of the Families mixins. */
  case class Family(namedChildrenNow: Int, allChildrenNow: Int, workersSum: Long) extends Sample :
    def show = s"namedChildrenNow=$namedChildrenNow, allChildrenNow=$allChildrenNow, workersSum=$workersSum"

  /** Class to marshal all the KPI's of the Timing mixin. */
  case class Protect(alarmsSum: Int) extends Sample :
    def show = s"alarmsSum=$alarmsSum"

For the Bare actor we have:

  • phase: BareActor.Phase: mainly for internal use, but in principle tells you if the actor is running or not
  • stop: Actor.Stop: shows in which stop mode this actor is
  • lettersSum: Int: the total amount of letters processed within the probe interval
  • lettersMax: Int: the highest level of letters in the mailbox within the probe interval
  • exceptionsSum: the total amount of exceptions over the entire lifetime of the actor
  • failedSum: the total amount of failed messages over the entire lifetime of the actor
  • needles: the actual value of the amount of needles dropped (to determine if the actor is silent)
  • processLoad: the relative time spend in processing user code within the probe interval

For the Stash:

  • lettersSum: Int: the total amount of stashed letters processed within the probe interval
  • lettersMax: Int: the highest level of the stashed letters in the mailbox within the probe interval

For the Timing:

  • lettersSum: Int: the total amount of timer/event letters processed within the probe interval
  • lettersMax: Int: the highest level of the timer/event letters in the mailbox within the probe interval
  • anchorsSize: Int: the actual number of anchors holding a timer/event

For the Family:

  • namedChildrenNow: Int: the actual number of children with a user defined name for this actor
  • allChildrenNow: Int: the actual total number of children
  • workersSum: Long: total number of workers started within lifetime of the actor

For the Protect:

  • alarmsSum: the total amount of alarms raised over the entire lifetime of the actor

Each class carries a show method to quickly see what the values are.

The Ticker example with local monitor

Let us see this all come to live in a short example. In the demo's we have some code that just sends a messages around. This demo can be started with ticker, see the code and the README for more information. Here we discuss a simplified version of that code. This example only starts one actor which can be in two states (tick(x) and tock(x)). It sends a message to itself until x exceeds the value 10, and reports each new state.

The code reads:

/* Simple ticker example */
class Ticker extends AcceptActor(Ticker,"Ticker") :
  import Actor.Stop

  /* In receive we handle the incoming letters. */
  final protected def receive(letter: Letter, sender: Sender): Receive = (state: State) => {
    /* In this example, we do not care about the letters that much, but more about the state. */
    state match
      case Ticker.Tick(value: Int) =>
        /* Report that we are in the 'tick' state*/
        println(s"tick = $value")
        /* Send a new letter to myself to continue the work */
        this ! Ticker.Work
        /* Change the state to a new one. This is obligatory. */
        Ticker.Tock(value+1)
      case Ticker.Tock(value: Int) =>
        /* Report that we are in the 'tock' state*/
        println(s"tock = $value")
        /* As long as we are below 10 we continue the work, otherwise we stop. */
        if value<10 then this ! Ticker.Work else stop(Stop.Finish)
        /* Change the state to a new one. This is obligatory. */
        Ticker.Tick(value+1) }

object Ticker extends AcceptDefine :
  /* The ticker only excepts one letter */
  sealed trait Letter extends Actor.Letter[Actor]
  case object Work extends Letter
  /* This actor can be in two 'states' */
  sealed trait State extends Actor.State
  case class Tick(value: Int) extends State
  case class Tock(value: Int) extends State
  /* The initial state of of a state actor */
  val initial: State = Ticker.Tick(0)


/* Main entry point for the demo. */
object Main :
  /* Create the root of the tree. */
  private val ticker = Ticker
  /* Main code entrance point */
  def main(args: Array[String]): Unit =
    /* Set the ticker to work. */
    ticker ! Ticker.Work
    /* Give it some time to complete */
    Thread.sleep(200)

This code produces:

tick = 0
tock = 1
tick = 2
tock = 3
tick = 4
tock = 5
tick = 6
tock = 7
tick = 8
tock = 9
tick = 10
tock = 11

which you can observe here in Scastie

Now we add the debugging, and must modify the class with the following items to get the actor 'under surveillance':

  • add a parameter to the class Ticker (this is optional of course)
  • extend Ticker with a local monitor: MonitorAid(LocalMonitor(1.second))
  • define the value of tracing (use a def not a val!)
  • define three reporting methods for samples, posts and traces
  • register those methods at the monitor
  • set the value of probing to debug, without, probing is true per default
  • call the Ticker instantiation with the parameter true

This gives:

/* Simple example to illustrate debugging capabilities with a local monitor. */
class Ticker(debug: Boolean) extends AcceptActor(Ticker,"Ticker"), MonitorAid(LocalMonitor(1.second)) :
  import Actor.{Post, Stop}
  import MonitorAid.{Action, Sample, Trace, Tracing}

  /* We allow full tracing for this actor. */
  final override def tracing = Tracing.Enabled

  /* Define report functions for each capability */
  private def sampled(samples: List[Sample]): Unit      = samples.foreach(sample => println(s"== SAMPLE ==> ${sample.show}"))
  private def posted(posts: SortedMap[Post,Long]): Unit = posts.foreach((post,i) => println(s"== POST ====> ${post.short}: $i"))
  private def traced(traces: SortedSet[Trace]): Unit    = traces.foreach(trace   => println(s"== TRACE ===> ${trace.show}"))

  /* Register the report functions so they are called by the local monitor. */
  monitor.register(sampled)
  monitor.register(posted)
  monitor.register(traced)

  /* Only activate the monitor when we are in the debug mode. */
  probing(debug)

  /* In receive we handle the incoming letters. */
  final protected def receive(letter: Letter, sender: Sender): Receive = (state: State) => {
    /* In this example, we do not care about the letters that much, but more about the state. */
    state match
      case Ticker.Tick(value: Int) =>
        /* Report that we are in the 'tick' state*/
        println(s"tick = $value")
        /* Send a new letter to myself to continue the work */
        this ! Ticker.Work
        /* Change the state to a new one. This is obligatory. */
        Ticker.Tock(value+1)
      case Ticker.Tock(value: Int) =>
        /* Report that we are in the 'tock' state*/
        println(s"tock = $value")
        /* As long as we are below 10 we continue the work, otherwise we stop. */
        if value<10 then this ! Ticker.Work else stop(Stop.Finish)
        /* Change the state to a new one. This is obligatory. */
        Ticker.Tick(value+1) }

object Ticker extends AcceptDefine :
  /* The ticker only excepts one letter */
  sealed trait Letter extends Actor.Letter[Actor]
  case object Work extends Letter
  /* This actor can be in two 'states' */
  sealed trait State extends Actor.State
  case class Tick(value: Int) extends State
  case class Tock(value: Int) extends State
  /* The initial state of of a state actor */
  val initial: State = Ticker.Tick(0)


/* Main entry point for the demo. */
object Main :
  /* Create the root of the tree. */
  private val ticker = Ticker(true)
  /* Main code entrance point */
  def main(args: Array[String]): Unit =
    /* Set the ticker to work. */
    ticker ! Ticker.Work
    /* Give it some time to complete */
    Thread.sleep(200)

If we now run this code we get (slightly beautified):

tick = 0
tock = 1
tick = 2
tock = 3
tick = 4
tock = 5
tick = 6
tock = 7
tick = 8
tock = 9
tick = 10
tock = 11
== SAMPLE ==> phase=Finish, stop=Finish, lettersSum=12, lettersMax=1, exceptionsSum=0, failedSum=0, needles=0, processLoad=208547ppm
== POST ====> from=:x,     to=Ticker, letter=Work: 1
== POST ====> from=Ticker, to=Ticker, letter=Work: 11
== TRACE ===> time=1959442,  nr=1,  action=Created, actor=Ticker
== TRACE ===> time=11487944, nr=2,  action=Accepted, from=:x,      to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=12360009, nr=3,  action=Initiated, from=:x,     to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=13062339, nr=4,  action=Accepted,  from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=13995010, nr=5,  action=Completed, from=:x,     to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=14282467, nr=6,  action=Initiated, from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=14462866, nr=7,  action=Accepted,  from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=14488807, nr=8,  action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=15423130, nr=9,  action=Initiated, from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=15555440, nr=10, action=Accepted,  from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=15576662, nr=11, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=15784287, nr=12, action=Initiated, from=Ticker, to=Ticker, mail
10000
=Received,  letter=Work
== TRACE ===> time=15907734, nr=13, action=Accepted,  from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=15928372, nr=14, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=16139677, nr=15, action=Initiated, from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=16258106, nr=16, action=Accepted,  from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=16277687, nr=17, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=16463901, nr=18, action=Initiated, from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=16574024, nr=19, action=Accepted,  from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=16592622, nr=20, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=16768664, nr=21, action=Initiated, from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=16872800, nr=22, action=Accepted,  from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=16890789, nr=23, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=17123063, nr=24, action=Initiated, from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=17249651, nr=25, action=Accepted,  from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=17270324, nr=26, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=17476122, nr=27, action=Initiated, from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=17587760, nr=28, action=Accepted,  from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=17606682, nr=29, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=17796116, nr=30, action=Initiated, from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=17905111, nr=31, action=Accepted,  from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=17924338, nr=32, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=18401725, nr=33, action=Initiated, from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=18527726, nr=34, action=Accepted,  from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=18547249, nr=35, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=18746084, nr=36, action=Initiated, from=Ticker, to=Ticker, mail=Received,  letter=Work
== TRACE ===> time=19404071, nr=37, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work

See this code run in Scastie

Internals of an actor

Clone this wiki locally
0