-
Notifications
You must be signed in to change notification settings - Fork 1
MonitorAid
Text compliant with Leucine version 0.5.x
If the application contains may actors things may behave in an unexpected manner.
In such a situation some insights in the system are required. But also
during normal operation it is nice to actually see everything runs
as designed. In order to facilitate the user here, the MonitorAid
is
provided.
If you mixin MonitorAid
there are two possibilities:
- use a global actor monitor, derived from
GlobalMonitor
- use a local actor monitor, usually an instance of
LocalMonitor
The choice is made by supplying MonitorAid
the appropriate trait
parameter. Here we will discuss the latter, which gives the possibility
to investigate what is going on in this particular actor, without having
to set up a system wide monitor. On the wiki page
ActorMonitor
it will be shown to do just that.
The library provides a predefined LocalMonitor
class that can be instantiated
directly at the definition of your Actor
like this:
class Ticker extends AcceptActor(Ticker), MonitorAid(LocalMonitor(1.second))
but you are of course free to first derive a personal version with extra
capabilities and make use of that. Due to the generic signature of MonitorAid
:
trait MonitorAid[Monitor <: ActorMonitor](val monitor: Monitor)(using context: ActorContext) extends ActorInit, ActorDefs :
those methods will be available in the newly defined actor as well. But the
former is also perfectly fine. What you should not do is use an instance
of a LocalMonitor in multiple actors. Use a new one for each actor instance.
The time parameter of LocalMonitor
is obligatory and defines the probe
frequency for the monitor actions.
The way the monitor works is that it starts taking samples from the actor at the given interval. The bare actor and each mixin generate performance metrics that are collected during activity. At each sample, those metrics are copied and, depending on the type of metric, are reset again. Those metrics can be interrogated at three different levels, with increasing granularity. This will be discussed below.
Besides that, when MonitorAid
is mixed in, the following two methods become
available:
/**
* Calculate the relative time this actor spend performing processing letters. Returns a value between
* zero and one. Zero means no time was spend at processing messages and one means 100% of the time was
* spend therein. With alltime being true this value reflects the overall performance of the actor, and
* with false, the value since the last call to processLoad(false). If this actor is part of a system
* ActorMonitor, do not use the latter call, for it may interfere with similar calls from the ActorMonitor
* which utilizes the same call to determine the actor process load. */
protected def processLoad(alltime: Boolean): Double
/**
* Returns the total time this actor is active in nano seconds. After the actor has
* stopped (i.e. does not accept any messages any more) this value remains constant. */
protected def activeTime: Long
which provide direct access to the load and lifetime of the actor. If you are
probing the actor you should only call processLoad(true)
and not
processLoad(false)
which may interfere with the probed measurements. Otherwise
you can implement your own probing system by repeating calling on processLoad(false)
.
The processLoad returns a number between 0 and 1 indicating which fraction of its existence is spend on processing messages. Typically this number is very small, in the order of 0.0001 to 0.001.
There are two important parameters that need to be set as well:
/**
* In order to reduce the monitor output, you can switch the probing on and off at will. Per
* default probing is on, but you may switch it off directly in the constructor if needed. */
protected def probing(enable: Boolean): Unit
/**
* This is the personal setting of tracing. There is a public setting as well.
* If tracing is active for this actor depends on both settings, in a symmetric manner.
* There are two levels of tracing. TraceFull and TraceCount. TraceFull traces every
* message with timestamp and action. This may be memory intensive. TraceCount just
* counts the messages from sender to receiver per letter. Usually this number is limited.
* However, both forms should only be used at debugging, since it requires a lot of
* synchronized operations.
* If both are Enabled or one is Enabled and the other is Default, the TraceFull is
* active. In all other cases it is not. TraceCount is active when both are at Default
* or one of the settings is Enabled. This implies that you can enable/disable
* the tracing TraceFull here as long as the global tracing is Default or Enabled.
* Setting this to Disabled will always prohibit tracing TraceFull of this actor,
* and setting it to Default/Enabled leaves the fate in the hands of the global setting.
* NOTE: This value is needed at object construction, so DO NOT override with a val.
* You may however change its value during execution to be able to focus on a specific
* time window or issue and reduce information you don't need. */
protected def tracing: Tracing = Tracing.Default
So you learn that probing is automatically started normally. Usually this is what you
want. However, there can be circumstances that the focus of interest lies further
away in time, and you do not want to see all the stuff in between. Then it can be
handy to switch off the probing in the constructor by calling probing(false)
only to be switched on when a certain message comes in for example.
The second parameter also deserves some attention, and deals with the level
messages should be traced. As described there are two levels of tracing, and
the tracing can of course be completely turned off tracing = Tracing.Disabled
,
in which case only actor metrics are collected. In short, with tracing
fully enabled tracing = Tracing.Enabled
, every message occurrence is
captured and reported with a time stamp. This can be handy to exactly reconstruct
what is going on to find a particular problem, but also can be quite overwhelming.
With only TraceCount active tracing = Tracing.Default
, every message type is counted, so you
get a quick overview of which messages arrive and in what quantity.
The particular order is lost however. This could look something like
this:
from=:x, to=F0.F2.F2.F1, letter=Forward: 1
from=:x, to=F0.F2.F2.F2, letter=Forward: 1
from=F0, to=F0, letter=Create(2,3): 1
from=F0, to=F0, letter=Forward: 1
from=F0.F1, to=F0, letter=Backward: 4
from=F0.F1.F1, to=F0.F1, letter=Backward: 2
from=F0.F1.F1.F1, to=F0.F1.F1, letter=Backward: 1
This indicates per letter kind "Forward", "Backward" ,"Create(2,3)"
how many occurrences there were in the interval period, and also
from which actor to which actor the message was send. The actor :x
is the Anonymous Actor. So here we see that in the 2 seconds of the
interval 4 letters "Backward" were send from actor called "F0.F1"
to actor called "F0". (And we learn the "F1" is a child
actor from "F0").
A small part of the result of full tracing could be:
time=50217024, nr=86, action=Completed, from=:x, to=F0.F1.F2.F1, mail=Processed, letter=Forward
time=50220140, nr=87, action=Accepted, from=F0.F1.F1.F2, to=F0.F1.F1, mail=Received, letter=Backward
time=50220659, nr=88, action=Completed, from=:x, to=F0.F2.F2.F2, mail=Processed, letter=Forward
time=50243378, nr=89, action=Initiated, from=F0.F2.F1.F2, to=F0.F2.F1, mail=Received, letter=Backward
time=50261115, nr=90, action=Completed, from=:x, to=F0.F1.F1.F2, mail=Processed, letter=Forward
time=50276470, nr=91, action=Initiated, from=F0.F1.F2.F2, to=F0.F1.F2, mail=Received, letter=Backward
Observe that we see each action of each letter with time stamp, sender, target etc. The time is measured in nano seconds from start of the actor.
Note, these are the default string representations. If you want,
you can define other representations as well that suit your
application better. The perfect place for that would be in your
own extension of LocalMonitor
if you want to reuse it
over different actors, or directly in your the actor being under
surveillance.
There are two fields left to explain. These are action
and mail
. Although these are
mostly for internal purposes, we give their meaning here for completeness.
The action is an enumerated
from:
enum Action :
/* The actor is created */
case Created
/* The actor is terminated */
case Terminated
/* The message is accepted for delivery */
case Accepted
/* The message is refused */
case Refused
/* Letter processing has begone */
case Initiated
/* Letter processing is done */
case Completed
where as the mail stems from:
/**
* These are the stages a mail can be in. There can be a few reasons why mail is not handled by
* the actor. Those will be collected in a separate list for manual inspection. The reason is
* specified by one of these causes. Otherwise we have the regular stages: Empty, Received and
* Processed. */
enum Mail extends EnumOrder[Mail] :
/** The mail is not yet defined. */
case Empty
/** The mail was not delivered to the actor because it has stopped. */
case Undelivered
/** The mail was not accepted by the actor because the mailbox is full. */
case Unaccepted
/** The mail was received by the actor for processing. */
case Received
/** The mail was not matched by the actor because the type is unknown. */
case Unmatched
/** The mail was not read completely by the actor because of an unhandled exception. */
case Unreadable
/** The mail was not processed by the actor because it stopped prematurely. */
case Unprocessed
/** The mail was processed by the actor, this completes the handling. */
case Processed
As discussed the MonitorActor
collect some internal metrics of
the complete actor with all its mixins. Lets see what is meant by
that. Here you see the metrics that are defined in the form of case classes:
/** Class to marshal all the KPI's of the bare actor. */
case class Bare(phase: BareActor.Phase, stop: Actor.Stop, lettersSum: Int, lettersMax: Int, exceptionsSum: Int, failedSum: Int, needles: Int, processLoad: Double) extends Sample :
def userPPM = (processLoad * 1000000).toInt
def show = s"phase=$phase, stop=$stop, lettersSum=$lettersSum, lettersMax=$lettersMax, exceptionsSum=$exceptionsSum, failedSum=$failedSum, needles=$needles, processLoad=${userPPM}ppm"
/** Class to marshal all the KPI's of the Stash mixin. */
case class Stash(lettersSum: Int, lettersMax: Int) extends Sample :
def show = s"stashSum=$lettersSum, stashMax=$lettersMax"
/** Class to marshal all the KPI's of the Timing mixin. */
case class Timing(lettersSum: Int, lettersMax: Int, anchorsSize: Int) extends Sample :
def show = s"timersSum=$lettersSum, timersMax=$lettersMax, timersNow=$anchorsSize"
/** Class to marshal all the KPI's of the Families mixins. */
case class Family(namedChildrenNow: Int, allChildrenNow: Int, workersSum: Long) extends Sample :
def show = s"namedChildrenNow=$namedChildrenNow, allChildrenNow=$allChildrenNow, workersSum=$workersSum"
/** Class to marshal all the KPI's of the Timing mixin. */
case class Protect(alarmsSum: Int) extends Sample :
def show = s"alarmsSum=$alarmsSum"
For the Bare
actor we have:
-
phase: BareActor.Phase
: mainly for internal use, but in principle tells you if the actor is running or not -
stop: Actor.Stop
: shows in which stop mode this actor is -
lettersSum: Int
: the total amount of letters processed within the probe interval -
lettersMax: Int
: the highest level of letters in the mailbox within the probe interval -
exceptionsSum
: the total amount of exceptions over the entire lifetime of the actor -
failedSum
: the total amount of failed messages over the entire lifetime of the actor -
needles
: the actual value of the amount of needles dropped (to determine if the actor is silent) -
processLoad
: the relative time spend in processing user code within the probe interval
For the Stash
:
-
lettersSum: Int
: the total amount of stashed letters processed within the probe interval -
lettersMax: Int
: the highest level of the stashed letters in the mailbox within the probe interval
For the Timing
:
-
lettersSum: Int
: the total amount of timer/event letters processed within the probe interval -
lettersMax: Int
: the highest level of the timer/event letters in the mailbox within the probe interval -
anchorsSize: Int
: the actual number of anchors holding a timer/event
For the Family
:
-
namedChildrenNow: Int
: the actual number of children with a user defined name for this actor -
allChildrenNow: Int
: the actual total number of children -
workersSum: Long
: total number of workers started within lifetime of the actor
For the Protect
:
-
alarmsSum
: the total amount of alarms raised over the entire lifetime of the actor
Each class carries a show method to quickly see what the values are.
Let us see this all come to live in a short example. In the demo's we have some
code that just sends a messages around. This demo can be started with ticker
,
see the code
and the README for more information.
Here we discuss a simplified version of that code. This example only starts one actor
which can be in two states (tick(x) and tock(x)). It sends a message to itself until
x exceeds the value 10, and reports each new state.
The code reads:
/* Simple ticker example */
class Ticker extends AcceptActor(Ticker,"Ticker") :
import Actor.Stop
/* In receive we handle the incoming letters. */
final protected def receive(letter: Letter, sender: Sender): Receive = (state: State) => {
/* In this example, we do not care about the letters that much, but more about the state. */
state match
case Ticker.Tick(value: Int) =>
/* Report that we are in the 'tick' state*/
println(s"tick = $value")
/* Send a new letter to myself to continue the work */
this ! Ticker.Work
/* Change the state to a new one. This is obligatory. */
Ticker.Tock(value+1)
case Ticker.Tock(value: Int) =>
/* Report that we are in the 'tock' state*/
println(s"tock = $value")
/* As long as we are below 10 we continue the work, otherwise we stop. */
if value<10 then this ! Ticker.Work else stop(Stop.Finish)
/* Change the state to a new one. This is obligatory. */
Ticker.Tick(value+1) }
object Ticker extends AcceptDefine :
/* The ticker only excepts one letter */
sealed trait Letter extends Actor.Letter[Actor]
case object Work extends Letter
/* This actor can be in two 'states' */
sealed trait State extends Actor.State
case class Tick(value: Int) extends State
case class Tock(value: Int) extends State
/* The initial state of of a state actor */
val initial: State = Ticker.Tick(0)
/* Main entry point for the demo. */
object Main :
/* Create the root of the tree. */
private val ticker = Ticker
/* Main code entrance point */
def main(args: Array[String]): Unit =
/* Set the ticker to work. */
ticker ! Ticker.Work
/* Give it some time to complete */
Thread.sleep(200)
This code produces:
tick = 0
tock = 1
tick = 2
tock = 3
tick = 4
tock = 5
tick = 6
tock = 7
tick = 8
tock = 9
tick = 10
tock = 11
which you can observe here in Scastie
Now we add the debugging, and must modify the class with the following items to get the actor 'under surveillance':
- add a parameter to the class
Ticker
(this is optional of course) - extend
Ticker
with a local monitor:MonitorAid(LocalMonitor(1.second))
- define the value of
tracing
(use adef
not aval
!) - define three reporting methods for samples, posts and traces
- register those methods at the monitor
- set the value of probing to
debug
, without, probing is true per default - call the
Ticker
instantiation with the parametertrue
This gives:
/* Simple example to illustrate debugging capabilities with a local monitor. */
class Ticker(debug: Boolean) extends AcceptActor(Ticker,"Ticker"), MonitorAid(LocalMonitor(1.second)) :
import Actor.{Post, Stop}
import MonitorAid.{Action, Sample, Trace, Tracing}
/* We allow full tracing for this actor. */
final override def tracing = Tracing.Enabled
/* Define report functions for each capability */
private def sampled(samples: List[Sample]): Unit = samples.foreach(sample => println(s"== SAMPLE ==> ${sample.show}"))
private def posted(posts: SortedMap[Post,Long]): Unit = posts.foreach((post,i) => println(s"== POST ====> ${post.short}: $i"))
private def traced(traces: SortedSet[Trace]): Unit = traces.foreach(trace => println(s"== TRACE ===> ${trace.show}"))
/* Register the report functions so they are called by the local monitor. */
monitor.register(sampled)
monitor.register(posted)
monitor.register(traced)
/* Only activate the monitor when we are in the debug mode. */
probing(debug)
/* In receive we handle the incoming letters. */
final protected def receive(letter: Letter, sender: Sender): Receive = (state: State) => {
/* In this example, we do not care about the letters that much, but more about the state. */
state match
case Ticker.Tick(value: Int) =>
/* Report that we are in the 'tick' state*/
println(s"tick = $value")
/* Send a new letter to myself to continue the work */
this ! Ticker.Work
/* Change the state to a new one. This is obligatory. */
Ticker.Tock(value+1)
case Ticker.Tock(value: Int) =>
/* Report that we are in the 'tock' state*/
println(s"tock = $value")
/* As long as we are below 10 we continue the work, otherwise we stop. */
if value<10 then this ! Ticker.Work else stop(Stop.Finish)
/* Change the state to a new one. This is obligatory. */
Ticker.Tick(value+1) }
object Ticker extends AcceptDefine :
/* The ticker only excepts one letter */
sealed trait Letter extends Actor.Letter[Actor]
case object Work extends Letter
/* This actor can be in two 'states' */
sealed trait State extends Actor.State
case class Tick(value: Int) extends State
case class Tock(value: Int) extends State
/* The initial state of of a state actor */
val initial: State = Ticker.Tick(0)
/* Main entry point for the demo. */
object Main :
/* Create the root of the tree. */
private val ticker = Ticker(true)
/* Main code entrance point */
def main(args: Array[String]): Unit =
/* Set the ticker to work. */
ticker ! Ticker.Work
/* Give it some time to complete */
Thread.sleep(200)
If we now run this code we get (slightly beautified):
tick = 0
tock = 1
tick = 2
tock = 3
tick = 4
tock = 5
tick = 6
tock = 7
tick = 8
tock = 9
tick = 10
tock = 11
== SAMPLE ==> phase=Finish, stop=Finish, lettersSum=12, lettersMax=1, exceptionsSum=0, failedSum=0, needles=0, processLoad=208547ppm
== POST ====> from=:x, to=Ticker, letter=Work: 1
== POST ====> from=Ticker, to=Ticker, letter=Work: 11
== TRACE ===> time=1959442, nr=1, action=Created, actor=Ticker
== TRACE ===> time=11487944, nr=2, action=Accepted, from=:x, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=12360009, nr=3, action=Initiated, from=:x, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=13062339, nr=4, action=Accepted, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=13995010, nr=5, action=Completed, from=:x, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=14282467, nr=6, action=Initiated, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=14462866, nr=7, action=Accepted, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=14488807, nr=8, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=15423130, nr=9, action=Initiated, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=15555440, nr=10, action=Accepted, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=15576662, nr=11, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=15784287, nr=12, action=Initiated, from=Ticker, to=Ticker, mail
10000
=Received, letter=Work
== TRACE ===> time=15907734, nr=13, action=Accepted, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=15928372, nr=14, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=16139677, nr=15, action=Initiated, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=16258106, nr=16, action=Accepted, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=16277687, nr=17, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=16463901, nr=18, action=Initiated, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=16574024, nr=19, action=Accepted, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=16592622, nr=20, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=16768664, nr=21, action=Initiated, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=16872800, nr=22, action=Accepted, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=16890789, nr=23, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=17123063, nr=24, action=Initiated, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=17249651, nr=25, action=Accepted, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=17270324, nr=26, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=17476122, nr=27, action=Initiated, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=17587760, nr=28, action=Accepted, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=17606682, nr=29, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=17796116, nr=30, action=Initiated, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=17905111, nr=31, action=Accepted, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=17924338, nr=32, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=18401725, nr=33, action=Initiated, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=18527726, nr=34, action=Accepted, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=18547249, nr=35, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work
== TRACE ===> time=18746084, nr=36, action=Initiated, from=Ticker, to=Ticker, mail=Received, letter=Work
== TRACE ===> time=19404071, nr=37, action=Completed, from=Ticker, to=Ticker, mail=Processed, letter=Work