8000 GitHub - pwmcintyre/logging: thoughts on logging (slides use revealjs)
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

pwmcintyre/logging

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

theme highlightTheme
night
monokai

Logging

a non-exhaustive opinionated guide


Observability

Building predictable systems that you can
reason about

In other words: Can you ask open ended questions about your system?

For example:

  • Did the latest release impact performance?
  • Is there a correlation between system load and latency?
  • Why is kafka partition 2 so hot?

--

O11y 1.0 Pillars

  • Metrics — coarse-grain
  • Logs — complete freedom!
  • Traces — fine-grain
- https://www.oreilly.com/library/view/distributed-systems-observability/9781492033431/ch04.html - https://medium.com/@copyconstruct/logs-and-metrics-6d34d3026e38

--

Reality Check

"Pillars" are marketing

Emit data to product which can query.

The product often limits the precision you can include.

Metrics are low-fidelity aggregates, tells you of a failure but not why.

Tracing is logging with opinions + tooling

Multiple sources of truth suffer weak correlation

--

O11y 2.0

Canonical Logs: Wide and structured

  1. Uncover unknown unknowns
  2. Useful to everyone
Ask arbitrary questions Useful to product owners, support, develops, SRE's, Platform engineers, etc.

--

Actionable:

Focus on good logs


Basics

  1. context
  2. correlation
  3. level

1. Context

Cannot predict future questions

Add context ✅ not data ⚠️

In future you will want to explore the data in ways you cannot predict today.

Do not add whole request/response payloads, these contain data which your observability tooling is not sancti

--

Example

{
    "time": "2021-07-25T04:12:50Z",
    "application": "authorizer@3.0.1",
    "msg": "authorized",
    "user_id": "123",
    "groups": ["a", "b"],
    "cache_used": "1627186370",
    "request_id": "a39b28c9",
    "corelation_id": "d4289bd7"
}

--

Common Mistakes

--

message overloading

{
	"msg": "Task finished: ThingProcessor: duration=3.014"
}
  • hard to parse
  • slow to filter (using like operation)
  • ambiguous unit

--

message overloading fixed

{
	"msg": "Task finished",
	"processor": "ThingProcessor",
	"duration_ms": 3.014
}

--

Architecture

--

Pipe architecture

Q: Which service should log?

graph TD;
	queue --> validator;
	validator --> sender;
Loading
Each service might need to, but how to pass context + correlation?

--

Controller

graph TD;
	controller <--> queue;
	controller <--> validator;
	controller <--> sender;
Loading

The controller handles flow, errors, and logging.

--

Canonical Example

func (s *Service) Process() (err error) {
	// prepare log context
	start := time.Now()
	log := StandardLogger.WithField("service", "controller")
	defer func() {
		if err != nil {
			log = log.WithError(err)
		}
		duration := time.Since(start)
		log.WithField("duration_ms", duration.Milliseconds()).
			Info("done")
	}()

	// get next work
	var work Work
	work, err = queue.Pop()
	if err != nil {
		return fmt.Errorf("failed to get work: %w", err)
	}
	log = log.WithField("work_id", item.ID)

	// validate
	if err = s.validator.IsValid(work.Body); err != nil {
		return fmt.Errorf("invalid work: %w", err)
	}

	// send
	if err = s.sender.Send(work.Body); !err != nil {
		return fmt.Errorf("failed to send: %w", err)
	}

	// commit work
	work.Delete()
}

2. Correlation

a cross-component concern — find concensus

--

Examples

  • event context:
    correlation_id, request_id

  • business context:
    user_id, asset_id

  • application context:
    application, version, environment

--

Tooling

reaching consensus through tooling

package appcontext

type SystemContext struct {
	Application string `json:"application,omitempty"`
	Version	 string `json:"version,omitempty"`
	Environment string `json:"environment,omitempty"`
}

func WithSystemContext(ctx context.Context, val SystemContext) context.Context {
	return context.WithValue(ctx, key, val)
}

func GetSystemContext(ctx context.Context) (val SystemContext, ok bool) {
	val, ok = ctx.Value(key).(SystemContext)
	return
}

...

3. Levels

Broadly categorize an event

Reach consensus on meaning

--

Level Definitions

--

fatal: The system cannot continue

FATAL: failed to connect to database

--

error: Failed to do the job

ERROR: timeout while saving

--

warning: Processing degraded but can continue

WARN: config unset; using default

--

info: System did what you asked it to do

INFO: user created

INFO: batch complete

--

debug: Low-level supporting steps.

Usually disabled due to poor signal-to-noise ratio.

Danger zone: Take care with sensitive data!

--

Common Mistakes

--

non-ERROR

ERROR: client is not authorized

This belongs in the response to the client:
401 Unauthorized

--

non-INFO

Uninteresting plumbing

INFO: executed 'SELECT * FROM foo'

INFO: parsed JSON

aka. i was prototyping and accidentally committed it

--

predictions

Predicting the future

INFO: about to handle request

Trust your error handling!


Closing

you'll get it wrong the first time; iterate

About

thoughts on logging (slides use revealjs)

Resources

Stars

Watchers

Forks

Languages

0