8000 GitHub - UzonduEgbombah/Monitoring: this where i keep all my DevOps project.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

UzonduEgbombah/Monitoring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

ULTIMATE DEVOPS MONITORING PROJECT

This project shows how virtual machines and web applications can be monitored using tools like Prometheus and Grafana and also shows how you can utilize alert managers to send E-mails alerting the team members on the status of the aplication.

In the context of DevOps monitoring, the terms "Black box exporter," "Node exporter," and "Alertmanager" refer to components within the Prometheus ecosystem, a popular open-source monitoring and alerting toolkit. Each of these components plays a distinct role in gathering, processing, and managing metrics and alerts. Here’s a detailed explanation and summary of each:

Black Box Exporter

Purpose:

The Black Box Exporter is used to probe endpoints (websites, APIs, etc.) and services from the outside, treating them as "black boxes."

Functionality:
  • It performs various types of probes (HTTP, HTTPS, DNS, TCP, ICMP) to check the availability and performance of services.

  • Configurable probes allow for detailed checks and validations, such as ensuring a web page returns the expected content.

  • Results of these probes are exported as metrics, which Prometheus can scrape and analyze.

Use Case:
  • Monitoring the uptime and performance of external services and endpoints.

  • Ensuring service level agreements (SLAs) are met by checking the availability of critical web services.

Node Exporter

Purpose:

The Node Exporter is designed to expose hardware and OS metrics from *nix systems (Linux, Unix).

Functionality:
  • Collects a wide variety of system metrics such as CPU usage, memory usage, disk I/O, network statistics, and more.

  • Metrics are gathered using system calls and various kernel interfaces, ensuring accuracy and relevance.

  • Exposes these metrics over HTTP in a format that Prometheus can scrape.

Use Case:
  • Monitoring the health and performance of individual servers.

  • Collecting detailed system-level metrics to diagnose performance issues or resource bottlenecks.

Alertmanager

Purpose:

Alertmanager handles alerts sent by client applications such as the Prometheus server.

Functionality:
  • Receives, deduplicates, groups, and routes alerts to various notification channels (email, Slack, PagerDuty, etc.).

  • Allows for complex alerting rules and logic, such as silencing alerts during maintenance windows or escalating alerts based on severity.

  • Supports templating to customize alert messages and notifications.

Use Case:
  • Centralized management of alerts generated by Prometheus.

  • Ensuring that alerts reach the right teams or individuals, minimizing alert fatigue and ensuring critical issues are addressed promptly.

  • Customizing alert notifications to include relevant information, improving incident response times.

Summary

In summary, within the Prometheus ecosystem for DevOps monitoring:

  • Black Box Exporter probes external services to ensure they are up and performing as expected.

  • Node Exporter collects and exposes metrics from the system hardware and OS, providing insight into the performance and health of servers.

  • Alertmanager processes and manages alerts generated by Prometheus, ensuring they are routed to the appropriate channels and handled according to defined rules.

These components work together to provide comprehensive monitoring and alerting capabilities, helping DevOps teams maintain high availability, performance, and reliability of their systems and services.

STEPS

Create two Ec2- instance (monitoring and VM) {t2.meduim, storage 20gb}.

Install The Following Using "Wget" And Extract

  • prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.53.0-rc.0/prometheus-2.53.0-rc.0.linux-amd64.tar.gz

Extract using this command

tar -xvf <filename>

Delete the old file and change the name

  • Black box

Repeat process for backbox and alertmanager

wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz

  • Alert manager
wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz

VM Instance

git clone web app

https://github.com/UzonduEgbombah/BoardGame.git

Install node exporter

wget https://github.com/prometheus/node_exporter/releases/download/v1.8.1/node_exporter-1.8.1.linux-amd64.tar.gz

repeat similar process to extract,delete and rename

cd into node exporter and run

./node_exporter &

copy VM IP address and add the port 9100 and paste on chrome

To ensure the Board-Game runs install

  • Java

  • Maven

and run the command

mvn package

cd into target

java -jar database_service_project-0.0.2.jar

now you can copy the vm IP with port 8080 to access BoardGame webb app

on your monitoring server cd into "prometheus and start it withethe command below :

./prometheus $

access with IP and node 9090

  • Now let's setup the alertrule

create a new file in prometheus

vi alert_rules.yml

paste the following rule, exit and save "wq"

groups:
- name: alert_rules                   # Name of the alert rules group
  rules:
    - alert: InstanceDown
      expr: up == 0                   # Expression to detect instance down
      for: 1m
      labels:
        severity: "critical"
      annotations:
        summary: "Endpoint {{ $labels.instance }} down"
        description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."

    - alert: WebsiteDown
      expr: probe_success == 0        # Expression to detect website down
      for: 1m
      labels:
        severity: critical
      annotations:
        description: The website at {{ $labels.instance }} is down.
        summary: Website down

    - alert: HostOutOfMemory
      expr: node_memory_MemAvailable / node_memory_MemTotal * 100 < 25  # Expression to detect low memory
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Host out of memory (instance {{ $labels.instance }})"
        description: "Node memory is filling up (< 25% left)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

    - alert: HostOutOfDiskSpace
      expr: (node_filesystem_avail{mountpoint="/"} * 100) / node_filesystem_size{mountpoint="/"} < 50  # Expression to detect low disk space
      for: 1s
      labels:
        severity: warning
      annotations:
        summary: "Host out of disk space (instance {{ $labels.instance }})"
        description: "Disk is almost full (< 50% left)\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

    - alert: HostHighCpuLoad
      expr: (sum by (instance) (irate(node_cpu{job="node_exporter_metrics",mode="idle"}[5m]))) > 80  # Expression to detect high CPU load
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Host high CPU load (instance {{ $labels.instance }})"
        description: "CPU load is > 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

    - alert: ServiceUnavailable
      expr: up{job="node_exporter"} == 0  # Expression to detect service unavailability
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "Service Unavailable (instance {{ $labels.instance }})"
        description: "The service {{ $labels.job }} is not available\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

    - alert: HighMemoryUsage
      expr: (node_memory_Active / node_memory_MemTotal) * 100 > 90  # Expression to detect high memory usage
      for: 10m
      labels:
        severity: critical
      annotations:
        summary: "High Memory Usage (instance {{ $labels.instance }})"
        description: "Memory usage is > 90%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

    - alert: FileSystemFull
      expr: (node_filesystem_avail / node_filesystem_size) * 100 < 10  # Expression to detect file system almost full
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "File System Almost Full (instance {{ $labels.instance }})"
        description: "File system has < 10% free space\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

now open prometheus.yml and remove the "alert_rules.yml" that was hashed

cd into alertmanager and run it, access it with port 9093

To make the rules inserted in the "alert_rules.yml" file to reflect, prometheus must be restarted

pgrep / kill

now it has reflected in prometheus, tap on alerts

Notes:

  • The & at the end of each command ensures the process runs in the background.
  • Ensure that you have configured the prometheus.yml and alertmanager.yml configuration files correctly before starting the services.
  • Adjust the firewall and security settings to allow the necessary ports (typically 9090 for Prometheus, 9093 for Alertmanager, 9115 for Blackbox Exporter, and 9100 for Node Exporter) to be accessible.

Prometheus and Alertmanager Configuration

Prometheus Configuration (prometheus.yml)

Global Configuration

global:
  scrape_interval: 15s                # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s            # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

Alertmanager Configuration

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - 'localhost:9093'          # Alertmanager endpoint

Rule Files

rule_files:
   - "alert_rules.yml"                # Path to alert rules file
  # - "second_rules.yml"              # Additional rule files can be added here

Scrape Configuration

Prometheus Itself

scrape_configs:
  - job_name: "prometheus"            # Job name for Prometheus

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]   # Target to scrape (Prometheus itself)

Node Exporter

  - job_name: "node_exporter"         # Job name for node exporter

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["3.110.195.114:9100"]  # Target node exporter endpoint

Blackbox Exporter

  - job_name: 'blackbox'              # Job name for blackbox exporter
    metrics_path: /probe              # Path for blackbox probe
    params:
      module: [http_2xx]              # Module to look for HTTP 200 response
    static_configs:
      - targets:
        - http://prometheus.io        # HTTP target
        - https://prometheus.io       # HTTPS target
        - http://3.110.195.114:8080/  # HTTP target with port 8080
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 13.235.248.225:9115  # Blackbox exporter address

Alertmanager Configuration (alertmanager.yml)

Routing Configuration

route:
  group_by:
    - alertname
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: email-notifications
receivers:
  - name: email-notifications
    email_configs:
      - to: uzonduegbombah419@gmail.com
        from: test@gmail.com
        smarthost: smtp.gmail.com:587
        auth_username: uzonduegbombah419@gmail.com
        auth_identity: uzonduegbombah419@gmail.com
        auth_password: qczm kxnu uygh wqja
        send_resolved: true
inhibit_rules:
  - source_match:
      severity: critical
    target_match:
      severity: warning
    equal:
      - alertname
      - dev
      - instance
  • remember to set up your own authentication password

Now restart prometheus and alertmanager

result should be similar

Now to make sure this setup works I had to shutdown the mvn package command and confirm if i also got an email too

reflected succesfully

E-mail came in after "1minute"

okay one more test, we killing the node exporter and see what happens

reflected successfully

now lets wait for the E-mail

uzondu egbombah

About

this where i keep all my DevOps project.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0