Project

General

Profile

Actions

Task #6541

closed

Monitor fans and power supplies

Added by Nico Schottelius over 5 years ago. Updated 11 months ago.

Status:
Rejected
Priority:
Normal
Target version:
-
Start date:
03/21/2019
Due date:
04/17/2019
% Done:

0%

Estimated time:
PM Check date:
04/10/2019

Description

  • Fans and PSU of switches need to be monitored
    • Probably using snmp for getting metrics
    • Maybe/likely storing in prometheus
  • Fans and PSU of servers need to be monitored
    • probably using local ipmitool
    • Maybe extending node_exporter or writing our own exporter
    • Might already exist
    • Saving in Prometheus

For both appropriate alarms should be created

Actions #1

Updated by Jason Kim over 5 years ago

  • PM Check date set to 03/30/2019
Actions #2

Updated by Jason Kim over 5 years ago

  • Status changed from New to Seen
  • PM Check date changed from 03/30/2019 to 03/31/2019
Actions #3

Updated by Jason Kim over 5 years ago

  • % Done changed from 0 to 10
  • PM Check date changed from 03/31/2019 to 04/10/2019
  • Due date set to 04/17/2019
  • Status changed from Seen to In Progress
  • Assignee changed from Jason Kim to Jin-Guk Kwon
Actions #4

Updated by Jason Kim over 5 years ago

  • Assignee changed from Jin-Guk Kwon to Samuel Hailu
  • % Done changed from 10 to 0
Actions #5

Updated by Nico Schottelius over 5 years ago

Not sure if this is a good fit für Samuel

writes:

Actions #6

Updated by Mirjana Rupar over 5 years ago

  • Assignee deleted (Samuel Hailu)
Actions #7

Updated by Mirjana Rupar over 5 years ago

  • Project changed from 97 to queue
Actions #8

Updated by Mirjana Rupar over 5 years ago

  • Status changed from In Progress to New
Actions #9

Updated by Nico Schottelius over 5 years ago

  • Project changed from queue to Open Infrastructure
  • Assignee set to Dominique Roux
Actions #10

Updated by Dominique Roux over 5 years ago

  • Status changed from New to Seen
Actions #12

Updated by Nico Schottelius over 5 years ago

Most important for this task is handing over / explaning to llnu & kjg

writes:

Actions #13

Updated by Dominique Roux over 5 years ago

Current freeipmi version on devuan available: 1.4.11
Version needed for having ipv6 support: >= 1.6.1
...

Actions #14

Updated by Dominique Roux over 5 years ago

Nico Schottelius wrote:

Most important for this task is handing over / explaning to llnu & kjg

writes:

Ok, so in general, the IPMI thing should not be a big of a deal:
I recommend to have the ipmi exporter running only on each monitoring.place{5,6} then using remote ipmi to access the servers. Since in this way we don't need to change the BootOS also we're able to get the data as long as the iDRAC interface is up.

IPMI

  • Getting the required version running on monitoring.place{5,6}
    • Either compile by yourself or check if the prepackaged version for debian buster works
    • Probably best is to move it to our package mirror => Adding a new repository (prob. the ungleich use the ungleich repo) on monitoring and do the installation like this
  • Installing ipmi_exporter on monitoring.place{5,6}
  • Configuring prometheus accordingly
  • Put everything in cdist:
    • Update the monitoring.place{5,6} cdist manifest (to install the new repository and install the required freeipmi package)
    • Update the prometheus config: According to the config from the README
Actions #15

Updated by Dominique Roux over 5 years ago

SNMP

Documentation for enabling SNMP on aristas: https://www.arista.com/en/um-eos/eos-section-43-3-configuring-snmp#ww1159793

The installation for the snmp-exporter is quite similar to the freeipmi:
  • Get the prebuild package from https://github.com/prometheus/snmp_exporter/releases
  • Create a devuan package, move it to the ungleich mirror
    • Don't forget the init.d file
    • Use the same path hierarchy as the other prometheus packages (also log etc.)
  • Install the exporter on monitoring.place{5,6}
  • Create the config file according to the README
  • cdistify everything (similar to IPMI)
Actions #16

Updated by Dominique Roux over 5 years ago

For both services you'll have to open up ports
Open them only from the inside (so only monitoring.place{5,6} is allowed to talk with theses ports)

Actions #17

Updated by Nico Schottelius over 5 years ago

  • Dominique, please approach @Jin-Guk Kwon in chat/infrastructure and implement it together with him
  • @Jin-Guk Kwon: please read the ticket and ping @Dominique Roux when you have understood it -> you'll then implement it together
Actions #18

Updated by Nico Schottelius almost 4 years ago

  • Assignee changed from Dominique Roux to Nico Schottelius
Actions #19

Updated by Nico Schottelius 11 months ago

  • Status changed from In Progress to Rejected
Actions

Also available in: Atom PDF