Meta-Issue for Matrix late-winter 2021 cleanup
We have some performance issues and infrastructure rot on our matrix deployments: I'll work on it here and there over the next few weeks. This meta-issue will make following easier to follow what's going on. I might add things on the fly as I encounter them / link to other issues later on.
- [x] Cleaning up and upstreaming the __matrix-synapse cdist type. #7345
- [x] Clean-up.
- [x] Bring configuration template up-to-date.
- [x] Add more performance-related flags.
- [x] Add support for multi-workers (a new __matrix_synapse_worker type might be needed)
- [x] Upstream to cdist-contrib See https://code.ungleich.ch/ungleich-public/cdist-contrib/-/merge_requests/9
- [ ] Cleanup and simplify the __ungleich_matrix type
- [ ] Allow PGSQL tuning / auto-tune from explorer if not provided.
- [x] Adapt to updated __matrix_synapse type
- [ ] Revamp matrix monitoring: we need something simpler and more robust.
- [x] Get back missing instances in monitoring.
- [ ] Add alerts.
- [x] Add PGSQL performance monitoring.
- [x] Update admin UI
- [ ] Investigate performance issues.
- [~] Checking out database bottlenecks.
- [~] Checking out synapse bottlenecks.
- [ ] Possibily add periodic database cleanup.
- [ ] Check out the state of the Jitsi integration.
- [x] Rebuilt with CDIST (small issue with watermark - see https://code.ungleich.ch/ungleich-public/cdist-contrib/-/issues/4)
- [x] Wire Prometheus to the new Jitsi Exporter
- [ ] Add simple blackbox monitoring
- [x] Check state of ext.ungleich.ch homeserver
- [ ] LOW_PRIO check out if it is useful to deploy our own integration server
- [ ] Don't forget to document!
Updated by Timothée Floure 4 months ago
- Description updated (diff)
The new shiny cdist pipeline seems to work nicely - it's currently deployed for staging and ungleich. We also have metrics exported to monitoring-v3. All of this will be documented and rolled out to customer deployments next week.