http://localhost:3000/http://localhost:3000/favicon.ico?16699092332020-05-27T10:59:34Zungleich redmineOpen Infrastructure - Task #8069: Investigate potential bottleneck on storage/CEPH at DCLhttp://localhost:3000/issues/8069?journal_id=325082020-05-27T10:59:34ZTimothée Floure
<ul></ul><p>Our hardware:</p>
<ul>
<li>RAID controllers: perc h700, perc h800<br /> - Technical manual: <a class="external" href="https://www.dell.com/learn/us/en/04/shared-content~data-sheets/documents~perc-technical-guidebook.pdf">https://www.dell.com/learn/us/en/04/shared-content~data-sheets/documents~perc-technical-guidebook.pdf</a><br /> - 2x4 ports, 6GB SAS 2.0, x8 PCIe 2.0<br /> - 512M to 1GB cache, 800MHz DDR2<br /> - IO load balacing on H800, not on h700 <- how does it work, is it significant?</li>
<li>Each server has dual 10Gbps connectivity.</li>
<li>Arista switches: 7050s<br /> - Datasheet: <a class="external" href="https://www.arista.com/assets/data/pdf/Datasheets/7050S_Datasheet.pdf">https://www.arista.com/assets/data/pdf/Datasheets/7050S_Datasheet.pdf</a><br /> - 52 x 1/10GbE SFP<br /> - 4GB RAM, Dual-core x86 CPU<br /> - '1.04 Tbps'<br /> - 9MB Dynamic Buffer Allocation</li>
<li>Cables?</li>
<li>Disks?</li>
</ul> Open Infrastructure - Task #8069: Investigate potential bottleneck on storage/CEPH at DCLhttp://localhost:3000/issues/8069?journal_id=325092020-05-27T11:09:24ZNico Schotteliusnico.schottelius@ungleich.ch
<ul></ul><p>Some questions we should be able to answer:</p>
<a name="Real-scenarios"></a>
<h2 >Real scenarios<a href="#Real-scenarios" class="wiki-anchor">¶</a></h2>
<p>NOTE: assuming all disks running at 'full speed'.<br />NOTE: big big unknown here is how the cache of the RAID controller behave.<br />NOTE: unknown IOPS limitations on raid controllers. <--- TODO, more important than bandwith!</p>
<ul>
<li>The R710 server has 8 disks slots (supposedly with a h700 controller). Given that we fully populate the server, what is the maximum bandwidth available per OSD running on that machine?<br /> -> 4 GB/s from PCIe but 3GB/s for SATA -> 375 MB/s per disk modulo caching from RAID controller.</li>
</ul>
<ul>
<li>The R815 has 6 disk slots (is that true? -> Balazs). Same question as above.<br /> -> SAS 6GB/s but 3GB/s for SATA -> 500 MB/s per disk, module caching from RAID controller.</li>
</ul>
<ul>
<li>What about an R815 with an md array (12x 3.5" HDD via SAS cable attached to H800)<br /> -> 4 GB/s from PCIe connector (SAS supports 6GB/s) -> 333 MB/s per device.
<ul>
<li>Is the bottleneck likely a) the disk b) the controller c) the network of the server d) another component in the server<br /> - 10 Gbps = 1.25 GB/s = 104 MB/s per disk at full speed.<br /> - Controller PCIe limits at 500 MB/s per disk at full speed.<br /> -> Bottleneck likely to be on disk or network.</li>
</ul></li>
</ul>
<ul>
<li>Given an Arista 7050 and an imaginary bandwidth per disk of 50 MB/s, how many disks can we run on one 7050?<br /> - The Arista is supposed to handle 1.04 Tbps = 130000 MB/s = 2600 * 50 MB/s => not an issue.</li>
</ul>
<ul>
<li>Is the PCI-E bus (it's not a bus anymore - afair it's point-to-point) on either server model a limitation?
<ul>
<li>It provides access to networking, disks and has an interconnect to the cpus</li>
</ul></li>
</ul>
<p>-> No worried, but TODO.</p>
<ul>
<li>We are using ceph bluestore (<a class="external" href="https://ceph.io/community/new-luminous-bluestore/">https://ceph.io/community/new-luminous-bluestore/</a>)
<ul>
<li>Does it make sense to switch our storage model to use 2 SSDs (f.i. 1TB) in a raid1 in front of HDDs and drop the distinction of HDD/SSD?
<ul>
<li>raid1 is needed as on the failure of the SSD all osds that have the rocksdb/bluefs on it fail</li>
</ul></li>
</ul></li>
</ul>
<p>-> TODO</p>
<p>(skip answers if they are too far from what you can gather)</p> Open Infrastructure - Task #8069: Investigate potential bottleneck on storage/CEPH at DCLhttp://localhost:3000/issues/8069?journal_id=325102020-05-27T11:26:52ZTimothée Floure
<ul></ul><p>Regarding the RAID controllers:</p>
<ul>
<li>RAID0 (striping - redundancy is handled by CEPH across physical servers).</li>
<li>Some controllers are battery-backed:<br /> - Likely write-back cache.</li>
<li>Some are not:<br /> - Likely write-though cache.<br /> - .. or forced WB via BIOS/firmware setting?</li>
<li>Read cache defaults to 'Adaptive Read Ahead': When selected, the controller begins using Read-Ahead if the two most recent disk accesses occurred in sequential sectors.<br /> - Fairly useless for random reads.</li>
</ul> Open Infrastructure - Task #8069: Investigate potential bottleneck on storage/CEPH at DCLhttp://localhost:3000/issues/8069?journal_id=325112020-05-27T11:46:17ZTimothée Floure
<ul></ul><p>Regarding PCIe AND SAS/SATA:</p>
<ul>
<li>Controllers are connected on x8 PCIe 2.0 => 500 MB/s per-lane for PCIe 2.0 -> x8 = 4 GB/s</li>
<li>6 GB/s SAS 2.0 connectivity -> how is this split between disks? Should be fine anyway.<br /> - perc h700 supports SATA 3GB/s, perch800 does not support SATA.</li>
<li>How are our network cards connected? Should be fine anyway. 10Gbpe = 1.25 GB/s -> even PCIe 2.0 4x is more than enough.</li>
</ul> Open Infrastructure - Task #8069: Investigate potential bottleneck on storage/CEPH at DCLhttp://localhost:3000/issues/8069?journal_id=325142020-05-27T12:42:51ZTimothée Floure
<ul></ul><p>I'll be AFK for a little while: the big pain point is the hardware RAID controller.</p>
<ul>
<li>Unknown effect on IOPS (needs more digging, not obvious).
<ul>
<li>The internet says (reddit, random wikis, CEPH mailing list) using RAID0 when passthrough is not supported is BAD (lower performance/IOPS, buggy firmware, some (unknown?) implication on cache, ...).</li>
</ul>
</li>
<li>Unknown effect from the cache.</li>
</ul> Open Infrastructure - Task #8069: Investigate potential bottleneck on storage/CEPH at DCLhttp://localhost:3000/issues/8069?journal_id=325462020-05-29T09:02:11ZTimothée Floure
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Waiting</i></li></ul><p>Waiting for @llnu to test a RAID controller with passthrough.</p>
<p><a class="external" href="https://redmine.ungleich.ch/issues/8063?issue_count=4&issue_position=1&next_issue_id=8002#note-22">https://redmine.ungleich.ch/issues/8063?issue_count=4&issue_position=1&next_issue_id=8002#note-22</a></p> Open Infrastructure - Task #8069: Investigate potential bottleneck on storage/CEPH at DCLhttp://localhost:3000/issues/8069?journal_id=523732024-01-03T18:31:09ZNico Schotteliusnico.schottelius@ungleich.ch
<ul><li><strong>Status</strong> changed from <i>Waiting</i> to <i>Closed</i></li></ul>