{"id":20327,"date":"2020-04-24T15:01:11","date_gmt":"2020-04-24T15:01:11","guid":{"rendered":"http:\/\/www.firewallhardware.it\/proxmox-ve-6-cluster-configurazione-avanzata-a-3-nodi-con-ceph\/"},"modified":"2023-06-05T16:45:34","modified_gmt":"2023-06-05T14:45:34","slug":"how-to-create-a-hyper-converged-3-node-proxmox-ve-cluster-with-ceph","status":"publish","type":"post","link":"https:\/\/blog.miniserver.it\/en\/proxmox-ve\/how-to-create-a-hyper-converged-3-node-proxmox-ve-cluster-with-ceph\/","title":{"rendered":"How to Create a Hyper-Converged 3 Node Proxmox VE Cluster with Ceph"},"content":{"rendered":"<div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap\" style=\"max-width:1123.2px;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-flex-column\" style=\"--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;\"><div class=\"fusion-column-wrapper fusion-flex-justify-content-flex-start fusion-content-layout-column\"><div class=\"fusion-text fusion-text-1\"><p>In this guide we want to deepen the <strong>creation of a 3-node cluster<\/strong> with <strong>Proxmox VE 6<\/strong> illustrating the functioning of the <strong>HA (Hight Avaibility)<\/strong> <strong>of the VMs<\/strong> through the <strong>advanced configuration of Ceph.<\/strong><\/p>\n<p>In a few words we delve deeper into the concept of <strong>hyperconvergence of Proxmox VE<\/strong>.<\/p>\n<p>To better understand the potential of the <strong>Cluster Proxmox VE solution<\/strong> and the possible configurations, we have created a laboratory aimed at viewing the possible <strong>configurations of Ceph<\/strong>.<\/p>\n<p>The lab is made up of 3 Proxmox VE virtual machines already configured in clusters with Ceph.<\/p>\n<p>Below, you will find the link to download the test environment, so you can run it on your Proxmox environment.<br \/>\nIt will be possible to follow all the steps of the guide directly on the freely downloadable test environment, test the configurations and simulate service interruptions on the various nodes.<\/p>\n<p>In the webinar that we find below, we comment on the test environment together, evaluating some interesting aspects of <strong>Ceph.<\/strong><\/p>\n<h3 style=\"color: #00a0df; font-size: 20px; text-align: left;\">Used Software<\/h3>\n<p><strong>Proxmox VE 6.1-8<\/strong><br \/>\n<strong>Ceph versione 14-2.6 (stable)<\/strong><\/p>\n<h3 style=\"color: #00a0df; font-size: 20px; text-align: left;\">Used Hardware<\/h3>\n<p>Our test environment is made up of 3 virtualized Proxmox nodes on an A3 Server (<a title=\"A3 Server\" href=\"https:\/\/www.miniserver.store\/appliance-a3-server-aluminum\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/www.miniserver.store\/appliance-a3-server-aluminum<\/a>), equipped with:<\/p>\n<ul>\n<li>2 2 TB SSD disks.<\/li>\n<li>1 10TB SATA disk<\/li>\n<li>RAM: 128 GB<\/li>\n<li>CPU number: 16<\/li>\n<li>Operating System: Proxmox 6.1-8<\/li>\n<\/ul>\n<p>It should be remembered that the test environment, freely downloadable from the link mentioned above, is <strong>NOT suitable<\/strong> for production environments, as it is <strong>the virtualization of a virtual environment<\/strong>, however it represents a versatile solution for testing purposes.<\/p>\n<p>For the creation of a production environment, we recommend the use of 3 specific hardware such as the solution consisting of n. 3 A3 Server nodes (<a title=\"A3 Server\" href=\"https:\/\/www.miniserver.store\/appliance-a3-server-aluminum\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/www.miniserver.store\/appliance-a3-server-aluminum<\/a>), which can be purchased already configured and ready for use.<\/p>\n<h3 style=\"color: #00a0df; font-size: 20px; text-align: left;\">Resources for each VM (node)<\/h3>\n<ul>\n<li>Name: miniserver\u2013pve1<\/li>\n<li>Disk space: 64 GB per S.O.<\/li>\n<li>RAM: 16 GB<\/li>\n<li>CPU number: 4<\/li>\n<\/ul>\n<h3 style=\"color: #00a0df; font-size: 20px; text-align: left;\">Introduction<\/h3>\n<p><strong>The creation of the cluster<\/strong> is a topic that we have already covered in other guides, so if you are interested in the topic you can <strong>follow<\/strong> this guide: (<a title=\"Proxmox VE 6 with 3 nodes\" href=\"https:\/\/blog.miniserver.it\/en\/proxmox-ve-6-3-node-cluster-with-ceph-first-considerations\/\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/blog.miniserver.it\/en\/proxmox-ve-6-3-node-cluster-with-ceph-first-considerations\/<\/a> ) for the creation of the cluster from scratch.<\/p>\n<p>From now on, we will assume that the 3-node <strong>Proxmox cluster<\/strong> is up and running and configured and functioning.<\/p>\n<h3 style=\"color: #00a0df; font-size: 20px; text-align: left;\">Ceph: first steps<\/h3>\n<p><strong>Ceph is a distributed file system<\/strong> that has been designed to improve and increase scalability and reliability in cluster server environments.<br \/>\nCeph <strong>allows data archiving<\/strong> (in our case the VM disks) to be performed directly <strong>on the hypervisor node<\/strong>, allowing replication to other nodes in the cluster, <strong>avoiding the use of a SAN.<\/strong><\/p>\n<p>The configuration of Ceph on each node of our cluster was done in the following way:<\/p>\n<ul>\n<li>a Pool Ceph on SSD disks<\/li>\n<li>a Pool Ceph on HDD disks<\/li>\n<li>configuration of a Public Network<\/li>\n<li>configuration of a Cluster Network<\/li>\n<\/ul>\n<p>We will see these configurations in more detail later.<br \/>\nObviously being a virtual cluster created with 3 virtual machines, the Pool Ceph with SSD disks is simulated (we will see below).<\/p>\n<h3 style=\"color: #00a0df; font-size: 20px; text-align: left;\">Ceph: il cluster<\/h3>\n<p>In fig. below is an image of the cluster called <em><strong>Miniserver<\/strong><\/em>, consisting of 3 nodes:<\/p>\n<ul>\n<li>miniserver-pve1<\/li>\n<li>miniserver-pve2<\/li>\n<li>miniserver-pve3<\/li>\n<\/ul>\n<p>the Laboratory can be reached on one of these 3 addresses (being a Cluster it is indifferent which one).<\/p>\n<ol>\n<li>192.168.131.150:8006<\/li>\n<li>192.168.131.151:8006<\/li>\n<li>192.168.131.152:8006<\/li>\n<\/ol>\n<h3 style=\"color: #00a0df; font-size: 20px; text-align: left;\">Network configuration<\/h3>\n<p>In order to ensure high reliability and therefore make the most of Ceph&#8217;s features, as already mentioned, we have decided to separate the <strong>Cluster Network<\/strong> and the <strong>Public Network<\/strong>.<\/p>\n<p><strong>Public network:<\/strong> this is the network dedicated to the management of <strong>Ceph monitoring<\/strong>, that is, all the data and controls that are needed by the nodes to understand what &#8220;state&#8221; they are in.<\/p>\n<p><strong>Cluster network<\/strong>: it is the network dedicated to the management of the <strong>OSD and to the heartbeat traffic<\/strong>, or rather it is the network that deals with synchronizing all the &#8220;data&#8221; of the VM virtual disks.<\/p>\n<p>However, <strong>it is not mandatory to separate<\/strong> the traffic into two networks but <strong>strongly recommended<\/strong> in production where the cluster sizes start to be considerable.<\/p>\n<p>There are two main reasons for using two separate networks:<\/p>\n<ol>\n<li><strong>Performance:<\/strong> when <strong>OSD Ceph daemons<\/strong> manage <strong>replicas of data on the cluster<\/strong>, network traffic can introduce latency to the traffic of the Ceph clients, also creating a possible disservice. Furthermore, the monitoring traffic would also be slowed down preventing a true view of the state of the cluster.<br \/>\nWe remind you that the <strong>recovery and rebalancing<\/strong> of the cluster, in case of failure, must be done in the shortest possible time, i.e. the <strong>PGs (Placement Groups)<\/strong> must be moved from the <strong>OSD quickly to recover a critical situation<\/strong>.<\/li>\n<li><strong>Security:<\/strong> a <strong>DoS<\/strong> (Denial of Service) attack could saturate network resources for the management of replicas and this would also lead to a slowdown (or even a complete stop) of the monitoring traffic. Remember that <strong>Ceph monitors<\/strong> allow <strong>Ceph Clients<\/strong> to read and write data on the cluster: we can therefore imagine what could happen in the face of congestion of the network used for Monitoring.<br \/>\nBy separating the two networks instead, the monitoring traffic would not be minimally congested.<\/li>\n<\/ol>\n<p>For safety and efficiency reasons, it is strongly recommended <strong>not to connect the Cluster Network and the Public network<\/strong> to the Internet: therefore keeping them &#8220;hidden&#8221; from the outside and separate from all other networks.<br \/>\nThe images below show the configuration of the network cards for the 3 virtual machines, i.e. the 3 nodes of the cluster.<br \/>\nTo view the configuration go to the desired node, then click on the network button.<strong>miniserver-pve<\/strong><\/p>\n<p>As can be seen in the 3 previous figures, LAN <strong>192.168.131.0\/24<\/strong> is used as a network for Hosts and virtual machines, while the subnet <strong>192.168.20.0\/24<\/strong> and <strong>192.168.30.0\/24<\/strong> have been used for the <strong>Public Network<\/strong> and for the <strong>Cluster Network<\/strong> respectively.<\/p>\n<h3 style=\"color: #00a0df; font-size: 20px; text-align: left;\">Monitoring configuration<\/h3>\n<p>By selecting one of the nodes in the cluster, then Ceph, Monitor, you get to the menu where you can create multiple Monitors and Managers.<br \/>\nAs already mentioned, monitoring is essential to understand the state of the cluster. In particular, the <strong>Ceph Monitors<\/strong> maintain a map of the cluster and based on the latter, the <strong>Ceph clients<\/strong> can write or read on a <strong>specific cluster storage space<\/strong>.<\/p>\n<p>It is therefore easy to understand that having a <strong>single Ceph Monitor<\/strong> <strong>is not a safe solution<\/strong> to implement.<br \/>\nIt is therefore necessary to configure a cluster of monitors to ensure the high reliability of monitoring too.<br \/>\nJust click on create and add the desired monitors.<br \/>\nThe following image shows how <strong>the Ceph Monitor cluster<\/strong> is implemented:<br \/>\nWe see that all three monitors are on the 192.168.20.0\/24 network, or on the Public Network<\/p>\n<h3 style=\"color: #00a0df; font-size: 20px; text-align: left;\">OSD configuration<\/h3>\n<p>The <strong>OSD (Object Storage Daemon)<\/strong> element is a software layer that is concerned with <strong>storing data, managing replicas, recovery and data rebalancing.<\/strong> It also provides all information to the Ceph Monitors and Ceph Managers.<br \/>\nIt is advisable to <strong>associate an OSD for each disk<\/strong> (ssd or hdd): in this way a <strong>single OSD will manage only the associated disk.<\/strong><\/p>\n<p>To create the OSD click on one of the Cluster nodes, then Ceph, then OSD.<\/p>\n<p>We see in the next image how the OSDs were created.<\/p>\n<p>On each host there are three disks dedicated to Ceph, of which:<\/p>\n<ul>\n<li>200 GB HDD<\/li>\n<li>200 GB HDD<\/li>\n<li>200 GB SSD<\/li>\n<\/ul>\n<p>For reasons of &#8220;space&#8221; in the test environment, there are actually 5 GB each.<br \/>\nThe <strong>respective OSDs<\/strong> have therefore been allocated on <strong>each disk<\/strong>.<br \/>\nIn the previous image we note that there are 3 OSDs allocated on 3 SSDs and 6 OSDs allocated on 6 HDDs respectively (see the &#8220;Class&#8221; column).<\/p>\n<h3 style=\"color: #00a0df; font-size: 20px; text-align: left;\">Creation of Ceph Pools<\/h3>\n<p>When creating a <strong>Pool<\/strong> from the <strong>Proxmox <\/strong>interface, by default you can only use the <strong>CRUSH <\/strong>rule called <strong>replicated_rule.<\/strong> By selecting this rule, the data replicas, within that pool, will be distributed both on the SSD disks and on the HDDs.<\/p>\n<p>Our goal instead is to create 2 pools:<\/p>\n<ul>\n<li>a first pool consisting of SSDs only<\/li>\n<li>a second pool consisting of only HDDs<\/li>\n<\/ul>\n<p>To do this, however, you must first create two different <strong>CRUSH rules<\/strong> than the default one.<br \/>\nUnfortunately it is an operation that cannot be managed from the <strong>Proxmox GUI<\/strong>, but it can be done by running these two commands on <strong>one of the three hosts in the cluster<\/strong> (no matter which):<\/p>\n<ul>\n<li>ceph osd crush rule create-replicated miniserver_hdd default host hdd<\/li>\n<li>ceph osd crush rule create-replicated miniserver_ssd default host ssd<\/li>\n<\/ul>\n<p>To access the pool creation menu click on one of the nodes, then Ceph, then Pools.<\/p>\n<p>In the following image we note that we can now select the CRASH rules we created previously.<\/p>\n<p>By default, a pool is created with 128<strong> PG<\/strong> (Placement Group). This is a number that can vary but depends on a few factors. We will see later how to set it according to the <strong>criteria of Ceph<\/strong>.<\/p>\n<p>The objects of the <strong>RDB File System<\/strong> are positioned inside the PCs, which are collected inside a Pool. We will see after the creation of RDB storage.<\/p>\n<p>As shown in the figure below, the Pool is in fact a logical grouping of the PCs. In fact, the PCs are physically distributed among the disks managed by the OSD.Recall that there is <strong>only one OSD<\/strong> for <strong>each disk<\/strong> dedicated to Ceph (see figure 12 above).<\/p>\n<p>By selecting the <strong>Size = 3<\/strong> field, you are specifying that each PG will have to be replicated 3 times in the cluster.<\/p>\n<p>This is a value to keep in mind when we need to estimate the size of the disks (see below in detail) when sizing the cluster.<\/p>\n<p>Remember not to select <strong>Add as Storage<\/strong>, as the <strong>Proxmox storage dedicated<\/strong> to <strong>Ceph will be inserted manually.<\/strong><\/p>\n<p>Once the two polls are created we will have the following situation:<\/p>\n<h3 style=\"color: #00a0df; font-size: 20px; text-align: left;\">Storage creation.<\/h3>\n<p>Now we are ready to create the two storage that will host our virtual machines.<br \/>\nLet&#8217;s go to Datacenter, Sotorage, select <strong>RBD,<\/strong> or the Block Storage that uses Ceph.<br \/>\nWe see in the figure below that for the &#8220;ceph_storage_hdd&#8221; storage we select the &#8220;ceph_pool_hdd&#8221; created previously.<br \/>\nSame thing we will do for the storage of SSDs &#8220;ceph_storage_ssd&#8221;.<\/p>\n<p>So we created the two storage<\/p>\n<ul>\n<li>ceph_storage_hdd<\/li>\n<li>ceph_storage_ssd<\/li>\n<\/ul>\n<p>Now let&#8217;s calculate how much space I have available for each storage.<\/p>\n<ul>\n<li>6 200 GB HDD disks = 1.2 TB<\/li>\n<\/ul>\n<p>Considering that I have to guarantee 3 replicas (Size field, figure 13), I will have: ceph_storage_hdd = 1.2 TB \/ 3 \u22cd <strong>400 GB<\/strong><\/p>\n<ul>\n<li>3 200 GB SSD disks = 600 GB<\/li>\n<\/ul>\n<p>Considering that I have to guarantee 2 replicas (Size field, figure 13), I will have: ceph_storage_ssd = 600 GB \/ 2 \u22cd <strong>300 GB<\/strong><\/p>\n<p>Let&#8217;s check what is calculated by selecting the storage from the Proxmox GUI.<\/p>\n<p>We see in the two figures below, the actual dimensions of the storage.<\/p>\n<p><em>ceph_storage_hdd<\/em><\/p>\n<p><em>ceph_storage_ssd <\/em><\/p>\n<p>We see that the actual sizes are <strong>377.86 GB and 283.49 GB,<\/strong> very similar to those that we have calculated roughly previously.<\/p>\n<p>The number of <strong>PGs for each Pool can be estimated<\/strong> based on the <strong>formula<\/strong> that Ceph makes available on the <a href=\"https:\/\/old.ceph.com\/pgcalc\/\" target=\"_blank\" rel=\"noopener\">official website.<\/a><\/p>\n<p>We see in the following image (fig. 19) how we set up the Ceph interface.<\/p>\n<p>Note that <strong>256 <\/strong>PGs are suggested for that pool.<br \/>\nSo having 128 PGs for that pool we will have a number of PGs for each OSD equal to:<\/p>\n<p>To understand the mechanism let&#8217;s make a case with 10 OSD (fig:21):<\/p>\n<p><strong>256<\/strong> PGs are suggested for that pool.<br \/>\nWe therefore have a number of PGs for each OSD equal to (fig. 22):<\/p>\n<p>As the number of OSDs increases, the PG load to be managed for each OSD decreases favoring better scalability on the cluster.<br \/>\nIn general therefore, it is better to have multiple OSDs to better distribute the load of the PCs.<\/p>\n<p>In this regard, the maximum default number for Ceph of PG for OSD is set to <strong>250 PG<\/strong> for OSD. However, it is a parameter that can be varied in Ceph&#8217;s configuration files, but we do not recommend it, unless you know exactly what you are doing.<\/p>\n<p><strong>Ceph system VS RAID system<\/strong><br \/>\nAlthough different in technology and substance, it is however possible to compare these two technologies at the level of &#8220;useful space&#8221; for the same discs.<\/p>\n<p><strong>Let&#8217;s compare the space between a RAID system and a Ceph system<\/strong>: if you are thinking that Ceph &#8220;burns&#8221; a lot of space unnecessarily you have to think about what would have happened if you had used a classic solution with a raid controller.<br \/>\nLet&#8217;s take the case we are dealing with in this article as an example:<br \/>\nIn the case of HDDs you would have had 3 servers with 2 disks each. in raid 1.<br \/>\nThe total usable space would have been 200 GB X n. 3 servers = 600 GB. But be careful, you would not have had the <strong>replication system between the servers<\/strong> and a single centralized space!<\/p>\n<p>Now let&#8217;s do the calculation with a <strong>real small Proxmox cluster system<\/strong>:<br \/>\nn.4 servers with 4 disks of 8 TB each.<\/p>\n<p><strong>RAID 10 configuration<\/strong>: 64 TB total storage space. If we think that every machine should have at least one replica, the space drops to 32 TB.<br \/>\nSo in the end I will have about 32 TB of usable space net of redundancies and replicas.<\/p>\n<p><strong>Ceph configuration<\/strong>: total space of the 3 nodes: 128 TB.<br \/>\nLet&#8217;s say we use n. 3 replicas (size parameter 3 of the pool), we have to calculate the total space divided by 3.<\/p>\n<p><strong>At the end of the day I will have about 40 TB of space compared to the 32 TB of the RAID solution,<\/strong><br \/>\nIn the case of Ceph, I will also have 3 replicas in place of the 2 of the RAID solution.<\/p>\n<p>Note that we also <strong>spared raid controllers<\/strong> with <strong>Ceph<\/strong>. It can be deduced that Ceph is slightly more expensive but it is more versatile in small solutions, while it is much more efficient from the 3 nodes to climb.<br \/>\nAt this point, with the info you have, you can &#8220;play&#8221; by simulating the variation of the size parameter and compare the &#8220;consumption&#8221; of actual space with a classic configuration of disks in RAID. You will notice that the <strong>benefits<\/strong> of a <strong>Ceph<\/strong> solution are <strong>significant as the number of servers and disks grows.<\/strong><\/p>\n<p>In the video that you find at the beginning, we are going to simulate &#8220;system failures&#8221; and see how our cluster reacts to disturbances. You can download the test environment and simulate failures and malfunctions.<\/p>\n<p>To stay up to date on this topic, we invite you to subscribe to our mailing list.<\/p>\n<p>We remind you that at our company it is possible to organize in-depth courses on the Proxmox VE topic.<br \/>\nWe invite you to fill in the form to request information.<\/p>\n<\/div><\/div><\/div><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>In this guide we want to deepen the creation of a 3-node cluster with Proxmox VE 6 illustrating the functioning of the HA (Hight Avaibility) of the VMs through the advanced configuration of Ceph.<\/p>\n","protected":false},"author":11,"featured_media":22552,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[129],"tags":[223],"class_list":["post-20327","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-proxmox-ve","tag-proxmox-en"],"_links":{"self":[{"href":"https:\/\/blog.miniserver.it\/en\/wp-json\/wp\/v2\/posts\/20327","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.miniserver.it\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.miniserver.it\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.miniserver.it\/en\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.miniserver.it\/en\/wp-json\/wp\/v2\/comments?post=20327"}],"version-history":[{"count":12,"href":"https:\/\/blog.miniserver.it\/en\/wp-json\/wp\/v2\/posts\/20327\/revisions"}],"predecessor-version":[{"id":28675,"href":"https:\/\/blog.miniserver.it\/en\/wp-json\/wp\/v2\/posts\/20327\/revisions\/28675"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.miniserver.it\/en\/wp-json\/wp\/v2\/media\/22552"}],"wp:attachment":[{"href":"https:\/\/blog.miniserver.it\/en\/wp-json\/wp\/v2\/media?parent=20327"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.miniserver.it\/en\/wp-json\/wp\/v2\/categories?post=20327"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.miniserver.it\/en\/wp-json\/wp\/v2\/tags?post=20327"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}