Что представляет собой решение vmware vsan?

New Features in vSAN Update 1

vSAN Data Persistence platform: The vSAN Data Persistence platform provides a framework for providers of modern stateful services, such as object storage and NoSQL databases, to build deep integration with the underlying virtual infrastructure, leverages the Kubernetes operator method and vSphere Pod Service. The integration allows you to run modern stateful applications with lower TCO and simplified operations.

HCI Mesh : Enable a unique, software-based approach for disaggregation of compute and storage resources. This native, cross-cluster architecture brings together multiple independent vSAN clusters to disaggregate resources and enable utilization of stranded capacity.

Capacity Optimizations: Reduce the reserve capacity (or “slack space”) required for cluster operations by up to 50%. Clusters with only eight nodes can unlock up to 7% of total capacity, while the largest clusters (48 nodes or greater) will see the greatest improvement.

Shared Witness for Two-Node vSAN Deployments : Enable multiple 2-Node vSAN deployments to share a common witness instance, with up to 64 clusters max per single shared witness host.

vSphere Lifecycle Manager (vLCM): With support for NSX-T updates, you can now update vSphere, vSAN and NSX-T with a single tool. vLCM will monitor for desired image compliance continuously and enable simple remediation in the event of any compliance drift.

File Services: Avoid the expense and complexity of purpose-built filers and adopt an enterprise-ready solution using the most common NFS and SMB protocols, with SMB v3 and v2.1 now added to native file services.

Note: these features are collected from the vmware website. please go through the website for more details.

Проектирование VSAN

Относительная простота развертывания отнюдь не отменяет тщательного проектирования архитектуры VSAN. Вот несколько моментов, на которых стоит остановиться подробнее:

Совместимость с аппаратным обеспечением. Хотя VSAN и дает определенную свободу в выборе «железа», имеет смысл оставаться в рамках списка гарантированно совместимого с VMware VSAN оборудования. Так не придется наугад подбирать совместимые контроллеры, адаптеры и пр.
Сеть. В конфигурации с VSAN ВМ может работать в одном месте, а храниться – в другом. Это предъявляет достаточно высокие требования к сети: у вас должна быть как минимум 10 GB сеть.
Производительность дисковых контроллеров. Дисковый контроллер должен обеспечивать объемный буфер для большой очереди команд. Нагрузка на него будет значительная: контроллер будет отдавать данные, нужные не только этому серверу, но и всему кластеру. Например, при восстановлении выбывшей дисковой группы на новую группу нужно записать большой объем данных за короткое время. Скорость записи как раз и будет зависеть от производительности контроллера.
Объем дисков. В данной ситуации больше не означает лучше. Скорее наоборот. Хотя сейчас доступны диски по 4, 6 ТБ, VSAN лучше строить из дисков объемом 1 ТБ. Давайте представим аварийную ситуацию, когда в кэш ничего не попадает (замена «полетевшей» дисковой группы, бэкап или восстанавление бэкапа): 6 ТБ диски будут восстанавливаться в 6 раз дольше, чем 1 ТБ диски (если отталкиватьcя от отношения скорости чтения к объему хранимых данных – IOPS/GB).
Соотношение объема SSD к объему жесткого диска. Оно будет напрямую влиять на итоговую производительность дисковой группы: чем больше емкость SSD (чем больше данных будет в кэше), тем выше производительность. В CloudLITE для кэширования используются PCIe флеш-карты — они обладают меньшими задержками по сравнению с SSD. Кстати, в VSAN версии 6.0 поддерживаются дисковые группы, состоящие только из SSD.
Соотношение вычислительных мощностей к дисковому пространству. При проектировании VSAN нужно все тщательно просчитывать: соотношение процессоров, памяти и количество дисковых групп, а также рассчитать, в каком соотношении наращивать вычислительные мощности, чтобы это было экономически выгодно.

При работающем решении уже нельзя будет на лету добавить дискового пространства под VSAN (storage node), не добавив при этом нового сервера, а значит процессоров и памяти. Вариант, когда сервер используется только в качестве хранилища (т. е. вычислительный узел этого сервера простаивает), возможен, но экономически невыгоден: фактически это возврат к традиционной конфигурации и отказ от преимуществ конвергентного решения.

Create the cluster

Now the distributed switch and virtual network adapters are set, we can create the cluster. Come back to Hosts and Clusters in the navigator. Right click on your folder and select New Cluster.

Give a name to your cluster and for the moment, just turn on virtual SAN. I choose a manual disk claiming because I have set to manually which disks are flash and which disks are HDD. This is because ESXi nodes are in VMs and hard disks are detected all in flash.

Next, move the node in the cluster (drag and drop). Once all nodes are in the cluster, you should have an alert saying that there is no capacity. This is because we have selected manual claiming and no disk are for the moment suitable for vSAN.

An overview of the Architecture

A vCenter High Availability cluster consists of three vCSA 6.5 appliances deployed in an Active, Passive and Witness configuration.

The Active node is a standard vCenter instance, the state of which is regularly replicated to the Passive node.
The Passive node takes over when the entire Active node or critical components fail.
The Witness node is a lighter version of the appliance acting as a quorum. This a clustering technique that mitigates against split-brain or network partitioning events. This is when the Active and Passive nodes are both running but unable to talk to each other. One possible outcome is that you end up having two Active nodes which is bad to say the least.

To set up a vCenter HA cluster, we being by installing vCSA as we’d normally do. One of the requirements is to have a second network card connected to the HA or cluster private network. The inclusion of a second nic is taken care of automatically when selecting the Basic deployment option; see further down. The HA network is simply a portgroup created on ESXi.

VMware recommends placing each HA node on its own ESXi host. The hosts in turn, should be in a DRS cluster though the requirement is optional. In practice, this means you would need 3 ESXi hosts at a minimum. Also note that the HA portgroup must be created on every ESXi hosting an HA node.

After the Active node is configured, the Passive and the Witness nodes are cloned off of it. The cloning process can be either automated or done manually, depending on the deployment mode selected; Basic or Advanced.

In today’s post, I’ll be writing about the Basic option which does most of the donkey work for you.

Figure 1 illustrates a typical vCenter HA cluster architecture. The Active node has two allocated IP addresses, 192.168.16.50 which is the management address and 172.168.0.1 , the HA network address. Note that while the HA network can be routable, you should not add a gateway address when configuring the second network card on the vCSA.

Figure 1 – A typical vCenter HA cluster (vSphere 6.5 only)

In the event of an Active node failure, the Passive node takes over and becomes the Active node. The management IP address is carried over to the newly designated Active node.

SATA / NL-SAS / SAS

Вот такой интересный документик есть от Seagate по SAS vs. SATA Flash Devices.

Можно скачать почитатьСкачать

Список отличий SAS от SATA:

Full-duplex transmission (bidirectional or 2x unidirectional)
Two concurrent channels (ports)
Wide ports (x2 and x4)
Available speed: 12Gb/s, up to 48Gb/s with two full-duplex ports
End-to-end data integrity
Full IOECC (input/output error correction)
Hot-plug
More than 32 queue depth (SATA maximum = 32)
Enterprise-type command queuing (128 to 256)
Full SCSI command set
Variable sector size
High-voltage signal level (1.2V)
Write Cache
Write Failure Notification
SAS Log Pages

Главной причина, почему в HCL нет SATA HDD кроется в Full SCSI command set. В рамках SATA не стандартизировано жесткое требование команды выключить кэш на запись в DRAM. Поэтому SATA диск может смело проигнорировать команду с контроллера на выключение кэша. И в отличии от SATA SSD в которых есть конденсаторы, в SATA HDD нет батареек, то при потере питания (power loss) все 64-256 MB (или сколько у этого HDD есть кэша) на каждом диске легко иcчезнут, с полной потерей данных.

Drive Reliability (MTBF / AFR / UER)

Есть несколько основных показателей по ошибкам дисков.

MTBF — Mean Time Between Failures. Это более актуально для жестких дисков где много механического всего что может сломаться, но для понимания, опишу. Для примера возьмем, диск c MTBF равным 1 000 000 часов. Исходя из описания может показаться, что производитель гарантирует работу диска в течение 1000000/8760 = 114 лет. Это означает, что не один диск отработает 114 лет, а в партии из 114-ти дисков за 1 год можно ожидать выхода из строя одного диска.AFR — Annualized Failure Rate. Годовая интенсивность отказов серии дисков. AFR=1/(MTBF/8760).UER — Unrecoverable Error Rate. Вероятность появления невосстановимой ошибки чтения, по различным причинам: дефект поверхности, сбой в работе головки, контроллера и т.д.

Отличная статейка про ячейки памяти в SSD. Как работают, почему ломаются? SLC, MLC, TLC, QLC

vSAN host requirements

vSAN has a strict set of hardware requirements that extends to the driver level. If you build a cluster yourself, you should check all hardware against VMware’s hardware compatibility guide. To simplify the hardware selection process, vSAN ready-nodes are also available ‘off the shelf’ with fully supported hardware and a guaranteed rate of IOPS.

You can run vSAN across two ESXi hosts in special cases, but you will need at least three hosts for all other deployments. In general, four hosts are recommended for maintenance purposes. If you decide to select your hardware, each host will need:

SSD Cache: Minimum of one supported SSD for cache
Persistent Storage: Minimum of one HDD or SSD for storage
NIC: 1GB (Hybrid) or 10GB (All-flash)
SAS/SATA Controller: Must run in passthrough mode or RAID0 for each disk
Memory: 8-32 GB of RAM depending on the number of disks and disk groups

Graphic shows vSAN hardware requirements

Особенности vSAN

Успешное внедрение vSAN зависит от многих факторов, но есть базовые требования и особенности, которые необходимо знать, чтобы не допустить ошибок.

Не стоит использовать в дисковой группе максимальное число дисков для хранения. Лучше иметь две дисковые группы по три диска, чем одну с шестью. Это дороже, так как требуется дополнительный SSD для кэша, но уменьшает размер домена отказа. Если диск кэша выйдет из строя, то ребилд трех дисков произойдет быстрее, чем шести. И скорость параллельной работы двух дисковых групп выше, чем одной, особенно если их повесить на разные контроллеры. Диски также лучше использовать не максимальных объемов (по 6Тб, например, а 1-2 Тб).
Не ставьте значение параметра FailuresToTolerate меньше единицы, так кластер будет работать без избыточности, и вы рискуете потерять данные в случае аварии.
vSAN не предъявляет особых требований к «железу», вы можете использовать стандартные x86 серверы, но лучше следовать списку рекомендуемого оборудования от VMware. Гипервизор лучше записывать на встроенную флешку, SSD для кэша в слот PCIe, а диски хранения в стандартные порты контроллера.
Так как для передачи данных используются не выделенная сеть, а обычная Ethernet, то ее пропускная способность должна быть не ниже 10GB.
В гибридной конфигурации на SSD производится кэширование как чтения, так и записи, а в All-Flash — только записи. Поэтому лучше использовать именно такую конфигурацию, так как 100% кэша под write buffer дают сильное преимущество в скорости обработки входящего потока операций ввода-вывода. Чем выше объем SSD для кэша, тем выше производительность

Если вы используете All-Flash, то никто не мешает вам для кэша применить SSD диск с лучшими характеристиками, чем остальные.
Говоря про vSAN, часто акцентируют внимание на легкое масштабирование. Действительно, чтобы добавить вычислительных мощностей или дополнительного дискового пространства, к сети необходимо подсоединить дополнительный узел

А если нужны только диски, то «на ходу» увеличить ёмкость виртуальной СХД, просто добавив или заменив на диски большего размера, не получится. Необходимо заранее просчитать соотношение хранилища и вычислительной мощности.
Активировать и работать с vSAN можно сразу, если у вас есть vSphere, но не стоит забывать, что технология лицензируется отдельно по числу процессоров. Следовательно, чем больше в вашем кластере узлов, тем дороже получается виртуальная СХД. Поэтому вам потребуется заранее просчитать бюджет.

Требования к HBA / RAID контроллерам

Storage Controller Feature	Storage Controller Requirement
Required mode	Review the vSAN requirements in the VMware Compatibility Guide for the required mode, passthrough or RAID 0, of the controller.If both passthrough and RAID 0 modes are supported, configure passthrough mode instead of RAID0. RAID 0 introduces complexity for disk replacement.
RAID mode	In the case of RAID 0, create one RAID volume per physical disk device.Do not enable a RAID mode other than the mode listed in the VMware Compatibility Guide. Do not enable controller spanning.
Driver and firmware version	Use the latest driver and firmware version for the controller according to VMware Compatibility Guide.If you use the in-box controller driver, verify that the driver is certified for vSAN.OEM ESXi releases might contain drivers that are not certified and listed in the VMware Compatibility Guide.
Queue depth	Verify that the queue depth of the controller is 256 or higher. Or 512 if you have 2 disk groupsHigher queue depth provides improved performance.
Cache	Disable the storage controller cache, or set it to 100 percent read if disabling cache is not possible.
Advanced features	Disable advanced features, for example, HP SSD Smart Path.

Каждому хосту ESXi в кластере vSAN требуется дисковый контроллер который умеет работать в Passthru (pass-through) режиме, режиме HBA или режиме JBOD. Другими словами, дисковый контроллер должен уметь прокидывать диски «как есть» без слоя RAID. Это нужно, чтобы ESXi смог выполнять операции I/O без вмешательства контроллера диска. Термин Passthru просто означает карту RAID, которая не выполняет никаких операций RAID на диске и часто также известна как режим HBA. Многие современные контроллеры поддерживают Passthru mode / HBA / JBOD режим, но есть еще те, которые не умеют в такие режимы, и для использования их в vSAN есть обходной путь через создание RAID-0 из одного диска.

Также, если кэш контроллера нельзя полностью отключить в конфигурации RAID-0, следует настроить кэш контроллера на 100% операции чтения, фактически, отключив кэш записи. Но это наследие времен vSAN 5.5, забудьте про это. Все современные контроллеры умеют нативно прокидывать диск без этих извращений с RAID-0 из одного диска.

IO Controller Mode	Drive Assignments	vSAN supported	Caveats
RAID	Every drive assigned as single drive RAID-0	Yes, Tested	Must use RAID-0 and assign just a single drive per volume All drives on this IO controller must be assigned to either vSAN OR vSphere and not utilized for other actions. Write caching must be disabled on the controller. You are strongly urged to use Passthru/HBA mode if available
Passthru/HBA	Every drive visible as a raw device	Yes	All drives on this IO controller must be assigned to either vSAN OR vSphere and not utilized for other actions
Mixed	Every drive assigned as single drive RAID-0	Yes	Must use RAID-0 and assign just a single drive per volume All drives on this IO controller must be assigned to either vSAN OR vSphere and not utilized for other actions. Write caching must be disabled on the controller. You are strongly urged to use Passthru/HBA mode if available
Mixed	Every drive visible as a raw device	Yes	All drives on this IO controller must be assigned to either vSAN OR vSphere and not utilized for other actions. Write caching must be disabled on the controller.
Mixed	Some devices assigned as single drive RAID-0, some as raw (Passthru) HBA devices	No	All the devices must be assigned for vSphere usage regardless of mapping as RAID or non-RAID. Be careful not to create an imbalance in your storage as RAIDed devices may exhibit a different performance profile that Passthru/HBA ones

И небольшое отступление про необходимый объем кэш диска. В документах можно найти упоминание про 10%. Так вот, эти 10% считать надо от полезного записанного объема (не выделенный, а реальный) VMDK в capacity-tier, а не от объема дисков + не забывать про ограничение в 600 GB на кэш диск. Т.е. все что выше этого объема, как кэш использоваться не будет, а будет оставлен под замену вышедших из строя ячеек памяти.Потому что 10% — это расчёт, что 10% от записанных данных идёт активное обращение гостевой ОС. Сколько там копий на бекенде совершенно не принципиально в случае vSAN, ибо чтение идёт со всех копий и требования к объёму, который надо хранить в горячей области не меняется при различном FTT (от х1 до х3 в разных зеркалах). И не забываем, что чем больше дисковая очередь контроллера, тем быстрее выполняются задачи reconfiguring или resynchronizing.

Heartbeat Failover Strategy for Stretched Cluster

Heartbeat is a technology that allows avoiding the so-called “split-brain” scenario when the HA cluster nodes are unable to synchronize but continue to accept write commands from the initiators independently. It can occur when all synchronization and heartbeat channels disconnect simultaneously, and the other partner nodes do not respond to the node’s requests. As a result, StarWind service assumes the partner nodes to be offline and continues operations in a single-node mode using the data written to it.

If at least one heartbeat link is online, StarWind services can communicate with each other via this link. The services mark the device with the lowest priority as not-synchronized one. Subsequently it gets blocked for further read and write operations until the synchronization channel resumption. Then, the partner device on the synchronized node flushes data from the cache to the disk to preserve data integrity in case the node goes down unexpectedly. It is recommended to assign more independent heartbeat channels during replica creation to improve system stability and avoid the “split-brain” issue. With the Heartbeat Failover Strategy, the storage cluster will continue working with only one StarWind node available.

Heartbeat Failover Strategy Network Design

Virtual Machine Storage Policies

Virtual machines placed into a vSAN Datastore automatically inherit a “Default Storage Policy” that dictates how its objects are protected and distributed across the vSAN cluster. You can edit or replace the default policy at the VM level.

The default policy uses mirroring to protect VM objects against a single failure and has no performance-related options set. Different policies can also be applied to different disks of the same VM.

Policies contain many options, so we will cover the most important you need to know below, along with an example in the following section.

Failure tolerance method: How objects are protected. Options are Mirroring or RAID.
Primary failures to tolerate (PFTT): Number of host/device failures to tolerate.
Number of disk stripes per object: How many hard drives objects are spread across.

Now, let’s review some important terminology and concepts. As we know, the original virtual machine files are converted into ‘objects.’ A VMDK is an object, a swap file is an object, and so on. When an object is mirrored, or broken down into RAID segments, those “sub-objects” are referred to as “components”. As a result, a single VMDK object that is protected by a mirror is broken down into two components, each a mirror of the other. In reality, mirrored objects are also composed of another type of component called a “witness component”.

The witness component — which is only a few MBs in size and contains no VM data — is designed to act as a tie-breaker in the event of a network partition. The VM will fail over to an ESXi host in the partition containing the most components. As a result, the number of components that any object has must be an odd number and the witness component helps with this.

A VMDK Object being broken down into two disk components and a witness component

vSAN Example: VM with Mirroring, PFFT set to 1 and Disk Stripes to 2

Let’s assume we configure vSAN settings of:

Failure tolerance method: Mirroring
Primary failures to tolerate (PFTT): 1
Number of disk stripes per object: 2

With these options set, the virtual machine’s VMDK object will be broken down into three components consisting of two mirrors and one witness.

These objects will be distributed across three hosts for availability. Component placement is fully automated and not configurable. The two mirror components will now be broken down further into stripes and written across two different, randomly selected disks.

The striping breaks the component down into 1MB chunks or ‘strips,’ with each strip going across a different disk. The witness object will not be striped because it would provide no benefit as it is infrequently accessed. This behavior also explains why vSAN typically requires a minimum of three hosts.

Logically, because the striping distributes the component over two disks, it should perform better than a single disk. However, that isn’t always the case. Because the vSAN logic will automatically handle the striping and may stripe across disks within the same disk group, striping may not lead to a performance boost in some cases.

For example, because the same cache SSD fronts all the disks within a disk group, striping provides no practical improvement. You can work around this by setting the number of stripes greater than the number of persistent disks in a disk group. 12 is the maximum.

The graphic demonstrates the physical placement of object components based on the VM vSAN policy

Внимание к виртуальным дискам

Чтобы создать классический отказоустойчивый кластер, данные нужно хранить на общем диске, и к нему должны иметь доступ все узлы кластера.

Если есть внешняя СХД, то для подобной конфигурации на платформе виртуализации используют RDM-диски. В случае с vSAN применяются виртуальные VMDK-диски с дополнительными настройками. По умолчанию VMware vSphere защищает данные от ошибок администратора, позволяя подключить виртуальный жесткий диск в один момент времени только к одной виртуальной машине. Чтобы обойти это ограничение, для общих виртуальных дисков надо прописать вручную параметр .

Выключаем виртуальную машину и добавляем параметр вида в конфигурационный файл

Также важно не забыть для общих дисков в этом же конфигурационном файле добавить параметр для корректной работы кластерного ПО Veritas

Однако за высокую доступность и надежность придется заплатить существенными ограничениями, с которыми придется мириться при дальнейшей эксплуатации системы. Основные из них:

Нельзя менять размер общего диска «на лету». Для этого придется выключать оба узла кластера и только после этого производить манипуляции с диском. Пожалуй, это самая больная тема. Когда заканчивается место на диске, где лежит критичная БД, становится не до шуток.
Нельзя использовать снапшоты для общих дисков. Соответственно, нельзя будет использовать средства безагентского резервного копирования для бэкапа данных. Как следствие, технология Change Block Tracking для инкрементального резервного копирования также не поддерживается.
Без Storage vMotion миграцию общих виртуальных дисков на горячую выполнить будет невозможно.

Хорошая новость — обычный vMotion для перемещения виртуальных машин между гипервизорами успешно работает.

Описание тестового стенда

Железо

4 идентичных хоста в следующей конфигурации:

Платформа — AIC SB302-LB (3U 16-Bay Storage Server, не сертифицирован под vSphere 6.2)
Процессор — Intel Xeon CPU E5-2620 v4 @ 2.10GHz – 8 ядер, включен hyper-threading – 2шт.
ОЗУ – 128 ГБ
NVMe-flash — HGST Ultrastar SN100 Series NVMe SSD HUSPR3216ADP301 1,6ТБ PCIe – 2шт (сертифицирован под Virtual SAN 6.2, только под all-flash, но думаю это не принципиально)
HDD — HGST Ultrastar 7K6000 HUS726020AL5214 2ТБ 7200 rpm SAS 12Гбит/с – 8шт (не сертифицирован под Virtual SAN 6.2, только под 6.5)
Загрузочный носитель – SSD 60ГБ
Дисковый контроллер — LSI Logic Fusion-MPT 12GSAS SAS3008 PCI-Express (сертифицирован под vSphere 6.2, но не сертифицирован под Virtual SAN 6.2)
2 порта 1GbE
2 порта IB 40Гбит/с – на HCA Mellanox ConnectX-2/ConnectX-3 в режиме IPoIB

IB-коммутатор Mallanox SB7790

ПО: VMware vSphere 6.2

vCenter Server Appliance 6.0.0.20100

ESXi 6.0.0.4600944

Версия драйвера Mallanox ConnectX-3 для VMware для работы в режиме IPoIB: MLNX-OFED-ESX-2.4.0.0-10EM-600.0.0.2494585

Описание кластера Virtual SAN

Пробная лицензия vSphere — полный фарш

vCenter Server Appliance развернут в виде ВМ на выделенном локальном загрузочном SSD одного из хостов

Кластер HA из 4х хостов, на нем же развернут кластер Virtual SAN (vSAN)

Virtual SAN задействует все носители 4х узлов кластера vSphere (за исключением загрузочных SSD): 8 идентичных дисковых групп (ДГ) – по 2 на хост; каждая ДГ включает 1 NVMe-flash под кэш и 4 HDD под capacity. Получаем гибридное хранилище с сырой общей ёмкостью 57,64ТБ — 32 capacity drive по 1,82ТБ (реальная ёмкость диска 2ТБ)

Add ESXi host to the inventory

First of all, connect to your vSphere Web Client and navigate to Hosts and Clusters. As you can see in the following screenshot, I have already created several datacenters and folders. To add the host to the inventory, right click on a folder and select Add Host.

Next specify the host name or IP address of the ESXi node.

Then specify the credential to connect to the host. Once the connection is made, a permanent account is created and used for management and not anymore the specified account.

Then select the license to assign the the ESXi node.

On the next screen, choose if you want to prevent a user to logging directly into this host or not.

To finish, choose the VM location.

Repeat these steps to add more ESXi node to inventory. For the vSAN usage, I will add two additional nodes.

Конвергентная СХД

Традиционная архитектура за многие годы своего существования отлично себя зарекомендовала, она надежна, практична и универсальна. Все три ее компонента, а именно серверы, системы хранения данных (СХД) и сети хранения данных, взаимодействуют на высоких скоростях и обеспечивают хорошую отказоустойчивость, вот только ее строительство, обслуживание и масштабирование – задача долгового времени и больших денег. Когда результат нужен быстро, а в современном бизнесе, облачном или другом, потребность в вычислениях меняется очень динамично – удобнее становятся «кирпичики» конвергентных систем

Поэтому если вы строите инфраструктуру для ресурсоемких бизнес-задач или планируете ее масштабирование – обратите внимание на HCI

Разворачивать платформу виртуализации и не использовать ее возможности отказоустойчивости – безответственно и не эффективно. Первое требование, предъявляемое VMware vSphere для создания отказоустойчивого кластера – это наличие выделенной СХД. На ней размещены образы виртуальных машин, которые могут запускаться на любом узле кластера в случае непредвиденного сбоя. Для удовлетворения этого требования, в дополнение к серверам покупаются и устанавливаются различные устройства хранения — блочные (FC, FCoE, ISCSI) или файловые (NFS, SMB), и объединяются сетью хранения данных. В результате в штате появляются дополнительные специалисты, а инфраструктура обрастает десятком систем управления и мониторинга. Сложно, муторно и дорого.

VMware предлагает объединить на одном устройстве и вычислительные функции, и функции хранения – развернуть конвергентное программное решение VMware vSAN. Такая конвергентная СХД не требует специальных устройств и сетей передачи данных, она работает прямо на сервере и способна выполнят ту же задачу, что и внешняя система хранения. Так в чем же отличие vSAN от простого дискового массива и для решения каких задач она может быть полезна.

Stretched Cluster

With the vSAN Stretched Cluster option, you can spread up to 30 ESXi hosts across two physical locations. Virtual machines can be located at either site and are mirrored to the other.

A layer two or layer three network can connect the sites. Layer two networking is simpler to manage. The network needs to be low-latency as round-trip time (RTT) must be under 5ms. Generally, this makes Stretched Cluster configurations viable across campuses or metropolitan areas, but not larger geographical distances.

In the Stretched Cluster topology, VMs can migrate using vMotion and tolerate an entire site failure without the need for any replication hardware or orchestration software. vSAN mirroring and vSphere HA handle the whole failover process in a matter of minutes.

Stretched Cluster configurations require a third site to act as a witness site to ensure that failover occurs correctly.

vSAN Stretched Cluster provides mirrored storage and failover between two sites

Эталонная архитектура

Решение Azure VMware обеспечивает собственную поддержку виртуализованных кластеров WSFC. Поддерживаются постоянные резервирования SCSI-3 (SCSI3PR) на уровне виртуального диска. Эта поддержка необходима кластеру WSFC для арбитража доступа к общему диску между узлами. Поддержка резервирований SCSI3PR позволяет настроить WSFC с ресурсом диска, совместно используемым виртуальными машинами в хранилищах данных vSAN.

На следующей схеме показана архитектура виртуальных узлов WSFC в частном облаке Решения Azure VMware. Здесь показано, где находится Решение Azure VMware, включая виртуальные серверы WSFC (синий прямоугольник) относительно более широкой платформы Azure. На этой схеме показана типичная архитектура концентратора, но аналогичная конфигурация возможна при использовании Виртуальной глобальной сети Azure. Оба варианта имеют все преимущества остальных служб Azure.

VMware vSAN: Hybrid vs. All-flash

vSAN has two modes administrations can select, “All-flash” or “Hybrid”.

Hybrid mode entails using SSD devices as cache and mechanical hard disk drives (HDDs) for persistent storage. It was the only option when vSAN was initially released. The faster SSD device services VM reads and writes in real-time. Then, vSAN logic eventually destages the data from the cache and onto the mechanical disk for long-term storage. This Hybrid approach is more economical because it leverages cheap HDDs for persistent storage but has performance tradeoffs.

All-flash mode uses SSD disks for both the caching tier and persistence tier. As a result, it significantly improves performance but increases hardware costs.

With Hybrid mode, cache capacity is split 70% for reads and 30% for writes. This 70/30 split is because read-cache performs better if more cache is available, and writes are regularly destaged to the persistent storage anyway. As a rule of thumb, the total cache for Hybrid mode should equal to 10% of vSAN’s used capacity, not total capacity.

Conversely, because flash-based disks have an inherent reliability problem, cache is used as a 100% write tier for All-flash mode. The electrically charged cells that store data can only be written a certain number of times before they fail. By fronting all of the writes with highly write-endurant disks, we can absorb most of them before destaging them to cheaper, capacity-based SSD. This approach leads to significant cost savings because all reads are immediately directed to the persistence tier with no penalty.

Additionally, All-flash mode supports RAID5 and RAID6 for VMs, as well as compression and deduplication. These features add additional I/O overhead and cannot be supported with Hybrid mode. The general recommendation is to go with All-flash if you can as it offers better performance, more features, and future-proofs your environment.

Graphic shows All-flash VS Hybrid mode

Hybrid Cloud Solutions Demo

See the best multi-cloud management solution on the market, and when you book & attend your CloudBolt demo we’ll send you a $75 Amazon Gift Card.

The Requirements

There are a number of software and hardware requirements you must take core of, these being:

vCSA 6.5. HA is only supported on this release.
HA cluster nodes are only supported on ESXi 5.5 and/or vCenter Server 5.5 or later.
Ideally, every HA node should reside on a different host and datastore. Optionally, the hosts should be in a DRS cluster. A minimum of 3 hosts is then required.
The vCSA deployment size should be set to Small (4 vCPUs / 16GB RAM) or better.
Create a port group on ESXi for the private HA network. Optionally, you can have a dedicated vSwitch if network isolation is a requirement.
The HA private network must reside on a different subnet other than that used for management.
You cannot mix IP4 and IP6 addressing when configuring networking on the nodes.
Network latency on the HA network must be less than 10ms.
No gateway for the HA network must be specified when configuring the nodes.
You will need one IP address for management and three private IP addresses, one for each HA node.
DNS A and PTR records for the Active node’s management network (i.e. FDQN for the vCSA).