Wednesday, May 9, 2018

First book - VMware NSX Cookbook

I haven't been blogging lately because I was writing a book on VMware NSX. It all started in early February 2017 when someone from Packt Acquisition Editor reached out to me and informed me that they were looking for someone who can author a cookbook on NSX. I wasn't sure as I haven't authored a book and English is not my first language so I don't think I would be the best person at that time.

After reading up on Packt cookbook format; apparently, a cookbook is focused more on 'How to do it' and 'How it works'. Reading few cookbook examples, I can see that most of the cookbooks are not heavy on the theory. I would just need a small intro section and then the focus would shift to 'How to do it'. I've been designing and deploying NSX for the past few years so I think I should be able to take this opportunity to contribute more to the community through authoring a book.

But still I wasn't too sure if I should take this opportunity, so I reached out to Iwan Rahabok to gather further insights about authoring a book as he has authored multiple books with Packt. He mentioned that writing a book is damn painful and luckily he wrote few tips for the first-time author here: I still think that I could do with more help, to ensure I write a good quality cookbook and Packt is open to the idea of co-authors. I asked around in the VMware NSX communities and thanks to Dale Coghlan, he introduced me to one of his colleague in the VMware NSBU - Tony Sangha who is keen to author a book on NSX and my first impression was that we would be a great team!

After teaming up with Tony, the next step is for us to create an outline for all chapters, what would be covered in each of the chapters, how many recipes, how many pages and when we can deliver each of the chapters which are not easy as we don't know how many pages it would be after writing and this is our first time authoring a book! Once all the final outline had been agreed with the publisher, we then needed to sign the contract and start the chapter writing - we started to write our first chapters in April 2017.

After many sleepless nights, we were finally able to jointly release the book and become published authors in late March 2018. Thanks to everyone involved, huge thanks to Dmitri Kalintsev for the technical review and Sjors Robroek for the foreword. The book is now available on Packt and Amazon as per the following li

If you are interested to review the book, I might be able to get an eBook copy from the publisher. Feel free to reach me on twitter/email for more information or for any feedback you have!

Thursday, May 3, 2018

AWS Certified Solutions Architect - Associate (February 2018) Exam Experience

I passed the AWS-SAA February 2018 version on the 30th April 2018. It is my first AWS and PSI exam I have sit. My score was 882/100 and I had around ~60-70 mins left after completing all the 65 questions, so I still had plenty of time to review all of the questions. If you do the math, it took me a minute per question to answer them and I've still got another minute per question for reviews.

Below are the sections and % of scored item:
  • Section 1.0: Design Resilient Architectures 34%
  • Section 2.0: Define Performant Architectures 24%
  • Section 3.0: Specify Secure Applications and Architectures 26%
  • Section 4.0: Design Cost-Optimized Architectures 10%
  • Section 5.0: Define Operationally-Excellent Architectures 6%
Ryan's ACG course ( and Elias' Pluralsight course ( are awesome! The courses helped me from knowing nothing to passing the exam. I have also attended the AWS Certification Exam Readiness Workshop: AWS Certified Solutions Architect – Associate ( which is really helpful for me to understand and prepare for the exam.

Lastly, all of the exam experiences sharing on the ACG discussion forum is extremely helpful for me to prepare 1 week before the exam I have also compiled all the exam experiences on the new exam from ACG here: for easier reading. Hopefully, this would be useful for the community.

So, here are the topics on my exam:
  • Redshift, make sure you understand the use cases, limitations, security, availability, durability, backup and restore. Ryan has covered Redshift for the exam pretty well, the FAQ ( is also a great additional material. Make sure you understand this as I think I've got 5 or 6 questions around Redshift
  • API Gateway, I couldn't remember exactly the details but make sure you understand the use cases and how it works in high-level as I think I have multiple scenario questions that have API gateway as one of the options in the answer.
  • ECS, understand what is the use cases, what's the benefit of ECS compare to other AWS solutions and when to use ECS
  • ELB, I've got multiple questions that have ELB as an option even specific i.e. one option use Application Load Balancer and other option use Classic Load Balancer, so make sure you understand what are the differences between the two and when to use it
  • CloudWatch and CloudTrail, understand the capabilities, things you can do with the solution and when to use
  • Kinesis, understand the use cases and when to use it
  • AWS Cognito was one of the options
  • EBS, you may need to remember the throughput/IOPS/volume size as I've got questions on the use cases, which one is the best option to choose for a particular scenario with a specific throughput/IOPS/volume requirements. Read the FAQ ( I think almost all of the topics in the FAQ i.e. performance, snapshots, encryption except billing & metering are in my exam
  • EFS, I think I've got 1-2 questions around EFS use cases and when to use
  • S3, heaps of questions around S3 and mostly are also covered in the FAQ i.e. storage classes, availability, security, durability, data protection, transfer acceleration, lifecycle management, CRR
  • SQS vs SNS, use cases and when to use
  • VPC, as Ryan mentioned you need to understand VPC inside out, how to setup, what's the default setup, security options in VPC i.e. Security Group vs NACL. On your lab practice, I think it is worthwhile everytime you set up an AWS solution let say EC2, S3, Lambda or anything, think about how do you apply security of access from and to that AWS solution, do you use Security Group or NACL or IAM, think of a real-world scenario with multi AWS solutions even with external non-AWS solution such as on-prem device. Understand all components that made up VPC including but not limited to NAT Gateway/Instance, IGW, VPC endpoint
  • I think there was a Lambda question but I'm not too sure
  • Databases, RDS, DynamoDB, Elasticache. Understand the differences, when to use, available options as this will help you to understand what are the options to improve an existing setup when there is a bottleneck in the performance. Understand the availability, multi-AZ/region setup, reliability, backup and restore, how to scale. Don't forget to read the FAQs
  • Can't remember what other topics are but I think I've got multiple questions about a scenario on multi AWS solutions i.e. an existing setup using Route53, ELB, EC2, RDS and we need to understand what is wrong with the setup or how to improve the setup
Hopefully, my exam experience will be useful for anyone preparing for the exam!

Tuesday, January 10, 2017

Creating an Active-Passive Pool on VMware NSX Load Balancer

In most of my NSX load balancing deployment at a customer, there is always a use case for Active-Standby or Active-Passive load balancing where the load balancer always forwards the traffic to the primary member and only forward to secondary member only if the primary member is down. The secondary member is working as a standby member.

As a comparison, F5 has this function called Priority Group feature. The Priority Group feature in F5 assigns a priority number to the pool member. Within the pool, traffic is then load balanced according to the priority number assigned to the pool member. Members that are assigned a high priority receive all traffic until the load reaches a certain level or some number of members in the group become unavailable.

This feature is not available in NSX load balancer, but we can use NSX Application Rule to achieve a similar result. NSX Layer 7 engine is based on HAProxy so we can use HAProxy ACL to achieve this Active-Standby load balancing method. Application rules enable NSX to create advanced load balancing rules which may not be possible with the application profile or services natively available on the NSX Edge. The application rule will be utilized by the virtual server configuration.

We can use the nbsrv in ACL to check if primary member is down (pool member is 0) then switch to secondary member (pool). To achieve this, we create two pools, create the application rule, and apply it to the virtual server.
Below is the Application Rule for active/standby pool
acl pool_is_down nbsrv(active_pool_name) eq 0
use_backend standby_pool_name if pool_is_down
Here is a step by step configuration in VMware HOL-1703-SDC-1-HOL
1.  Create 2 Pool: pool-web-01a (active pool) and pool-web-02a (standby pool)

2. Create Application Rule, to add a comment on the script use #. In below screenshot example row 1 & row 3 are comments, the scripts are in row 2 and row 4.

3. Create Virtual Server, select the designated active pool as the Default Pool, and apply the Active-Standby Application Rule

The load balancer will now always forward to pool-web-01a and only use pool-web-02a when pool-web-01a is down/has no member. A caveat to note is that failback/preempt is enabled without delay, so the load balancer will instantly switch back and forward to pool-web01a whenever it comes back online.
I haven't found a way to disable preempt or delay the preempt. So if your active pool is down, disable the member to disable the failback/preempt

Thursday, December 15, 2016

Troubleshoot NSX DFW (Distributed Firewall) dropping or blocking traffic

I often receive questions from friends and customers around NSX DFW (Distributed Firewall) issue or troubleshotting NSX DFW.
They just installed NSX DFW then found that NSX DFW rules does not work as expected and seems to be dropping legitimate traffic.

Scenario 1: Default rule (Rule ID 1001) set to Deny, there are other Allow rules on top of it but traffic did not hit the Allow rules
Scenario 2: vRA (vRealize Automation) is being used and integrated with NSX. vRA Blueprint has App Isolation Policy enabled and overlapping IP address is being used
Scenario 3: Default rule is set to Allow, NSX version is 6.2.x, somehow traffic is blocked

There are several points or steps that we can check to troubleshoot NSX DFW, I would normally start with the following steps:
1. Check logs
Check syslog (dfwpktlogs) or traceflow or flow monitoring. See if any DFW rules is blocking the traffic whether it is the default rule is deny, allow, or maybe there's another rule.

2. NSX SpoofGuard
Verify if NSX SpoofGuard is detecting IP address on the VM from the SpoofGuard menu

3. Check VMware tools
DFW uses VMware tools to associate a VM and its vNICs with IP Addresses, if VMware tools was not installed on a VM, its IP address was not learned.
If for some reason you cannot install VMware Tools, starting with NSX 6.2 you can use DHCP or ARP snooping for NSX to detect VM's IP addresses
IP detection method can be changed in the cluster level
If its only a small number of VMs, you can also manually input the IP address statically in the SpoofGuard menu and put the IP address under Approved IP

4. Open VM Tools (OVT)
Some OS vendors/virtual appliance vendors may use Open VM Tools that ship together with the product.
Unfortunately open VM tools has not been validated with NSX DFW as per NSX docs.
"Running open VMware Tools on guest or workload virtual machines has not been validated with distributed firewall."  
So NSX may not be able to retrieve IP address using Open VM tools

5. vRA Integration & NSX App Isolation
When you are integrating NSX DFW with vRA and using App isolation on vRA Blueprint, make sure to change the Service Composer "Applied To" value/behavior from its default applied to DFW change to applied to Policy's Security Groups

The "Applied To" behavior must be changed to apply to Policy's SG to allow Overlapping IPs and App Isolation and make sure that NSX will apply rules only on the VMs that are part of the SGs. We can change this option from the Service Composer settings in NSX 6.2 and later. Prior to NSX 6.2, an API call through vRO must be used. You can find the details in the NSX & vRA Micro-segmentation Tech Guide

NSX introduce ALG (Application Level Gateway) in NSX 6.2
NSX DFW supports ALG for protocols: FTP, CIFS, ORACLE TNS, MS-RPC, SUN-RPC, and TFTP
I have a customer that have an application that runs on TCP 69 which is the same as TFTP port and as soon as we install NSX DFW, its blocking the application even if the default rule is still Allow. After opening a Support Request with VMware GSS, the support engineer confirmed that there is a known issue with NSX 6.2.3/6.2.4 on NSX ALG.
NSX 6.2.3 introduces TFTP ALG, somehow the ALG engine detect/capture this TCP 69 traffic. But after checking the traffic, it drops the traffic as it is not a TFTP traffic.
This issue is also mentioned on VMTN Hopefully this will be fixed in the next release NSX 6.2.5

6. KB 2125437 - Troubleshooting NSX
Check this KB Troubleshooting NSX for vSphere 6.x Distributed Firewall (DFW) (2125437) | VMware KB

If everything in above looks good, but you are still having DFW issue blocking your legitimate traffic, then you may want to open a Support Request with VMware GSS.

Sunday, September 13, 2015

Mini VMware vSphere 6 Homelab on VSAN with Shuttle DS81

Finally I have a homelab to support my study on VCAP-DCA and my job. My current work often requiring me to test a complex upgrade or a complex setup. VMware Hands-on Lab and VMware Product Walkthrough are great to learn how to configure specific features. But not all products and features available and the lab does not provide walkthrough from scratch - for example installing ESXi host, vCenter Server, etc.

It was not easy for me because I need to submit and present to my CFO - my wife, WAF (Wife Acceptance Factor) is one of the design factors that need to be considered plus there are also some technical constraints. I can only submit a Purchase Order after the BoQ has been approved by CFO.
- Total budget is $2,000USD
- Low watt. Apartment's maximum watt is 1,300W, there are fridge, aircons, TV, lights, etc. We are looking at lab that draws up to 200W max 300W.
- Compact and small form factor. Apartment's size is only 33m2, there is not much free space left.

Looking at above lists, I was not sure if I can get a homelab that can meet above requirements and constraints so I look for a cloud that can provide a lab with bunch of VMs with nested ESXi. I found ravello systems but I'm not sure if I can rely online labs and still prefer physical labs.
I was visiting Jason Langer's blog and read his post on replacing homelab with ravello systems.
I'm interested in the Physical Design lab on Micro-ATX w/ Lian-Li Case and Intel NUC. The Micro-ATX based lab can provide 32GB RAM per host, but the power supply is 400W. So I need to go with Intel NUC w/ 16GB RAM.

I was planning to get a small 4-bays storage for SOHO/SMB like Synology DS415+ but it is quite expensive.  Alternative for 4-bays in Synology w/ lowest price is DS414slim. But again still quite expensive and I'm not sure if the total budget is sufficient. So I decided to go with VMware VSAN.

To build a VSAN with NUC, I will need NUC that support multiple disks (1 SSD and 1 HDD). Apparently there are a lot of model for NUCs and I wanted something that can run vSphere 6. Florian Grehl's has an article on ESXi 6.0 Image that works with NUC. Most of local electronic/IT shops here sell the complete set NUC and in the end the memory & disks will be useless since I will replace it to 16GB RAM and at least 120GB SSD + 1TB HDD. NUC with Intel vPro is cool because it provides a KVM Remote Control. I agree with Mike Tabor as he pointed on his article Intel NUC i5 5th Generation an ESXi lab improvement, the best suite would be NUC5i5MYHE - 5th Gen Processor, 2 internal disks, and Intel vPro. So I contacted a local Intel's distributor to ask for a BoQ on NUC5i5MYHE / D54250WYKH2 / NUC5i5RYH, they only sell Intel stuff so I need to buy memory, disks, etc from different shops and build my own. The total cost is above $2K and it is quite troublesome to purchase from different shops, one of the risk the warranty and support are separate and there could be compatibility issue if I'm not picking up the correct brand and model.

I was looking for alternatives for Intel NUC and Shuttle DS81 looks promising.
Below are some links that has Shuttle DS81 for homelabs
Ultrasmall computers for your VMware lab - Intel NUC and Shuttle DS81 preview:
The Perfect vSphere 6 Home Lab | Ryan Birk – Virtual Insanity:
Build a new home lab VMGuru:
Building a ESXi 5.5 Server with the Shuttle DS81:

It can runs vSphere 6 only need to inject Realtek 8111G VIB drivers to ESXi 6 image with PowerCLI 6 or using v-Front ESXi-Customizer by Andreas Peetz. The drivers can be found from one of the above links on Shuttle DS81 for homelabs. Although it is slightly bigger than Intel NUC, the advantages of Shuttle DS81 are it comes with a dual-NIC GigabitEthernet and processor speed is higher than the NUCs. I decided to order 3xShuttle DS81 from local distributor with the following hardware specifications:
Intel® Core™ i3-4170 Processor (3M Cache, 3.70 GHz)
Kingston 8GB 1600MHz DDR3 (PC3-12800) SODIMM Memory
Samsung 850 EVO 120 GB mSATA 2-Inch SSD
HGST Travelstar 7K1000 2.5-Inch 1TB 7200 RPM SATA III 32MB Cache Internal Hard Drive SanDisk 16GB Class 4 SDHC Memory Card

For the switch, I start with TP-Link 8-Port Gigabit Easy Smart Switch TL-SG108E first. It support VLANs, IGMP Snooping, Link Aggregation, and its low watt. I also buy a small UPS/AVR CyberPower BU600E. I use an energy/watt meter to validate all of the above configurations as the CFO would very much like to validate herself. For all 3 nodes Shuttle DS81 + 8-Port GbE TP-Link switch + UPS, it turns out that the total wattage are only between ~65W up to 160W :)

To install ESXi 6 on USB/SD Card, we can install it to USB/SD as a destination directly (with Workstation/Fusion) or have it as a source installer. You can read the details in Vladan's post here. To  create ESXi 6 bootable ISO along with automatically using a static IP Address when the custom ISO first boots up, we can use a ks.cfg, read more about it in William Lam's post here or you can also create kickstart for VSAN. I'm using SD card for the ESXi boot ISO but unfortunately VMware Fusion cannot detect Mac's Internal SD card reader. So I will need to create an ESXi installer on USB. Then install to SD card using USB. After the ESXi hosts are ready, I cannot install vCenter because there is no VMFS Datastore and we need a vCenter Server to configure a VSAN. There's an article on how to bootstrap a VCSA to a single VSAN node. Since I'm using Mac, I cannot use the Client Integration Plug-In to deploy VCSA and need to use the vcsa-cli-installer. To install using vcsa-cli, read Romain Decker's article here.

At the moment, I only install VCSA 6.0u1 with Windows 2003 as AD, DNS, DHCP. I used Win 2003 because of it small size and hardware specification/requirements. With 3 nodes of 120 GB SSD & 1TB HDD I get 2.7TB VSAN datastore in total as below screenshot.

I'm planning to continue to install NSX 6.2 by following Thomas Beaumont's blog post here.