They just installed NSX DFW then found that NSX DFW rules does not work as expected and seems to be dropping legitimate traffic.
Scenario 1: Default rule (Rule ID 1001) set to Deny, there are other Allow rules on top of it but traffic did not hit the Allow rules
Scenario 2: vRA (vRealize Automation) is being used and integrated with NSX. vRA Blueprint has App Isolation Policy enabled and overlapping IP address is being used
Scenario 3: Default rule is set to Allow, NSX version is 6.2.x, somehow traffic is blocked
There are several points or steps that we can check to troubleshoot NSX DFW, I would normally start with the following steps:
1. Check logs
Check syslog (dfwpktlogs) or traceflow or flow monitoring. See if any DFW rules is blocking the traffic whether it is the default rule is deny, allow, or maybe there's another rule.
2. NSX SpoofGuard
Verify if NSX SpoofGuard is detecting IP address on the VM from the SpoofGuard menu
3. Check VMware tools
DFW uses VMware tools to associate a VM and its vNICs with IP Addresses, if VMware tools was not installed on a VM, its IP address was not learned.
If for some reason you cannot install VMware Tools, starting with NSX 6.2 you can use DHCP or ARP snooping for NSX to detect VM's IP addresses
IP detection method can be changed in the cluster level
If its only a small number of VMs, you can also manually input the IP address statically in the SpoofGuard menu and put the IP address under Approved IP
4. Open VM Tools (OVT)
Some OS vendors/virtual appliance vendors may use Open VM Tools that ship together with the product.
Unfortunately open VM tools has not been validated with NSX DFW as per NSX docs.
"Running open VMware Tools on guest or workload virtual machines has not been validated with distributed firewall."
So NSX may not be able to retrieve IP address using Open VM tools
5. vRA Integration & NSX App Isolation
When you are integrating NSX DFW with vRA and using App isolation on vRA Blueprint, make sure to change the Service Composer "Applied To" value/behavior from its default applied to DFW change to applied to Policy's Security Groups
5. NSX ALG
NSX introduce ALG (Application Level Gateway) in NSX 6.2 http://blogs.vmware.com/networkvirtualization/2015/11/distributed-firewall-alg.html#.WFJW9FzHVwM
NSX DFW supports ALG for protocols: FTP, CIFS, ORACLE TNS, MS-RPC, SUN-RPC, and TFTP
I have a customer that have an application that runs on TCP 69 which is the same as TFTP port and as soon as we install NSX DFW, its blocking the application even if the default rule is still Allow. After opening a Support Request with VMware GSS, the support engineer confirmed that there is a known issue with NSX 6.2.3/6.2.4 on NSX ALG.
NSX 6.2.3 introduces TFTP ALG, somehow the ALG engine detect/capture this TCP 69 traffic. But after checking the traffic, it drops the traffic as it is not a TFTP traffic.
This issue is also mentioned on VMTN https://communities.vmware.com/message/2626001#2626001. Hopefully this will be fixed in the next release NSX 6.2.5
6. KB 2125437 - Troubleshooting NSX
Check this KB Troubleshooting NSX for vSphere 6.x Distributed Firewall (DFW) (2125437) | VMware KB https://kb.vmware.com/kb/2125437
If everything in above looks good, but you are still having DFW issue blocking your legitimate traffic, then you may want to open a Support Request with VMware GSS.